Progress Update (June 2024)
preprocessing
updates
We just released a new set of chapters:
- Embeddings: this encompasses feature extraction tools such as PCA, MDS, UMAP, and shrunken centroids.
- Interactions and Nonlinear Features: interaction definitions and detection methods, basis expansions (polynomials, splines), and discretization.
- Overfitting: the first chapter of the “Optimization” part of the book. This sets up the next few chapters on resampling and model tuning.
We liberally used shinylive to include interactive demonstrations of a few concepts, such as UMAP parameters and natural splines. It will be interesting when we convert this to a pdf for publishing but, overall, it has been a great tool to work with and we think that it is a great tool for learning.
We still have one more chapter in the “Preparation” part on missing data. The chapter number for Overfitting will increment by one in the next release.
What We Are Working On Now
- Light/dark mode: Phil Karlton said that “There are only two hard things in Computer Science: cache invalidation and naming things.” Maybe the third is getting an elegant framework for seemlessly going between light and dark mode. Quarto makes it a lot easier (and hopefully will become trival) but it is not easy. I have a test repo started and a branch for this book to experiment with.
- A missing data chapter.
- A broad overview of resampling methods.
- Grid search (including nested resampling, racing).
- Iterative search.