The Automated Statistician
A project aimed at:
- Automating feature selection, transformation and understanding.
- Automating data collection and experiment design.
- Automating model discovery and explanation (Important aim).
- Automating allocation of computational resources.
- Automating inference.
Project was motivated due to a lack of data scientists, machine learning experts etc, thus creating a need for a system that automates model discovery from data.
Main ingredients:
- An open-ended language of models (#DOUBT I guess graphical models?)
- A search procedure over the language of models.
- A principled approach of evaluating models.
- A procedure to automatically explain the models, i.e. make the assumptions more explicit and communicate them to non-experts.
For regression in time-series data, the search procedure involves creating a decision tree whose nodes correspond Gaussian process kernel combinations and the quantity maximized is the marginal likelihood.