The Automated Statistician

A project aimed at:

Automating feature selection, transformation and understanding.
Automating data collection and experiment design.
Automating model discovery and explanation (Important aim).
Automating allocation of computational resources.
Automating inference.

Project was motivated due to a lack of data scientists, machine learning experts etc, thus creating a need for a system that automates model discovery from data.

Main ingredients:

An open-ended language of models (#DOUBT I guess graphical models?)
A search procedure over the language of models.
A principled approach of evaluating models.
A procedure to automatically explain the models, i.e. make the assumptions more explicit and communicate them to non-experts.

For regression in time-series data, the search procedure involves creating a decision tree whose nodes correspond Gaussian process kernel combinations and the quantity maximized is the marginal likelihood.