Pretraining Task Diversity and the Emergence of Non-Bayesian in-Context Learning for Regression - 2023
Details
Title : Pretraining Task Diversity and the Emergence of Non-Bayesian in-Context Learning for Regression Author(s): RaventoĢs, Allan and Paul, Mansheej and Chen, Feng and Ganguli, Surya Link(s) : http://arxiv.org/abs/2306.15063
Rough Notes
This work analyzes In-Context Learning (ICL) capabilities of pretrained transformers (PTs) i.e. their ability to learn from few examples without updating any weights. This paper analyzes ICL's performance on linear regression with varying task diversity in the pretraining dataset. Experiments show that there is a task diversity threshold for the emergence of ICL, where below this threshold the PT cannot solve unseen regression tasks (it behaves like a Bayesian estimator with the non-diverse pretraining task distribution as the prior). Beyond the threshold, the PT significantly outperforms this estimator, corresponding to a Gaussian prior over all tasks including those unseen during pretraining (its behavious aligns with ridge-regression).