Continual Learning (University of Pisa, Continual AI, AIDA)
Introduction
Content
Introduction and Motivation
Current state-of-the-art (SOTA) performances in many tasks are achieved by deep neural networks - the paradigm here is to use huge, fixed datasets. Typically, samples are very high dimensional (e.g. images). One aim of this lecture is to convince anyone that as this complexity grows it becomes exponentially harder to gather representative data.
Rather than making a model that can handle all possible problems it will face, we take a systems security style approach where we want to have a methodology that can handle the eventual failures in the best manner possible. This involves adaptive learning in local regions, improving over time.
Some desiderata:
- Realistic time-scale where data (and tasks) are available only during time.
- No access to previously encountered data.
- Constant computational and memory resources.
- Incremental development in the face of increasing complexity.
- Efficiency and scalability.
One of the main problems we face here is catastrophic forgetting, which is the phenomenon where neural networks suddenly and completely forget previously learnt information when learning new information, mostly due to gradient descent.
We can imagine the setting as follows:
- A collection of experiences \(e_i\) which evolve over time.
- Each experience \(e_i\) consits of its own time index and observed samples.
- Learning happens with respect to one and only one experience at a time.
See (, , Definition 3) for one formalization of the CL problem.
Conventions and names are still under discussion, for e.g. we have continual learning, incremental learning, lifelong learning, continuous learning. Related paradigms include multi-task learning, meta-learning/learning to learn, transfer learning and domain adaptation, online/streaming learning.
Catastrophic forgetting
This lecture will focus on catastrophic forgetting, specifically we will look at a toy example with 1 neuron, and a deep learning example with permuted and split MNIST.
Problem settings and Benchmarks
This lecture focuses on continual learning scenarios, existing benchmarks, and how to evaluate continual learning (CL) algorithms.