Deep Bayesian Active Learning with Image Data - 2017
Details
Title : Deep Bayesian Active Learning with Image Data Author(s): Gal, Yarin and Islam, Riashat and Ghahramani, Zoubin Link(s) : https://dl.acm.org/doi/10.5555/3305381.3305504
Rough Notes
Develops an active learning (AL) framework for high-dimensional data, where they use Bayesian deep learning.
Some challenges:
- AL methods constantly require updating models on few samples.
- Many acquisition functions rely on model uncertainty.
Existing methods often used kernel-based models.
This paper used Bayesian Convolutional Neural Networks (BNNs). After putting a prior of the weights of the neural networks, approximate inference is performed using techniques e.g. dropout based approaches. In classification, this involves sampling from what is called the dropout distribution to get a Monte Carlo estimate for \(p(y=c|\mathbf{ x },\mathcal{ D })\).
Acquisition functions used include:
- Maximum predictive entropy i.e. maximum entropy of \(p(y=c|\mathbf{ x },\mathcal{ D })\).
- BALD.
- Maximum variation ratios.
- Maximum mean standard deviation.
- Random acquisition.
Details on the BALD score - it is approximated for each of the classes by sampling from the posterior using the dropout distribution. Recall the actual score is: \[ \mathbb{ H }[y|\mathbf{ x },\mathcal{ D }] - \mathbb{ E }_{p(\omega|\mathcal{ D })}[\mathbb{ H }[y|\mathbf{ x },\omega]] \] The Monte Carlo approximation is then: \[ -\sum_{c}^{}(\frac{1}{T}\sum_{t}^{}\hat{p}_c^t)\log(\frac{1}{T}\sum_{t}^{}\hat{p}_c^t) + \frac{1}{T}\sum_{c,t}^{}\hat{p}_c^t \log \hat{p}_c^t \] Where \(\hat{p}_c^t\) is the probability of input \(\mathbf{ x }\) with parameters \(\hat{\omega}_t\) (MC sample for model parameters) to take class \(c\).