NeurIPS 2022
Rough Notes
Cicero (Meta)
- piKL objetive (alphastar did it to aid exploration, these guys used it to model the other players/encourage human like behaviour)
- Difference between lying (not done for AI safety reasons) and witholding information -> Noam: never having it lie improved the performace dramatically, although as per profressional human players, rarely lying may help.
- Data augmentation to filter illegal moves (I guess this is in the paper)
- Dora vs Cicero cases
UQ (IBM)
One way comm
- Infinitesimal jackknife?
- Some coverage metrics PICP,MPIW
- Montonic selective risk (msr) - monotone error decreased for every subgroup - (#DOUBT MI used in eventual objective that imposes the sufficiency criteria which is the theorem they proved.
Two-way comm
- Learning rejector and classifier concurrently - they cast the MILP as a differentiable relaxation - (#DOUBT relation between relaxing MILPs and Gumbel-softmax trick).
Richer comm
- Their disentangled model - (#TODO Look into exact mechanism to encourage disentanglement)
- See UQ360 paper, I assume all/most other papers mentioned here will have some relation to that.
Learning agents (LG)
- keyword: Compositional generalizability (w/ hierarchical RL)
- nonlinearity- boiled cabbage vs. fried egg example.
- hypothesis ranking and popularity and recency bias.
- language priors ? from LLMs?
- EXAONE for scientific discovery, makes use of actual academic papers. how is a subgroup defined)
- Upper bound on cond. MI used in eventual objective that imposes the sufficiency criteria which is the theorem they proved.
Two-way comm
- Learning rejector and classifier concurrently - they cast the MILP as a differentiable relaxation - (#DOUBT relation between relaxing MILPs and Gumbel-softmax trick).
Richer comm
- Their disentangled model - (#TODO Look into exact mechanism to encourage disentanglement)
- See UQ360 paper, I assume all/most other papers mentioned here will have some relation to that.
Learning agents (LG)
- keyword: Compositional generalizability (w/ hierarchical RL)
- nonlinearity- boiled cabbage vs. fried egg example.
- hypothesis ranking and popularity and recency bias.
- language priors ? from LLMs?
- EXAONE for scientific discovery, makes use of actual academic papers. CDEd paper.
Human-in-the-Loop (Toloka)
- Due to data drifts, anomalies etc.
Keynote 2
- Prediction Policy Problems Kleinberg et al. 2015
- Keyword: Algorithmic auditing , regression disconuity design
- Pedromo et al 2022 - dropout crisis
- Knowes et al 2015 Dropout Early Warning Sustem
- individual vs environment and malleable vs. non-malleable grouping reveals impact of system level factors
- Kirchner ProPubloca 2017 probabilistic genotyping DNA (DNA mixture interpretation paper byNIST)
Keynote 3
- Conformal prediction (missed)
Keynote 4
- Model centric, data centric and human centric AI.
- Semantics are not captured well at least not enough to go to the stickman eating ramen.
- What makes a good user interface?
- Training lanague models to followe instruction with human feedback - can we perform something like experimental design after considering the LLM to be a prior.
- Some challenges: Incentivizing users to work w AI.
- Michael Bernstein (Stanford) 2 cultures of evaluation in AI vs HCI.
Town hall
- "Exhibitors" instead of "Sponsors" - no tiers, no emphasis on the scientific aspect.
- Stats: In person - 10300, virtual 3160. for reference, 2010 had 1354 participants. 9634 submissions, 2905 accepted.
Keynote -2
- Bias in the data generation.
- All datasets are biased, some are useful.
- Dataset selection Neurips2002
- OpenML, HuggingFace and ofc Kaggle
- Big-Bench
- Reduced, Reused and Recycled NeurIPS 2021 D&B
CML-4-Impact
- Some hot topics: partial discovery, needing high quality simulators
- Keywords: sequential discovery learns from some treatment then propagates outwards.
- COutnerfactual generated by the machine as an expression of what it thinks something looks like in its language
- go.topcoder.com/schmidtcentercancerchallenge - cells here are imagined to be in steady state.
- Human in the loop to transcend Kleinberg's theorem?
Questions from poster session
- Is this prior elicitation
- Difference between graphical model and actual causal DAG
Information theory cognitive science
Talk 1 Palmer
- Things arent driven to optimality, theres only best "sensors" in comparison to other things.
- Flash lag illusion
- Optimal prediction slide references - the loss function \(I_{past} - \beta I_{future}\) also important, or rather \(I_{future}\) as another information related value like for cognitive (something).
Talk 2 Gornet
- Intuition: spatial structure influences temporal correlations
- Visual maps relation to reasoning slide
Talk 3 Sajid
- lost of evidence for tradeoff between exploration and exploitation in e.g. monkey
- amortized Active inf?
Talk 5 Tatyana
- Hyperbolic geometry as a tool to represent hierachical tree like networks
- There is hierarchical structure in decision making
Talk 6
- Ask for his thesis
Talk 7 Information is not enough
- George Miller's paper is important
- Duncan Luce - whatever happened to ingo theory in psychology
- Ma, Husain, Bays 2014 Nature Neuroscience
- paper: in neuron - deep rl and its neuroscientific implications
- Fang Z 2022 thesis - learning generalizable representtions through compression
- keyword: rate distortion RL - specific alg: rate distortion policy gradient - RD multi-agent
- paper: human RL w visual information
Lightning talks, spotlight
- Emergent comms EC : using info theoretic perspectives to make generalizable and translatable EC
- Generalizing w overly complex representations: against the simplicity bias - great slide just before limitation
- soft labels learning more than true labels (see also on the informativeness of supervision signals)
- unsupervised machine translation : given a good prior, its possible to PAClearn a translator
Andrew Gordon Wilson ; when bayesian orthodoxy goes wrong
- Sequence completion example for Occams razor
- Modified Netwonian mechanics vs. General Relativity
- Marginal likelihood \(\neq\) Generalization - marginal likelihood is not to optimize if you want to generalize
- Marginal likelihood is great for scientific hypothesis testing
- What about normalizing images in the BMA for MNIST?
- What exactly is M -> empire strikes back if we assume a distribution over DAGs like from the Indian Chef's process?
Kun Zhang
- 3 dims (in causal disc. w/ obs data) are iid, parametric assumptions and latent confounders
- wrong direction -> non id noise is the main principle
- Zhang has Neurips paper in 2019,20,22 l
Panel disc.
- A benchmark does not test hypotheses
- Samy Bengio's "changing test set" idea relates to the "do imagenet classifiers generalize to imagenet" paper
- a good paper is simple and summarizable in 2ish sentences, and also scientifically rigourous