NeurIPS 2022

Rough Notes

Cicero (Meta)

  • piKL objetive (alphastar did it to aid exploration, these guys used it to model the other players/encourage human like behaviour)
  • Difference between lying (not done for AI safety reasons) and witholding information -> Noam: never having it lie improved the performace dramatically, although as per profressional human players, rarely lying may help.
  • Data augmentation to filter illegal moves (I guess this is in the paper)
  • Dora vs Cicero cases

UQ (IBM)

One way comm

  • Infinitesimal jackknife?
  • Some coverage metrics PICP,MPIW
  • Montonic selective risk (msr) - monotone error decreased for every subgroup - (#DOUBT MI used in eventual objective that imposes the sufficiency criteria which is the theorem they proved.

Two-way comm

  • Learning rejector and classifier concurrently - they cast the MILP as a differentiable relaxation - (#DOUBT relation between relaxing MILPs and Gumbel-softmax trick).

Richer comm

  • Their disentangled model - (#TODO Look into exact mechanism to encourage disentanglement)
  • See UQ360 paper, I assume all/most other papers mentioned here will have some relation to that.

Learning agents (LG)

  • keyword: Compositional generalizability (w/ hierarchical RL)
  • nonlinearity- boiled cabbage vs. fried egg example.
  • hypothesis ranking and popularity and recency bias.
  • language priors ? from LLMs?
  • EXAONE for scientific discovery, makes use of actual academic papers. how is a subgroup defined)
  • Upper bound on cond. MI used in eventual objective that imposes the sufficiency criteria which is the theorem they proved.

Two-way comm

  • Learning rejector and classifier concurrently - they cast the MILP as a differentiable relaxation - (#DOUBT relation between relaxing MILPs and Gumbel-softmax trick).

Richer comm

  • Their disentangled model - (#TODO Look into exact mechanism to encourage disentanglement)
  • See UQ360 paper, I assume all/most other papers mentioned here will have some relation to that.

Learning agents (LG)

  • keyword: Compositional generalizability (w/ hierarchical RL)
  • nonlinearity- boiled cabbage vs. fried egg example.
  • hypothesis ranking and popularity and recency bias.
  • language priors ? from LLMs?
  • EXAONE for scientific discovery, makes use of actual academic papers. CDEd paper.

Human-in-the-Loop (Toloka)

  • Due to data drifts, anomalies etc.

Keynote 2

  • Prediction Policy Problems Kleinberg et al. 2015
  • Keyword: Algorithmic auditing , regression disconuity design
  • Pedromo et al 2022 - dropout crisis
  • Knowes et al 2015 Dropout Early Warning Sustem
  • individual vs environment and malleable vs. non-malleable grouping reveals impact of system level factors
  • Kirchner ProPubloca 2017 probabilistic genotyping DNA (DNA mixture interpretation paper byNIST)

Keynote 3

  • Conformal prediction (missed)

Keynote 4

  • Model centric, data centric and human centric AI.
  • Semantics are not captured well at least not enough to go to the stickman eating ramen.
  • What makes a good user interface?
  • Training lanague models to followe instruction with human feedback - can we perform something like experimental design after considering the LLM to be a prior.
  • Some challenges: Incentivizing users to work w AI.
  • Michael Bernstein (Stanford) 2 cultures of evaluation in AI vs HCI.

Town hall

  • "Exhibitors" instead of "Sponsors" - no tiers, no emphasis on the scientific aspect.
  • Stats: In person - 10300, virtual 3160. for reference, 2010 had 1354 participants. 9634 submissions, 2905 accepted.

Keynote -2

  • Bias in the data generation.
  • All datasets are biased, some are useful.
  • Dataset selection Neurips2002
  • OpenML, HuggingFace and ofc Kaggle
  • Big-Bench
  • Reduced, Reused and Recycled NeurIPS 2021 D&B

CML-4-Impact

  • Some hot topics: partial discovery, needing high quality simulators
  • Keywords: sequential discovery learns from some treatment then propagates outwards.
  • COutnerfactual generated by the machine as an expression of what it thinks something looks like in its language
  • go.topcoder.com/schmidtcentercancerchallenge - cells here are imagined to be in steady state.
  • Human in the loop to transcend Kleinberg's theorem?

Questions from poster session

  • Is this prior elicitation
  • Difference between graphical model and actual causal DAG

Information theory cognitive science

Talk 1 Palmer

  • Things arent driven to optimality, theres only best "sensors" in comparison to other things.
  • Flash lag illusion
  • Optimal prediction slide references - the loss function \(I_{past} - \beta I_{future}\) also important, or rather \(I_{future}\) as another information related value like for cognitive (something).

Talk 2 Gornet

  • Intuition: spatial structure influences temporal correlations
  • Visual maps relation to reasoning slide

Talk 3 Sajid

  • lost of evidence for tradeoff between exploration and exploitation in e.g. monkey
  • amortized Active inf?

Talk 5 Tatyana

  • Hyperbolic geometry as a tool to represent hierachical tree like networks
  • There is hierarchical structure in decision making

Talk 6

  • Ask for his thesis

Talk 7 Information is not enough

  • George Miller's paper is important
  • Duncan Luce - whatever happened to ingo theory in psychology
  • Ma, Husain, Bays 2014 Nature Neuroscience
  • paper: in neuron - deep rl and its neuroscientific implications
  • Fang Z 2022 thesis - learning generalizable representtions through compression
  • keyword: rate distortion RL - specific alg: rate distortion policy gradient - RD multi-agent
  • paper: human RL w visual information

Lightning talks, spotlight

  • Emergent comms EC : using info theoretic perspectives to make generalizable and translatable EC
  • Generalizing w overly complex representations: against the simplicity bias - great slide just before limitation
  • soft labels learning more than true labels (see also on the informativeness of supervision signals)
  • unsupervised machine translation : given a good prior, its possible to PAClearn a translator

Andrew Gordon Wilson ; when bayesian orthodoxy goes wrong

  • Sequence completion example for Occams razor
  • Modified Netwonian mechanics vs. General Relativity
  • Marginal likelihood \(\neq\) Generalization - marginal likelihood is not to optimize if you want to generalize
  • Marginal likelihood is great for scientific hypothesis testing
  • What about normalizing images in the BMA for MNIST?
  • What exactly is M -> empire strikes back if we assume a distribution over DAGs like from the Indian Chef's process?

Kun Zhang

  • 3 dims (in causal disc. w/ obs data) are iid, parametric assumptions and latent confounders
  • wrong direction -> non id noise is the main principle
    • Zhang has Neurips paper in 2019,20,22 l

Panel disc.

  • A benchmark does not test hypotheses
  • Samy Bengio's "changing test set" idea relates to the "do imagenet classifiers generalize to imagenet" paper
  • a good paper is simple and summarizable in 2ish sentences, and also scientifically rigourous

Emacs 29.4 (Org mode 9.6.15)