Managing Complex Data Science Experiment Configurations with Hydra - Presented by Michal Karzynski - 2022

Details

Title : Managing Complex Data Science Experiment Configurations with Hydra - Presented by Michal Karzynski Author(s): EuroPython Conference Link(s) : https://www.youtube.com/watch?v=bNGu8A6F3-8

Rough Notes

Data science experiments:

Have complex configurations.
Are easy to confuse which values worked best.
Results often difficult to reproduce.

The Python package Hydra is meant to help with these problems.

Suppose you have the following file config.yaml which defines parameters for some model.

model:
  a: 1
  b: 2
  c: 3

To use this in a Python machine learning experiment:

import hydra
from omegaconf import DictConfig, OmegaConf

@hydra.main(version_base="1.2", config_path=".", config_name="config")
def my_experiment(cfg: DictConfig) -> None:
    logger.info("Starting experiment")
    logger.info(f"model" {cfg.model}")

The above assumed that that config.yaml is in the same directory as the Python script.

Hydra can also be used nicely together with MLflow. The Hydra configuration file can be passed onto log_params to save each parameter (after flattening the hierarchically nested dictonary), or the whole YAML file itself could be logged via log_artifact.