Managing Complex Data Science Experiment Configurations with Hydra - Presented by Michal Karzynski - 2022
Details
Title : Managing Complex Data Science Experiment Configurations with Hydra - Presented by Michal Karzynski Author(s): EuroPython Conference Link(s) : https://www.youtube.com/watch?v=bNGu8A6F3-8
Rough Notes
Data science experiments:
- Have complex configurations.
- Are easy to confuse which values worked best.
- Results often difficult to reproduce.
The Python package Hydra is meant to help with these problems.
Suppose you have the following file config.yaml
which defines parameters for
some model.
model: a: 1 b: 2 c: 3
To use this in a Python machine learning experiment:
import hydra from omegaconf import DictConfig, OmegaConf @hydra.main(version_base="1.2", config_path=".", config_name="config") def my_experiment(cfg: DictConfig) -> None: logger.info("Starting experiment") logger.info(f"model" {cfg.model}")
The above assumed that that config.yaml
is in the same directory as the Python script.
Hydra can also be used nicely together with MLflow. The Hydra configuration file can be passed onto log_params
to save each parameter (after flattening the hierarchically nested dictonary), or the whole YAML file itself could be logged via log_artifact
.