ML project configuration management

May 8, 2021·
Georg Heiler
Georg Heiler
· 1 min read

Configuration handling can get quite messy in complex machine learning pipelines. Facebook research has created Hydra to cope with this. Additionally, it allows for easy composition and re-configuration of such workflows.

Think of a simple project setup as outlined below:

├── conf
│   └── config.yml 
├── my_ml_script.py

NOTICE: the configuration is already set up as a folder to future-proof it i.e. add specific configurations for each model derived from some base configuration.

The config.yml file contains:

db:
  driver: mysql
  user: omry

Then you can use it directly in a python script my_ml_script:

import hydra
from omegaconf import DictConfig, OmegaConf

import logging

log = logging.getLogger(__name__)


@hydra.main(config_path="conf", config_name="config.yml")
def my_app(cfg: DictConfig) -> None:
    log.info(OmegaConf.to_yaml(cfg))


if __name__ == "__main__":
    my_app()

and call it (including overwritten configuration values)

python my_ml_script.py db.driver=postgresql

However, this does not directly work from our beloved Jupyter notebook interactive envrionment. But it is not too complicated to get it to work on Jupyter as well. Simply some more imports are needed - and the initialize function needs to be callend manually:

from hydra.experimental import compose, initialize
from omegaconf import OmegaConf

initialize(config_path="conf")
cfg = compose(config_name="config.yml", overrides=["db.driver=postgres", "db.user=me"])
print(OmegaConf.to_yaml(cfg))
Georg Heiler
Authors
senior data expert
My research interests include large geo-spatial time and network data analytics.