SAO: Metaxy - Field-Level Metadata Management for Incremental Multimodal ML Pipelines

Abstract
Multimodal machine learning pipelines that process video, audio, and images incur substantial GPU costs. Existing data orchestrators operate at table or asset granularity: when any part of a processing step changes, they invalidate and recompute entire downstream datasets, even fields unaffected by the change. We present Metaxy, an open-source Python library that introduces field-level dependency tracking at record granularity as a standalone metadata layer. Features are declared as Pydantic models organized into a directed acyclic graph (DAG), where nodes represent individual data fields and edges encode data flow. For each record, Metaxy computes hierarchical version hashes that combine user-specified code versions with upstream record versions, propagating changes along graph edges. At query time, the resolve_update operation compares expected and stored provenance to return exactly those records that are new, stale, or orphaned, enabling downstream systems to process only what changed. Metaxy supports pluggable metadata backends (DuckDB, ClickHouse, BigQuery, Delta Lake, and others) and integrates with orchestrators such as Dagster and distributed compute frameworks such as Ray. In production at Anam, Metaxy processes millions of training samples for a video generation model, eliminating redundant GPU recomputation and preserving full per-record lineage for reproducibility.
Date
May 26, 2026 3:30 PM
Location
San Jose, California
Links
- Lightning deck: georgheiler.com/slides/2026-metaxy-sao-workshop
- Workshop: Supporting Our AI Overlords (SAO)
- Metaxy docs: docs.metaxy.io/stable

Authors
senior data expert
Georg is a co-founder @Jubust and a Senior data expert at Magenta as well as a ML-ops engineer at ASCII.
He is solving challenges with data. His interests include geospatial graphs
and time series. Georg transitions the data platform of Magenta to the cloud
and is handling large scale multi-modal ML-ops challenges at ASCII.