VDSG: Optimizing Multimodal AI Pipelines with Metaxy

May 29, 2026·
Dr. Georg Heiler
Dr. Georg Heiler
Hernan Picatto
Hernan Picatto
· 1 min read
Abstract
The AI era has shifted compute toward complex multimodal pipelines, where a small input or code change can trigger expensive downstream recomputation. This talk introduces Metaxy, an open source Python framework for sample-level metadata versioning and field-level provenance. We show how Metaxy lets pipelines recompute only what actually changed, so startups, enterprises, and researchers can iterate faster, reduce cloud costs, and build more efficient AI workflows.
Date
May 29, 2026 6:00 PM
Location

A1 Telekom Austria, Lassallestraße 9, 1020 Vienna

events

Abstract

The AI era has changed the economics of data pipelines. Multimodal workflows often fan out into transcription, image understanding, embeddings, classification, extraction, review, and downstream analytics. Without precise metadata, a small change can invalidate too much of the pipeline and force costly reruns.

This talk introduces Metaxy, an open source Python framework for sample-level metadata versioning and field-level provenance. Metaxy acts as a control layer for incremental data pipelines: it records which fields depend on which upstream fields, computes what became stale, and lets the execution layer process only the affected records.

We focus on practical examples across startup, enterprise, and research settings:

  • avoiding wasteful recomputation in multimodal AI workflows,
  • using field-level lineage to decide what can be skipped,
  • keeping provenance queryable across document, audio, image, and tabular data,
  • connecting Metaxy with orchestrators and compute engines such as Dagster, Ray, and Slurm.

The core idea is simple: if an audio file changes, recompute transcription. Do not rerun face recognition if it only depends on the video stream.

Dr. Georg Heiler
Authors
senior data expert
Georg is a co-founder @Jubust and a Senior data expert at Magenta as well as a ML-ops engineer at ASCII. He is solving challenges with data. His interests include geospatial graphs and time series. Georg transitions the data platform of Magenta to the cloud and is handling large scale multi-modal ML-ops challenges at ASCII.
Hernan Picatto
Authors
Researcher & data scientist

Researcher at the Supply Chain Intelligence Institute Austria (ASCII).

My research interest lies at the intersection of forecasting extreme events and causal analysis in high-frequency time series.