Big-Data

Cost efficient alternative to databricks lock-in
Cost efficient alternative to databricks lock-in

Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated orchestrator ([dagster-pipes](https://docs.dagster.io/guides/dagster-pipes)). It can not only make Databricks an implementation detail but also save cost. Also, it improves developer productivity. It allows you to take back control.

Sep 12, 2024

Cloud arbitrage for spark pipelines
Cloud arbitrage for spark pipelines

Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated orchestrator ([dagster-pipes](https://docs.dagster.io/guides/dagster-pipes)). It can not only make Databricks an implementation detail but also save cost. Also, it improves developer productivity. It allows you to take back control.

Jun 21, 2024

Introduction to Geostatistics
Introduction to Geostatistics

Nov 10, 2023

Visual analytics of mobility network changes observed using mobile phone data during COVID-19 pandemic
Visual analytics of mobility network changes observed using mobile phone data during COVID-19 pandemic

Feb 8, 2023

Efficient Temporal Graph Analytics
Efficient Temporal Graph Analytics

Oct 19, 2022

AI basierte Root Cause Analyse von CPD Störquellen in Docsis Netzen
AI basierte Root Cause Analyse von CPD Störquellen in Docsis Netzen

Good quality network connectivity is ever more important. For hybrid fiber coaxial (HFC) networks, searching for upstream \emph{high noise} in the past was cumbersome and time-consuming. Even with machine learning due to the heterogeneity of the network and its topological structure, the task remains challenging. We present the automation of a simple business rule (largest change of a specific value) and compare its performance with state-of-the-art machine-learning methods and conclude that the precision@1 can be improved by 2.3 times. As it is best when a fault does not occur in the first place, we secondly evaluate multiple approaches to forecast network faults, which would allow performing predictive maintenance on the network.

May 10, 2022

Identifying the root cause of cable network problems with machine learning
Identifying the root cause of cable network problems with machine learning

Mar 14, 2022

Monitoring supply networks from mobile phone data for estimating the systemic risk of an economy
Monitoring supply networks from mobile phone data for estimating the systemic risk of an economy

Figure description: (a) Probability $p(s|c)$ to find a supply link, sij , given that there exists a communication link, cij, between firms i and j for communication links exceeding a given call duration, dij. Error bars denote the quartiles of a bootstrap simulation described in SI Text 1.

Oct 13, 2021

Mobility analytics

Nov 16, 2020

Run the latest version of spark
Run the latest version of spark

Execute the latest version of spark on HDP.

Aug 31, 2020