freiberufler Data Architect /Data Strategist/ Data Engineer / Data Scientist/ Data & AI Management auf freelance.de

Data Architect /Data Strategist/ Data Engineer / Data Scientist/ Data & AI Management

zuletzt online vor 2 Tagen
  • auf Anfrage
  • 80687 München
  • Umkreis (bis 200 km)
  • en  |  de  |  fr
  • 05.02.2026

Kurzvorstellung

I led a team of up to 20 people (consisting of Data Architects, Data Engineers, Data Scientists, Front-End/Backend developers ) to build a Big Data Analytics and AI platform. Have hands-on practical experience with cloud and on-prem data tech.

Qualifikationen

  • Apache Spark4 J.
  • Azure Synapse Analytics
  • Big Data4 J.
  • Data Mining5 J.
  • Data Science5 J.
  • Databricks3 J.
  • Django
  • Document Retrieval
  • Internet of Things (IoT)
  • Java (allg.)3 J.
  • Maschinelles Lernen
  • Microsoft Azure4 J.
  • Salesforce.Com

Projekt‐ & Berufserfahrung

Senior Data Engineer
Kundenname anonymisiert, Düsseldorf
6/2025 – offen (9 Monate)
Life Sciences
Tätigkeitszeitraum

6/2025 – offen

Tätigkeitsbeschreibung

Data Engineering for SAP–Databricks Pipelines in Environmental Sustainability Reporting

Design and implementation of enterprise-scale SAP-to-Databricks data pipelines to support environmental sustainability reporting for chemicals, packaging, and real substances. The objective was to enable accurate tracking of the global environmental footprint and sustainability metrics, supporting regulatory compliance and monitoring of chemicals under investigation across multiple business departments.

Comprehensive analysis of SAP business-process-driven data pipelines, including in-depth examination of SAP table schemas, joins, inheritance and self-inheritance patterns, and complex inter-table relationships. Each business process typically involved 5 to 12 interrelated SAP tables, requiring careful orchestration to preserve business semantics.

Definition and implementation of a Data Mesh–based data pipeline architecture, alongside Databricks schema modeling to support reliable daily ingestion of SAP data into the Databricks Lakehouse.

Implementation of PySpark-based polymorphic class structures in Databricks to capture and encapsulate SAP business logic for chemical and real substance processes. This abstraction enabled a highly automated, robust, and error-free flow of substance data from SAP to Databricks.

Post-processing of raw SAP data using specialized PySpark classes and Databricks utility functions for joins, data cleaning, filtering, and aggregation. These transformations produced customized data pipelines tailored to the specific needs of different business units.

Publication of processed datasets as governed data products, registered in the enterprise data catalog and made available for subscription by multiple departments. Creation of secure Databricks views governed by Unity Catalog authorization groups, enabling controlled access by downstream consumers. Integration with Power BI to allow business users to consume curated sustainability data products.

Implementation of data lifecycle management and end-to-end data lineage tracking within Databricks to ensure transparency, traceability, and compliance.

Ongoing, on-demand development of new sustainability data pipelines, frequently requested by business units to address evolving compliance requirements and newly defined sustainability data logic.

Building agentic workflows using Databricks Agent Bricks to automate dynamic data pipelines and workflows, addressing subtle toxicity nuances among chemical substances with very similar compositions.

Development of real-time and near–real-time data ingestion workflows using Azure Data Factory, Azure Fabric, Databricks Lakehouse, and Azure Event Hub to ingest data from external systems, perform preprocessing in Azure, and persist curated outputs as governed data products.

Eingesetzte Qualifikationen

Azure Synapse Analytics, Big Data, Databricks, Microsoft Azure

Senior Agentic AI Scientist
Kundenname anonymisiert, München
11/2024 – 5/2025 (7 Monate)
IT & Entwicklung
Tätigkeitszeitraum

11/2024 – 5/2025

Tätigkeitsbeschreibung

1) Agentic AI with Reasoning and Reinforcement Learning:

Design and implementation of an enterprise-grade Agentic AI reasoning system based on System 2 reasoning, operating on large-scale enterprise and historical customer data. The system autonomously plans and executes complex collections of customer tasks, leveraging long-term customer interaction history.

Development of novel Reinforcement Learning (RL)-based AI agents to intelligently automate traditionally manual enterprise processes. These agents synthesize new, efficient AI workflows using Chain-of-Thought (CoT) and Tree-of-Thought (ToT) reasoning frameworks.

Implementation of Tree-of-Thought reasoning using Monte Carlo Tree Search (MCTS), guided by a custom-designed Upper Bound Cost function. Introduction of a new GRPO-based RL reward formulation for scoring and optimizing reasoning trajectories. Utilization of Proximal Policy Optimization (PPO) and Process Reward Models (PRM) to guide agent behavior and enable robust temporal reasoning.

Creation of a novel Risk Agent Model (RAM) to compute and evaluate the economic, ethical, and corporate risks associated with each AI agent decision. Integration of a Human-in-the-Loop (HITL) module that continuously monitors high-risk decisions, enabling human operators to approve, modify, or halt actions prior to execution.



2) Optimization of GPU Training Cost and Performance:
Optimization of foundational model training to significantly reduce GPU costs. Re-engineering of training pipelines for NVIDIA A100 GPUs by transitioning from pure FP32 computation to mixed precision (FP32 + FP16/BF16).

Detailed FLOP analysis and recomputation, including precise token-size selection based on target foundational model parameter counts, to improve computational efficiency without degrading model quality.

Low-level CUDA optimization using C++, enabling large-scale AI models to achieve high performance on lower-cost GPUs, reducing reliance on premium hardware such as A100 while maintaining training throughput and stability.

Eingesetzte Qualifikationen

Informatik, Mathematische Optimierung, Neuronale Netze, Reinforcement Learning, Transformer

Data Scientist and Data Engineer for Corporate Open AI ChatBot and Azure AI Search RAG Development
Kundenname anonymisiert, ...
4/2024 – 10/2024 (7 Monate)
Öl- und Gasindustrie
Tätigkeitszeitraum

4/2024 – 10/2024

Tätigkeitsbeschreibung

Development of Internal Corporate ChatBot based on Azure RAG Vectorization


Development of a RAG Indexer Pipeline that ingests EnBW corporates Documents (i.e., PDF, Powerpoints, Images etc.) from SharePoint and MSTeams into Azure Blob Storage and data lake.

Write python codes that use Azure Form Recognizer, Azure Document Intelligence, Azure AI Search to perform OCR text extraction from PDFs and Images.
Implementation of large document semantic summarization using LangChain and OpenAI. Development of document chunking and data vectorization techniques to build a Vectorization Index that it used by the corporate chatbot.

Usage of Azure ML Studio to create integrated vectorization indexing pipeline for the RAG, which is based on Azure AI Search Enrichment Skillsets. Creation of Azure Functions to handle Custom Skillsets during document processing.

Development and extension of a frontend chatbot python based application using Django

Development of CI/CD pipeline for the automatic End-to-End ingestion, cracking, chunking, enrichment and vectorization of incoming documents from share point and Azure Data Factory.

Eingesetzte Qualifikationen

Django, Document Retrieval, Microsoft Azure

Azure Data Architect & Data Scientist
Kundenname anonymisiert, .
7/2023 – 3/2024 (9 Monate)
Telekommunikation
Tätigkeitszeitraum

7/2023 – 3/2024

Tätigkeitsbeschreibung

Real-Time Data Pipeline from Marketing Use Case:
Creation of a data pipeline using Azure Data Factory, Autoloader, Databricks Delta Live table, Kafka Client, which ingests and re-organizes marketing data from Sales Force. Creation of a data pipeline that captures customer data from LinkedIn campaigns, new company followers from LinkedIn company page and customer profiles from LinkedIn Sales Navigator. This data is stored in Databrick Lakehouse and utilized to build machine learning model for Next Best Action.

360° Contact Nurturing and Next Best Action ML Model:
Inference of contact interest and engagement from Social Media and Sales Force data (i.e., Lead, Campaigns, Portfolio). Building of machine learning models for Contact Nurturing and Next Best Action, as well as a cross selling product/portfolio recommendation system based on Neural Networks and XGBOOST and Matrix Factorization. This enables marketing team to handover quality leads to the sales team.

Eingesetzte Qualifikationen

Databricks, Data Mining, Data Science, Microsoft Azure, Salesforce.Com

Data Scientist & Cloud Data Architect
Kundenname anonymisiert, .
7/2022 – 6/2023 (1 Jahr)
Telekommunikation
Tätigkeitszeitraum

7/2022 – 6/2023

Tätigkeitsbeschreibung

Dockerization of a Machine Learning based Customer Default Credit Rating financial application on AWS Cloud to support monthly credit rating of customers.

Algorithmic extension and Retraining of a Customer Credit Rating application which comprises of multiple classifiers and regressor to predict if a customer would default on his/her credit or not. And by how much.

Eingesetzte Qualifikationen

Amazon Web Services (AWS), Databricks, Data Science, Microsoft Azure

Data Architect, Overall Data Quality Testing
Kundenname anonymisiert, .
11/2021 – 6/2022 (8 Monate)
Versicherungen
Tätigkeitszeitraum

11/2021 – 6/2022

Tätigkeitsbeschreibung

Creation of Data Pipeline and Data Flow using Azure Data Factory and Databricks Spark Cluster.

Hands on programming of customized data flow logic in PySpark, Scala, Python to trigger dedicated spark jobs that ensure the end-to-end movement of data from Azure Data Lake through several Postgre Databases to Power BI.

Eingesetzte Qualifikationen

Databricks, Microsoft Azure

Lead Data Scientist
Kundenname anonymisiert, München
8/2021 – 11/2021 (4 Monate)
Gesundheitswesen
Tätigkeitszeitraum

8/2021 – 11/2021

Tätigkeitsbeschreibung

Creation of a customized Resnet architecture for image classification. Generally, a wide variety of CNN, RNN and Word-Embedding architectures were explored. Using the aforementioned image features vectors, a muti-stage pipeline of models such as XGBoost, CNN, LSTM, custom Restnet and Unet were assembled.

Eingesetzte Qualifikationen

Data Science, Data Mining, Maschinelles Lernen, Neuronale Netze, Amazon Web Services (AWS)

Data Architect / Data Engineer
Kundenname anonymisiert, München
10/2020 – 7/2021 (10 Monate)
Banken
Tätigkeitszeitraum

10/2020 – 7/2021

Tätigkeitsbeschreibung

- Setup predictive maintenance cloud Infrastructure with Azure IoT Hub
- Automate stream data transfer from IoT devices to IoT Hub
- Implementation of Messaging Queue to store IoT data in Azure data lake
- Creation of Databricks notebooks and spark clusters for IoT sensor data analysis - Use PySpark , Deep Neural Networks and other Data Mining and Machine learning techniques to analyse data

Eingesetzte Qualifikationen

Big Data, Data Mining, Internet of Things (IoT)

Data Architect (Festanstellung)
Kundenname anonymisiert, München
3/2020 – 2/2021 (1 Jahr)
Banken
Tätigkeitszeitraum

3/2020 – 2/2021

Tätigkeitsbeschreibung

- Data Hub concept creation and architecture on Azure Cloud
- Implementation of data pipelines for the ingestion in Azure Data Factory.
- Implementation of Azure Functions to pre-process ingested data
- Secure orchestration and movement of data from ADF to Azure Datalake using Azure Key Vault permissions.
- Optimal p partitioning of Datalake data in Databrick Delta Lake format
- Creations of multiple data Apache spark data aggregation scripts in PySpark and Scala
- Setup of Databrick cluster, computation of business KPIs and expose results as rest interface to Qlik, PowerBI visualization

- Setup Azure Active Directory tenant, enable Databricks AD Passthrough, Azure Apps and CosmosDB management.

Eingesetzte Qualifikationen

Apache Spark, Big Data, Microsoft Azure

Data Scientist (Festanstellung)
Kundenname anonymisiert, München
4/2017 – 1/2019 (1 Jahr, 10 Monate)
Banken
Tätigkeitszeitraum

4/2017 – 1/2019

Tätigkeitsbeschreibung

-Architecture and building of a Machine Learning Python API that is seamlessly integrated to the Spring Boot Java based platform. The Python API uses of Flask/Gunicorn server.
- Build machine learning smart alert system which is based on a combination of self-made outlier detection algorithm and publicly available outlier detection algorithms
- Build of a machine learning system for bank data prediction.
- Using XGBoost and an ensemble of other machine learning techniques for bank data forecasting.
- Deep reinforcement learning for banking network optimization. Usage of both convolution and recurrent neural network to teach multiple agents.

Eingesetzte Qualifikationen

Big Data, Data Mining, Data Science

Data Architect / Data Engineer/Software Developer (Festanstellung)
Kundenname anonymisiert, München
6/2016 – 7/2019 (3 Jahre, 2 Monate)
Banken
Tätigkeitszeitraum

6/2016 – 7/2019

Tätigkeitsbeschreibung

- Build on-premise Data Centre

- Install on-premise clusters with numerous head and worker nodes using Linux Ubuntu

- Build data Ingestion pipelines to ingest data from file protocols (FTP/SFTP), relational databases (e.g., MySQL, Postgres, Oracle, MSQL Server), document database (e.g., MongoDB), ERP systems (e.g., GreenPlum, SAP), file systems

- Architecture and building of a Data Prep Layer whereby imported data are aggregated using custom Spark function in Scala, SparkSQL, Spark UDF.

- Utilization of Hive and HiveContext to improve the speed of spark data aggregation.

- Performance tuning and optimal data structure setup of MongoDB to deliver KPI results with billions of data points within microsecond

- Build micro-services to access business KPI results from MongoDB and other internal databases.

- Build Java and Spring Boot web-based application

Eingesetzte Qualifikationen

Apache Hadoop, Apache Spark, Apache Tomcat, Data Mining, Data Science, Java (allg.), Spring Framework

Ausbildung

Data Mining & Machine Learning
Dr. Ing.
2015
RWTH Aachen Universität

Weitere Kenntnisse

Big Data: Databricks Delta Lake, Lakehouse, Azure Data Factory, Datalake, AAD, Hadoop, Spark, Kafka, AWS, Data Ingestion and Integration

Machine Learning: PyTorch, Reinforcement Learning, Deep Neural network Architectures, Neural Search Architecture, Data Pipelines, Scikit-learn, Weka

Web Fullstack: Java, Vue, Angular, JSP, MongoDB, MySQL

Mobile: Swift, IoS App Development

Embedded Systems/IOT : Azure IOT Hub, Event Hub, Mosquito

Persönliche Daten

Sprache
  • Englisch (Muttersprache)
  • Deutsch (Fließend)
  • Französisch (Grundkenntnisse)
Reisebereitschaft
Umkreis (bis 200 km)
Home-Office
bevorzugt
Profilaufrufe
2449
Alter
48
Berufserfahrung
21 Jahre und 3 Monate (seit 11/2004)
Projektleitung
12 Jahre

Kontaktdaten

Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.

Jetzt Mitglied werden