Big data engineer

freiberufler Big data engineer auf freelance.de
Verfügbarkeit einsehen
Europa
en  |  sl  |  no
100€/Stunde
0356 OSLO
04.10.2019

Kurzvorstellung

Data Engineer with 5 year experience with big data, infrastructure-as-code, automation, cloud and machine learning engineering. My focus are open source technologies in cloud.
Prior to that, 10 years with various Oracle technologies.

Ich biete

IT, Entwicklung
  • Apache Hadoop
  • Python
  • Scala
  • Amazon Web Services (AWS)
  • Apache Spark
  • Cloud (allg.)
Forschung, Wissenschaft, Bildung
  • Data Science

Fokus
  • Terraform
  • Infrastructure as a Code
  • Hortonworks

Projekt‐ & Berufserfahrung

Hortonworks Data Platform consultant
IBM, OSLO
7/2018 – 2/2019 (8 Monate)
Dienstleistungsbranche
Tätigkeitszeitraum

7/2018 – 2/2019

Tätigkeitsbeschreibung

Responsibilities:
• writing Terraform and Ansible scripts for dynamic provision of HDP 2.6 and HDP 3.0 clusters
• using cloud infrastructures VMWare and AWS
• integration of IBM tools with HDP: Spectrum Scale (former GPFS), BigSql, DataServer Management, SPSS Analytic Server
• provisioning HDP clusters using ansible-hortonworks from GitHub repository
• manual and automatic installs of HDP services such as Ambari, HDFS, YARN, Ranger, Hive, Spark…
• configuration and administration of HDP clusters
• Embedding Hashicorp tools (Terraform, Consul, Vault, Packer) to automate cluster provisioning

Details:
IBM’s customer’s idea is to automate creation of environments for data storage and analysis. All secrets are stored in Vault and cluster configuration is defined in Consul. The solution takes configuration and builds an HDP cluster based on the configuration. Everything is automated and dynamic. Clusters for test, production, storage, analysis,... in various sizes are provisioned.
Terraform scripts read configuration values from Consul, store secrets to Vault, and provision the infrastructure. Ansible scripts, called from Terraform, set up the HDP architecture on the infrastructure delivering a configured and functional HDP cluster ready to use.
When provisioning a default cluster, a certain amount of instances are created, HDP cluster is installed on them, connection to existing Spectrum Scale storage is done and IBM’s BigSql is installed. Various configuration in Consul allow creating various HDP clusters with different services according to the needs.

Eingesetzte Qualifikationen

Apache Hadoop, DevOps, Software Architektur / Modellierung


Data SCientist
Norwegian Business School, OSLO
1/2018 – 6/2018 (6 Monate)
Hochschulen und Forschungseinrichtungen
Tätigkeitszeitraum

1/2018 – 6/2018

Tätigkeitsbeschreibung

Responsibilities:
• AWS administrator – planning and building environments, automatization of processes – pay as you go
• Collecting data sources and doing feature engineering
• Data analysis in Python (Pandas, NumPy) using Jupyter and PyCharm
• Introducing Data Science and AWS to the researchers, helping them use AWS, Jupyter, Linux, Python…

Eingesetzte Qualifikationen

Data Science, Python, Amazon Web Services (AWS)


Big data specialist
Deloitte, Oslo
11/2016 – 12/2017 (1 Jahr, 2 Monate)
Handel
Tätigkeitszeitraum

11/2016 – 12/2017

Tätigkeitsbeschreibung

Responsibilities:
• Introducing Hadoop stack to the organization
• Designing and building Hadoop and Spark clusters in AWS
• Develop and drive technical roadmap for data and development infrastructure
• Defining knowledge roadmaps for internal employees in the field of Hadoop
• Machine Learning (evolutionary algorithms, feature engineering, neural networks) using Python
• Testing new technologies

Details:
As a Big Data developer focus was on all levels of the stacks. I have built two clusters in AWS, one is a pure Hadoop cluster (HDP 2.6) and the other one is a Spark cluster with separate storage in S3. The latter one launches on demand with dynamic resources. My tasks are architecture, maintenance and upgrades of the clusters. Both clusters rely heavily on Spark as the computational engine where I am mostly using Scala (for data integration), Python (for data science - ML) and SparkSQL.
Hive is the data warehouse on top of HDFS to provide users the SQL API.
Tested new visualization tools (Zeppelin, Druid, re:dash, superset…) to find best possible stack.
Key technology terms: Hortonworks, Ambari, HDFS, MapReduce2, YARN, Zookeeper, Hive, Zeppelin, Spark, Storm, Ranger, Redis, Flume, Sqoop, Druid, scikit learn, Jupyter.
PyCharm and Jupyter were used for the data science work. Main focus was on feature engineering, machine learning, evolutionary algorithms and neural networks.

Eingesetzte Qualifikationen

Apache Hadoop, Python, Scala


Big Data Engineer
University of St. Gallen, St. Gallen
11/2015 – 10/2016 (1 Jahr)
Hochschulen und Forschungseinrichtungen
Tätigkeitszeitraum

11/2015 – 10/2016

Tätigkeitsbeschreibung

Responsibilities:
• Big Data Full Stack Developer
• Research, recommend and implement Big Data technologies
• Develop and drive technical roadmap for data and development infrastructure
• Administering Hortonworks & Apache Hadoop clusters in the cloud (OpenStack)
• Architect, design, data modelling and recommend data architecture
• Preparing analytical and data visualization environments
• Data analysis using SparkSql, Pyspark, Java, Hive and SparkR in Zeppelin, RStudio

Details:
University of St. Gallen is involved in a project owned by Switch (www.switch.ch) and the University’s task is to provide infrastructure to cover researchers’, students’ and customers’ needs with working on distributed systems with “Big Data” technologies.
My tasks include decision making around technologies, cluster architecture, cluster set up on Switch OpenStack cloud, cluster configuration; testing, preparing and introducing the technologies to the users.
Key technology terms: OpenStack, Ambari, HDFS, MapReduce2, YARN, Zookeeper, Hive, Zeppelin, Spark, Storm, Slider, Ranger, Redis, Elastic, Flume, MySql, Sqoop, Cloudbreak.
Cluster computing framework offered to the users is Apache Spark. Focus is on Spark SQL, PySpark and SparkR.
CLI, Apache Zeppelin or RStudio are available on client nodes for the users to interact with the cluster. Statistical tools like Gauss, Strata or Matlab on Big Data technologies are under testing and evaluation.
I am maintaining 5 clusters, four of them are Hortonworks distributions and one is Apache. The main tool for administration is Ambari, some services (Zeppelin, Spark) are installed manually and maintained from the CLI.
With some eager students, we have started a Big Data Club at the University where the goal is bring Big Data closer to the business students.

Eingesetzte Qualifikationen

Apache Hadoop, Software Architektur / Modellierung, Python


Ausbildung

1998
(2004)
Jahr: 2004
Ort: University of Ljubljana

Qualifikationen

Hadoop, Hortonworks, YARN, Spark, Python, Machine learning, Neural network, Ambari, Apache, Scala, Java, Storm, NoSql, Redis, DevOps, Hashicorp, Ansible, Open Source

Über mich

I am a software developer with years of experience in programming, databases and information systems. Although I have worked a lot on Big data architectures in the past, my career focus is on automation, infrastructure-as-code, machine learning engineering.
Over 4 years’ experience with open source technologies (big data). Installation, administration and configuration of Hadoop ecosystems - Apache and Hortonworks distributions in the AWS. Building and configuring Spark clusters and writing Spark code (Scala, PySpark and SparkR).

Persönliche Daten

Sprache
  • Englisch (Fließend)
  • Slowenisch (Muttersprache)
  • Norwegisch (Fließend)
  • Deutsch (Grundkenntnisse)
  • Serbisch (Gut)
Reisebereitschaft
Europa
Arbeitserlaubnis
  • Europäische Union
Profilaufrufe
581
Alter
39
Berufserfahrung
19 Jahre und 3 Monate (seit 07/2000)
Projektleitung
17 Jahre

Kontaktdaten

Nur registrierte PREMIUM-Mitglieder von freelance.de können Kontaktdaten einsehen.

Jetzt Mitglied werden »