Over 4 years’ experience with open source te
Business Intelligence (BI)
Projekt‐ & Berufserfahrung
Kundenname anonymisiert, Oslo
11/2016 – 12/2017Tätigkeitsbeschreibung
• Introducing Hadoop stack to the organization
• Designing and building Hadoop and Spark clusters in AWS
• Develop and drive technical roadmap for data and development infrastructure
• Defining knowledge roadmaps for internal employees in the field of Hadoop
• Machine Learning (evolutionary algorithms, feature engineering, neural networks) using Python
• Testing new technologies
As a Big Data developer focus was on all levels of the stacks. I have built two clusters in AWS, one is a pure Hadoop cluster (HDP 2.6) and the other one is a Spark cluster with separate storage in S3. The latter one launches on demand with dynamic resources. My tasks are architecture, maintenance and upgrades of the clusters. Both clusters rely heavily on Spark as the computational engine where I am mostly using Scala (for data integration), Python (for data science - ML) and SparkSQL.
Hive is the data warehouse on top of HDFS to provide users the SQL API.
Tested new visualization tools (Zeppelin, Druid, re:dash, superset…) to find best possible stack.
Key technology terms: Hortonworks, Ambari, HDFS, MapReduce2, YARN, Zookeeper, Hive, Zeppelin, Spark, Storm, Ranger, Redis, Flume, Sqoop, Druid, scikit learn, Jupyter.
PyCharm and Jupyter were used for the data science work. Main focus was on feature engineering, machine learning, evolutionary algorithms and neural networks.
Apache Hadoop, Python, Scala
Ort: University of Ljubljana
Over 4 years’ experience with open source technologies (big data). Installation, administration and configuration of Hadoop ecosystems - Apache and Hortonworks distributions in the AWS. Building and configuring Spark clusters and writing Spark code (Scala, PySpark and SparkR).