A global leading technology company in the healthcare industry is looking for Senior Data Engineer to join their team.
You will be tasked with analysing data, creating solutions and building production ready models.
- Design and build data pipelines (mostly in Spark) to process terabytes of data
- Orchestrate in Airflow the data tasks to run on Kubernetes/Hadoop for the ingestion, processing and cleaning of data.
- Create Docker images for various applications and deploy them on Kubernetes
- Design and build best in class processes to clean and standardize data.
- Troubleshoot production issues in our Elastic Environment
- Tuning and optimizing data processes
- Work on Proof of Concepts for Big Data and Data Science
- Modelling of big volume datasets to maximize performance for our BI & Data Science Team
- Create real-time analytics pipelines using Kafka / Spark Streaming
- Undergraduate degree in Computer Science or relevant field
- Proven experience of developing processes in Spark
- Hands on experience of writing complex SQL queries
- Building ETL/Data Pipelines
- Exposure to Kubernetes and Linux containers (i.e. Docker) for at least 1 year
- Related/complementary open source software platforms and languages (e.g. Scala, Python, Java, Linux)
- Exposure to the following technologies: Airflow, Hive /HBase / Presto Jenkins / Travis, Kafka, Cloud technologies: Amazon AWS or Microsoft Azure
- Previous experience with Relational Databases (RDBMS) & Non- Relational Database
- Analytical and problem-solving experience applied to a Big Data environment and Distributed Processing.
- Experience working in projects with agile/scrum methodologies
- Exposure to DevOps methodology
- Data warehousing principles, architecture and its implementation in large environments