Position Details

Salary $109K – $200K USD / year

Type Not Specified

Experience mid

Exp. Years 8+ years

Education Master's degree or equivalent experience

Category Data & Analytics

About this role

Seeking a data engineer to develop large-scale cloud-based data pipelines supporting AI and ML initiatives, with expertise in distributed data systems and cloud platforms.

Key Responsibilities

Build scalable data pipelines
Enable analytics and personalization
Develop ML data workflows
Collaborate with data scientists
Design data ingestion and governance frameworks

Technical Overview

Environment includes Hadoop, Spark, Hive, Presto, Databricks, cloud storage solutions (S3, Azure Blob), and data formats like Parquet and ORC, with a focus on scalable data workflows.

Ideal Candidate

An experienced data engineer with over 8 years of expertise in building large-scale, fault-tolerant data pipelines on cloud platforms like AWS and Azure. Skilled in distributed data technologies such as Hadoop, Spark, and Hive, with a focus on data governance and ML workflows.

Must-Have Skills

8+ years of data engineering experienceExperience with distributed data technologies (HadoopHivePrestoSpark)3+ years with cloud technologies (DatabricksS3Azure Blob StorageEMRAthenaGlue)Experience with streaming data

Nice-to-Have Skills

Building scalablefault-tolerant data pipelinesData ingestion and transformation frameworksData governanceML data workflows

Tools & Platforms

HadoopHivePrestoSparkDatabricksS3Azure Blob StorageAWS EMRAthenaGlueDeltaParquetORC

Required Skills

Data EngineeringHadoopHivePrestoSparkDatabricksS3Azure Blob StorageAWS EMRAthenaGlueDeltaParquetORC

Hard Skills

Data EngineeringLarge-scale data pipelinesCloud platformsApache HadoopHivePrestoSparkDatabricksS3Azure Blob StorageNotebooksAWS EMRAthenaGlueDeltaParquetORC

Soft Skills

collaborationproblem-solvingownershipinnovationadaptability

Industry & Role

Industry Data Science, Cloud, SaaS

Job Function Data engineering for large-scale cloud-based analytics and ML support

Keywords for Your Resume

Data EngineeringLarge-scale data pipelinesCloud platformsHadoopHivePrestoSparkDatabricksS3Azure Blob StorageAWS EMRAthenaGlueDeltaParquetORC

Deal Breakers

Less than 8 years of experience, Lack of cloud platform experience (Databricks, S3, EMR), No experience with distributed data technologies, Unwillingness to work in a collaborative environment

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile