Position Details

Type Not Specified

Experience mid

Exp. Years Not specified

Education Not specified

Category Data & Analytics

About this role

Build and maintain foundational data infrastructure and analytics tools for Amazon PeopleInsights eXperience (APIX). Design scalable data pipelines and data lake capabilities that transform complex HR Ops and employee experience data into actionable, self-serve insights.

Key Responsibilities

Design and implement high performant and cost-efficient data lake infrastructure using AWS big data stack, Spark, Hive, SQL, Apache Airflow, AWS Glue, EMR, S3, Redshift and OLAP technologies
Collaborate with Business Intelligence Engineers to build semantic layers and optimize SQL queries
Follow software best practices including coding standards, code reviews, and testing
Work directly with customers to integrate new data types, curate data profiles, perform data quality checks, and incorporate feedback
Enable technical and non-technical customers to drive self-serve analytics and ad-hoc reporting; iterate via proof of concepts

Technical Overview

Responsible for high-performance, cost-efficient data lake infrastructure on AWS using Spark, Hive, SQL, Apache Airflow, AWS Glue, Amazon EMR, Amazon S3, and Amazon Redshift with OLAP technologies. Collaborates on semantic layers, query optimization, testing/code reviews, and data quality workflows (profiling and validation), while supporting self-serve reporting.

Ideal Candidate

The ideal candidate is a mid-level Data Engineer experienced building scalable data lake infrastructure and data pipelines in a big data environment. They have hands-on experience with AWS big data stack components (Spark, Hive, SQL, Apache Airflow, AWS Glue, EMR, S3, Redshift) and can optimize SQL for fast, cost-efficient analytics while enabling self-serve reporting and data quality workflows.

Must-Have Skills

experience working with big databuilding data lakes and data processing servicesexperience with one or more query language (SQLPL/SQLHiveQLSparkSQL)experience with one or more scripting language (PythonScala)designbuildand maintain scalable data pipelines and infrastructureability to optimize SQL queries for fastcost-efficient access

Tools & Platforms

Amazon Web Services (AWS)AWS big data stackAmazon EMR (Elastic MapReduce)Amazon S3 (Simple Storage Service)Amazon RedshiftAmazon Athena (implied by SQL OLAP? Not specified; omit)Apache AirflowAWS GlueHiveSparkSQLPL/SQLHiveQLSparkSQLPythonScala

Required Skills

Data lake infrastructureAWS big data stackSparkHiveSQLApache AirflowAWS GlueEMRS3RedshiftOLAPsemantic layersoptimize SQL queriescoding standardscode reviewstestingdata quality checksdata profilingself-serve reportingproof of conceptsPythonScalaPL/SQLHiveQLSparkSQL

Hard Skills

data lake infrastructureAWS big data stackSparkHiveSQLApache AirflowAWS GlueAmazon EMRS3RedshiftOLAP technologiessemantic layers in reporting and analysisoptimize SQL queriescoding standardscode reviewstestingdata processing systemsquery languagePL/SQLHiveQLSparkSQLscripting languagePythonScaladata quality checksdata profilingbuilding data pipelinesanalytics toolsself-serve reportingproof of conceptsintegrating new data types

Soft Skills

collaborative problem-solvingability to work in a degree of ambiguitywillingness to develop quick proof of conceptsiterate and improveinclusion culturecross-functional collaboration with product managersbusiness intelligence engineersand HR business partnerswork with technical and non-technical internal customerscustomer integration and feedback incorporation

Industry & Role

Industry SaaS

Job Function Engineer scalable AWS data infrastructure and pipelines to deliver people analytics capabilities.

Role Subtype Data Engineer

Tech Domains Amazon Web Services, Python, SQL / PostgreSQL, Data & Analytics, Linux, Apache Airflow (not canonical), Kubernetes (not present)

Keywords for Your Resume

Data EngineerAmazon PeopleInsights eXperience (APIX)APIXData teamdata infrastructuredata lakedata pipelinesscalable data pipelinesAWS big data stackSparkHiveSQLApache AirflowAWS GlueEMRAmazon EMR (Elastic MapReduce)S3RedshiftOLAPsemantic layersreporting and analysisoptimize SQL queriescoding standardscode reviewstestingdata quality checksdata profilingself-serve reportingproof of conceptsPythonScalaPL/SQLHiveQLSparkSQLAmazon EMRAmazon S3Amazon Redshift

Deal Breakers

Must have experience building data lakes and data processing services, Must have experience with one or more query language (SQL, PL/SQL, HiveQL, SparkSQL), Must have experience with one or more scripting language (Python, Scala)

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Data Engineer, Amazon PeopleInsights eXperience (APIX)

Get matched to jobs like this