✦ Luna Orbit — Science & Research

Research Scientist II - AMZ9698004

at Amazon.com

📍 US, WA, Seattle Unknown 💰 $136K – $136K USD / year Posted April 14, 2026
Salary $136K – $136K USD / year
Type Full-Time
Experience senior
Exp. Years one year of research or work experience in the job offered, or as a Research Scientist, Research Assistant, Software Engineer, or a related occupation; Must have 1 year of experience in the following skill(s)
Education PhD (or foreign equivalent) in Statistics, Psychometrics, Educational Measurement, Quantitative Psychology, Data Science, Industrial-Organizational (I/O) Psychology, or a related field
Category Science & Research

AWS is hiring multiple Research Scientist II candidates to perform and support psychometric research and operational work for certification exam development. The role covers item and test analyses, test form assembly, item bank health monitoring, standard setting, and statistical/psychometric modeling to ensure exam quality and validity.

  • Perform operational psychometric analysis (item and test analyses, item bank health, standard setting, JTA)
  • Build optimal test forms and pools using optimization techniques
  • Develop and apply statistical and psychometric modeling to evaluate exam validity, reliability, applicability, efficiency, accuracy
  • Participate in research projects using ML, NLP, and GenAI to improve operational processes
  • Develop automation code in R or Python for psychometric workflow pipelines and report results

The technical scope includes operational psychometric tasks such as item response theory, classical test theory, equating and scaling, and form/pool assembly, along with complex test designs like LOFT and CAT. The scientist will also develop automation code using R or Python and apply machine learning (ML) and/or natural language processing (NLP) techniques, while following ISO/IEC:2012 17024 and NCCA accreditation standards.

The ideal candidate is a Research Scientist II with a PhD in psychometrics, educational measurement, statistics, or a closely related quantitative field. They have 1+ year experience performing operational psychometric work for large-scale education, licensure, or certification programs, including item response theory, equating and scaling, standard setting, and job task analysis, plus programming skills in R and/or Python and exposure to ML or NLP.

PhD or foreign equivalent degree in StatisticsPsychometricsEducational MeasurementQuantitative PsychologyData ScienceIndustrial-Organizational (I/O) Psychologyor a related fieldone year of research or work experience in the job offered or as a Research ScientistResearch AssistantSoftware Engineeror a related occupation1 year of experience in large-scale educationlicensureor certification assessment programsoperational psychometric tasks on large-scale educationlicensureor certification assessment programs including item analysisequating and scalingitem response theoryclassical test theoryform and pool assemblyitem bank health analysisstandard settingand job task analysisat least one of the complex test designs such as linear-on-the-fly testing (LOFT)computerized adaptive testing (CAT)at least one of the following areas including machine learning (ML) or natural language processing (NLP)Programming skills in at least one script-based programming language (RPython)Follow accreditation standards set by ISO/IEC:2012 17024 and the National Council for Certifying Agencies (NCCA)
Generative Artificial Intelligence (GenAI)Natural Language Processing (NLP)Machine Learning (ML)
psychometric analysisautomated test assemblyitem analysisoptimal item bank designjob task analysisstandard settingquality assuranceproject planningitem response theoryclassical test theoryequating and scalingform and pool assemblyitem bank health analysislinear-on-the-fly testing (LOFT)computerized adaptive testing (CAT)Machine Learning (ML)natural language processing (NLP)Generative Artificial Intelligence (GenAI)RPythonstatistical modelingpsychometric modelingISO/IEC:2012 17024National Council for Certifying Agencies (NCCA)
psychometric aspects of exam development and operationsautomated test assemblyitem analysesoptimal item bank designjob task analysisstandard settingquality assuranceproject planningitem response theoryequating and scalingclassical test theoryform and pool assemblyitem bank health analysisoperational psychometric tasks on large-scale educationlicensureor certification assessment programslarge-scale educationlicensureor certification assessment programsstatistical modelingpsychometric modelingevaluate and ensure exam validityevaluate and ensure exam reliabilityevaluate and ensure exam applicabilityevaluate and ensure exam efficiencyevaluate and ensure exam accuracymachine learning (ML)natural language processing (NLP)generative artificial intelligence (GenAI)RPythonautomation code development for psychometric workflow pipelinepresentinterpretand communicate analysis results through written and oral reportsaccreditation standards follow ISO/IEC:2012 17024National Council for Certifying Agencies (NCCA) standards for valid psychometric practicesbuilding optimal test forms and pools via optimization techniquesanalyzing and monitoring item bank healthdeveloping and applying statistical and psychometric modelingrefresh test blueprintsstandard setting studiesjob task analysis (JTA)
presenting resultsinterpreting analysis outcomescommunicating to stakeholders through written reportscommunicating to stakeholders through oral reportsstakeholder managementparticipating in research projectsengagement with professional communitycollaboration
Industry Education
Job Function Conduct psychometric research and operational exam analytics to ensure AWS certification exams meet validity, reliability, and accreditation requirements.
Role Subtype Research Scientist
Tech Domains Python, AI & Machine Learning
Research Scientist IIAMZ9698004psychometric analysisexam developmentexam operationsautomated test assemblyitem analysisoptimal item bank designjob task analysisJTAstandard settingquality assuranceproject planningitem response theoryclassical test theoryequating and scalingform and pool assemblyitem bank health analysisstandard setting studieslinear-on-the-fly testing (LOFT)LOFTcomputerized adaptive testing (CAT)CATMachine Learning (ML)natural language processing (NLP)GenAIGenerative Artificial Intelligence (GenAI)RPythonstatistical modelingpsychometric modelingvalidityreliabilityapplicabilityefficiencyaccuracyISO/IEC:2012 17024International Organization for Standardization/International Electrotechnical Commission 17024National Council for Certifying Agencies (NCCA)

PhD (or foreign equivalent) in an approved quantitative/psychometrics-related field, Must have 1 year of experience in operational psychometric tasks for large-scale education, licensure, or certification assessment programs, Must have at least one of LOFT or CAT complex test designs, Must have programming skills in at least one of R or Python

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile