Position Details

Type Full-Time

Experience mid

Exp. Years Not specified

Education Not specified

Category AI & Machine Learning

About this role

This role involves building and operationalizing AI/ML systems in mission-critical environments, focusing on deployment, monitoring, and reliability of models.

Key Responsibilities

Own ML lifecycle end-to-end
Deploy models into mission environments
Implement monitoring for models and systems
Build cloud-native ML infrastructure
Establish data discipline

Technical Overview

The technical environment includes Python, ML frameworks (PyTorch, TensorFlow), containerization with Docker, orchestration with Kubernetes, and monitoring tools like Prometheus and Grafana, within cloud-native and distributed systems.

Ideal Candidate

The ideal candidate is a mid-level AI/ML engineer with experience deploying machine learning systems into production, especially within classified or mission-critical environments. They possess strong skills in Python, ML frameworks, and cloud-native infrastructure, with a focus on reliability and system stability.

Must-Have Skills

Experience deploying ML systems into production environmentsStrong background in Python and ML frameworks (PyTorchTensorFlow)Experience with ML pipeline orchestration tools (KubeflowAirflowArgo)Experience with containerization (Docker)Experience with KubernetesExperience deploying models into mission environmentsExperience with model versioninglineagereproducibilityExperience with monitoring tools (PrometheusGrafanaOpenTelemetry)

Nice-to-Have Skills

Experience with cloud platforms (AWSAzureGCP)Experience with data governance tools (lakeFS)Experience with experiment tracking (MLflowClearML)Experience with AI/ML models (LLMstransformer modelscomputer vision models)Knowledge of metadata standards (STAC)

Tools & Platforms

KubeflowAirflowArgoDockerKubernetesPrometheusGrafanaOpenTelemetryMLflowlakeFS

Required Skills

PythonPyTorchTensorFlowMLflowKubeflowAirflowArgoDockerKubernetesOpenTelemetryPrometheusGrafanadistributed systemscloud-native infrastructuremodel versioningmodel deploymentmodel monitoringdata versioning

Hard Skills

PythonPyTorchTensorFlowMLflowKubeflowAirflowArgoDockerKubernetesOpenTelemetryPrometheusGrafanaCloud-native infrastructureDistributed systemsModel versioningModel deploymentAI/ML systemsModel monitoringData versioningModel reproducibility

Soft Skills

problem-solvingsystems thinkingcollaborationreliability focusattention to detail

Industry & Role

Industry Defense, Government/Public Sector

Job Function Develop and operate AI/ML systems for mission-critical deployment and monitoring

Role Subtype AI & Machine Learning

Tech Domains Python, Kubernetes, Docker, MLflow, TensorFlow, PyTorch, OpenTelemetry, Prometheus, Grafana

Clearance & Visa

Clearance Required TS/SCI Preferred

Visa Sponsorship Not Specified

Keywords for Your Resume

ML OpsMachine learningAI/ML systemsML pipelinesKubeflowAirflowArgoDockerKubernetesModel deploymentModel versioningModel reproducibilityModel monitoringDistributed systemsCloud-native infrastructureAI modelsComputer visionLLMsTransformer modelsData versioningOpenTelemetryPrometheusGrafanaPythonTensorFlowPyTorch

Deal Breakers

Lack of experience with ML deployment in production environments, No experience with Kubernetes or Docker, No security clearance or experience working in classified environments, Lack of experience with monitoring tools (Prometheus, Grafana, OpenTelemetry)

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

MLOps Engineer — AI/ML Systems & Deployment (TS/SCI Preferred)

Get matched to jobs like this