MLOps Lead, Central Technology

Remote Full-time
About the position Responsibilities • Provide technical MLOps leadership for a team of MLOps Engineers, managing and leading the team in operating AI training and inference systems. • Drive the application of MLOps and DevOps principles across multiple platforms, ensuring peak operational efficiency. • Define end to end metrics program including full proactive monitoring and alerting systems for the MLOps team. • Facilitate model training through collaboration with AI Researchers to ensure best practices in machine learning and deep learning. • Optimize Kubernetes based AI Lifecycle platform through IAC practices and integrate with On-Prem HPC systems. • Collaborate on Data systems for AI model training with Data Infrastructure Eng team and Science data teams. • Lead MLOps team supporting on-call rotation with a focus on automation and proactive alerting. Requirements • BS, MS, or PhD degree in Computer Science or a related technical discipline or equivalent experience. • 7+ years of relevant coding and systems experience. • 5+ years of systems Architecture and Design experience, with a broad range of MLOps experience. • Proven technical leadership in SRE and MLOps related experience. • Strong experience scaling containerized applications on Kubernetes or Mesos. • Cloud Platform proficiency with AWS, GCP, or Microsoft Azure. • MLOps experience working with medium to large scale GPU clusters in Kubernetes. • Working knowledge of Nvidia CUDA and AI/ML custom libraries. • Knowledge of Linux systems optimization and administration. • Solid Coding experience with a systems language such as Rust, C/C++, C#, Go, Java, or Scala. • Expertise with a scripting language such as Python, PHP, or Ruby. • Experience in integrating Data with the AI Lifecycle. • AI/ML Platform Operations experience in an environment integrated with challenging data and systems platform challenges. • Large scale Streaming data systems integration experience. • Experience with Hadoop, Spark, and/or Kafka deployments. • Workflow scheduling tools experience such as Apache Airflow, Dagster, or Apache Beam. • Understanding of Data Engineering, Data Governance, Data Infrastructure, and AI/ML execution platforms. Nice-to-haves • Experience with PyTorch, Keras, or Tensorflow. • Experience with HPC and Slurm. Benefits • Generous employer match on employee 401(k) contributions. • Annual benefit for employees that can be used for housing, student loan repayment, childcare, commuter costs, or other life needs. • CZI Life of Service Gifts awarded to employees to support causes closest to them. • Paid time off to volunteer at an organization of your choice. • Funding for select family-forming benefits. • Relocation support for employees moving to the Bay Area. Apply tot his job
Apply Now

Similar Opportunities

Software (ML Product) Engineer (Staff/ Senior, Open Source, Python)

Remote Full-time

MLOps Tech Lead; Remote

Remote Full-time

Staff Software Engineer (AI/ML Platform)

Remote Full-time

Senior Devops Engineer- ML Engineering Support

Remote Full-time

App Developer for Beauty Consultation App (iOS + Web)

Remote Full-time

Technical Product Manager, Mobile App Attribution and Measurement job at StackAdapt in US National

Remote Full-time

Lead Product Manager, AI Agents & Emerging Products

Remote Full-time

Principal Technical Product Manager – Driving Innovation in AT&T’s Flagship Mobile App Experience Across Multiple Locations

Remote Full-time

Self-Employed Mortgage Advisor

Remote Full-time

Mobile Mortgage Advisor

Remote Full-time

**Experienced Data Entry Specialist – Application Administration and Customer Support**

Remote Full-time

**Experienced Customer Support Associate (Seasonal) – Join blithequark's Fastest-Growing Sports Gaming Team**

Remote Full-time

Machine Operator I - 3rd shift - $21.45/hr + $2.00 Shift Premium

Remote Full-time

Experienced Remote Data Entry Assistant – Flexible Work from Home Opportunity with arenaflex for Career Growth and Development

Remote Full-time

Merck Associate Director., Safety & Environment, Industrial Hygiene CoE (Remote) in Columbia, South Carolina

Remote Full-time

[Remote] Account Manager, Mid Market REMOTE FROM ANYWHERE IN THE USA

Remote Full-time

Customer Support Representative (Work from Home) at Flipkart

Remote Full-time

Experienced Remote Live Chat Support Specialist – Customer Service and Technical Support Expert for blithequark

Remote Full-time

BIM (Building Information Modeling) Manager, Civil Structures (Revit Focused)

Remote Full-time

Compliance Officer II - Testing & Analytics (Remote)

Remote Full-time
← Back to Home