Roadmap to a job
Machine Learning / AI Engineer
A 2026 ML/AI Engineer designs, ships, and operates intelligent systems end-to-end, not just notebooks
7 stages · 33 skills · 75 free resources
Core stack
Track your progress
0 / 38 done
Stage 01
Stage 0, Software Engineering Foundations
Write clean, tested Python and operate like an engineer (version control, environments, CLI) before touching any ML. This is the strongest predictor of hireability and the most under-rated step.
Python (core + intermediate)Essential3 links
Python is a high-level, dynamically typed programming language widely used in data science, machine learning, and backend development. It supports object-oriented, functional, and procedural styles, and its extensive ecosystem of libraries makes it the primary language for building and deploying ML systems.
Why it matters · The lingua franca of ML/AI; everything downstream assumes fluency (OOP, typing, error handling, comprehensions).
Git & GitHubEssential2 links
Git is a distributed version control system that tracks changes to source code over time, enabling collaboration and rollback. GitHub is a cloud-based hosting platform built on Git that adds pull requests, code review workflows, issue tracking, and CI/CD integrations for managing software projects.
Why it matters · Every team runs on version control, and your public repos double as portfolio and hiring signal.
Command line, virtual environments & dependency managementEssential2 links
The command line (shell) provides direct control over the operating system for running scripts, managing files, and automating tasks. Virtual environments (venv, conda, uv) isolate project dependencies, while tools like pip, requirements.txt, and pyproject.toml specify and lock package versions for reproducibility.
Why it matters · Reproducible environments (venv/conda/uv, requirements/pyproject) prevent the 'works on my machine' failures that sink ML projects.
SQL & relational databasesEssential3 links
SQL (Structured Query Language) is the standard language for querying and manipulating data stored in relational databases such as PostgreSQL, MySQL, and SQLite. It is used to filter, aggregate, join, and transform tabular data, and forms the foundation for data extraction in analytics and ML pipelines.
Why it matters · Most real ML data lives in databases; SQL is a near-universal requirement and needed for nearly every data-pulling task.
Testing, clean code & code review basicsRecommended2 links
Software testing involves writing automated checks (unit, integration) using frameworks like pytest to verify that code behaves correctly. Clean code practices emphasize readability, modularity, and consistent style, while code review is a collaborative process where teammates inspect changes before they are merged.
Why it matters · ML code that ships needs tests and readability; teams screen for software discipline, not just model accuracy.
Stage 02
Stage 1, Math & Statistics for ML (right-sized)
Build enough intuition to understand WHY models work, debug them, and read papers, without a year of pure theory. Learn it as a tool, alongside code.
Linear algebra (vectors, matrices, dot products)Essential2 links
Linear algebra is a branch of mathematics concerned with vectors, matrices, and linear transformations. In machine learning, it underlies data representations, dimensionality reduction, and the core computations in neural networks, including matrix multiplications used in forward and backward passes.
Why it matters · Embeddings, neural nets, and attention are all linear algebra; you can't reason about model internals without it.
Probability & statisticsEssential2 links
Probability theory describes the likelihood of events and forms the mathematical foundation for statistical inference. In ML, it is applied to modeling uncertainty, understanding data distributions, designing experiments, and evaluating models through metrics such as confidence intervals and hypothesis tests.
Why it matters · Distributions, hypothesis testing, and Bayes underpin model evaluation, uncertainty, and experiment design.
Calculus & gradients (intuition)Recommended2 links
Calculus studies rates of change and accumulation, with derivatives measuring how a function's output changes with its input. In machine learning, gradient descent relies on partial derivatives to iteratively minimize a loss function, and backpropagation uses the chain rule to compute gradients through a neural network.
Why it matters · Gradient descent and backprop are calculus; you need the intuition, not the ability to do proofs by hand.
Mathematics for Machine Learning (consolidated path)Optional2 links
Mathematics for Machine Learning is a structured curriculum that unifies linear algebra, multivariate calculus, probability, and principal component analysis (PCA) into a single coherent learning path. It is commonly taught through the book of the same name (Deisenroth et al.) and associated online courses, providing the mathematical grounding needed to understand modern ML algorithms.
Why it matters · A single track ties linear algebra, calculus, and PCA together if you prefer one structured route.
Stage 03
Stage 2, Data Wrangling & Classical Machine Learning
Turn messy data into features and train/evaluate classical models well. This is still the bread-and-butter of most production ML and the base that deep learning builds on.
NumPy, pandas & data visualizationEssential3 links
NumPy provides efficient N-dimensional array operations and numerical computing primitives in Python. Pandas builds on NumPy to offer DataFrame-based data manipulation for tabular datasets, while visualization libraries such as Matplotlib and Seaborn generate charts and plots for exploratory data analysis.
Why it matters · Most real ML work is cleaning, joining, and shaping data; pandas/NumPy are non-negotiable daily tools.
scikit-learn & classical algorithmsEssential3 links
scikit-learn is a Python library providing consistent implementations of classical machine learning algorithms including linear regression, decision trees, gradient boosting (XGBoost, LightGBM), k-nearest neighbors, k-means clustering, and support vector machines. It also supplies utilities for preprocessing, pipelines, and model selection.
Why it matters · Regression, trees, gradient boosting, KNN, k-means, and the train/validate/test workflow are foundational and still widely deployed.
Feature engineering & model evaluationEssential2 links
Feature engineering is the process of transforming raw data into informative inputs for a model, including encoding categorical variables, scaling, and creating interaction terms. Model evaluation assesses predictive performance using techniques such as cross-validation, precision-recall curves, ROC-AUC, and careful separation of training and test sets to prevent data leakage.
Why it matters · Cross-validation, leakage avoidance, and metric choice (precision/recall/ROC) separate working models from misleading ones.
ML theory (Andrew Ng specialization)Recommended1 link
The Machine Learning Specialization by Andrew Ng (DeepLearning.AI on Coursera) covers supervised learning, unsupervised learning, and best practices for model building including regularization and the bias-variance trade-off. It provides a widely recognized theoretical foundation through video lectures, assignments, and practical exercises in Python.
Why it matters · Gives a coherent mental model of supervised/unsupervised learning and the bias-variance trade-off; free to audit and widely respected.
Stage 04
Stage 3, Deep Learning with PyTorch
Build, train, and debug neural networks for vision, sequence, and tabular problems, using PyTorch, the framework most new ML postings ask for.
Neural network fundamentals & PyTorchEssential3 links
Neural networks are computational models composed of layers of parameterized units that learn representations from data through gradient-based optimization. PyTorch is an open-source deep learning framework that provides dynamic computation graphs, automatic differentiation (autograd), and GPU-accelerated tensor operations used to define, train, and deploy neural network models.
Why it matters · PyTorch now leads TensorFlow in new ML job postings; fluency with tensors, autograd, and training loops is core.
CNNs, RNNs & the Transformer architectureEssential3 links
Convolutional Neural Networks (CNNs) apply learned filters to extract spatial features and are the standard architecture for image tasks. Recurrent Neural Networks (RNNs) process sequential data with shared weights over time steps, while Transformers replace recurrence with self-attention mechanisms, enabling parallel processing and capturing long-range dependencies, which is the foundation for modern large language models.
Why it matters · Transformers (attention) power modern AI; understanding them is required to fine-tune, debug, and reason about LLMs.
Hugging Face Transformers (using & fine-tuning models)Essential2 links
Hugging Face Transformers is an open-source Python library that provides a unified API to load, run, and fine-tune thousands of pretrained models for NLP, vision, and multimodal tasks. It integrates with PyTorch and JAX and supports parameter-efficient fine-tuning methods such as LoRA through the PEFT library.
Why it matters · The standard way to load, run, and fine-tune open models (Llama, Mistral, Qwen, etc.); ubiquitous in industry.
TensorFlow / KerasOptional2 links
TensorFlow is an open-source machine learning framework developed by Google that supports building and training neural networks at scale, with deployment options across servers, mobile, and browsers. Keras is its high-level API that simplifies model construction through layer-based abstractions, functional and sequential interfaces, and built-in training loops.
Why it matters · Still present in many enterprise/legacy stacks, but PyTorch is the better primary investment in 2026.
Checkpoint
Don't wait, start applying
You don't have to finish the path to begin. Early applications and interviews show you exactly what to learn next.
Stage 05
Stage 4, LLM & AI Application Engineering
Build production AI features on foundation models: prompting, RAG over your own data, tool-using agents (including MCP for tool/data wiring), and, critically, evaluation. In 2026 this is a mainstream requirement, not a specialty.
LLM APIs, prompting & structured outputsEssential3 links
LLM APIs (such as those from Anthropic, OpenAI, and providers on OpenRouter) expose large language models over HTTP for tasks including text generation, summarization, and reasoning. Prompting techniques shape model behavior through system messages and few-shot examples, while structured outputs constrain responses to JSON schemas using tool/function calling or response format parameters.
Why it matters · The entry point to AI engineering: chat/completions APIs, tool/function calling, structured (JSON) outputs, and prompt caching.
Embeddings, vector databases & RAGEssential3 links
Embeddings are dense numerical representations of text, images, or other data produced by neural encoders that place semantically similar content close together in vector space. Vector databases (Pinecone, Weaviate, pgvector) index these embeddings for fast approximate nearest-neighbor retrieval. Retrieval-Augmented Generation (RAG) combines a retrieval step over a vector index with an LLM to ground responses in specific documents.
Why it matters · RAG is the most-deployed LLM pattern in 2026; you must handle chunking, hybrid search, reranking, and corpus drift.
AI agents & orchestrationEssential3 links
AI agents are systems where a language model iteratively reasons, selects actions, calls external tools, and updates its plan based on intermediate results. Orchestration frameworks such as LangGraph, CrewAI, and PydanticAI provide abstractions for defining agent graphs, managing state across steps, handling retries, and composing multi-agent workflows.
Why it matters · The fastest-growing AI skill: systems where an LLM plans, calls tools, holds state, and recovers from failure (LangGraph/CrewAI/PydanticAI).
Model Context Protocol (MCP) & tool integrationRecommended2 links
Model Context Protocol (MCP) is an open standard introduced by Anthropic in 2024 and adopted across the industry for connecting AI agents to external tools, data sources, and services through a standardized client-server interface. MCP servers expose resources and callable tools, allowing agents to retrieve data, execute actions, and integrate with APIs in a consistent, composable way.
Why it matters · MCP became the de-facto open standard in 2025-2026 for connecting agents to tools and data (backed by Anthropic, OpenAI, Google, Microsoft); increasingly expected for agentic roles.
LLM evaluation & guardrailsEssential2 links
LLM evaluation encompasses methods for measuring the quality, accuracy, and safety of language model outputs, including LLM-as-judge scoring, retrieval metrics (precision, recall, MRR), and task-specific benchmarks. Guardrails are validation layers applied at input and output to detect prompt injection, enforce output schemas, filter harmful content, and prevent model behavior from drifting outside acceptable boundaries.
Why it matters · Without evals (LLM-as-judge, retrieval metrics) and prompt-injection defense, AI features silently regress; eval rigor is exactly what employers screen for.
Fine-tuning (SFT / LoRA / DPO)Recommended2 links
Fine-tuning adapts a pretrained language model to a specific task or style by continuing training on a curated dataset. Supervised fine-tuning (SFT) trains on labeled examples, LoRA (Low-Rank Adaptation) injects small trainable weight matrices to reduce compute cost, and DPO (Direct Preference Optimization) aligns model outputs to human preferences without a separate reward model.
Why it matters · Useful for shaping behavior/format and cutting cost/latency with smaller models, but it's for behavior, not for teaching new facts (use RAG for that).
Stage 06
Stage 5, MLOps, Deployment & Production Systems
Ship models/AI systems as reliable services: containerize, serve via an API, deploy to a cloud, version data/models, and monitor in production. This is what turns a notebook into a hireable skill set.
Serving models as APIs (FastAPI) + DockerEssential2 links
FastAPI is a modern Python web framework for building HTTP APIs with automatic OpenAPI documentation and async support, commonly used to expose ML models as prediction endpoints. Docker packages an application and its dependencies into a portable container image, ensuring consistent behavior across development, testing, and production environments.
Why it matters · The standard pattern for exposing a model/agent; containerization is assumed for essentially any deployment.
Cloud platform (AWS, GCP, or Azure), pick oneEssential2 links
AWS, GCP, and Azure are the three leading public cloud platforms, each offering managed compute, storage, networking, databases, and ML-specific services (SageMaker, Vertex AI, Azure ML). Cloud platforms provide the infrastructure for training, deploying, and scaling ML models without managing physical hardware.
Why it matters · Fluency in one cloud is expected for nearly all ML roles; AWS leads in 2026 with GCP and Azure close behind.
Experiment tracking & data/model versioningEssential2 links
Experiment tracking tools such as MLflow and Weights and Biases (W&B) record hyperparameters, metrics, artifacts, and code for each training run, enabling comparison and reproducibility. Data and model versioning with tools like DVC (Data Version Control) applies Git-like semantics to large datasets and model checkpoints stored in remote storage.
Why it matters · MLflow/W&B + DVC make experiments reproducible and models auditable, core to any real ML pipeline.
End-to-end MLOps (pipelines, CI/CD, monitoring)Essential3 links
MLOps applies software engineering practices to the full ML lifecycle, from data ingestion and model training to deployment and monitoring. Pipeline orchestrators (Kubeflow, Prefect, Airflow) automate workflow steps, CI/CD systems (GitHub Actions) run tests and deploy on code changes, and monitoring tools track prediction drift, data distribution shift, latency, and cost in production.
Why it matters · Orchestration, GitHub Actions, and monitoring (drift, latency, cost) are the 'boring' skills that most distinguish hireable ML engineers.
LLM serving & inference optimization (vLLM, quantization)Recommended2 links
LLM serving frameworks such as vLLM use techniques like PagedAttention and continuous batching to maximize GPU throughput when hosting large language models. Quantization reduces model weight precision (FP8, INT4, GPTQ) to decrease memory footprint and increase inference speed, enabling larger models to run on fewer or less expensive accelerators.
Why it matters · Self-hosting LLMs cost-effectively (PagedAttention, continuous batching, FP8/INT4 quantization) is an increasingly demanded edge.
Kubernetes, Terraform & big data (Spark)Optional2 links
Kubernetes is a container orchestration system that automates deployment, scaling, and management of containerized workloads across clusters. Terraform is an infrastructure-as-code tool for provisioning and managing cloud resources declaratively. Apache Spark is a distributed computing engine for processing large-scale datasets in parallel across many nodes.
Why it matters · Needed at scale and in platform/infra-heavy roles, but not required to land a first ML/AI engineering job.
Stage 07
Stage 6, Portfolio, Specialization & Job Readiness
Prove you can deliver end-to-end and get hired. Build 2-3 deployed, documented projects, pick a depth area, and prepare for ML system design + coding interviews.
End-to-end portfolio projects (deployed + documented)Essential2 links
End-to-end portfolio projects demonstrate the ability to take a problem from raw data through modeling, serving, and monitoring to a live, accessible application. Documentation covers architecture decisions, dataset sources, model choices, and performance metrics, while deployment to a public URL or API makes the work verifiable and shareable.
Why it matters · Employers hire demonstrated delivery (data -> model/RAG/agent -> live API -> monitoring) over certificates; this is your strongest signal.
ML system design interview prepEssential2 links
ML system design interviews assess the ability to architect complete machine learning systems, covering problem framing, data collection and labeling, feature pipelines, model selection, training infrastructure, serving, and monitoring. Preparation involves studying canonical systems (recommendation engines, search ranking, fraud detection) and practicing structured trade-off discussions.
Why it matters · Mid/senior ML interviews center on designing data-to-serving systems (trade-offs, scaling, monitoring), not just algorithms.
Coding & DSA interview practiceRecommended2 links
Coding interviews for ML and AI engineering roles test general software engineering proficiency through algorithmic problems involving data structures (trees, graphs, hash maps) and algorithm design (sorting, dynamic programming, two-pointer techniques). Practice platforms such as LeetCode provide a large bank of problems organized by topic and difficulty.
Why it matters · ML/AI engineering roles still run software-engineering coding rounds; steady practice keeps you competitive.
Pick a depth specialization (NLP/LLMs, CV, RecSys, or platform/MLOps)Recommended2 links
Depth specialization means developing concentrated expertise in one ML subfield: Natural Language Processing and LLMs (text understanding, generation, fine-tuning), Computer Vision (image classification, detection, segmentation), Recommender Systems (collaborative filtering, ranking, retrieval), or ML platform and MLOps (infrastructure, pipelines, tooling). A clear specialization complements a general ML foundation and aligns with specific team needs.
Why it matters · A generalist foundation plus one credible depth area makes you memorable and matches how teams actually hire.
Land the job
Turn these skills into offers
ResuMax takes you from skilled to hired: a resume that proves it, applications tailored per role, and interview reps.
Train on this path
Atlas reads your resume, shows what you already have on this path, and coaches the gaps in order.