Summary

This work provides a practical guide to using Docker for ML and AI workflows. It explains why Docker is essential for reproducibility, portability, and scalability of ML models and pipelines. Key topics include Docker vs virtual machines, benefits for AI/ML roles (ML engineers, data scientists, DevOps), and how Docker integrates into the entire ML lifecycle—from data exploration and model training to experiment tracking, deployment, and monitoring. The work also highlights popular use cases like containerized notebooks, model serving, and cloud deployment, demonstrating how Docker ensures consistent, efficient, and scalable ML workflows.

image-2.png

The Importance of Docker for ML Model

  • AI is transforming every industry – from finance and healthcare to retail and manufacturing, AI-driven solutions are creating efficiency, insights, and automation.

  • ML models are growing more complex and harder to deploy – modern deep learning and LLMs require large dependencies, GPU acceleration, and distributed setups.

  • Need for scalable, portable, reproducible environments – researchers and engineers must ensure their models run consistently across development, testing, and production.

This is where Docker comes in. It provides a lightweight, standardized way to package and deploy ML models and applications.

Docker vs. Virtual Machines

Feature Virtual Machine (VM) Docker (Container)
Architecture Runs a full guest OS on top of a hypervisor Shares the host OS kernel; only packages app + dependencies
Resource Usage Heavy – each VM needs its own OS Lightweight – containers reuse the host OS
Startup Time Minutes to boot Seconds (or less) to start
Portability Works across hypervisors but larger image size Extremely portable, small images, runs anywhere Docker is supported
Efficiency High overhead due to multiple OS layers Very efficient, near-native performance
Scalability Harder to scale quickly Easy to orchestrate (Kubernetes, Docker Swarm)
Use Case Good for running multiple OS types on one machine Ideal for microservices, ML model deployment, CI/CD pipelines

image.png Image retrieved from https://k21academy.com/docker-kubernetes/docker-vs-virtual-machine/


Why Docker for AI/ML?

  • Reproducibility → Models and pipelines run identically on a laptop, test cluster, or cloud environment.

  • Simplified Dependencies → No more “works on my machine” issues. You can lock specific versions of Python, ML libraries (TensorFlow, PyTorch), or tools (MLflow, CUDA).

  • Model Versioning & Experimentation → Containers can capture different model versions, making it easier to track experiments and roll back if needed.

  • Portability → Build once, run anywhere: across Windows, Linux, macOS, or cloud platforms with Docker runtime.

  • Scalability → Containers integrate seamlessly with orchestration platforms (Kubernetes, ECS) to scale ML models and LLM services.

  • Efficiency → Lightweight containers maximize GPU/CPU utilization, avoiding the overhead of full virtual machines.

  • Rapid Iteration → Spin up or tear down environments in seconds, accelerating research and deployment cycles.

✅ Docker is the bridge between ML models and production systems — ensuring consistency, efficiency, and scalability.

image-4.png

How Docker Supports Different Roles in AI/ML

  • ML Engineer → Ensures reproducible experiments by locking specific versions of models and applications, while also enabling scalable infrastructure for inference and deployment.

  • DevOps Engineer → Simplifies deployment by containerizing models and seamlessly orchestrating them with Kubernetes or other platforms.

  • Data Scientist → Eliminates “environment hell” — no more “works on my machine” problems. Provides a consistent and reliable setup across teams.

  • AI Hobbyist → Makes it easy to run state-of-the-art models locally on a laptop or PC without complex setup.

👉 Overall, Docker streamlines workflows and can boost ML productivity by at least 10% or more.

Here’s a clear, structured explanation you can use for “Where Docker fits into the ML workflow”:


Where Docker Fits into the ML Workflow

Docker can support every stage of the ML lifecycle, making workflows reproducible, portable, and scalable.

  • Data Collection & Exploration → Run JupyterHub or Jupyter Notebooks inside Docker, ensuring a consistent environment for data cleaning, preprocessing, and exploration.

  • Model Development & Training → All dependencies (Python, CUDA, ML libraries) are defined in one container, so the same environment is used by every team member, avoiding version conflicts.

  • Experiment Tracking → Tools like MLflow, Weights & Biases, or TensorBoard can be containerized, making experiment management reproducible and easy to share.

  • Model Packaging → The trained model and its dependencies are bundled into a lightweight container image, ensuring it runs the same way in test, staging, and production.

  • Model Deployment → Containers can be deployed with Kubernetes, Docker Swarm, or cloud services, scaling inference efficiently across CPUs and GPUs.

  • Monitoring & Updates → Containers simplify rolling updates and monitoring of models, ensuring quick fixes, retraining, or scaling as data and usage evolve.

✅ Docker acts as the backbone of the ML workflow — from notebooks to deployment — providing consistency, portability, and scalability at every step.

🔹 Popular Use Cases for Docker in ML

  • Containerized Jupyter Notebooks → For research and reproducible data exploration.

  • GPU-Accelerated Training → Run ML/DL model training efficiently on GPU clusters.

  • Experiment Tracking → Combine MLflow (or similar tools) with Docker for consistent and reproducible experiment management.

  • Model Serving → Package models as REST APIs using Flask or FastAPI for easy integration.

  • Deployment to Cloud & Platforms → Push models to Hugging Face Spaces, AWS, GCP, or Azure using Docker containers.

  • Edge & Portable Inference → Deploy lightweight containers on edge devices for real-time inference.

Well-Known Companies Using Docker for AI/ML

  1. Netflix – Orchestrating ML Workflows Netflix uses Docker containers to manage and scale its machine learning workflows. By containerizing ML tasks, Netflix ensures consistent environments across development and production, powering critical use cases such as content recommendation and streaming optimization.

  2. Walmart – Scaling AI Solutions Walmart leverages Docker to containerize its AI applications, enabling scalable and efficient deployment across its massive retail infrastructure. This supports diverse use cases, including inventory management and customer experience optimization.

  3. Uber – Streamlining ML Pipelines Uber employs Docker to containerize its ML workflows, ensuring consistent environments for both development and deployment. This enhances scalability and reproducibility of models, which power key services like ETA predictions and dynamic pricing.

  4. IKEA (Ingka Group) – Scalable MLOps with Docker & Kubernetes IKEA adopted Docker and Kubernetes to build a robust MLOps platform. This setup enables dynamic scaling, uniform development environments, improved collaboration, and enhanced security, accelerating prototyping and deployment while driving innovation.

  5. ZEISS Microscopy – Cross-Platform AI Deployment ZEISS, a leader in optics and optoelectronics, uses Docker to deploy AI models across cloud and local Windows-based systems. Containerization ensures consistent performance and simplifies distribution of complex models, enhancing their microscopy software capabilities.

  6. NASA – Accelerating Data Analysis NASA employs Docker to standardize and speed up ML workflows, especially for processing vast amounts of satellite data. Containerization ensures consistent, reproducible environments, which are essential for scientific research and analysis.

Running Models Locally with Docker

We can access a wide range of pre-trained models from Docker Hub, which is like a GitHub for Docker images.

Once find the image, we can run it locally with a single command, without worrying about dependencies or environment setup. This makes it easy to test, experiment, or deploy models quickly.

Example: Running a Pre-trained Model with Docker

  1. Search for a model on Docker Hub Go to https://hub.docker.com/ and find a containerized ML model (e.g., tensorflow/tensorflow:latest-py3).

  2. Pull the Docker image

    docker pull tensorflow/tensorflow:latest-py3
    
  3. Run the container

    docker run -it --rm tensorflow/tensorflow:latest-py3 python
    
    • -it → interactive terminal
    • --rm → remove container after exit
    • python → starts a Python shell inside the container

Using Docker with MCP Tooling

MCP (Model Context Protocol) allows AI models to access real-world tools in a controlled and standardized way.

By combining Docker + MCP, we can:

  • Self-host MCP toolkits → Run Terraform, Kubernetes, or CLI agents in a containerized environment.
  • Enable tool-aware autonomous agents → AI models can safely interact with external tools through MCP.

Example: Run a trusted MCP server using Docker:

docker run -p 3000:3000 realops/kubernetes-mcp-server:latest
  • This starts the MCP server on port 3000, ready to serve AI agents.

Install Docker Desktop

What is Docker Desktop?

Docker Desktop is an application that enables you to build, run, and manage Docker containers on your local machine. It provides:

  • A graphical interface to manage containers and images.
  • A Docker Engine to run containers.
  • Integration with Kubernetes for orchestration.
  • Tools for building, testing, and deploying applications, including ML models, in a consistent environment.

How to Download and Install Docker Desktop

  1. Download

  2. Install

    • Windows: Run the .exe installer, follow the setup steps, and enable WSL 2 if prompted (required for Linux containers).
    • macOS: Open the .dmg file, drag Docker to Applications, and launch it.
  3. Verify Installation

    • Open a terminal or command prompt and run:

      docker version
      docker run hello-world
      
    • If successful, Docker is ready to use. Make sure both "Client" and "Server" are showing up:
In [1]:
# Docker commands can be run directly inside a Jupyter Notebook
#!docker version

Build and Manage ML Dev Environments with Docker

Docker Concepts - Images, Container, Registry, Repository. Pulling Images

The goal is to leverage pre-built AI/ML Docker images to run real-world tools such as JupyterLab and MLflow, and to learn how to connect, manage, and persist work across the full ML lifecycle.

With Docker images, there’s no need to worry about library versions (e.g., Pandas, NumPy, etc.) since everything is already packaged within the environment. This makes setup quick, consistent, and easy.

To get started, ensure Docker is installed with both the Client and Server components.

Look at image registry Docker Hub for MLFlow:

image.png

Docker Hub is a popular container image registry, but it’s not the only one. GitHub also provides its own container registry in addition to hosting code.

For example, you can pull an MLflow image from GitHub’s registry (ghcr.io):

docker pull ghcr.io/mlflow/mlflow:v3.4.0

This shows that GitHub is no longer just a code repository — it also serves as a container image registry, where both the latest and older versions of images are available.

image.png

In [2]:
# Here are all docker commands
#!docker
In [3]:
#!docker system info
In [4]:
# This shows the events while start running the containers
#!docker system events

Step 1: Pull the MLFlow Image

In [5]:
!docker pull ghcr.io/mlflow/mlflow:latest
latest: Pulling from mlflow/mlflow
61f697520022: Pulling fs layer
61f697520022: Download complete
Digest: sha256:7924215e2a805104296c5eb963e43795adbe6b5c1a542faaa69e1e0cb51b2095
Status: Downloaded newer image for ghcr.io/mlflow/mlflow:latest
ghcr.io/mlflow/mlflow:latest

It pulls all layers. The definition of ghcr.io/mlflow/mlflow:v3.4.0 are:

  • ghcr.io: container registry. This comes from GitHub.
  • mlflow: how created that. This is organization who created this image.
  • mlflow: this is repo or repository
  • v3.4.0: we can get version or latest
In [6]:
# list the image locally
!docker image ls
REPOSITORY               TAG       IMAGE ID       CREATED         SIZE
ghcr.io/mlflow/mlflow    latest    7924215e2a80   4 days ago      1.25GB
ghcr.io/mlflow/mlflow    v3.4.0    9c9e24a3fc24   3 weeks ago     1.25GB
nginx                    latest    8adbdcb969e2   8 weeks ago     279MB
postgres                 15        5ab68e212eab   4 months ago    608MB
node-app                 latest    fa3502158bb2   7 months ago    203MB
postgres                 latest    0321e2252ebf   7 months ago    621MB
python                   3         08471c63c5fd   8 months ago    1.47GB
ubuntu                   latest    72297848456d   8 months ago    117MB
ubuntu                   jammy     ed1544e45498   8 months ago    117MB
hello-world              latest    e0b569a5163a   8 months ago    20.4kB
apache/airflow           2.9.1     2514cee14aae   17 months ago   2.01GB
jupyter/scipy-notebook   latest    fca4bcc9cbd4   24 months ago   5.76GB

Step 2: Run Docker with Port Mapping

There are many ways to run a container (see !docker run --help). When running MLflow, we want to access its Web UI. Since MLflow runs inside the container on a specific port, it’s not directly available on the host machine. To make it accessible, we need to map the container’s port to a host port during startup.

In [7]:
## Get help for docker run 
#!docker run --help

When running a container, we use the option -p host_port:container_port to map ports.

For example, MLflow typically runs on port 5000 inside the container. To make it accessible from the host machine, we can map it to port 5001 on the host:

docker run -p 5001:5000 ghcr.io/mlflow/mlflow:latest mlflow server --host 0.0.0.0

The figure below illustrates how port mapping works. On our desktop or laptop, Docker runs inside a virtualized environment (the Docker host/VM). Within this host, containers run their own services—MLflow in this case, on port 5000.

Since we can’t directly connect to the container’s internal port from the desktop browser, we need to expose it by mapping it to a host port. By specifying -p 5001:5000, we’re saying:

  • 5001 → port on the host machine (what we connect to in the browser)
  • 5000 → port inside the container where MLflow is actually running

image-4.png

After port mapping we provide the image which comes from registry as we discussed:

Instead of just -p, we’ll also add the -d flag to run the container in detached mode. This means the container will run in the background, freeing up our terminal for other commands instead of streaming all the logs. On top of that, instead of Docker assigning a random name (like sad_mccarthy), we’ll specify a clear, meaningful name for the container using --name mlflow_test.

docker run -d -p 5001:5000 --name mlflow_test ghcr.io/mlflow/mlflow:latest mlflow server --host 0.0.0.0

When we visit http://localhost:5001 in the browser, Docker forwards the traffic to port 5000 inside the container. This mechanism is called port mapping or port forwarding.

image.png

So you would access MLflow at: 👉 http://localhost:5001

image-2.png

This is how MLflow runs inside a container. Let’s break down what happens behind the scenes, as illustrated in the diagram below:

  • The image is first pulled from a registry (e.g., GitHub or Docker Hub) and stored locally.
  • The image is then run to launch a container.
  • The Docker daemon attaches to the container, allowing it to track logs, status, and lifecycle events.
  • A network is created, and once the container starts, it transitions into the running state.
  • With networking and port mapping in place, we can connect to the containerized MLflow service from our host machine.

image-2.png

After running the container, how do we know if container stop or run, and what is the better way of running container that it keeps running in the background (-d option).

The docker ps command shows only running containers:

In [10]:
!docker ps
CONTAINER ID   IMAGE                          COMMAND                  CREATED          STATUS          PORTS                    NAMES
b6b94b6eb8ef   ghcr.io/mlflow/mlflow:latest   "mlflow server --hos…"   20 minutes ago   Up 20 minutes   0.0.0.0:5001->5000/tcp   mlflow_test
In [ ]:
#!docker logs b6b94b6eb8ef

docker container ls lists the containers:

In [11]:
!docker container ls
CONTAINER ID   IMAGE                          COMMAND                  CREATED          STATUS          PORTS                    NAMES
b6b94b6eb8ef   ghcr.io/mlflow/mlflow:latest   "mlflow server --hos…"   20 minutes ago   Up 20 minutes   0.0.0.0:5001->5000/tcp   mlflow_test
In [12]:
# Shows last 1 running container -n 1
!docker ps -n 1
CONTAINER ID   IMAGE                          COMMAND                  CREATED          STATUS          PORTS                    NAMES
b6b94b6eb8ef   ghcr.io/mlflow/mlflow:latest   "mlflow server --hos…"   20 minutes ago   Up 20 minutes   0.0.0.0:5001->5000/tcp   mlflow_test

Step 3: Launch Jupyter Notebook Inside Container

When running Jupyter with Docker, we don’t want our work to disappear when the container is deleted. To make sure our notebooks are always available, we mount a local directory into the container as a volume. This way, files are stored on our machine, not inside the container.

Example command:

docker run -d -p 8888:8888 --name jupyterlab_test \
  -v ~/ml-docker/notebooks:/home/jovyan/work jupyter/scipy-notebook:latest
  • docker run → Run a new container.
  • -d → Detached mode (runs in the background).
  • -p 8888:8888 → Maps port 8888 inside the container to port 8888 on the host, so you can access Jupyter in the browser at http://localhost:8888.
  • --name jupyterlab_test → Assigns a name to the container (jupyterlab_test).
  • -v ~/ml-docker/notebooks:/home/jovyan/work → Mounts a local folder (~/ml-docker/notebooks) to the container’s working directory (/home/jovyan/work). Anything you edit in Jupyter is saved locally, and local changes appear inside the container.
  • jupyter/scipy-notebook:latest → The Docker image to use. This image is part of the Jupyter Docker Stacks project, providing a ready-to-use scientific Python environment.

What’s included in jupyter/scipy-notebook

This image comes preloaded with:

  • Python 3
  • Jupyter Notebook & JupyterLab
  • NumPy, SciPy, pandas
  • Matplotlib, Seaborn
  • SymPy
  • scikit-learn, statsmodels
  • Joblib, cloudpickle
  • Tools like pip, conda, and git

✅ With this setup:

  • The notebooks persist on your machine (~/ml-docker/notebooks), even if the container is deleted.
  • We can edit files either inside Jupyter or directly on computer.

image.png

In [14]:
#!docker rm -f jupyter_mlflow_test
In [15]:
!docker run -d -p 8888:8888 --name jupyter_mlflow_test \
    -v "D:\Learning\MyWebsite\FinalGithub\ToPublihsed\projects\Docker-for-AI-ML:/home/jovyan/work" \
    jupyter/scipy-notebook start-notebook.sh --notebook-dir=/home/jovyan/work
a9057c4068fd3b9b18c5e87cbd3b5b8dc742b746c1bca0c505456c3e1b63477b
In [16]:
!docker ps
CONTAINER ID   IMAGE                          COMMAND                  CREATED          STATUS                   PORTS                    NAMES
a9057c4068fd   jupyter/scipy-notebook         "tini -g -- start-no…"   10 seconds ago   Up 9 seconds (healthy)   0.0.0.0:8888->8888/tcp   jupyter_mlflow_test
b6b94b6eb8ef   ghcr.io/mlflow/mlflow:latest   "mlflow server --hos…"   24 minutes ago   Up 24 minutes            0.0.0.0:5001->5000/tcp   mlflow_test

This created container with notebook. The port 8888:

image-2.png

After that open the long for jupyter_mlflow_test, click on http://127.0.0.1:8888/lab?token=.................. 5.jpg

Going to this link opens JupyterLab, but it needs token which can be retrieved from jupyter server list

6.jpg

In [21]:
!jupyter server list
Currently running servers:
http://localhost:8888/?token=2dd656f2c8eea957812c3b92da307cb1e310eef97375b3e7 :: D:\Learning\MyWebsite\FinalGithub\ToPublihsed\projects\Docker-for-AI-ML

By adding this token, the JypterLab will be started: image.png

Now we should create a virtual environment then open the JupyterLab there and install required python packages: 8.jpg

Step 4: Connect Jupyter Notebook to MLFlow

After launching JupyterLab and MLflow as separate containers, we can now connect them together. JupyterLab runs in one container, while MLflow runs in another. The goal is to execute code or experiments in Jupyter and send the results to MLflow, which will handle tracking and logging.

MLflow is exposed on port 5000 inside its container, mapped to port 5001 on the host machine. To enable Jupyter to communicate with MLflow, we connect through the host port at host.docker.internal:5001. This special domain name resolves to the Docker host, allowing seamless communication between the two containers. See illustration below:

image-2.png

To connect Jupyter Notebook with mlflow, this url should be used

mlflow_tracking_uri = 'http://localhost:5001'

To connect JupyterLab with mlflow, this url should be used

mlflow_tracking_uri = 'http://host.docker.internal:5001'

In [4]:
#%pip install scikit-learn==1.7.2
#%pip install matplotlib==3.10.6
#%pip install matplotlib_inline==0.1.7
#%pip install mlflow==3.4.0
#%pip install numpy==2.3.3
#%pip install pandas==2.3.3
In [86]:
#pip install pandas scikit-learn numpy
#pip install mlflow

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()
import mlflow
from mlflow.models import infer_signature
import mlflow.sklearn

# This url is for running Jupyter Notebook
mlflow_tracking_uri = 'http://localhost:5001' 

## This url is for running Jupyter Lab
#mlflow_tracking_uri = 'http://host.docker.internal:5001' 

# set up url
mlflow.set_tracking_uri(mlflow_tracking_uri)
print("Tracking to:", mlflow.get_tracking_uri())
Tracking to: http://localhost:5001
In [87]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import warnings
warnings.filterwarnings('ignore')

# Load sample dataset (Boston Housing from sklearn deprecated, using California housing instead)
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing(as_frame=True)
df = data.frame

# Features + target
X = df.drop("MedHouseVal", axis=1)
y = df["MedHouseVal"]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Start MLflow experiment
mlflow.set_experiment("MedHouseVal_LinearRegression")

with mlflow.start_run(run_name="linear_regression_run"):
    # Model
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Predictions
    y_pred = model.predict(X_test)

    # Metrics
    mse = mean_squared_error(y_test, y_pred)

    # Log parameters and metrics
    mlflow.log_param("model_type", "LinearRegression")
    mlflow.log_metric("mse", mse)

    # Create prediction vs actual plot
    plt.figure(figsize=(6,6))
    plt.scatter(y_test, y_pred, alpha=0.5)
    plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], "r--")
    plt.xlabel("Actual")
    plt.ylabel("Predicted")
    plt.title(f"Prediction vs Actual (MSE={mse:.3f})")
    plt.grid(True)
    plt.savefig("pred_vs_actual.png")
    plt.close()
   
    # Save plot
    mlflow.log_artifact("pred_vs_actual.png")

    # Log model
    mlflow.sklearn.log_model(model, "linear_regression_model")

print(f"✅ Run complete. MSE: {mse:.3f}")
print("You can now view results in MLflow UI.")
2025/10/12 10:30:36 WARNING mlflow.models.model: `artifact_path` is deprecated. Please use `name` instead.
2025/10/12 10:30:44 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
🏃 View run linear_regression_run at: http://localhost:5001/#/experiments/494636870324001332/runs/4930c83f63e449eca49ed5d5bc61968d
🧪 View experiment at: http://localhost:5001/#/experiments/494636870324001332
✅ Run complete. MSE: 0.556
You can now view results in MLflow UI.

This shows experiment "MSE_LinearRegression" on MLFlow:

image.png

  • Docker essentials for AI/ML workflows
Concept / Option Explanation AI/ML Use Case
Image A pre-packaged environment with code, libraries, and dependencies. Immutable blueprint to create containers. Use official images like pytorch/pytorch or tensorflow/tensorflow to quickly spin up ML environments with GPU, CUDA, and Python pre-installed.
Container A running instance of an image. It’s isolated, but can interact with the host through ports/volumes. Run Jupyter Notebook, MLFlow, or model training inside containers for reproducibility.
Tag A label that specifies a version of an image (default: latest). Choose specific versions, e.g., pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime to ensure experiments run with exact dependencies.
Port Mapping (-p) Maps container ports to host machine ports: -p host:container. Expose Jupyter Notebook (-p 8888:8888) or MLFlow UI (-p 5000:5000) to access from browser.
Detached Mode (-d) Runs the container in the background. Start long-running services like MLFlow tracking server without keeping the terminal busy.
Interactive Terminal (-it) Attaches to container with interactive shell (stdin/stdout). Debug inside container, run bash, install missing Python packages, or start Jupyter interactively.
Volume Mount (-v) Mounts host directories into container for persistence: -v host_path:container_path. Save datasets, models, or notebooks outside the container (e.g., -v ~/data:/workspace/data). Ensures data isn’t lost when container stops.