Summary
This work provides a practical guide to using Docker for ML and AI workflows. It explains why Docker is essential for reproducibility, portability, and scalability of ML models and pipelines. Key topics include Docker vs virtual machines, benefits for AI/ML roles (ML engineers, data scientists, DevOps), and how Docker integrates into the entire ML lifecycle—from data exploration and model training to experiment tracking, deployment, and monitoring. The work also highlights popular use cases like containerized notebooks, model serving, and cloud deployment, demonstrating how Docker ensures consistent, efficient, and scalable ML workflows.
This is where Docker comes in. It provides a lightweight, standardized way to package and deploy ML models and applications.
| Feature | Virtual Machine (VM) | Docker (Container) |
|---|---|---|
| Architecture | Runs a full guest OS on top of a hypervisor | Shares the host OS kernel; only packages app + dependencies |
| Resource Usage | Heavy – each VM needs its own OS | Lightweight – containers reuse the host OS |
| Startup Time | Minutes to boot | Seconds (or less) to start |
| Portability | Works across hypervisors but larger image size | Extremely portable, small images, runs anywhere Docker is supported |
| Efficiency | High overhead due to multiple OS layers | Very efficient, near-native performance |
| Scalability | Harder to scale quickly | Easy to orchestrate (Kubernetes, Docker Swarm) |
| Use Case | Good for running multiple OS types on one machine | Ideal for microservices, ML model deployment, CI/CD pipelines |
Image retrieved from https://k21academy.com/docker-kubernetes/docker-vs-virtual-machine/
✅ Docker is the bridge between ML models and production systems — ensuring consistency, efficiency, and scalability.
How Docker Supports Different Roles in AI/ML
👉 Overall, Docker streamlines workflows and can boost ML productivity by at least 10% or more.
Here’s a clear, structured explanation you can use for “Where Docker fits into the ML workflow”:
Docker can support every stage of the ML lifecycle, making workflows reproducible, portable, and scalable.
✅ Docker acts as the backbone of the ML workflow — from notebooks to deployment — providing consistency, portability, and scalability at every step.
🔹 Popular Use Cases for Docker in ML
Running Models Locally with Docker
We can access a wide range of pre-trained models from Docker Hub, which is like a GitHub for Docker images.
Once find the image, we can run it locally with a single command, without worrying about dependencies or environment setup. This makes it easy to test, experiment, or deploy models quickly.
Example: Running a Pre-trained Model with Docker
Search for a model on Docker Hub
Go to https://hub.docker.com/ and find a containerized ML model (e.g., tensorflow/tensorflow:latest-py3).
Pull the Docker image
docker pull tensorflow/tensorflow:latest-py3
Run the container
docker run -it --rm tensorflow/tensorflow:latest-py3 python
-it → interactive terminal--rm → remove container after exitpython → starts a Python shell inside the containerUsing Docker with MCP Tooling
MCP (Model Context Protocol) allows AI models to access real-world tools in a controlled and standardized way.
By combining Docker + MCP, we can:
Example: Run a trusted MCP server using Docker:
docker run -p 3000:3000 realops/kubernetes-mcp-server:latest
What is Docker Desktop?
Docker Desktop is an application that enables you to build, run, and manage Docker containers on your local machine. It provides:
How to Download and Install Docker Desktop
Download
Install
.exe installer, follow the setup steps, and enable WSL 2 if prompted (required for Linux containers)..dmg file, drag Docker to Applications, and launch it.Verify Installation
Open a terminal or command prompt and run:
docker version
docker run hello-world
# Docker commands can be run directly inside a Jupyter Notebook
#!docker version
The goal is to leverage pre-built AI/ML Docker images to run real-world tools such as JupyterLab and MLflow, and to learn how to connect, manage, and persist work across the full ML lifecycle.
With Docker images, there’s no need to worry about library versions (e.g., Pandas, NumPy, etc.) since everything is already packaged within the environment. This makes setup quick, consistent, and easy.
To get started, ensure Docker is installed with both the Client and Server components.
Look at image registry Docker Hub for MLFlow:
Docker Hub is a popular container image registry, but it’s not the only one. GitHub also provides its own container registry in addition to hosting code.
For example, you can pull an MLflow image from GitHub’s registry (ghcr.io):
docker pull ghcr.io/mlflow/mlflow:v3.4.0
This shows that GitHub is no longer just a code repository — it also serves as a container image registry, where both the latest and older versions of images are available.
# Here are all docker commands
#!docker
#!docker system info
# This shows the events while start running the containers
#!docker system events
!docker pull ghcr.io/mlflow/mlflow:latest
It pulls all layers. The definition of ghcr.io/mlflow/mlflow:v3.4.0 are:
ghcr.io: container registry. This comes from GitHub.mlflow: how created that. This is organization who created this image.mlflow: this is repo or repositoryv3.4.0: we can get version or latest# list the image locally
!docker image ls
There are many ways to run a container (see !docker run --help).
When running MLflow, we want to access its Web UI. Since MLflow runs inside the container on a specific port, it’s not directly available on the host machine. To make it accessible, we need to map the container’s port to a host port during startup.
## Get help for docker run
#!docker run --help
When running a container, we use the option -p host_port:container_port to map ports.
For example, MLflow typically runs on port 5000 inside the container. To make it accessible from the host machine, we can map it to port 5001 on the host:
docker run -p 5001:5000 ghcr.io/mlflow/mlflow:latest mlflow server --host 0.0.0.0
The figure below illustrates how port mapping works. On our desktop or laptop, Docker runs inside a virtualized environment (the Docker host/VM). Within this host, containers run their own services—MLflow in this case, on port 5000.
Since we can’t directly connect to the container’s internal port from the desktop browser, we need to expose it by mapping it to a host port. By specifying -p 5001:5000, we’re saying:
After port mapping we provide the image which comes from registry as we discussed:
Instead of just -p, we’ll also add the -d flag to run the container in detached mode. This means the container will run in the background, freeing up our terminal for other commands instead of streaming all the logs. On top of that, instead of Docker assigning a random name (like sad_mccarthy), we’ll specify a clear, meaningful name for the container using --name mlflow_test.
docker run -d -p 5001:5000 --name mlflow_test ghcr.io/mlflow/mlflow:latest mlflow server --host 0.0.0.0
When we visit http://localhost:5001 in the browser, Docker forwards the traffic to port 5000 inside the container. This mechanism is called port mapping or port forwarding.
So you would access MLflow at: 👉 http://localhost:5001
This is how MLflow runs inside a container. Let’s break down what happens behind the scenes, as illustrated in the diagram below:
After running the container, how do we know if container stop or run, and what is the better way of running container that it keeps running in the background (-d option).
The docker ps command shows only running containers:
!docker ps
#!docker logs b6b94b6eb8ef
docker container ls lists the containers:
!docker container ls
# Shows last 1 running container -n 1
!docker ps -n 1
When running Jupyter with Docker, we don’t want our work to disappear when the container is deleted. To make sure our notebooks are always available, we mount a local directory into the container as a volume. This way, files are stored on our machine, not inside the container.
Example command:
docker run -d -p 8888:8888 --name jupyterlab_test \
-v ~/ml-docker/notebooks:/home/jovyan/work jupyter/scipy-notebook:latest
docker run → Run a new container.-d → Detached mode (runs in the background).-p 8888:8888 → Maps port 8888 inside the container to port 8888 on the host, so you can access Jupyter in the browser at http://localhost:8888.--name jupyterlab_test → Assigns a name to the container (jupyterlab_test).-v ~/ml-docker/notebooks:/home/jovyan/work → Mounts a local folder (~/ml-docker/notebooks) to the container’s working directory (/home/jovyan/work). Anything you edit in Jupyter is saved locally, and local changes appear inside the container.jupyter/scipy-notebook:latest → The Docker image to use. This image is part of the Jupyter Docker Stacks project, providing a ready-to-use scientific Python environment.What’s included in jupyter/scipy-notebook
This image comes preloaded with:
pip, conda, and git✅ With this setup:
~/ml-docker/notebooks), even if the container is deleted.#!docker rm -f jupyter_mlflow_test
!docker run -d -p 8888:8888 --name jupyter_mlflow_test \
-v "D:\Learning\MyWebsite\FinalGithub\ToPublihsed\projects\Docker-for-AI-ML:/home/jovyan/work" \
jupyter/scipy-notebook start-notebook.sh --notebook-dir=/home/jovyan/work
!docker ps
This created container with notebook. The port 8888:
After that open the long for jupyter_mlflow_test, click on http://127.0.0.1:8888/lab?token=..................
Going to this link opens JupyterLab, but it needs token which can be retrieved from jupyter server list
!jupyter server list
By adding this token, the JypterLab will be started:
Now we should create a virtual environment then open the JupyterLab there and install required python packages:
After launching JupyterLab and MLflow as separate containers, we can now connect them together. JupyterLab runs in one container, while MLflow runs in another. The goal is to execute code or experiments in Jupyter and send the results to MLflow, which will handle tracking and logging.
MLflow is exposed on port 5000 inside its container, mapped to port 5001 on the host machine. To enable Jupyter to communicate with MLflow, we connect through the host port at host.docker.internal:5001. This special domain name resolves to the Docker host, allowing seamless communication between the two containers. See illustration below:
To connect Jupyter Notebook with mlflow, this url should be used
mlflow_tracking_uri = 'http://localhost:5001'
To connect JupyterLab with mlflow, this url should be used
mlflow_tracking_uri = 'http://host.docker.internal:5001'
#%pip install scikit-learn==1.7.2
#%pip install matplotlib==3.10.6
#%pip install matplotlib_inline==0.1.7
#%pip install mlflow==3.4.0
#%pip install numpy==2.3.3
#%pip install pandas==2.3.3
#pip install pandas scikit-learn numpy
#pip install mlflow
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()
import mlflow
from mlflow.models import infer_signature
import mlflow.sklearn
# This url is for running Jupyter Notebook
mlflow_tracking_uri = 'http://localhost:5001'
## This url is for running Jupyter Lab
#mlflow_tracking_uri = 'http://host.docker.internal:5001'
# set up url
mlflow.set_tracking_uri(mlflow_tracking_uri)
print("Tracking to:", mlflow.get_tracking_uri())
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import warnings
warnings.filterwarnings('ignore')
# Load sample dataset (Boston Housing from sklearn deprecated, using California housing instead)
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing(as_frame=True)
df = data.frame
# Features + target
X = df.drop("MedHouseVal", axis=1)
y = df["MedHouseVal"]
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Start MLflow experiment
mlflow.set_experiment("MedHouseVal_LinearRegression")
with mlflow.start_run(run_name="linear_regression_run"):
# Model
model = LinearRegression()
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
# Metrics
mse = mean_squared_error(y_test, y_pred)
# Log parameters and metrics
mlflow.log_param("model_type", "LinearRegression")
mlflow.log_metric("mse", mse)
# Create prediction vs actual plot
plt.figure(figsize=(6,6))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], "r--")
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.title(f"Prediction vs Actual (MSE={mse:.3f})")
plt.grid(True)
plt.savefig("pred_vs_actual.png")
plt.close()
# Save plot
mlflow.log_artifact("pred_vs_actual.png")
# Log model
mlflow.sklearn.log_model(model, "linear_regression_model")
print(f"✅ Run complete. MSE: {mse:.3f}")
print("You can now view results in MLflow UI.")
This shows experiment "MSE_LinearRegression" on MLFlow:
| Concept / Option | Explanation | AI/ML Use Case |
|---|---|---|
| Image | A pre-packaged environment with code, libraries, and dependencies. Immutable blueprint to create containers. | Use official images like pytorch/pytorch or tensorflow/tensorflow to quickly spin up ML environments with GPU, CUDA, and Python pre-installed. |
| Container | A running instance of an image. It’s isolated, but can interact with the host through ports/volumes. | Run Jupyter Notebook, MLFlow, or model training inside containers for reproducibility. |
| Tag | A label that specifies a version of an image (default: latest). |
Choose specific versions, e.g., pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime to ensure experiments run with exact dependencies. |
Port Mapping (-p) |
Maps container ports to host machine ports: -p host:container. |
Expose Jupyter Notebook (-p 8888:8888) or MLFlow UI (-p 5000:5000) to access from browser. |
Detached Mode (-d) |
Runs the container in the background. | Start long-running services like MLFlow tracking server without keeping the terminal busy. |
Interactive Terminal (-it) |
Attaches to container with interactive shell (stdin/stdout). |
Debug inside container, run bash, install missing Python packages, or start Jupyter interactively. |
Volume Mount (-v) |
Mounts host directories into container for persistence: -v host_path:container_path. |
Save datasets, models, or notebooks outside the container (e.g., -v ~/data:/workspace/data). Ensures data isn’t lost when container stops. |