How to Run AI Image Generation Privately on Your Machine with Docker and Open WebUI

We've all faced the frustrations of cloud-based AI image services—privacy concerns, credit limits, and overly strict content filters. But what if you could run everything locally on your own machine, with a sleek chat interface? Docker Model Runner now makes that possible. By combining Docker's model management with Open WebUI, you can pull image-generation models, connect them to a local inference server, and generate images right from a chat interface—fully private, fully offline. No cloud subscription required. This guide answers your key questions about getting started.

What is Docker Model Runner and how does it enable local image generation?

Docker Model Runner acts as the control plane for AI models on your local machine. It downloads, manages, and runs inference backends for models like Stable Diffusion. The key is that it exposes a 100% OpenAI-compatible API, including the POST /v1/images/generations endpoint. This means any tool that speaks OpenAI's API—like Open WebUI—can talk to it without modification. When you run a model via Docker Model Runner, it handles all the heavy lifting: it unpacks the model files (packaged as DDUF artifacts), launches the correct inference engine, and serves requests locally. You never send a prompt to a remote server; everything stays on your hardware. This gives you full privacy, no rate limits, and complete control over the model behavior.

How to Run AI Image Generation Privately on Your Machine with Docker and Open WebUI — Source: www.docker.com

What hardware and software do you need to get started?

To set up local image generation with Docker Model Runner and Open WebUI, you'll need Docker Desktop (macOS) or Docker Engine (Linux). For smooth operation, allocate at least 8 GB of free RAM for a small model; more RAM allows larger models or faster generation. A GPU is optional but highly recommended. Docker Model Runner supports NVIDIA CUDA (Linux/Windows with WSL2), Apple Silicon (MPS), and CPU fallback. If you can run docker model version without errors, you're ready. The setup is lightweight: no additional Python environments or CUDA toolkits to install—Docker handles everything. Users with powerful GPUs will see dramatically faster generation times, but even a modern CPU can produce images, albeit more slowly.

How do you pull an image generation model with Docker Model Runner?

Pulling a model is as simple as a single command: docker model pull stable-diffusion. Docker Model Runner uses a compact packaging format called DDUF (Diffusers Unified Format) to distribute models through Docker Hub, just like any other OCI artifact. After the pull, verify with docker model inspect stable-diffusion. You'll see details like the model's SHA256 hash, tag, creation timestamp, and configuration—including the DDUF file name and the 6.94 GB size for the Stable Diffusion XL base model. This command confirms the model is stored locally as a single DDUF file, ready to be unpacked at runtime. The process is identical to pulling a Docker image; you can pull different versions or other compatible image-generation models in the future.

What is a DDUF file and why is it used?

DDUF stands for Diffusers Unified Format. It's a single-file packaging format that bundles all components of a diffusion model into one portable artifact. A typical diffusion model consists of multiple parts: a text encoder (for converting prompts into embeddings), a VAE (for encoding/decoding images), a UNet or DiT (the core denoising network), and a scheduler configuration (for controlling the diffusion steps). Traditionally, these are stored as separate files or directories, making distribution and versioning messy. DDUF packs everything into a single file that Docker Model Runner knows how to unpack at runtime. This simplifies model distribution via Docker Hub, ensures consistency, and allows you to swap models without dealing with multiple dependencies. It's part of what makes the local setup so seamless.

How do you launch Open WebUI with Docker Model Runner?

This is where the magic happens. Docker Model Runner comes with a built-in launch command that automatically wires up Open WebUI against your local inference endpoint. Simply run: docker model launch openwebui. That single command does several things: it starts the image generation model you pulled (e.g., Stable Diffusion), launches an OpenAI-compatible API server on your machine (listening on a local port), and then spins up Open WebUI—a full-featured chat interface—pre-configured to talk to that local API. No configuration files, no environment variables, no manual linking. Open WebUI appears in your browser, and you can immediately start typing prompts to generate images. The entire stack runs locally, so your prompts never leave your computer. If you prefer different settings, you can later customize the launch by passing environment variables.

Source: www.docker.com

Can you use your own GPU, or is a CPU fallback available?

Yes, you can use your own GPU if you have compatible hardware. Docker Model Runner supports NVIDIA CUDA on Linux (and Windows via WSL2), Apple Silicon (MPS) on macOS, and a generic CPU fallback for systems without a dedicated GPU. When you run a model, Docker Model Runner automatically detects your hardware and selects the optimal backend—no manual switching required. GPU acceleration dramatically speeds up image generation: a single image on a modern NVIDIA RTX 3090 can take under 5 seconds, while CPU generation might take 30–60 seconds. But even without a GPU, you can still generate images; it's just slower. The flexibility ensures almost any modern machine can participate in local AI image generation.

Is the entire image generation process fully private?

Absolutely. Because everything runs locally on your machine, your prompts, generated images, and any model data never leave your computer. There is no cloud subscription, no API key sent to a third party, and no risk of your creative ideas being stored on someone else's server. Docker Model Runner downloads the model once (from Docker Hub), stores it locally, and runs the inference entirely in your own environment. The Open WebUI interface also runs locally, so even your chat history stays on your machine if configured that way. This makes the setup ideal for projects with sensitive content, proprietary designs, or anyone who simply values digital privacy. You are in complete control—same power as cloud AI services, none of the surveillance.

What kind of images can you generate with this setup?

The default model is Stable Diffusion (like Stable Diffusion XL), which can generate a wide variety of images from text prompts. You can create photorealistic scenes, artistic illustrations, fantasy creatures, product mockups, and more. Because you control the model, you can also fine-tune or swap it for other compatible image-generation models (e.g., SD 1.5, SD 2.1, or community variants). The Open WebUI interface supports custom prompts, negative prompts, and various generation parameters (steps, guidance scale, seed). There are no arbitrary content filters unless you choose to add them. So that request for a dragon wearing a business suit? Perfectly acceptable here. You can generate as many images as your hardware allows, without worrying about credits. The only limit is your imagination—and your VRAM.

Tags: