Exploring Stable Diffusion Setup

tltr: use AUTOMATIC1111/stable-diffusion-webui, it works like a charm.

In my exploration, I tried these setup:

Google Colab

Watch this video and in the description there is a link to Google Colab:
This is a very simple code example on how to run txt2img and img2img, however the free Google Colab runtime can’t run all the examples because of out of GPU memory.
But at least you don’t need to setup anything locally. It’s a nice start to understand the code.

Local setupCUDA & Pytorch

I later on tried the above code with local machine, I have Nvidia GPU so I need to setup CUDA.
My GPU supports CUDA 11.6 and Pytorch supports CUDA 11.6, so I downloaded CUDA 11.6 here.

For some reason the command generated from Pytorch didn’t work, torch.cuda.is_available() returns false even I’ve installed CUDA. After some searching, this works for me:
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 -f https://download.pytorch.org/whl/torch_stable.html

Here are a few methods to confirm if CUDA is properly installed, open cmd and run these:
(1) NVIDIA System Management Interface

  • cd C:\Windows\System32\DriverStore\FileRepository\nvdmui.inf_amd64_xxxxxxxxxxx
  • nvidia-smi

(2) NVIDIA CUDA Compiler Driver NVCC

  • cd C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin
  • nvcc --version

(3) Pytorch environment information

  • python -m torch.utils.collect_env

GPU out of memory

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 8.00 GiB total capacity; 7.21 GiB already allocated; 0 bytes free; 7.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF


  1. On Windows, search “Environment Variable” > click “Environment Variable” > in System variables, click New
  2. Put variable to be “PYTORCH_CUDA_ALLOC_CONF” and value to be “garbage_collection_threshold:0.6,max_split_size_mb:128”.
    max_split_size_mb must be 21 or higher.

However I notice that setting PYTORCH_CUDA_ALLOC_CONF does not help fixing out of GPU memory issue, the problem comes from the python scripts loading too much stuff to GPU.

I also tried these in python script, no luck neither:

import gc

Optimized Stable Diffusion (github)

Since my machine will also run out of GPU memory using the simple code example above, so I tried this optimized version. However, there might be extra steps to get it working.

If you encounter ModuleNotFoundError: No module named 'omegaconf':

  1. Open cmd and cd to the Optimized Stable Diffusion folder
  2. pip install -e .

It you encounter errors about taming:

  1. pip install taming-transformers
  2. Clone https://github.com/CompVis/taming-transformers.git
  3. Replace
    by https://github.com/CompVis/taming-transformers/blob/master/taming/modules/vqvae/quantize.py

To download stable-diffusion-v-1-4-original model:

  1. Click on Files and versions tab
  2. Click on sd-v1-4.ckpt, click download
  3. Put this sd-v1-4.ckpt in models\ldm\stable-diffusion-v1\model.ckpt

From Task Manager it shows that Optimized Stable Diffusion only occupies less than half of my GPU memory:

Variations generated from a pic I drew in the past.

Stable Diffusion Webui (github)

I made a symlink so that I don’t have to copy the model.ckpt into this repo local folder (models\Stable-diffusion\model.ckpt).
Double click webui.bat to run it, it works like a charm!
Watch this video to understand what each settings does:

Original pic is a pic that I drew in the past, so for this exploration I try to generate some variations from it.
Prompt: white demon girl, breathing smoke from mouth, white horn on head, white glowing eyes, fantasy, holy, portrait, pretty, detailed, digital art, trending in Artstation

This version of stable diffusion is faster than the optimized one and utilizes the GPU memory without going out of memory: