tltr: use AUTOMATIC1111/stable-diffusion-webui, it works like a charm.
In my exploration, I tried these setup:
Google Colab
Watch this video and in the description there is a link to Google Colab:
https://www.youtube.com/watch?v=ltLNYA3lWAQ
This is a very simple code example on how to run txt2img and img2img, however the free Google Colab runtime can’t run all the examples because of out of GPU memory.
But at least you don’t need to setup anything locally. It’s a nice start to understand the code.
Local setup – CUDA & Pytorch
I later on tried the above code with local machine, I have Nvidia GPU so I need to setup CUDA.
My GPU supports CUDA 11.6 and Pytorch supports CUDA 11.6, so I downloaded CUDA 11.6 here.
For some reason the command generated from Pytorch didn’t work, torch.cuda.is_available()
returns false even I’ve installed CUDA. After some searching, this works for me: pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 -f https://download.pytorch.org/whl/torch_stable.html
Here are a few methods to confirm if CUDA is properly installed, open cmd and run these:
(1) NVIDIA System Management Interface
cd C:\Windows\System32\DriverStore\FileRepository\nvdmui.inf_amd64_xxxxxxxxxxx
nvidia-smi
(2) NVIDIA CUDA Compiler Driver NVCC
cd C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin
nvcc --version
(3) Pytorch environment information
python -m torch.utils.collect_env
GPU out of memory

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 8.00 GiB total capacity; 7.21 GiB already allocated; 0 bytes free; 7.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
To set PYTORCH_CUDA_ALLOC_CONF:
- On Windows, search “Environment Variable” > click “Environment Variable” > in System variables, click New
- Put variable to be “PYTORCH_CUDA_ALLOC_CONF” and value to be “garbage_collection_threshold:0.6,max_split_size_mb:128”.
max_split_size_mb must be 21 or higher.

However I notice that setting PYTORCH_CUDA_ALLOC_CONF does not help fixing out of GPU memory issue, the problem comes from the python scripts loading too much stuff to GPU.
I also tried these in python script, no luck neither:
torch.cuda.empty_cache()
import gc
gc.collect()
Optimized Stable Diffusion (github)
Since my machine will also run out of GPU memory using the simple code example above, so I tried this optimized version. However, there might be extra steps to get it working.
If you encounter ModuleNotFoundError: No module named 'omegaconf'
:
- Open cmd and cd to the Optimized Stable Diffusion folder
pip install -e .
It you encounter errors about taming:
- pip install taming-transformers
- Clone https://github.com/CompVis/taming-transformers.git
- Replace
C:\Users\YourName\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_xxxxx\LocalCache\local-packages\Python310\site-packages\taming\modules\vqvae\quantize.py
by https://github.com/CompVis/taming-transformers/blob/master/taming/modules/vqvae/quantize.py
To download stable-diffusion-v-1-4-original model:
- Click on Files and versions tab
- Click on sd-v1-4.ckpt, click download
- Put this sd-v1-4.ckpt in models\ldm\stable-diffusion-v1\model.ckpt
From Task Manager it shows that Optimized Stable Diffusion only occupies less than half of my GPU memory:


Stable Diffusion Webui (github)
I made a symlink so that I don’t have to copy the model.ckpt into this repo local folder (models\Stable-diffusion\model.ckpt).
Double click webui.bat to run it, it works like a charm!
Watch this video to understand what each settings does:
https://www.youtube.com/watch?v=Z3IHmdqUar0

Prompt: white demon girl, breathing smoke from mouth, white horn on head, white glowing eyes, fantasy, holy, portrait, pretty, detailed, digital art, trending in Artstation
This version of stable diffusion is faster than the optimized one and utilizes the GPU memory without going out of memory:
