Exploring Stable Diffusion Setup

tltr: use AUTOMATIC1111/stable-diffusion-webui, it works like a charm.

In my exploration, I tried these setup:

Google Colab

Watch this video and in the description there is a link to Google Colab:
This is a very simple code example on how to run txt2img and img2img, however the free Google Colab runtime can’t run all the examples because of out of GPU memory.
But at least you don’t need to setup anything locally. It’s a nice start to understand the code.

Local setupCUDA & Pytorch

I later on tried the above code with local machine, I have Nvidia GPU so I need to setup CUDA.
My GPU supports CUDA 11.6 and Pytorch supports CUDA 11.6, so I downloaded CUDA 11.6 here.

For some reason the command generated from Pytorch didn’t work, torch.cuda.is_available() returns false even I’ve installed CUDA. After some searching, this works for me:
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 -f https://download.pytorch.org/whl/torch_stable.html

Here are a few methods to confirm if CUDA is properly installed, open cmd and run these:
(1) NVIDIA System Management Interface

  • cd C:\Windows\System32\DriverStore\FileRepository\nvdmui.inf_amd64_xxxxxxxxxxx
  • nvidia-smi

(2) NVIDIA CUDA Compiler Driver NVCC

  • cd C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin
  • nvcc --version

(3) Pytorch environment information

  • python -m torch.utils.collect_env

GPU out of memory

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 8.00 GiB total capacity; 7.21 GiB already allocated; 0 bytes free; 7.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF


  1. On Windows, search “Environment Variable” > click “Environment Variable” > in System variables, click New
  2. Put variable to be “PYTORCH_CUDA_ALLOC_CONF” and value to be “garbage_collection_threshold:0.6,max_split_size_mb:128”.
    max_split_size_mb must be 21 or higher.

However I notice that setting PYTORCH_CUDA_ALLOC_CONF does not help fixing out of GPU memory issue, the problem comes from the python scripts loading too much stuff to GPU.

I also tried these in python script, no luck neither:

import gc

Optimized Stable Diffusion (github)

Since my machine will also run out of GPU memory using the simple code example above, so I tried this optimized version. However, there might be extra steps to get it working.

If you encounter ModuleNotFoundError: No module named 'omegaconf':

  1. Open cmd and cd to the Optimized Stable Diffusion folder
  2. pip install -e .

It you encounter errors about taming:

  1. pip install taming-transformers
  2. Clone https://github.com/CompVis/taming-transformers.git
  3. Replace
    by https://github.com/CompVis/taming-transformers/blob/master/taming/modules/vqvae/quantize.py

To download stable-diffusion-v-1-4-original model:

  1. Click on Files and versions tab
  2. Click on sd-v1-4.ckpt, click download
  3. Put this sd-v1-4.ckpt in models\ldm\stable-diffusion-v1\model.ckpt

From Task Manager it shows that Optimized Stable Diffusion only occupies less than half of my GPU memory:

Variations generated from a pic I drew in the past.

Stable Diffusion Webui (github)

I made a symlink so that I don’t have to copy the model.ckpt into this repo local folder (models\Stable-diffusion\model.ckpt).
Double click webui.bat to run it, it works like a charm!
Watch this video to understand what each settings does:

Original pic is a pic that I drew in the past, so for this exploration I try to generate some variations from it.
Prompt: white demon girl, breathing smoke from mouth, white horn on head, white glowing eyes, fantasy, holy, portrait, pretty, detailed, digital art, trending in Artstation

This version of stable diffusion is faster than the optimized one and utilizes the GPU memory without going out of memory:

URP and GPU instancing & MPB

Tested with: 2020.1.0b16.4138 + URP 9.0.0-preview.38, Windows DX11

Objects with same material & same property values

SRP batcher ON
material GPU instancing OFF
SRP batched
SRP batcher ON
material GPU instancing ON
SRP batched
(because if SRP batcher is ON, gpu instancing will be ignored)
SRP batcher OFF
material GPU instancing ON
GPU instanced

Objects with different material & different property values

SRP batcher ON
material GPU instancing OFF
SRP batched
SRP batcher ON
material GPU instancing ON
SRP batched
(because if SRP batcher is ON, gpu instancing will be ignored)
SRP batcher OFF
material GPU instancing ON
Not batched nor instanced
(expected because they are different materials)

Objects with same material & different property values set by MaterialPropertyBlock

SRP batcher ON
material GPU instancing OFF
Not batched nor instanced
(because MPB values are not same)
SRP batcher ON
material GPU instancing ON
Not batched nor instanced
(same as above)
SRP batcher OFF
material GPU instancing ON
Not batched nor instanced
(because properties are non-instanced)

According to Unity Documentation about GPU instancing, user can use MaterialProperyBlock to have different material properties for each instances.
Apparently the test showed that this is not true anymore in Universal Render Pipeline.


URP shader properties are all wrapped by SRP batcher macros “CBUFFER_START(UnityPerMaterial)”.
GPU instancing need to use this macro “UNITY_INSTANCING_BUFFER_START(MyProps)”


Continue reading “URP and GPU instancing & MPB”

Custom SceneView DrawMode

The beautiful scene is from the asset store package:
RPG Medieval Kingdom Kit by BEFFIO

Screen Shot 2018-06-28 at 21.43.05

Download : https://drive.google.com/open?id=1hloorB11hPcIXIIKBIRzqvon9mpmbYPK
If you are looking for doing it in custom SRP, please visit my github

How to use :
Just add your shaders and names to the CustomDrawModeAsset

Thanks Vlad Neykov for helping me in making this little tool
( This is one of the tools I made during #UnityHackweek )

[Custom SRP] How to use Unity features?


If you are using 2019.1+, you might notice there is a big change to the SRP APIs.
I’ve created a new repository and you can grab here. Much cleaner and minimal.




May-14-2018 gif

Screen Shot 2018-06-02 at 22.16.24

SRPFlowScreen Shot 2018-05-12 at 18.52.43

(My playground pipeline)

Here lists out exact what codes enable the Unity feature when making our custom SRP.

*Note that my codes may not be perfectly optimised, but the concept itself won’t change.
(!) Alert: Below information might be outdated. I stopped updating this note after 2018.x releases.

icon_script In pipeline code
icon_shader In shader code
✅ Doesn’t need to specifically care about it in codes. Write the codes as usual.

Continue reading “[Custom SRP] How to use Unity features?”

A change for blogging

I havn’t been authoring any post for years as I had moved to United Kingdom in 2016, which is definitely a big change in my life and I enjoyed so much.

I’m working in Unity now and found that I gain new knowledge EVERYDAY.

To tackle the knowledge wave, I used to keep links or upload pics to my personal fb and share with my fds in game industry. But I found that it’s limiting my content sizes and hard for me to trace the notes.

And for the blog here, the dark theme was too unclear and spaces are too small. So now I changed the theme to see how it goes 😀


So for now on, I hope to share the knowledge I gained and this blog would also serve as my own note.


I’m currently working on Shader tutorials which are having lots of pictures and simple description so that people who are new to Unity can learn the techniques as well as the software itself.


Learning would never stop!