Exploring Stable Diffusion Setup

tltr: use AUTOMATIC1111/stable-diffusion-webui, it works like a charm.

In my exploration, I tried these setup:

Google Colab

Watch this video and in the description there is a link to Google Colab:
This is a very simple code example on how to run txt2img and img2img, however the free Google Colab runtime can’t run all the examples because of out of GPU memory.
But at least you don’t need to setup anything locally. It’s a nice start to understand the code.

Local setupCUDA & Pytorch

I later on tried the above code with local machine, I have Nvidia GPU so I need to setup CUDA.
My GPU supports CUDA 11.6 and Pytorch supports CUDA 11.6, so I downloaded CUDA 11.6 here.

For some reason the command generated from Pytorch didn’t work, torch.cuda.is_available() returns false even I’ve installed CUDA. After some searching, this works for me:
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 -f https://download.pytorch.org/whl/torch_stable.html

Here are a few methods to confirm if CUDA is properly installed, open cmd and run these:
(1) NVIDIA System Management Interface

  • cd C:\Windows\System32\DriverStore\FileRepository\nvdmui.inf_amd64_xxxxxxxxxxx
  • nvidia-smi

(2) NVIDIA CUDA Compiler Driver NVCC

  • cd C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin
  • nvcc --version

(3) Pytorch environment information

  • python -m torch.utils.collect_env

GPU out of memory

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 8.00 GiB total capacity; 7.21 GiB already allocated; 0 bytes free; 7.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF


  1. On Windows, search “Environment Variable” > click “Environment Variable” > in System variables, click New
  2. Put variable to be “PYTORCH_CUDA_ALLOC_CONF” and value to be “garbage_collection_threshold:0.6,max_split_size_mb:128”.
    max_split_size_mb must be 21 or higher.

However I notice that setting PYTORCH_CUDA_ALLOC_CONF does not help fixing out of GPU memory issue, the problem comes from the python scripts loading too much stuff to GPU.

I also tried these in python script, no luck neither:

import gc

Optimized Stable Diffusion (github)

Since my machine will also run out of GPU memory using the simple code example above, so I tried this optimized version. However, there might be extra steps to get it working.

If you encounter ModuleNotFoundError: No module named 'omegaconf':

  1. Open cmd and cd to the Optimized Stable Diffusion folder
  2. pip install -e .

It you encounter errors about taming:

  1. pip install taming-transformers
  2. Clone https://github.com/CompVis/taming-transformers.git
  3. Replace
    by https://github.com/CompVis/taming-transformers/blob/master/taming/modules/vqvae/quantize.py

To download stable-diffusion-v-1-4-original model:

  1. Click on Files and versions tab
  2. Click on sd-v1-4.ckpt, click download
  3. Put this sd-v1-4.ckpt in models\ldm\stable-diffusion-v1\model.ckpt

From Task Manager it shows that Optimized Stable Diffusion only occupies less than half of my GPU memory:

Variations generated from a pic I drew in the past.

Stable Diffusion Webui (github)

I made a symlink so that I don’t have to copy the model.ckpt into this repo local folder (models\Stable-diffusion\model.ckpt).
Double click webui.bat to run it, it works like a charm!
Watch this video to understand what each settings does:

Original pic is a pic that I drew in the past, so for this exploration I try to generate some variations from it.
Prompt: white demon girl, breathing smoke from mouth, white horn on head, white glowing eyes, fantasy, holy, portrait, pretty, detailed, digital art, trending in Artstation

This version of stable diffusion is faster than the optimized one and utilizes the GPU memory without going out of memory:

Disable Fog for URP Lit ShaderGraph

Scenario: An environment with many objects with different materials that need fog but there is some special ShaderGraph material objects that do not need fog.

Hack for Editor:

  1. Of course you have enabled Fog in LightingSettings
  2. Create a ShaderGraph and add these 3 FOG keywords
  3. Make sure the Reference are exactly “FOG_LINEAR“, “FOG_EXP” and “FOG_EXP2
  4. Create a material and use this ShaderGraph
  5. I’m using Exponential Squared fog in Lighting Settings so I turn off “FOG_EXP” and “FOG_EXP2”, but leaving “FOG_LINEAR” checked does the trick

Hack for Player:

  1. Create a shader variant stripping script
  2. Put it in Editor folder
  3. Make a player build
using System.Collections.Generic;
using UnityEditor.Build;
using UnityEditor.Rendering;
using UnityEngine;
using UnityEngine.Rendering;

class StrippingExample_Shader : IPreprocessShaders
    public StrippingExample_Shader()


    public int callbackOrder { get { return 99; } }

    public void OnProcessShader(Shader shader, ShaderSnippetData snippet, IList<ShaderCompilerData> data)
        for (int i = 0; i < data.Count; ++i)
            //Get a string of keywords
            string variantText = "";
            foreach(ShaderKeyword s in data[i].shaderKeywordSet.GetShaderKeywords())
                variantText += " " +s.name;

            bool wantToStrip = false;

            //Only stripping the shader graph that we don't want fog
                shader.name == "Shader Graphs/ShaderGraphNoFog" &&
                wantToStrip = true;

            if ( wantToStrip )
                //Strip the variant

Get the test project here if you need (Open with Unity 2022.2)

URP and GPU instancing & MPB

Tested with: 2020.1.0b16.4138 + URP 9.0.0-preview.38, Windows DX11

Objects with same material & same property values

SRP batcher ON
material GPU instancing OFF
SRP batched
SRP batcher ON
material GPU instancing ON
SRP batched
(because if SRP batcher is ON, gpu instancing will be ignored)
SRP batcher OFF
material GPU instancing ON
GPU instanced

Objects with different material & different property values

SRP batcher ON
material GPU instancing OFF
SRP batched
SRP batcher ON
material GPU instancing ON
SRP batched
(because if SRP batcher is ON, gpu instancing will be ignored)
SRP batcher OFF
material GPU instancing ON
Not batched nor instanced
(expected because they are different materials)

Objects with same material & different property values set by MaterialPropertyBlock

SRP batcher ON
material GPU instancing OFF
Not batched nor instanced
(because MPB values are not same)
SRP batcher ON
material GPU instancing ON
Not batched nor instanced
(same as above)
SRP batcher OFF
material GPU instancing ON
Not batched nor instanced
(because properties are non-instanced)

According to Unity Documentation about GPU instancing, user can use MaterialProperyBlock to have different material properties for each instances.
Apparently the test showed that this is not true anymore in Universal Render Pipeline.


URP shader properties are all wrapped by SRP batcher macros “CBUFFER_START(UnityPerMaterial)”.
GPU instancing need to use this macro “UNITY_INSTANCING_BUFFER_START(MyProps)”


Continue reading “URP and GPU instancing & MPB”

3D scene need Linear but UI need Gamma

update: including UniversalRP (URP) workaround at bottom!

Having this problem?


Looking for a way to make only the UI matches to what is designed in Photoshop? Don’t want artists to change any workflow(possible solution for artists please refer to this comment) because you are in the middle of development? Try this.

Continue reading “3D scene need Linear but UI need Gamma”