Stable Diffusion Training and Interference on AMD GPUs with Ubuntu 22.04

3 min readApr 29, 2024

In the rapidly evolving field of artificial intelligence, the need for efficient computational but also affordable resources is ever-present. AMD GPUs, known for their gaming performance but also prices that are more affordable than Nvidia ones, can be a viable option for AI training and inference tasks as well. This article provides a comprehensive guide to setting up AMD GPUs with Ubuntu 22.04 for AI development, specifically using Kohya SS and Automatic 1111 with Stable Diffusion.

Preliminary Steps

Before diving into the specifics, ensure that your system runs on Ubuntu 22.04, as it offers the best support for AMD’s ROCm and the PyTorch library — key components in many AI projects.

Note: Windows users might face challenges, as PyTorch with ROCm support for Windows is not yet available.

Check if your graphics card is supported on Linux here.

Installing AMD Drivers and ROCm

Start by preparing your system with the necessary drivers and packages:

sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo usermod -a -G render,video $LOGNAME
wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm

Installing Prerequisites

Next, install the required libraries for various AI training and inference scenarios:

python3, python3.10-venv, and python-is-python3 is needed for all scenarios, libstdc++-12-dev is used for C++ builds where needed, nvidia-cuda-toolkit and nvidia-cuda-toolkit-gcc is used when ZLUDA is used and libhdf5-dev, libhdf5-serial-dev is used for Hierarchical Data Format 5 used by some tools. Git is needed for cloning the repos of the applications used.

sudo apt install git python3 python3.10-venv python3-tk python-is-python3 libstdc++-12-dev nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc libhdf5-dev libhdf5-serial-dev

Preparing the Working Directory

Set up a dedicated project directory:

sudo mkdir /opt/Projects
sudo chmod -R 777 /opt/Projects
sudo chown -R $LOGNAME:$LOGNAME /opt/Projects

Setting up ZLUDA

ZLUDA allows AMD GPUs to run CUDA workloads. Install and configure ZLUDA with the following commands:

cd /opt/Projects
mkdir zluda
cd zluda
wget https://github.com/vosen/ZLUDA/releases/download/v3/zluda-3-linux.tar.gz
sudo tar -ztvf zluda-3-linux.tar.gz -C /opt/rocm-6.0.2/lib
rm -rf zluda-3-linux.tar.gz
cd /opt/rocm-6.0.2/lib
sudo cp librocblas.so.4 librocblas.so.3

Great, now we have setup of ZLUDA, we need to make sure to set this as LD_LIBRARY_PATH env using bash when we want to use ZLUDA.

LD_LIBRARY_PATH="/opt/rocm-6.0.2/lib:$LD_LIBRARY_PATH"

Setting up Kohya SS for Stable Diffusion Training

To install Kohya SS and its dependencies:

cd /opt/Projects
git clone --branch v23.0.11 https://github.com/bmaltais/kohya_ss.git
cd kohya_ss

python3.10 -m venv venv
source setup.sh
source venv/bin/activate
pip uninstall torch torchvision xformers
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.7
pip install opencv-python xformers

Answer the prompts as follows:

Now we need to setup accelerate using

accelerate config

Answers:

This machine
No distributed training
no
no
no
all
fp16

Export the GFX version of your graphics card. Version can be found by running rocminfo command. For 7900 xtx it is 11.0.0

rocminfo # Run only if you don't know GFX version to find it.
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export LD_LIBRARY_PATH=/opt/rocm-6.0.2/lib

Now install bitandbytes for rocm (this will enable Usage of 8-bit AdamW optimizer)

git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6.git bitsandbytes
cd bitsandbytes
export ROCM_HOME=/opt/rocm
make hip ROCM_TARGET=gfx1100
pip install .

Go and run your kohya_ss instance:

cd /opt/Projects/kohya_ss
source venv/bin/activate
export HSA_OVERRIDE_GFX_VERSION=11.0.0
LD_LIBRARY_PATH="/opt/rocm-6.0.2/lib:$LD_LIBRARY_PATH" python kohya_gui.py --listen 0.0.0.0 --server_port 7865 "$@"

Monitoring GPU Load During Training:

Keep an eye on your GPU load:

watch /opt/rocm-6.0.2/bin/rocm-smi

Setting up Automatic 1111 to Test Trained Models

The following commands Clone the repository of stable-diffusion-webui, create a Python virtual environment, install pip wheel package, and then runs launch.py with TORCH_COMMAND hint to install rocm6 torch during setup.

cd /opt/Projects
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
python -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip wheel
TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.0" python launch.py --precision full --no-half

Conclusion

This detailed setup enables the use of AMD GPUs for Stable Diffusion training and inference, leveraging the power of ROCm and tailored tools like Kohya SS and Automatic 1111. With these tools, you can efficiently run Stable Diffusion and other AI models, making the most of your AMD hardware on Ubuntu 22.04.

In the next article I will go through Stable Diffusion 1.5 and SDXL training with your own photos.