Stable Diffusion Training and Interference on AMD GPUs with Ubuntu 22.04
In the rapidly evolving field of artificial intelligence, the need for efficient computational but also affordable resources is ever-present. AMD GPUs, known for their gaming performance but also prices that are more affordable than Nvidia ones, can be a viable option for AI training and inference tasks as well. This article provides a comprehensive guide to setting up AMD GPUs with Ubuntu 22.04 for AI development, specifically using Kohya SS and Automatic 1111 with Stable Diffusion.
Preliminary Steps
Before diving into the specifics, ensure that your system runs on Ubuntu 22.04, as it offers the best support for AMD’s ROCm and the PyTorch library — key components in many AI projects.
Note: Windows users might face challenges, as PyTorch with ROCm support for Windows is not yet available.
Check if your graphics card is supported on Linux here.
Installing AMD Drivers and ROCm
Start by preparing your system with the necessary drivers and packages:
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo usermod -a -G render,video $LOGNAME
wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm
Installing Prerequisites
Next, install the required libraries for various AI training and inference scenarios:
python3
, python3.10-venv
, and python-is-python3
is needed for all scenarios, libstdc++-12-dev
is used for C++ builds where needed, nvidia-cuda-toolkit
and nvidia-cuda-toolkit-gcc
is used when ZLUDA
is used and libhdf5-dev
, libhdf5-serial-dev
is used for Hierarchical Data Format 5
used by some tools. Git is needed for cloning the repos of the applications used.
sudo apt install git python3 python3.10-venv python3-tk python-is-python3 libstdc++-12-dev nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc libhdf5-dev libhdf5-serial-dev
Preparing the Working Directory
Set up a dedicated project directory:
sudo mkdir /opt/Projects
sudo chmod -R 777 /opt/Projects
sudo chown -R $LOGNAME:$LOGNAME /opt/Projects
Setting up ZLUDA
ZLUDA allows AMD GPUs to run CUDA workloads. Install and configure ZLUDA with the following commands:
cd /opt/Projects
mkdir zluda
cd zluda
wget https://github.com/vosen/ZLUDA/releases/download/v3/zluda-3-linux.tar.gz
sudo tar -ztvf zluda-3-linux.tar.gz -C /opt/rocm-6.0.2/lib
rm -rf zluda-3-linux.tar.gz
cd /opt/rocm-6.0.2/lib
sudo cp librocblas.so.4 librocblas.so.3
Great, now we have setup of ZLUDA, we need to make sure to set this as LD_LIBRARY_PATH
env using bash when we want to use ZLUDA
.
LD_LIBRARY_PATH="/opt/rocm-6.0.2/lib:$LD_LIBRARY_PATH"
Setting up Kohya SS for Stable Diffusion Training
To install Kohya SS and its dependencies:
cd /opt/Projects
git clone --branch v23.0.11 https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
python3.10 -m venv venv
source setup.sh
source venv/bin/activate
pip uninstall torch torchvision xformers
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.7
pip install opencv-python xformers
Answer the prompts as follows:
Now we need to setup accelerate
using
accelerate config
Answers:
- This machine
- No distributed training
- no
- no
- no
- all
- fp16
Export the GFX version of your graphics card. Version can be found by running rocminfo
command. For 7900 xtx
it is 11.0.0
rocminfo # Run only if you don't know GFX version to find it.
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export LD_LIBRARY_PATH=/opt/rocm-6.0.2/lib
Now install bitandbytes for rocm (this will enable Usage of 8-bit AdamW optimizer)
git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6.git bitsandbytes
cd bitsandbytes
export ROCM_HOME=/opt/rocm
make hip ROCM_TARGET=gfx1100
pip install .
Go and run your kohya_ss
instance:
cd /opt/Projects/kohya_ss
source venv/bin/activate
export HSA_OVERRIDE_GFX_VERSION=11.0.0
LD_LIBRARY_PATH="/opt/rocm-6.0.2/lib:$LD_LIBRARY_PATH" python kohya_gui.py --listen 0.0.0.0 --server_port 7865 "$@"
Monitoring GPU Load During Training:
Keep an eye on your GPU load:
watch /opt/rocm-6.0.2/bin/rocm-smi
Setting up Automatic 1111 to Test Trained Models
The following commands Clone the repository of stable-diffusion-webui, create a Python virtual environment, install pip wheel package, and then runs launch.py
with TORCH_COMMAND
hint to install rocm6
torch during setup.
cd /opt/Projects
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
python -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip wheel
TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.0" python launch.py --precision full --no-half
Conclusion
This detailed setup enables the use of AMD GPUs for Stable Diffusion training and inference, leveraging the power of ROCm and tailored tools like Kohya SS and Automatic 1111. With these tools, you can efficiently run Stable Diffusion and other AI models, making the most of your AMD hardware on Ubuntu 22.04.
In the next article I will go through Stable Diffusion 1.5 and SDXL training with your own photos.