Creating audio with AMD GPUs using Stable Audio Open 1.0

Things I want to do
Environment setup
execution
1. If the model download fails
2. Regarding the execution environment
Change the script
Result
Websites I used as references

Things I want to do

I’ll try creating audio using Stable Audio Open 1.0 with an AMD GPU.

We will use DirectML.

Environment setup

Create a working folder.

Move to venv environment (optional)

If necessary, run the following command in the command prompt to create and activate the Venv environment.

python -mvenv venv
venv\scripts\activate.bat

Library Installation

Execute the following command to install the necessary libraries.

pip install scipy
pip install torch torchvision torchaudio
pip install torch-directml
pip install soundfile 
pip install diffusers 
pip install transformers
pip install torchsde
pip install accelerate

Script creation

Save the following content as a file named run.py.

import scipy
import torch
import soundfile as sf
from diffusers import StableAudioPipeline
import torch_directml
import random

dml = torch_directml.device()
repo_id = "stabilityai/stable-audio-open-1.0"
pipe = StableAudioPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, cache_dir="model")
pipe = pipe.to(dml)

prompt = "sound of heart beat"

generator = torch.Generator()
generator.manual_seed(random.randint(1, 65535))

audio = pipe(
    prompt,
    num_inference_steps=2,
    audio_end_in_s=0.5,
    num_waveforms_per_prompt=1,
    generator=generator,
).audios

output = audio[0].T.float().cpu().numpy()
sf.write("output.wav", output, pipe.vae.sampling_rate)

execution

Execute the following command to run the script.

A file named output.wav will be created in the folder where you executed the command.

python run.py

If the model download fails

Please refer to the article below.

Regarding the execution environment

On my system (CPU: AMD Ryzen 7 7735HS / Memory: 32GB / GPU: Integrated GPU), it barely runs browsers and editors. (Sometimes it succeeds, and sometimes it fails with an out-of-memory error.)

Change the script

Change the prompt

Change the ‘sound of heart beat’ part of prompt = ‘sound of heart beat’.

Changing the length of an audio file

Change `audio_end_in_s=0.5,` to 0.5. (Units are seconds)

Quality changes

Change num_inference_steps=2 to 2.

The initial value for this value is 100.I’ve set it to 2 in order to force it to work on my poor system.

In environments using AMD GPU boards, for example, you should actively set a large value.

Result

I was able to generate audio using an AMD GPU.　

Websites I used as references

stabilityai/stable-audio-open-1.0 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.