Things I want to do
I’ll try creating audio using Stable Audio Open 1.0 with an AMD GPU.
We will use DirectML.
Environment setup
Create a working folder.
Move to venv environment (optional)
If necessary, run the following command in the command prompt to create and activate the Venv environment.
python -mvenv venv
venv\scripts\activate.batLibrary Installation
Execute the following command to install the necessary libraries.
pip install scipy
pip install torch torchvision torchaudio
pip install torch-directml
pip install soundfile
pip install diffusers
pip install transformers
pip install torchsde
pip install accelerateScript creation
Save the following content as a file named run.py.
import scipy
import torch
import soundfile as sf
from diffusers import StableAudioPipeline
import torch_directml
import random
dml = torch_directml.device()
repo_id = "stabilityai/stable-audio-open-1.0"
pipe = StableAudioPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, cache_dir="model")
pipe = pipe.to(dml)
prompt = "sound of heart beat"
generator = torch.Generator()
generator.manual_seed(random.randint(1, 65535))
audio = pipe(
prompt,
num_inference_steps=2,
audio_end_in_s=0.5,
num_waveforms_per_prompt=1,
generator=generator,
).audios
output = audio[0].T.float().cpu().numpy()
sf.write("output.wav", output, pipe.vae.sampling_rate)execution
Execute the following command to run the script.
A file named output.wav will be created in the folder where you executed the command.
python run.pyIf the model download fails
Please refer to the article below.
Regarding the execution environment
On my system (CPU: AMD Ryzen 7 7735HS / Memory: 32GB / GPU: Integrated GPU), it barely runs browsers and editors. (Sometimes it succeeds, and sometimes it fails with an out-of-memory error.)
Change the script
Change the prompt
Change the ‘sound of heart beat’ part of prompt = ‘sound of heart beat’.
Changing the length of an audio file
Change `audio_end_in_s=0.5,` to 0.5. (Units are seconds)
Quality changes
Change num_inference_steps=2 to 2.
The initial value for this value is 100.I’ve set it to 2 in order to force it to work on my poor system.
In environments using AMD GPU boards, for example, you should actively set a large value.
Result
I was able to generate audio using an AMD GPU.
Websites I used as references



コメント