CPU-based audio creation using stable-audio-open-small

Things I want to do
1. Regarding licenses
Environment setup
execution
Result
Websites I used as references

Things I want to do

In the article below, I tried creating audio using Stable Audio Open 1.0, but it barely ran smoothly.

A new, lighter version of stable-audio-open-small has been released, so I’ll give it a try.

I tried various things, but I couldn’t get stable-audio-open-small to work with DirectML. (This is because the versions of PyTorch used by stable-audio-tools and torch-directml conflict.)

Regarding licenses

Please refer to the following link for the model’s license.

It’s free for non-commercial use.

Professional Membership Agreement — Stability AI

Environment setup

Create a working folder.

Move to venv environment (optional)

If necessary, run the following command in the command prompt to create and activate the Venv environment.

Since this project may involve modifying libraries, we recommend using a venv environment.

python -mvenv venv
venv\scripts\activate.bat

Library Installation

Execute the following command to install the necessary libraries.

pip install stable-audio-tools

Error: AttributeError: module pkgutil has no attribute ImpImporter . Did you mean: zipimporter ?

The following error occurred depending on the environment. This could be resolved by changing the version of Python being used.

Version where the problem occurred: 3.10.6

Version that did not experience the problem: 3.12.8

      AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Script creation

Save the following content as a file named run.py.

This is the same script used in environments where CUDA can be executed. (It switches automatically.)

import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond

device = "cuda" if torch.cuda.is_available() else "cpu"

# Download model
model, model_config = get_pretrained_model("stabilityai/stable-audio-open-small")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]

model = model.to(device)

# Set up text and timing conditioning
conditioning = [{
    "prompt": "128 BPM tech house drum loop",
    "seconds_total": 11
}]

# Generate stereo audio
output = generate_diffusion_cond(
    model,
    steps=8,
    conditioning=conditioning,
    sample_size=sample_size,
    sampler_type="pingpong",
    device=device
)

# Rearrange audio batch to a single sequence
output = rearrange(output, "b d n -> d (b n)")

# Peak normalize, clip, convert to int16, and save to file
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
torchaudio.save("output.wav", output, sample_rate)

execution

Execute the following command to run the script.

A file named output.wav will be created in the folder where you executed the command.

python run.py

Execution time

As shown in the screenshot below, 1 it takes about 1 second, so even 8 its can be completed in about 10 seconds.

However, it takes quite a while after the progress bar reaches 100%. (I didn’t time it precisely, but it didn’t finish in 3 hours, although it was finished after waiting overnight.)

By the way

The following processes are slow.

sampled = model.pretransform.decode(sampled)

If the model download fails

Please refer to the article below.

Error handling (ValueError: high is out of bounds for int32)

The following error may occur, causing the program to fail to run. (Environment-dependent?)

  File "F:\projects\python\StableAL\venv\lib\site-packages\stable_audio_tools\inference\generation.py", line 138, in generate_diffusion_cond
    seed = seed if seed != -1 else np.random.randint(0, 2**32 - 1)
  File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1336, in numpy.random._bounded_integers._rand_int32
ValueError: high is out of bounds for int32

To avoid this, modify line 138 of venv\Lib\site-packages\stable_audio_tools\inference\generation.py as follows. (This may vary depending on the version of stable_audio_tools.)

Before revision:

    seed = seed if seed != -1 else np.random.randint(0, 2**32 - 1, dtype=np.uint32)

After correction:

    seed = seed if seed != -1 else np.random.randint(0, 2**31 - 1, dtype=np.uint32)

Result

I was able to create audio using stable-audio-open-small and the CPU, but it takes quite a long time to actually use, so it will depend on how you plan to use it.

Websites I used as references

stabilityai/stable-audio-open-small · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.