Let’s try running Z-Image-Turbo on the CPU.

Things I want to do
Environment setup
1. Move to venv environment (optional)
2. Library Installation
execution
1. Execution time
thoughts
bonus

Things I want to do

I’ll try using the Z-Image-Turbo image generation model on Alibaba in an environment without CUDA.

Environment setup

Create a working folder.

Move to venv environment (optional)

If necessary, run the following command in the command prompt to create and activate the Venv environment.

python -mvenv venv
venv\scripts\activate.bat

Library Installation

Execute the following command to install the necessary libraries.

pip install git+https://github.com/huggingface/diffusers
pip install torch torchvision
pip install transformers
pip install accelerate

Save the following content as a file named run.py.

import torch
from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cpu")

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

# 2. Generate Image
image = pipe(
    prompt=prompt,
    height=256,
    width=256,
    num_inference_steps=9,  
    guidance_scale=0.0,    
    generator=torch.Generator().manual_seed(42),
).images[0]

image.save("example.png")

The code is basically the same as the one on the following page (model card), but it has been modified to run on the CPU. Also, the size of the generated images has been reduced for testing purposes.

Tongyi-MAI/Z-Image-Turbo · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

execution

Execute the following command to run the script.

A file named example.png will be created in the folder where you executed the command.

python run.py

Execution time

(The first run will be slow because the model will be downloaded. Depending on your environment, it may take an additional 1-2 hours. A download of over 20GB will occur.)

It takes about 20 minutes to complete an iteration.

However, it takes quite a while after the progress shown above reaches 100%. (I haven’t timed it precisely, but maybe an extra hour?)

thoughts

Since it took 20 minutes for Flux.1 to output an image, my honest impression is that it was slow.

However, I have never had a successful experience generating a decent 256×256 image using other image generation models. But with Z-Image-Turbo, I’ve been able to generate images of the same quality as the sample.

If the specifications are sufficient

bonus

Record of failed attempts to run it using DirectML

import torch
from diffusers import ZImagePipeline
import torch_directml
dml = torch_directml.device()

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.float,
    low_cpu_mem_usage=False,
)
pipe.to(dml)

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

# 2. Generate Image
image = pipe(
    prompt=prompt,
    height=256,
    width=256,
    num_inference_steps=9,  # This actually results in 8 DiT forwards
    guidance_scale=0.0,     # Guidance should be 0 for the Turbo models
    generator=torch.Generator().manual_seed(42),
).images[0]

image.save("example.png")

error

It’s from float to float, so no conversion should be necessary… memory?

Traceback (most recent call last):
  File 'F:\projects\python\Qwen-Image\zi.py', line 11, in 
    pipe.to(dml)
  File 'F:\projects\python\Qwen-Image\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py', line 545, in to
    module.to(device, dtype)
  File 'F:\projects\python\Qwen-Image\venv\lib\site-packages\transformers\modeling_utils.py', line 4343, in to
    return super().to(*args, **kwargs)
  File 'F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py', line 1174, in to
    return self._apply(convert)
  File 'F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py', line 780, in _apply
    module._apply(fn)
  File 'F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py', line 780, in _apply
    module._apply(fn)
  File 'F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py', line 780, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File 'F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py', line 805, in _apply
    param_applied = fn(param)
  File 'F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py', line 1160, in convert
    return t.to(
RuntimeError