CPUでZ-Image-Turboを動かしてみる

やりたいこと
環境構築
1. venv環境への移動(任意)
2. ライブラリのインストール
実行
1. 実行時間
感想
おまけ

やりたいこと

CUDAのない環境でAlibabaで画像生成モデルであるZ-Image-Turboを使ってみます。

環境構築

作業用のフォルダを作成します。

venv環境への移動(任意)

必要であればコマンドプロンプトで以下のコマンドを実行して、Venvの環境を作成してアクティブにします。

python -mvenv venv
venv\scripts\activate.bat

ライブラリのインストール

以下のコマンドを実行して必要なライブラリのインストールを行います。

pip install git+https://github.com/huggingface/diffusers
pip install torch torchvision
pip install transformers
pip install accelerate

以下の内容をrun.pyのファイル名で保存します。

import torch
from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cpu")

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

# 2. Generate Image
image = pipe(
    prompt=prompt,
    height=256,
    width=256,
    num_inference_steps=9,  
    guidance_scale=0.0,    
    generator=torch.Generator().manual_seed(42),
).images[0]

image.save("example.png")

基本的に以下のページ（モデルカード）に書かれてコードですがCPUで動作するように修正しています。また動作確認のために生成画像のサイズを小さくしています。

Tongyi-MAI/Z-Image-Turbo · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

実行

以下のコマンドを実行してスクリプトを実行します。

実行したフォルダにexample.pngというファイルが作成されます。

python run.py

実行時間

（初回実行時はモデルのダウンロードが実行されるため遅いです。環境にもよりますが+1～2時間かかると思います。20GByte+のダウンロードが行われます。）

イテレーションを回すのには20分ほどです。

しかし、上の進捗が100％になった後にかなり時間がかかります。（具体的に計ったわけではないのですが+1時間ぐらい？）

感想

Flux.1が画像出力されるまで20分だったので遅いというのが素直な感想です。

ただ他の画像生成モデルで256ｘ256の画像を生成しようとするとまともな画像が生成された経験がありません。しかしこのZ-Image-Turboではサンプルと同様のクオリティの画像が生成されています。

十分なスペックであれば

おまけ

DirectMLで動作させようとして失敗した記録

import torch
from diffusers import ZImagePipeline
import torch_directml
dml = torch_directml.device()

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.float,
    low_cpu_mem_usage=False,
)
pipe.to(dml)

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

# 2. Generate Image
image = pipe(
    prompt=prompt,
    height=256,
    width=256,
    num_inference_steps=9,  # This actually results in 8 DiT forwards
    guidance_scale=0.0,     # Guidance should be 0 for the Turbo models
    generator=torch.Generator().manual_seed(42),
).images[0]

image.save("example.png")

エラー

floatからfloatだから変換不要のはずだけど・・・メモリ？

Traceback (most recent call last):
 File "F:\projects\python\Qwen-Image\zi.py", line 11, in <module>
    pipe.to(dml)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 545, in to
    module.to(device, dtype)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\transformers\modeling_utils.py", line 4343, in to
    return super().to(*args, **kwargs)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 1174, in to
    return self._apply(convert)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 805, in _apply
    param_applied = fn(param)
  File "F:\projects\python\Qwen-Image\venv\lib\site-packages\torch\nn\modules\module.py", line 1160, in convert
    return t.to(
RuntimeError