Thursday, November 9, 2023

Silicon Mac で Stable Diffusion 1.5 / 2 / 2.1 を試す（Diffusers ライブラリを使用）

そこそこ速いシリコンマックでの Stable Diffusion まとめ。

A cup of coffee

実行環境の確認

$ python3 --version
Python 3.9.6

venv 環境の用意

$ python3 -m venv ~/venv-sd
$ source ~/venv-sd/bin/activate

pip で必要なライブラリのインストール

(venv-sd) $ pip install --upgrade pip
(venv-sd) $ pip install diffusers torch torchvision torchaudio transformers

もし入れないで画像生成すると accelerate も入れることをすすめられるので先に入れておく。

(venv-sd) $ pip install accelerate

runwayml SD 1.5

https://huggingface.co/runwayml/stable-diffusion-v1-5

Conda なしで Stable Diffusion するでも、やりましたね。

from diffusers import DiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipe = DiffusionPipeline.from_pretrained(model_id)
pipe = pipe.to("mps")

# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()

prompt = "A cup of coffee, plain background, art by Hokusai."

image = pipe(prompt).images[0]
image.save("a-cup-of-coffee.png")

実行します。

(venv-sd) $ python main.py

この環境では 30秒程度で画像を１枚生成できました。

まだ seed 固定していないので、実行するたびに異なる絵ができます。

A cup of coffee 1 A cup of coffee 2

512 x 512 の大きさの画像が生成されます。ここでは 320 x 320 にリサイズして掲載しています。

runwayml SD 1.5（シードを指定）

シードを指定できるように修正したコード main.py:

import torch
from diffusers import DiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
seed = 45
device = "mps"
prompt = "A cup of coffee, plain background, art by Hokusai."

pipe = DiffusionPipeline.from_pretrained(model_id)
pipe = pipe.to(device)

# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()

generator = torch.Generator(device).manual_seed(seed)
image = pipe(prompt, guidance_scale=7.5, generator=generator).images[0]
image.save("a-cup-of-coffee.png")

実行すると、当たり前ですが、同じモデル・同じシード値を使ったので、以前のエントリー Conda なしで Stable Diffusion するで生成したのと全く同じ画像ができました。

A cup of coffee 3

stabilityai SD-2

https://huggingface.co/stabilityai/stable-diffusion-2

main.py

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

model_id = "stabilityai/stable-diffusion-2"

scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler)

pipe = pipe.to("mps")

#prompt = "A cup of coffee, plain background, art by Hokusai."
prompt = "A cup of coffee, simple background, art by Hokusai."

image = pipe(prompt).images[0]
image.save("a-cup-of-coffee.png")

SD 1.5 モデルより生成に倍くらいの時間がかかりました。

プロンプトで plain background を指定しても意図通りにならなかったので、代わりに simple background と指定してみました。

A cup of coffee

768 x 768 のサイズの絵を生成できます。ここでは 320 x 320 サイズにリサイズして掲載しています。

pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler)

に代えて

pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    scheduler=scheduler,
    torch_dtype=torch.float16)

とすることで（ torch_dtype=torch.float16 を指定）、少し速く画像を生成できるようになりました。

コード全体では次のようになりました。

import torch
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

model_id = "stabilityai/stable-diffusion-2"

scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    scheduler=scheduler,
    torch_dtype=torch.float16)

pipe = pipe.to("mps")

prompt = "A cup of coffee, simple background, art by Hokusai."
image = pipe(prompt).images[0]
image.save("a-cup-of-coffee.png")

SD 1.5 のときと同じように、シード（Seed）値を指定したければ次のようにします。

import torch
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

model_id = "stabilityai/stable-diffusion-2"
seed = 46
device = "mps"
prompt = "A cup of coffee, simple background, art by Hokusai."

generator = torch.Generator(device).manual_seed(seed)

scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler)
pipe = pipe.to(device)
pipe.enable_attention_slicing()

image = pipe(prompt, guidance_scale=7.5, generator=generator).images[0]
image.save("a-cup-of-coffee.png")

実行すると次の絵ができました。

A cup of coffee

北斎色の薄い絵になった気がする。著作権関連のことが言われだしたからその画家そのものな絵ができないように調整されたのだろうか。

stabilityai SD 2.1

https://huggingface.co/stabilityai/stable-diffusion-2-1

SD 2 の seed 指定できるコードの model_id の値を 2.1 用のモデル名に変更しただけです。

main.py

import torch
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

#model_id = "stabilityai/stable-diffusion-2"
model_id = "stabilityai/stable-diffusion-2-1"
seed = 46
device = "mps"
prompt = "A cup of coffee, simple background, art by Hokusai."

generator = torch.Generator(device).manual_seed(seed)

scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler)
pipe = pipe.to(device)
pipe.enable_attention_slicing()

image = pipe(prompt, guidance_scale=7.5, generator=generator).images[0]
image.save("a-cup-of-coffee.png")

実行すると次の絵ができました。

A cup of coffee

スケジューラーに https://huggingface.co/stabilityai/stable-diffusion-2-1のページのサンプルコードにある DPMSolverMultistepScheduler を使用するとこの環境（Silicon Mac）では真っ黒の絵しかできません。 cuda しか対応していないのかもしれません。（わかりません。）

そもそも EulerDiscreteScheduler とか DPMSolverMultistepScheduler について何もしらないで使っているので、調べてなにかわかったら、また追記します。

調べました Diffusers の Scheduler を試した

まとめ

SD をこの方法（Pytorch + Diffusers）で実行する場合、そこそこ速いシリコンマックは RTX 3060 12GB に匹敵するスピードがでることが判明した。

Controlnet についても書きたいのですが、力尽きました。後日書きます。

追伸：書きました。 Silicon Mac で Controlnet + Stable Diffusion 1.5 を試す