FLUX.1 [Schnell] を試した

A cup of coffee

FLUX.1 [Schnell]を stable-diffusion.cppを使ってプロンプトで指定の文字列の入った画像を生成してみた。手順はここ https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/flux.md にある通り。

Apple M1 Max を使って 512x512 サイズの画像を出力するのに 1分半程度、 768x768 サイズの場合は2分半程度です。

macOS で実行する場合 Metal 対応したオプションで sd コマンドをビルドして、それを使う必要があります。もし非対応の sd コマンドを使うと1枚の画像生成に10分以上時間がかかります（画像サイズに依存）。

冒頭の画像を生成したプロンプト:

a cup of coffee on the small wood dining table with a candle at night in the small lodge,
the cup side printed 'TULLY'S COFFEE',
high quality detail, art anime style,
best quality, 4k resolution"

次のようなスクリプトで出力できます。

#!/bin/bash
prompt="a cup of coffee on the small wood dining table with a candle at night in the small lodge, the cup side printed 'TULLY'S COFFEE', high quality detail, art anime style, best quality, 4k resolution"

wh=512
seed=46

outfile="a-cup-of-coffee.png"

dir=/path/to/models

model=flux1-schnell-q8_0.gguf
vae=ae.safetensors
clip_l=clip_l.safetensors
t5xxl=t5xxl_fp16.safetensors

./bin/sd \
  --output $outfile \
  --seed $seed \
  --diffusion-model $dir/$model \
  --vae $dir/$vae \
  --clip_l $dir/$clip_l \
  --t5xxl $dir/$t5xxl \
  --cfg-scale 1 \
  --steps 6 \
  --sampling-method euler \
  -H $wh -W $wh \
  -p "$prompt"

dir=/path/to/models の部分は実行する環境に合わせて書きかえてください。

追伸 MacBook Air M1 でリリースされているバイナリを使う方法

MacBook Air M1 (Sonoma 14.6.1) でどのくらいの時間で画像生成できるか試した。 stable-diffusion.cpp のリリースページから現時点の latest master-e410aeb の sd-master--bin-Darwin-macOS-14.6.1-arm64.zip を使いました。

ダウンロードしたら unzip で展開して xattr して警告を回避します。

$ unzip sd-master--bin-Darwin-macOS-14.6.1-arm64.zip
$ xattr -rd com.apple.quarantine ./sd
$ xattr -rd com.apple.quarantine ./libstable-diffusion.dylib

また libstable-diffusion.dylib へのパスを通すために:

export DYLD_LIBRARY_PATH=/path/to

としてこの dylib へのパスを通しておく必要がありました。

結局、次のようなスクリプト myscript.sh を使って画像を生成しました。量子化されているモデルもサイズの小さい flux1-schnell-q2_k.gguf を使って試しました。

#!/bin/bash

export DYLD_LIBRARY_PATH=/path/to

prompt="a cup of coffee on the small wood dining table with a candle at night in the small lodge, the cup side printed 'TULLY'S COFFEE', high quality detail, art anime style, best quality, 4k resolution"

wh=512
seed=46

outfile="a-cup-of-coffee.png"

dir=/path/to/models

#model=flux1-schnell-q8_0.gguf
model=flux1-schnell-q2_k.gguf
vae=ae.safetensors
clip_l=clip_l.safetensors
t5xxl=t5xxl_fp16.safetensors

./sd \
  --output $outfile \
  --seed $seed \
  --diffusion-model $dir/$model \
  --vae $dir/$vae \
  --clip_l $dir/$clip_l \
  --t5xxl $dir/$t5xxl \
  --cfg-scale 1 \
  --steps 6 \
  --sampling-method euler \
  -H $wh -W $wh \
  -p "$prompt"

export DYLD_LIBRARY_PATH=/path/to と dir=/path/to/models の部分は実行する環境に合わせて書きかえてください。

実行すると、14分近くかかりました。画像サイズを 512x512 指定していますが、64の倍数のサイズで指定できるので、256x256 など小さいサイズにすればもっと速く画像が生成できるはずです。

$ time sh ./myscript.sh
...

real	13m50.822s
user	51m32.213s
sys	0m41.151s

生成された画像:

A cup of coffee (2)

flux1-schnell-q8_0.gguf を使った場合と少し異なる画像ができました。

以上です。