Stable Diffusion をWebサーバ経由で実行する（Bottle を使用）

前回のエントリーで Hugging Face の Diffusers ライブラリを使用してテキストから画像を生成しました。今回はその続きで、それを Bottle を使ってWebサーバにしました。

venv 環境は前回作成した diffuers 環境をそのまま引き継いでいる点に注意してください。（OS は M1 macOS ではなく Linux(Ubuntu) を使います。）

a cup of coffee (1)

生成される画像は 512 x 512 ですが、ここでは 320 x 320 にリサイズして載せています。

Bottle をインストール

(diffusers) $ pip install bottle

その上で、前回のコードに Bottle をつかって /txt2img のエンドポイントを追加します。

sd-web-server.py

import torch
from diffusers import DiffusionPipeline
from io     import BytesIO
from bottle import Bottle, run, request, HTTPResponse

def toImage(prompt, seed, device):
    generator = torch.Generator(device).manual_seed(seed)
    with torch.autocast("cuda"):
        image = pipe(prompt, guidance_scale=7.5, generator=generator).images[0]
    return image


model_id = "stabilityai/stable-diffusion-2-1-base"
seed = 45
device = "cuda"

pipe = DiffusionPipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipe = pipe.to(device)

# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()


app = Bottle()

@app.post('/txt2img')
def txt2img():
    prompt = request.body.getvalue().decode('utf-8')
    image = toImage(prompt, seed, device)

    output = BytesIO()
    image.save(output, format="PNG")
    contents = output.getvalue()
    output.close()

    res = HTTPResponse(status=200, body=contents)
    res.set_header('Content-Type', 'image/png')
    return res

run(app, host='localhost', port=8080, debug=True)

ローカル環境でのみ使用する場合は、このコードの通り（ host='localhost' ）で良い。しかし、このサーバを使うクライアントを別マシンから実行する場合は、以下のように、明示的に動かしているサーバのIPアドレスを指定する必要がありました。（サーバのIPアドレスがが 192.168.10.100 の場合）
run(app, host='192.168.10.100', port=8080, debug=True)

必要であれば、 stabilityai/stable-diffusion-2-1-base に代えて、別のモデルを使うこともできます。

model_id = "runwayml/stable-diffusion-v1-5"
model_id = "CompVis/stable-diffusion-v1-4"

それではWebサーバを起動して作動を確かめます。

(diffusers) $ python sd-web-server.py

このWebサーバにアクセスして、テキストから画像を生成する場合は以下のようにします。

$ curl -X POST "http://localhost:8080/txt2img"\
    -d 'A cup of coffee, plain background, art by Hokusai.'\
    -H "Content-Type: text/plain"\
    -o coffee.png

これでカレントディレクトリに coffee.png が生成されます。

a cup of coffee, seed 45

seed 値を固定しないでランダムにする

このサーバでは現状は seed 値を 45 に固定しています。したがって、同じテキスト（プロンプト）に対して、同じ画像が生成されます。

そうではなく、毎回ランダムな seed 値にて異なる画像を得たい場合は、以下のようにします。

import random
my_random_seed = int(random.uniform(0, 999))

0..999 のランダムな値が得られるので、これを seed 値として使います。

このランダム値を実際に反映するには、sd-web-server.py の def txt2img 関数を以下のように書きかえます。

@app.post('/txt2img')
def txt2img():
    prompt = request.body.getvalue().decode('utf-8')
    my_random_seed = int(random.uniform(0, 999))
    image = toImage(prompt, my_random_seed, device)

このようにして同じプロンプト「A cup of coffee, plain background, art by Hokusai.」から得られた画像:

a cup of coffee (2)

a cup of coffee (3)

a cup of coffee (4)

seed値をランダムにしたので、（たとえ同じプロンプトでも）実行するたびに異なる画像を得ることができます。

まとめ

Webサーバにすることで、処理にかかる時間が半分くらい（体感では 10秒から15秒くらい）になるのが助かります。それは、以下の部分が Webサーバ起動時に一度だけ実行されるためです。

pipe = DiffusionPipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipe = pipe.to(device)

# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()

また、別のマシンから curl すれば画像が取得できるようになるのが地味に便利です。さらに、curl コマンドをシェルスクリプトに並べて記述しておけば、一度にたくさんのバリエーションの画像をバッチ実行できるので、とても便利です。ただ、それをやると怖いくらいに GPU のファンが回ります。

最後に、完成した（ランダム seedつき）コード掲載します。

sd-web-server-random.py

import torch
from diffusers import DiffusionPipeline
from io     import BytesIO
from bottle import Bottle, run, request, HTTPResponse
import random

def toImage(prompt, seed, device):
    generator = torch.Generator(device).manual_seed(seed)
    with torch.autocast("cuda"):
        image = pipe(prompt, guidance_scale=7.5, generator=generator).images[0]
    return image


model_id = "stabilityai/stable-diffusion-2-1-base"
device = "cuda"

pipe = DiffusionPipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipe = pipe.to(device)

# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()


app = Bottle()

@app.post('/txt2img')
def txt2img():
    prompt = request.body.getvalue().decode('utf-8')
    my_random_seed = int(random.uniform(0, 999))
    image = toImage(prompt, my_random_seed, device)

    output = BytesIO()
    image.save(output, format="PNG")
    contents = output.getvalue()
    output.close()

    res = HTTPResponse(status=200, body=contents)
    res.set_header('Content-Type', 'image/png')
    return res

run(app, host='localhost', port=8080, debug=True)

追記

いろいろ試していて気に入った画像ができたので、記録しておく。

a cup of flat white

model : stabilityai/stable-diffusion-2-1-base
seed : 45
prompt : A cup of Flat white, plain background, monotone, hand drawing art by Hokusai.