Wednesday, July 5, 2023

Stable Diffusion で画像を生成（Cannyエッジを経由）

コーヒーカップの写真を撮ってそこから欲しい画像を作り出す実験です。

a photogenic coffee cup

ここでは 768 x 768 サイズの画像を使用しています。掲載画像は 320 x 320 にリサイズしています。

Linux(Ubuntu 22.04) + CUDA の環境

Stable Diffusion するための環境構築については過去のエントリーをご覧ください。

Canny エッジ処理する

出発点となる画像です。

a-coffee-cup

実際に使用した元画像は 768x768 サイズの画像です。

この画像を Canny を使ってエッジを抜き出した画像にします。

canny.py

import cv2 as cv

input_image_file  = "a-coffee-cup_768x768.png"
output_image_file = "canny_a-coffee-cup.png"

image = cv.imread(input_image_file, cv.IMREAD_GRAYSCALE)
edges = cv.Canny(image,40,80)
cv.imwrite(output_image_file, edges)

閾値の 40, 80 は適当に選びました。

生成された画像:

a coffee cup

Canny エッジ処理された画像から画像を生成（Controlnet を使用）

次に Controlnet を使って、Canny エッジ画像からプロンプトとともに画像を生成します。

詳細はこちら https://huggingface.co/lllyasviel/sd-controlnet-canny をご覧ください。

txt-and-img2img.py

import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

def load_image(image_file_path):
    with open(image_file_path, "rb") as f:
        image = Image.open(f).convert("RGB")
    return image


device = "cuda"

cn_model_id = "lllyasviel/sd-controlnet-canny"
sd_model_id = "runwayml/stable-diffusion-v1-5"
seed = 42
input_image_file  = "canny_a-coffee-cup.png"
output_image_file = "a-photogenic-coffee-cup.png"

prompt = "photogenic, a cup of coffee"

controlnet = ControlNetModel.from_pretrained(
    cn_model_id,
    torch_dtype=torch.float16)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    sd_model_id,
    controlnet=controlnet,
    torch_dtype=torch.float16).to(device)


init_image = load_image(input_image_file)
#init_image.thumbnail((768, 768))

generator = torch.Generator(device).manual_seed(seed)

image = pipe(
    prompt=prompt,
    image=init_image,
    guidance_scale=7.5,
    generator=generator).images[0]

image.save(output_image_file)

生成された画像:

a photogenic coffee cup

プロンプトを以下のように変更して再度生成してみます。

prompt = "hand drawing art, a cup of coffee"

a hand drawing coffee cup

このように写真風ではなく手描き風の画像も生成できます。

さらに img2img する

この a-photogenic-coffee-cup.png から、さらに img2img を使って画像を生成してみます。

a photogenic coffee cup

img2img.py

import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionImg2ImgPipeline

def load_image(image_file_path):
    with open(image_file_path, "rb") as f:
        image = Image.open(f).convert("RGB")
    return image


device = "cuda"
sd_model_id = "stabilityai/stable-diffusion-2-1-base"

seed = 43
input_image_file = "a-photogenic-coffee-cup.png"
output_image_file = "a-photogenic-blue-coffee-cup.png"

prompt = "a blue color cup"

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    sd_model_id,
    revision="fp16",
    torch_dtype=torch.float16).to(device)

init_image = load_image(input_image_file)
#init_image.thumbnail((768, 768))

generator = torch.Generator(device).manual_seed(seed)

with torch.autocast("cuda"):
    image = pipe(
        prompt,
        image=init_image,
        guidance_scale=7.5,
        strength=0.75,
        generator=generator).images[0]

image.save(output_image_file)

生成された画像:

a photogenic blue coffee

プロンプトを以下のように変更して再度生成してみます。

prompt = "a black color cup"

a photogenic black coffee

うーん、カップの色を「青」または「黒」一色に変更したかったのですが、意図通りにはいきませんでした。プロンプトを工夫したり、seed値を変更すればうまくいくのかもしれません。

まとめ

実在画像から展開して少しテイストの異なる画像を作り出すことができました。ただし、自分が期待した画像を得るにはプロンプトを工夫したり、シード値を変更する必要がありそう。簡単に期待する画像を得られるわけではない。