Monday, July 24, 2023

DeepLab v3 Semantic Segmentation を TensorFlow.js で試す（その１）

semantic dog image segmentation

Semantic Segmentation in the Browser: DeepLab v3 Model を起点にあれこれ調べた結果を書き残します。

トレーニング済みのモデルとして提供されている次の３つ pascal, cityscapes, ade20k が使えますが、 pascal を試します。

dog jpg image

それではやってみましょう。

環境の確認

$ node --version
v18.17.0
$ npm --version
9.6.7

Node.js のプロジェクトディレクトリを作成して準備

$ mkdir my-deeplab-v3
$ cd my-deeplab-v3
$ npm init -y

必要なモジュールを入れる

$ npm install @tensorflow/tfjs-node
$ npm install @tensorflow-models/deeplab

ラベルを確認

以下のコードで、モデル pascal の場合のラベル名と対応する番号を確認します。

const deeplab = require('@tensorflow-models/deeplab')

const range = (v)=>{ return [...Array(v).keys()] }

const labels = deeplab.getLabels('pascal')
range(labels.length).forEach((index)=>{
    console.log(`${index} : ${labels[index]}`)
})

実行します。

0 : background
1 : aeroplane
2 : bicycle
3 : bird
4 : boat
5 : bottle
6 : bus
7 : car
8 : cat
9 : chair
10 : cow
11 : dining table
12 : dog
13 : horse
14 : motorbike
15 : person
16 : potted plant
17 : sheep
18 : sofa
19 : train
20 : TV

犬は 12 です、これを覚えておきます。

対象画像をロード

期待されている入力画像の tensor 形式(shape)は [縦, 横, 3] です。これは TensorFlow.js で JPG 画像を読んだ場合の普通の shape と同じなので、以下のように dog.jpg 画像を tf.node.decodeImage を使って読み込みます。

const dogImagePath = 'dog.jpg'
const dogImage = fs.readFileSync(dogImagePath)
const dogImageTensor = tf.node.decodeImage(dogImage)
console.log(dogImageTensor.shape) // [ 256, 256, 3 ]

deeplab v3 の pascal モデルをロードして推測

次のモデルをロードして、推測します。

@tensorflow-models/deeplab のモジュールには model.segment という簡単に扱えるようにした関数が用意されているようですが、原始的に model.predict の関数を使ってどんな結果が出るか確認します。

const loadModel = async () => {
    const modelName = 'pascal'   // set to your preferred model, either `pascal`, `cityscapes` or `ade20k`
    const quantizationBytes = 4  // either 1, 2 or 4
    return await deeplab.load({base: modelName, quantizationBytes})
}

loadModel().then((model)=>{
    // 画像をロードする.
    const dogImagePath = 'dog.jpg'
    const dogImage = fs.readFileSync(dogImagePath)
    const dogImageTensor = tf.node.decodeImage(dogImage)
    console.log(dogImageTensor.shape)

    // 推測する.
    const rawSegmentationMap = model.predict(dogImageTensor)
    console.log(rawSegmentationMap)

    // 後始末する.
    dogImageTensor.dispose()
    rawSegmentationMap.dispose()
})

loadModel して返される model は SemanticSegmentation です。詳細は ./node_modules/@tensorflow-models/deeplab/dist/index.js を見ましょう。

実行すると、以下の推測結果の tensor が出力されます。

Tensor {
  kept: false,
  isDisposedInternal: false,
  shape: [ 513, 513 ],
  dtype: 'int32',
  size: 263169,
  strides: [ 513 ],
  dataId: {},
  id: 485,
  rankType: '2',
  scopeId: 2
}

int32 の [513, 513] になっています。

256x256 のサイズの画像を与えたが結果が 513x513 になって返ってきた。これでいいのかな？（わからない）もしかすると 513x513 サイズの画像を入力として使うことを想定しているのかもしれない。違うかもしれない。とりあえずこのまま先に進む。

結果の tensor つまり rawSegmentationMap に 12 の値が入っているか調べて見る。

12 の値のピクセルを見つけたらカウントアップするコード:

// jsArray へ変換
const jsArray = rawSegmentationMap.arraySync()

// x,y ごとにピクセルの値を調べる
let dogPixelCount = 0
range(513).forEach((y)=>{
    range(513).forEach((x)=>{
        const predictedPixelValue = jsArray[y][x]
        if( predictedPixelValue==12 ){
            dogPixelCount += 1
        }
    })
})
console.log(dogPixelCount)

y, x の順番に注意

結果として 44886 のピクセルが犬（12）となりました。確かに犬を検出しているようです。

意図通りの位置で検出できているか調べるため、 Jimp を使ってビジュアル化（画像に変換）します。

$ npm install jimp

513x513 の黒塗りしたイメージを作成して、 12 の番号が出現したピクセルのみ白に変えます。

const image = new Jimp(513, 513, 'black', (err, image) => {})
const imageW = image.bitmap.width
const imageH = image.bitmap.height
image.scan(0, 0, imageW, imageH, (x, y, idx)=> {
    const predictedPixelValue = jsArray[y][x]

    if( predictedPixelValue==12 ){
        image.bitmap.data[idx + 0] = 255 // red
        image.bitmap.data[idx + 1] = 255 // green
        image.bitmap.data[idx + 2] = 255 // blue
        image.bitmap.data[idx + 3] = 255 // alpha
    }
})

// 元の画像サイズ(256x256) に戻してJPGとして書き出す.
image.resize(256, 256).write('masked-dog.jpg')

結果のマスク画像

masked dog jpg image

できました！

まとめ

完成した index.js を掲載します。

const fs = require('fs')
const tf = require('@tensorflow/tfjs-node')
const deeplab = require('@tensorflow-models/deeplab')
const Jimp = require('jimp')

const range = (v)=>{ return [...Array(v).keys()] }

/*
const labels = deeplab.getLabels('pascal')
range(labels.length).forEach((index)=>{
    console.log(`-${index} : ${labels[index]}`)
}
*/

const loadModel = async () => {
    const modelName = 'pascal'   // set to your preferred model, either `pascal`, `cityscapes` or `ade20k`
    const quantizationBytes = 4  // either 1, 2 or 4
    return await deeplab.load({base: modelName, quantizationBytes})
}

loadModel().then((model)=>{
    // 画像をロードする.
    const dogImagePath = 'dog.jpg'
    const dogImage = fs.readFileSync(dogImagePath)
    const dogImageTensor = tf.node.decodeImage(dogImage)
    console.log(dogImageTensor.shape)

    // 推測する.
    const rawSegmentationMap = model.predict(dogImageTensor)
    console.log(rawSegmentationMap.shape)

    const jsArray = rawSegmentationMap.arraySync()

    /*
    // 数える.
    let dogPixelCount = 0
    range(513).forEach((y)=>{
        range(513).forEach((x)=>{
            const predictedPixelValue = jsArray[y][x]
            if( predictedPixelValue==12 ){
                dogPixelCount += 1
            }
        })
    })
    console.log(dogPixelCount)
    */

    // 視覚化する.
    const image = new Jimp(513, 513, 'black', (err, image) => {})
    const imageW = image.bitmap.width
    const imageH = image.bitmap.height
    image.scan(0, 0, imageW, imageH, (x, y, idx)=> {
        const predictedPixelValue = jsArray[y][x]

        if( predictedPixelValue==12 ){
            image.bitmap.data[idx + 0] = 255 // red
            image.bitmap.data[idx + 1] = 255 // green
            image.bitmap.data[idx + 2] = 255 // blue
            image.bitmap.data[idx + 3] = 255 // alpha
        }
    })

    // 元の画像サイズ(256x256) に戻してJPGとして書き出す.
    image.resize(256, 256).write('masked-dog.jpg')

    // 後始末する.
    dogImageTensor.dispose()
    rawSegmentationMap.dispose()
})