[Azure] Azure Cognitive Vision Services を使って自分の写真(手書き文字写真）に関する情報を取得する

元ネタ：Computer Vision API を使用します。

https://docs.microsoft.com/ja-jp/learn/modules/create-computer-vision-service-to-classify-images/

準備するもの（無償 Azure 環境）

適切な MSLearn アカウントで、https://docs.microsoft.com/ja-jp/learn/modules/create-computer-vision-service-to-classify-images/3-analyze-images サンドボックス上の【アクセスキー】を取得するところまで進みます。

素材 URL を準備しましょう。

ここでは、https://pbs.twimg.com/profile_images/1394307242325778436/pbFqHGOm_400x400.jpg を。
これは東京2020 ボランティアウェアをフル着用している画像です。

SKU s1, 東日本リージョンで試しましょう。
機能は引数指定。

curl "https://japaneast.api.cognitive.microsoft.com/vision/v2.0/analyze?visualFeatures=Categories,Description&details=Landmarks" \
-H "Ocp-Apim-Subscription-Key: $key" \
-H "Content-Type: application/json" \
-d "{'url' : 'https://pbs.twimg.com/profile_images/1394307242325778436/pbFqHGOm_400x400.jpg'}" \
| jq '.'

結果は次の通りです。

{ "categories": [ {  "name": "people_",　　←　人 "score": 0.85546875　　←　スコア 0.85546875 (85.55%)  } ], "description": { "tags": [　　　← タグ内が写真に関する情報です。 "person", "man", "wearing", "clothing", "blue", "hat", "baseball", "uniform", "holding", "dressed", "sunglasses", "skiing", "shirt", "glasses", "standing", "helmet", "young", "player", "bat", "ball", "snow" ], "captions": [ { "text": "a man wearing a blue hat",　←　青い帽子を着用した男性  "confidence": 0.9509890037764904　　←　信頼度 0.950989 (95.10%)  } ] }, "requestId": "c3451935-2c35-407f-a02e-1b2c0f0a3376", "metadata": { "height": 400, "width": 400, "format": "Jpeg" }
}

結果：信頼度 95% 「青い帽子を着用した男性」

念のため確認しましょう。

画像に alt 属性が指定されていません。ファイル名: 400x400.jpg — 青い帽子を着用した男性 (C) FXFROG.COM

不適切な画像（わいせつ/成人向け）ではないか、確認しましょう。自分の画像なのでドキドキｗ

curl "https://japaneast.api.cognitive.microsoft.com/vision/v2.0/analyze?visualFeatures=Adult,Description" \
-H "Ocp-Apim-Subscription-Key: $key" \
-H "Content-Type: application/json" \
-d "{'url' : 'https://pbs.twimg.com/profile_images/1394307242325778436/pbFqHGOm_400x400.jpg'}" \
| jq '.'

結果は次の通りです。

{ "adult": { "isAdultContent": false, ←　成人向けではない "isRacyContent": false,　←　わいせつ物ではない "adultScore": 0.00773863960057497, "racyScore": 0.010551562532782555 }, "description": { "tags": [ "person", "man", "wearing", "clothing", "blue", "hat", "baseball", "uniform", "holding", "dressed", "sunglasses", "skiing", "shirt", "glasses", "standing", "helmet", "young", "player", "bat", "ball", "snow" ], "captions": [ { "text": "a man wearing a blue hat", "confidence": 0.9509890037764904 } ] }, "requestId": "9268dc3e-3cef-4489-a1fc-7597553c235d", "metadata": { "height": 400, "width": 400, "format": "Jpeg" }
}
azureuser@Azure:~$

Azure Computer Vision API v2.0 を使用できるリージョンや機能情報について

https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa

折角なので、Computer Vision API でサムネイル画像 (100 x 100) を作成し、Storage Explorer で入手しましょう。

Azure Portal / ストレージアカウント【Storage Explorer（プレビュー）】を使い生成したサムネイル画像をダウンロード。

サムネ作成結果

画像に含まれるテキストを抽出し文字認識させてみる①

同じ画像（サムネではなく、オリジナルの 400 x 400) を使ってみますが、人間でも厳しいかな。

curl "https://japaneast.api.cognitive.microsoft.com/vision/v2.0/ocr" \
-H "Ocp-Apim-Subscription-Key: $key" \
-H "Content-Type: application/json" \
-d "{'url' : 'https://pbs.twimg.com/profile_images/1394307242325778436/pbFqHGOm_400x400.jpg'}" \ | jq '.'

結果は次の通りです。

curl "https://japaneast.api.cognitive.microsoft.com/vision/v2.0/ocr" \
-H "Ocp-Apim-Subscription-Key: $key" \
-H "Content-Type: application/json" \
-d "{'url' : 'https://pbs.twimg.com/profile_images/1394307242325778436/pbFqHGOm_400x400.jpg'}" \ | jq '.'

やはり、厳しいですね。素材を換えてみますね。

画像に含まれるテキストを抽出し文字認識させてみる②

https://i1.wp.com/www.fxfrog.com/wp-content/uploads/2021/04/AzureFunction_HttpTrigger1_1.png

これなら、テキスト情報だらけなので【抽出できる】期待が高いです。

curl "https://japaneast.api.cognitive.microsoft.com/vision/v2.0/ocr" \
-H "Ocp-Apim-Subscription-Key: $key" \
-H "Content-Type: application/json" \
-d "{'url' : 'https://i1.wp.com/www.fxfrog.com/wp-content/uploads/2021/04/AzureFunction_HttpTrigger1_1.png'}" \ | jq '.'

結果は次の通りです。

{ "language": "en", "textAngle": 0, "orientation": "Up", "regions": [ { "boundingBox": "24,16,388,65", "lines": [ { "boundingBox": "24,16,257,18", "words": [ { "boundingBox": "24,16,38,15", "text": "Ffi—L" }, { "boundingBox": "72,17,8,13", "text": ">" }, { "boundingBox": "91,17,173,17", "text": "FunctionApp20210410" }, { "boundingBox": "274,17,7,13", "text": ">" } ] }, { "boundingBox": "293,17,93,17", "words": [ { "boundingBox": "293,17,93,17", "text": "HttpTrigger1" } ] },以下省略

手書き文字認識にも挑戦します。

横向きだけど、多分イケると思った（根拠はない）。

curl "https://japaneast.api.cognitive.microsoft.com/vision/v2.0/recognizeText?mode=Handwritten" \
-H "Ocp-Apim-Subscription-Key: $key" \
-H "Content-Type: application/json" \
-d "{'url' : 'https://pbs.twimg.com/media/E2p4wPpUYAIVXGL?format=jpg&name=large'}" \
-D -

結果は次の通りです。

HTTP/1.1 202 Accepted
Content-Length: 0
Operation-Location: https://japaneast.api.cognitive.microsoft.com/vision/v2.0/textOperations/c3bf264b-6d00-44ef-a71d-650b1acacf53
x-envoy-upstream-service-time: 736
CSP-Billing-Usage: CognitiveServices.ComputerVision.Transaction=1
apim-request-id: c3bf264b-6d00-44ef-a71d-650b1acacf53
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
warn-code: 299
warn-agent: -
warn-text: Computer Vision Recognize Text is retired and will be removed by 16 November 2020. Please migrate toComputer Vision's Read Operation. See the following page for added details: https://docs.microsoft.com/en-us/azure/cognitive-services/Computer-vision/concept-recognizing-text
Date: Sun, 30 May 2021 18:03:03 GMT

サービス終了していたと！！！　ORZ ORZ　ORZ

想定していた JSON を受け取れず。

納得がいかないので出力された追加情報を参照する。

https://docs.microsoft.com/en-us/azure/cognitive-services/Computer-vision/concept-recognizing-text

READ API としてマージ済み。
https://centraluseuap.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/5d986960601faab4bf452005

Supported file formats: JPEG, PNG, BMP, PDF, and TIFF
For PDF and TIFF files, up to 2000 pages (only first two pages for the free tier) are processed.
The file size must be less than 50 MB (6 MB for the free tier) and dimensions at least 50 x 50 pixels and at most 10000 x 10000 pixels.

↑ 多分、Azure 認定試験とかに出そうなので今後の学習に役立てたい。
今回最後に演習したシナリオを実行するには Read API が必要である。

やり直してみる。新しい画像（正向きで）とともに。

画像：　https://pbs.twimg.com/media/E2p78cfVgAAaHER?format=jpg

次の２つは、エンドポイントの呼び方が異なるもの。

curl "https://japaneast.api.cognitive.microsoft.com/vision/v2.0/ocr?mode=Handwritten" \
-H "Ocp-Apim-Subscription-Key: $key" \
-H "Content-Type: application/json" \
-d "{'url' : 'https://pbs.twimg.com/media/E2p78cfVgAAaHER?format=jpg'}" \
-D -

curl "https://japaneast.api.cognitive.microsoft.com/vision/v3.2/read/analyze" \
-H "Ocp-Apim-Subscription-Key: $key" \
-H "Content-Type: application/json" \
-d "{'url' : 'https://pbs.twimg.com/media/E2p4wPpUYAIVXGL?format=jpg&name=large'}" \
-D -

うむ、うまく行かない。。そして、サンドボックスの時間切れ。