Python Image Processing on Azure Databricks – Part 3, Text Recognition

We will conclude this image processing series by utilizing Azure Cognitive Services to recognize text on the images we have been using in Part 1 and Part 2. Part 1 set-up Azure Databricks and then used OpenCV for image comparison. Part 2 set-up Azure Cognitive Services and then used Bing Search to retrieve an image from the web using a search query.

After this post we will have walked through all of the code to build a solution for processing and comparing images using Azure Databricks and Azure Cognitive Services. You could put them together as follows or modify to fit your requirements.

diagram

Getting Started

If you have gone through Part 1 and Part 2 then all dependencies should already be in-place. All Azure resources were added in the previous blogs, cluster was created and all libraries were attached to the cluster.

This post does not require any addition set-up steps.

Azure Databricks: Image Text Recognize Notebook

Create a new Notebook in Azure Databricks for the text recognition code – similar to the steps in Part 1 and Part 2. Attach the Notebook to a running cluster (start a cluster if one is not running).

github icon For your convenience, the following code is also available on GitHub.

A few new libraries are used to convert an image URL into an OpenCV image, to call the Azure Computer Vision SDK and for making API calls.

from skimage import io
import simplejson as json
import cv2
import matplotlib.pyplot as plt
import time
import numpy as np
from pyspark.sql import *
import re
from skimage import img_as_float
from skimage import img_as_ubyte
from skimage.color import rgba2rgb
import http.client, urllib.request, urllib.parse, urllib.error, base64


from azure.cognitiveservices.vision.computervision import ComputerVisionAPI
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from azure.cognitiveservices.search.imagesearch import ImageSearchAPI
from azure.cognitiveservices.search.imagesearch.models import ImageType, ImageAspect, ImageInsightModule
from msrest.authentication import CognitiveServicesCredentials


Images and search sites can be dynamic, here I am using a list and dict; many other options could be used to instead.


products = [{'Name': 'PAM Original Cooking Spray, 6 Ounce', 'File': 'PAM_Original_6_OZ.jpg' }]

sites = ['walmart.com','target.com']
The keys can be found by clicking ‘Keys’ on each Cognitive Services resource. You will need to know which region you created the resource and pass this into the ComputerVisionRegion placeholder, (such as, westus or eastus, etc.).

Variable used in code: Image Folder and Subscription Keys, and Computer Visions Location ‘region’

You can see additional Computer Vision features by visiting:

https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/


IMAGES_FOLDER = "/dbfs/mnt/images/"

IMG_SEARCH_SUBSCRIPTION_KEY = "[BingSearchKey]"

COMP_VIS_SUBSCRIPTION_KEY = "[ComputerVisionKey]"
COMPUTERVISION_LOCATION = os.environ.get("COMPUTERVISION_LOCATION", "[ComputerVisionRegion]")

Create Image Directory – only needs to run once


#%fs
#mkdirs "/mnt/images/"
dbutils.fs.mount(source = "wasbs://[container]@[storage-account].blob.core.windows.net/",mount_point = "/mnt/images/",extra_configs = {"fs.azure.account.key.[storage-account].blob.core.windows.net": "[storage-account-key]"})

Functions


Retrieve a list of image search results using Cognitive Services Image Search API. Arguments: search query string and subscription key.


def retrieve_images(search,key):
  client = ImageSearchAPI(CognitiveServicesCredentials(key))
 
  try:
    image_results = client.images.search(query=search)
    
    print("Search images for query " + search)
    return image_results
  except Exception as err:
    print("Encountered exception. {}".format(err))
    return null

Return single image from search results – for simplicity bring back first item.


def retrieve_first_img_url(image_results):
  if image_results.value:
    first_image_result = image_results.value[0] #grab first image from search results
    print("Image result count: {}".format(len(image_results.value)))
    print(first_image_result.content_url)
    
    url = first_image_result.content_url 
    #remove extra args from url, just grab upto image
    url_clean = re.match(r'.*(png|jpg|jpeg)',url,re.M|re.I).group(0)
    print(url_clean)
    
    return url_clean

Convert URL into OpenCV image – this is important for image comparison using OpenCV


def url_to_image(url): 
    img = io.imread(url) #read url
    
    img = rgba2rgb(img) #remove alpha
    cv_image =cv2.cvtColor(img_as_ubyte(img), cv2.COLOR_RGB2BGR) #convert from skimage to opencv

    # return the image
    return cv_image

plot function to simplify code


def plot_img(figtitle,subtitle,img1,img2,site):
  fig = plt.figure(figtitle, figsize=(10, 5))
  plt.suptitle(subtitle,fontsize=24)
  ax = fig.add_subplot(1, 2, 1)
  ax.set_title("Base",fontsize=12)
  plt.imshow(img1)
  plt.axis("off")
  ax = fig.add_subplot(1, 2, 2)
  ax.set_title(site,fontsize=12)
  plt.imshow(img2)
  plt.axis("off")

  display(plt.show())

Text Retrieve Functions


retrieve_text_from_img – Uses SDK function recognize_text_in_stream()

Recognize Text operation. When you use the Recognize Text interface, the response contains a field called “Operation-Location”. The “Operation-Location” field contains the URL that you must use for your Get Handwritten Text Operation Result operation

Computer Vision SDK Python Docs

The code then captures the JSON results using the ‘Operation-Location’ URL.


def retrieve_text_from_img(img):
    client = ComputerVisionAPI(COMPUTERVISION_LOCATION, CognitiveServicesCredentials(COMP_VIS_SUBSCRIPTION_KEY))
    
    #raw - returns the direct response alongside the deserialized response
    with open(os.path.join(IMAGES_FOLDER, img), "rb") as image_stream:
        txt_analysis2=client.recognize_text_in_stream(image_stream,raw=True)
    
    #give Computer Vision some time to process image, could also be a while loop checking status (20s is arbitrary) 
    time.sleep(20)
    
    #Operation-Location contains url to results, use it to get the processed JSON results
    headers = {'Ocp-Apim-Subscription-Key':COMP_VIS_SUBSCRIPTION_KEY}

    url = txt_analysis2.response.headers['Operation-Location']

    return json.loads(requests.get(url, headers=headers).text)

retrieve_text_from_url – Uses SDK function recognize_text()

Recognize Text operation. When you use the Recognize Text interface, the response contains a field called “Operation-Location”. The “Operation-Location” field contains the URL that you must use for your Get Handwritten Text Operation Result operation

Computer Vision SDK Python Docs

The code then captures the JSON results using the ‘Operation-Location’ URL.


def retrieve_text_from_url(imgurl):
    client = ComputerVisionAPI(COMPUTERVISION_LOCATION, CognitiveServicesCredentials(COMP_VIS_SUBSCRIPTION_KEY))
    txt_analysis2=client.recognize_text(imgurl,raw=True, mode='Printed')
    
    #give Computer Vision some time to process image, could also be a while loop checking status (20s is arbitrary)  
    time.sleep(20)
    
    #Operation-Location contains url to results, use it to get the processed JSON results
    headers = {'Ocp-Apim-Subscription-Key':COMP_VIS_SUBSCRIPTION_KEY}

    url = txt_analysis2.response.headers['Operation-Location']

    return json.loads(requests.get(url, headers=headers).text)

The preview OCR engine is much better, but I don’t believe it is accessible yet thought the SDK. You can use the following function to call the V2 API, you will need to replace the region placeholder with you region.

This API is currently available in:

  • West US – westus.api.cognitive.microsoft.com
  • West US 2 – westus2.api.cognitive.microsoft.com
  • East US – eastus.api.cognitive.microsoft.com
  • East US 2 – eastus2.api.cognitive.microsoft.com
  • West Central US – westcentralus.api.cognitive.microsoft.com
  • South Central US – southcentralus.api.cognitive.microsoft.com
  • West Europe – westeurope.api.cognitive.microsoft.com
  • North Europe – northeurope.api.cognitive.microsoft.com
  • Southeast Asia – southeastasia.api.cognitive.microsoft.com
  • East Asia – eastasia.api.cognitive.microsoft.com
  • Australia East – australiaeast.api.cognitive.microsoft.com
  • Brazil South – brazilsouth.api.cognitive.microsoft.com

retrieve_text_from_url_v2 – The new preview OCR engine (through “Recognize Text” API operation) has even better text recognition results for English. The SDK used here is for V1 (which could change soon), we can specify V2 in the API call and compare the results.

Computer Vision V2 API


def retrieve_text_from_url_v2(imgurl):
  
  
  headers = {
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': COMP_VIS_SUBSCRIPTION_KEY
  }

  #pass in mode and set raw equal to 'true'
  params = urllib.parse.urlencode({
    'mode': 'Printed',
    'raw':'True'
  })

  try:
    conn = http.client.HTTPSConnection('[ComputerVisionRegion].api.cognitive.microsoft.com')
    conn.request("POST", "/vision/v2.0/recognizeText?%s" % params, "{'url':'" + imgurl + "'}" , headers)
    response = conn.getresponse()
    ol = response.headers.get('Operation-Location')
    conn.close()
  except Exception as e:
    print("[Errno {0}] {1}".format(e.errno, e.strerror))
  
  #give Computer Vision some time to process image, could also be a while loop checking status (30s is arbitrary)
  time.sleep(30)
    
  #clear parms
  params = urllib.parse.urlencode({})

  try:
    conn = http.client.HTTPSConnection('[ComputerVisionRegion].api.cognitive.microsoft.com')
    conn.request("GET", "/vision/v2.0/textOperations/" + ol.split('/')[-1] + "/?%s" % params, "" , headers)
    response = conn.getresponse()
    data = response.read()
    conn.close()
  except Exception as e:
    print("[Errno {0}] {1}".format(e.errno, e.strerror))
    
  return json.loads(data)

#Use the first product and first site

product = products[0]
site = sites[0]

print("Product: " + product['Name'] + "\nSite: " + site)
Product: PAM Original Cooking Spray, 6 Ounce Site: walmart.com

Retrieve img1 from blob storage (mounted dir) and img2 from Image Search API


print(product['Name'] + ":" + site)

img1 = IMAGES_FOLDER  + product['File']
orig_img =  cv2.imread(img1)

image_results = retrieve_images("site:" + site + " \"" +  product['Name'] + "\"",IMG_SEARCH_SUBSCRIPTION_KEY)
img2 = retrieve_first_img_url(image_results)
    
comp_img = url_to_image(img2)

plot_img("Image Compare" + site,"Original Images",cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB),cv2.cvtColor(comp_img, cv2.COLOR_BGR2RGB),site)

original_images_walmart

Retrieve Text From Images

Retrieve text from first image, display words from JSON

words1 = retrieve_text_from_img(img1)
for b in words1['recognitionResult']['lines']:
    print(b['text'])
Accept header absent and forced to application/json
NO – STICK COOKING SPRAY
AN
HE
HIM
PRESER
COLORS
WITH RENEWABLE
OVER PRESSURE
PRECAUTIONS OH
MERLENE

Retrieve text from second image (url), display words from JSON

words2 = retrieve_text_from_url(img2)
for b in words2['recognitionResult']['lines']:
    print(b['text'])
Accept header absent and forced to application/json
NO- STICK COOKING SPRAY
AN
One
HIM
PRESED
COLORS
WITH ROMANABLE
UNDERPRESSURE
BUTCARSON
MEALONE
IF

Retrieve text from second image (url) using the V2 API, display words from JSON.

words3 = retrieve_text_from_url_v2(img2)
for b in words3['recognitionResult']['lines']:
    print(b['text'])
NO-STICK COOKING SPRAY
PAM
ORIGINAL
made CANOLA OIL BLEND
PRESER
NO
S3ALLVAX
ARTIFICIAL
COLORS
ERVING SUGGESTION
CAUTION FLAMMABLE
CONTENTS UNDER PRESSURE
HEAD PRECAUTIONS ON
BACK BEFORE USING
NET WT 6 OZ (.

 

Un-mount the images directory when done.
dbutils.fs.unmount("dbfs:/mnt/images/")

 

Conclusion

With the SDK it is very easy to capture the text on an image (local or url). As you can see the OCR engine preview with the V2 API does a much better job at recognizing text. Using the V2 API we get the NET WT on the Pam product along with addition details not captured by the SDK function calls, which are using the V1 API.



Categories: AI, Azure Cognitive Services, Databricks, Python

Tags: , , , , , ,

1 reply

Trackbacks

  1. Python Image Processing on Azure Databricks – Part 1, OpenCV Image Compare | Stochastic Coder

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: