Python Image Processing on Azure Databricks – Part 3, Text Recognition

We will conclude with the python image processing series by utilizing Azure Cognitive Services Computer Vision to recognize text on the images we have been using in Part 1 and Part 2. Part 1 set-up Azure Databricks and then used OpenCV for image comparison. Part 2 set-up Azure Cognitive Services and then used Bing Search to retrieve an image from the web using a search query.

After this post we will have walked through all of the code to build a solution for processing and comparing images using Azure Databricks and Azure Cognitive Services. You could put them together as follows or modify to fit your requirements.

If you have read through Part 1 and Part 2, you should have already set up all the dependencies. The previous blogs covered adding all Azure resources, creating the cluster, and attaching all the necessary libraries to it.

Azure Databricks: Image Text Recognize Notebook

Create a new Notebook in Azure Databricks for the text recognition code – similar to the steps in Part 1 and Part 2. Attach the Notebook to a running cluster (start a cluster if one is not running).

github icon For your convenience, the following code is also available on GitHub.

A few additional libraries are used to assist with the conversion of an image URL into an OpenCV image, enabling access to the Azure Computer Vision SDK through this link, and handling API calls.

from skimage import io
import simplejson as json
import cv2
import matplotlib.pyplot as plt
import time
import numpy as np
from pyspark.sql import *
import re
from skimage import img_as_float
from skimage import img_as_ubyte
from skimage.color import rgba2rgb
import http.client, urllib.request, urllib.parse, urllib.error, base64


from azure.cognitiveservices.vision.computervision import ComputerVisionAPI
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from azure.cognitiveservices.search.imagesearch import ImageSearchAPI
from azure.cognitiveservices.search.imagesearch.models import ImageType, ImageAspect, ImageInsightModule
from msrest.authentication import CognitiveServicesCredentials

Images and search sites can dynamically change. I’m using a list and a dictionary, but many other options could be used instead.

products = [{'Name': 'PAM Original Cooking Spray, 6 Ounce', 'File': 'PAM_Original_6_OZ.jpg' }]

sites = ['walmart.com','target.com']

The keys can be found by clicking ‘Keys’ on each Cognitive Services resource. You will need to know which region you created the resource and pass this into the ComputerVisionRegion placeholder, (such as, westus or eastus, etc.).

Variable used in code: Image Folder and Subscription Keys, and Computer Visions Location ‘region’.

You can see additional Computer Vision features by visiting: https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision

IMAGES_FOLDER = "/dbfs/mnt/images/"

IMG_SEARCH_SUBSCRIPTION_KEY = "[BingSearchKey]"

COMP_VIS_SUBSCRIPTION_KEY = "[ComputerVisionKey]"
COMPUTERVISION_LOCATION = os.environ.get("COMPUTERVISION_LOCATION", "[ComputerVisionRegion]")

Create the Image Directory

To ensure the existence of a directory exclusively for images, it must be created if it does not already exist.

%fs
mkdirs "/mnt/images/"

Mount Blob Storage

The provided code snippet is used to mount an Azure Storage Container, specifically a blob store. For more information, you can refer to the documentation on how to accomplish this task on the Azure Databricks website: Azure Databricks documentation on mounting Azure Storage.

dbutils.fs.mount(source = "wasbs://[container]@[storage-account].blob.core.windows.net/",mount_point = "/mnt/images/",extra_configs = {"fs.azure.account.key.[storage-account].blob.core.windows.net": "[storage-account-key]"})

Reusable Functions

We are going to develop a set of reusable functions in Python. The function below is designed to fetch a list of image search results using the Cognitive Services Image Search API. It takes two arguments: the search query string and the subscription key.

def retrieve_images(search,key):
  client = ImageSearchAPI(CognitiveServicesCredentials(key))
 
  try:
    image_results = client.images.search(query=search)
    
    print("Search images for query " + search)
    return image_results
  except Exception as err:
    print("Encountered exception. {}".format(err))
    return null

The function ‘retrieve_first_img_url’ is designed to fetch a single image from the search results, bringing back the first item for simplicity.

def retrieve_first_img_url(image_results):
  if image_results.value:
    first_image_result = image_results.value[0] #grab first image from search results
    print("Image result count: {}".format(len(image_results.value)))
    print(first_image_result.content_url)
    
    url = first_image_result.content_url 
    #remove extra args from url, just grab upto image
    url_clean = re.match(r'.*(png|jpg|jpeg)',url,re.M|re.I).group(0)
    print(url_clean)
    
    return url_clean

The next function ‘url_to_image’ is designed to convert a URL into an OpenCV image, which is a critical step for conducting image comparison using OpenCV.

def url_to_image(url): 
    img = io.imread(url) #read url
    
    img = rgba2rgb(img) #remove alpha
    cv_image =cv2.cvtColor(img_as_ubyte(img), cv2.COLOR_RGB2BGR) #convert from skimage to opencv

    # return the image
    return cv_image

The function “plot_img” straightforwardly displays images using OpenCV.

def plot_img(figtitle,subtitle,img1,img2,site):
  fig = plt.figure(figtitle, figsize=(10, 5))
  plt.suptitle(subtitle,fontsize=24)
  ax = fig.add_subplot(1, 2, 1)
  ax.set_title("Base",fontsize=12)
  plt.imshow(img1)
  plt.axis("off")
  ax = fig.add_subplot(1, 2, 2)
  ax.set_title(site,fontsize=12)
  plt.imshow(img2)
  plt.axis("off")

  display(plt.show())

Text Retrieve Functions

The function ‘retrieve_text_from_img’ utilizes the SDK function recognize_text_in_stream().

Recognize Text operation. When you use the Recognize Text interface, the response contains a field called “Operation-Location”. The “Operation-Location” field contains the URL that you must use for your Get Handwritten Text Operation Result operation

Computer Vision SDK Python Docs

The code captures the JSON results using the ‘Operation-Location’ URL. This URL serves as a reference point for accessing the operation’s status and results. By utilizing the ‘Operation-Location’ URL, the code can effectively retrieve, monitor, and process the relevant JSON data, ensuring that the operation progresses smoothly and the desired results are obtained. This approach adds a layer of efficiency and control to the process, allowing for seamless integration of the JSON results into the overarching workflow.

def retrieve_text_from_img(img):
    client = ComputerVisionAPI(COMPUTERVISION_LOCATION, CognitiveServicesCredentials(COMP_VIS_SUBSCRIPTION_KEY))
    
    #raw - returns the direct response alongside the deserialized response
    with open(os.path.join(IMAGES_FOLDER, img), "rb") as image_stream:
        txt_analysis2=client.recognize_text_in_stream(image_stream,raw=True)
    
    #give Computer Vision some time to process image, could also be a while loop checking status (20s is arbitrary) 
    time.sleep(20)
    
    #Operation-Location contains url to results, use it to get the processed JSON results
    headers = {'Ocp-Apim-Subscription-Key':COMP_VIS_SUBSCRIPTION_KEY}

    url = txt_analysis2.response.headers['Operation-Location']

    return json.loads(requests.get(url, headers=headers).text)

The function ‘retrieve_text_from_url’, similar to the above function, is designed to extract text from a URL instead of an image.

def retrieve_text_from_url(imgurl):
    client = ComputerVisionAPI(COMPUTERVISION_LOCATION, CognitiveServicesCredentials(COMP_VIS_SUBSCRIPTION_KEY))
    txt_analysis2=client.recognize_text(imgurl,raw=True, mode='Printed')
    
    #give Computer Vision some time to process image, could also be a while loop checking status (20s is arbitrary)  
    time.sleep(20)
    
    #Operation-Location contains url to results, use it to get the processed JSON results
    headers = {'Ocp-Apim-Subscription-Key':COMP_VIS_SUBSCRIPTION_KEY}

    url = txt_analysis2.response.headers['Operation-Location']

    return json.loads(requests.get(url, headers=headers).text)

The updated OCR engine ‘preview’ shows significant improvement. However, as of this writing, accessibility through the SDK has not been implemented. You can use the following function to utilize the V2 API, replacing the region placeholder with the appropriate region designation.

This API is currently available in:

West US – westus.api.cognitive.microsoft.com
West US 2 – westus2.api.cognitive.microsoft.com
East US – eastus.api.cognitive.microsoft.com
East US 2 – eastus2.api.cognitive.microsoft.com
West Central US – westcentralus.api.cognitive.microsoft.com
South Central US – southcentralus.api.cognitive.microsoft.com
West Europe – westeurope.api.cognitive.microsoft.com
North Europe – northeurope.api.cognitive.microsoft.com
Southeast Asia – southeastasia.api.cognitive.microsoft.com
East Asia – eastasia.api.cognitive.microsoft.com
Australia East – australiaeast.api.cognitive.microsoft.com
Brazil South – brazilsouth.api.cognitive.microsoft.com

‘retrieve_text_from_url_v2’ – The updated preview OCR engine, accessed via the “Recognize Text” API operation, delivers enhanced text recognition outcomes specifically for English. The software development kit (SDK) currently in use pertains to V1, although this may change in the near future. By indicating V2 in the API call, we can conduct a comparative analysis of the results.

def retrieve_text_from_url_v2(imgurl):
  
  
  headers = {
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': COMP_VIS_SUBSCRIPTION_KEY
  }

  #pass in mode and set raw equal to 'true'
  params = urllib.parse.urlencode({
    'mode': 'Printed',
    'raw':'True'
  })

  try:
    conn = http.client.HTTPSConnection('[ComputerVisionRegion].api.cognitive.microsoft.com')
    conn.request("POST", "/vision/v2.0/recognizeText?%s" % params, "{'url':'" + imgurl + "'}" , headers)
    response = conn.getresponse()
    ol = response.headers.get('Operation-Location')
    conn.close()
  except Exception as e:
    print("[Errno {0}] {1}".format(e.errno, e.strerror))
  
  #give Computer Vision some time to process image, could also be a while loop checking status (30s is arbitrary)
  time.sleep(30)
    
  #clear parms
  params = urllib.parse.urlencode({})

  try:
    conn = http.client.HTTPSConnection('[ComputerVisionRegion].api.cognitive.microsoft.com')
    conn.request("GET", "/vision/v2.0/textOperations/" + ol.split('/')[-1] + "/?%s" % params, "" , headers)
    response = conn.getresponse()
    data = response.read()
    conn.close()
  except Exception as e:
    print("[Errno {0}] {1}".format(e.errno, e.strerror))
    
  return json.loads(data)

Product Comparision

The following will grab the first product (PAM Cooking Spray) and the first web site (Walmart.com).

product = products[0]
site = sites[0]

print("Product: " + product['Name'] + "\nSite: " + site)

Product: PAM Original Cooking Spray, 6 Ounce Site: walmart.com

We will proceed to retrieve the original image from our Storage Account (mounted directory) and a similar image from the selected site.

print(product['Name'] + ":" + site)

img1 = IMAGES_FOLDER  + product['File']
orig_img =  cv2.imread(img1)

image_results = retrieve_images("site:" + site + " \"" +  product['Name'] + "\"",IMG_SEARCH_SUBSCRIPTION_KEY)
img2 = retrieve_first_img_url(image_results)
    
comp_img = url_to_image(img2)

Next, we can display (plot) the initial image alongside the image from the designated website.

plot_img("Image Compare" + site,"Original Images",cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB),cv2.cvtColor(comp_img, cv2.COLOR_BGR2RGB),site)

Retrieve Text from Images

The established functions will now utilize Azure Cognitive Service – Computer Vision via the SDK to extract text from the image.

First image, display words from JSON

words1 = retrieve_text_from_img(img1)
for b in words1['recognitionResult']['lines']:
    print(b['text'])

NO - STICK COOKING SPRAY
AN
HE
HIM
PRESER
COLORS
WITH RENEWABLE
OVER PRESSURE
PRECAUTIONS OH
MERLENE

Second image (url), display words from JSON

words2 = retrieve_text_from_url(img2)
for b in words2['recognitionResult']['lines']:
    print(b['text'])

NO- STICK COOKING SPRAY
AN
One
HIM
PRESED
COLORS
WITH ROMANABLE
UNDERPRESSURE
BUTCARSON
MEALONE
IF

Second image (URL) using the V2 API, display words from JSON.

words3 = retrieve_text_from_url_v2(img2)
for b in words3['recognitionResult']['lines']:
    print(b['text'])

NO-STICK COOKING SPRAY
PAM
ORIGINAL
made CANOLA OIL BLEND
PRESER
NO
S3ALLVAX
ARTIFICIAL
COLORS
ERVING SUGGESTION
CAUTION FLAMMABLE
CONTENTS UNDER PRESSURE
HEAD PRECAUTIONS ON
BACK BEFORE USING
NET WT 6 OZ (.

Un-mount the images directory when done.

dbutils.fs.unmount("dbfs:/mnt/images/")

Conclusion

With the software development kit (SDK), capturing text from an image (whether local or from a URL) becomes a straightforward task. It is evident that the OCR engine preview significantly enhances text recognition with the V2 API. Through the utilization of the V2 API, we are able to extract the NET WT information from the Pam product, in addition to other details that are not accessible through the SDK function calls, which rely on the V1 API.

Stochastic Coder

Leave a Reply Cancel reply

Beyond the Alert: Building Self-Healing Pipelines with Azure SRE Agent and GitHub Copilot

Designing Sequential Multi-Agent Pipelines with Microsoft Foundry

Optimizing AI Agents for Scale: Triage, Throttle Control, and Model Right-Sizing