We will conclude this image processing series by utilizing Azure Cognitive Services to recognize text on the images we have been using in Part 1 and Part 2. Part 1 set-up Azure Databricks and then used OpenCV for image comparison. Part 2 set-up Azure Cognitive Services and then used Bing Search to retrieve an image from the web using a search query.
After this post we will have walked through all of the code to build a solution for processing and comparing images using Azure Databricks and Azure Cognitive Services. You could put them together as follows or modify to fit your requirements.
Getting Started
If you have gone through Part 1 and Part 2 then all dependencies should already be in-place. All Azure resources were added in the previous blogs, cluster was created and all libraries were attached to the cluster.
This post does not require any addition set-up steps.
Azure Databricks: Image Text Recognize Notebook
Create a new Notebook in Azure Databricks for the text recognition code – similar to the steps in Part 1 and Part 2. Attach the Notebook to a running cluster (start a cluster if one is not running).
For your convenience, the following code is also available on GitHub.
A few new libraries are used to convert an image URL into an OpenCV image, to call the Azure Computer Vision SDK and for making API calls.
from skimage import io import simplejson as json import cv2 import matplotlib.pyplot as plt import time import numpy as np from pyspark.sql import * import re from skimage import img_as_float from skimage import img_as_ubyte from skimage.color import rgba2rgb import http.client, urllib.request, urllib.parse, urllib.error, base64 from azure.cognitiveservices.vision.computervision import ComputerVisionAPI from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes from azure.cognitiveservices.search.imagesearch import ImageSearchAPI from azure.cognitiveservices.search.imagesearch.models import ImageType, ImageAspect, ImageInsightModule from msrest.authentication import CognitiveServicesCredentials
products = [{'Name': 'PAM Original Cooking Spray, 6 Ounce', 'File': 'PAM_Original_6_OZ.jpg' }] sites = ['walmart.com','target.com']
IMAGES_FOLDER = "/dbfs/mnt/images/" IMG_SEARCH_SUBSCRIPTION_KEY = "[BingSearchKey]" COMP_VIS_SUBSCRIPTION_KEY = "[ComputerVisionKey]" COMPUTERVISION_LOCATION = os.environ.get("COMPUTERVISION_LOCATION", "[ComputerVisionRegion]")
#%fs #mkdirs "/mnt/images/"
dbutils.fs.mount(source = "wasbs://[container]@[storage-account].blob.core.windows.net/",mount_point = "/mnt/images/",extra_configs = {"fs.azure.account.key.[storage-account].blob.core.windows.net": "[storage-account-key]"})
def retrieve_images(search,key): client = ImageSearchAPI(CognitiveServicesCredentials(key)) try: image_results = client.images.search(query=search) print("Search images for query " + search) return image_results except Exception as err: print("Encountered exception. {}".format(err)) return null
def retrieve_first_img_url(image_results): if image_results.value: first_image_result = image_results.value[0] #grab first image from search results print("Image result count: {}".format(len(image_results.value))) print(first_image_result.content_url) url = first_image_result.content_url #remove extra args from url, just grab upto image url_clean = re.match(r'.*(png|jpg|jpeg)',url,re.M|re.I).group(0) print(url_clean) return url_clean
def url_to_image(url): img = io.imread(url) #read url img = rgba2rgb(img) #remove alpha cv_image =cv2.cvtColor(img_as_ubyte(img), cv2.COLOR_RGB2BGR) #convert from skimage to opencv # return the image return cv_image
def plot_img(figtitle,subtitle,img1,img2,site): fig = plt.figure(figtitle, figsize=(10, 5)) plt.suptitle(subtitle,fontsize=24) ax = fig.add_subplot(1, 2, 1) ax.set_title("Base",fontsize=12) plt.imshow(img1) plt.axis("off") ax = fig.add_subplot(1, 2, 2) ax.set_title(site,fontsize=12) plt.imshow(img2) plt.axis("off") display(plt.show())
def retrieve_text_from_img(img): client = ComputerVisionAPI(COMPUTERVISION_LOCATION, CognitiveServicesCredentials(COMP_VIS_SUBSCRIPTION_KEY)) #raw - returns the direct response alongside the deserialized response with open(os.path.join(IMAGES_FOLDER, img), "rb") as image_stream: txt_analysis2=client.recognize_text_in_stream(image_stream,raw=True) #give Computer Vision some time to process image, could also be a while loop checking status (20s is arbitrary) time.sleep(20) #Operation-Location contains url to results, use it to get the processed JSON results headers = {'Ocp-Apim-Subscription-Key':COMP_VIS_SUBSCRIPTION_KEY} url = txt_analysis2.response.headers['Operation-Location'] return json.loads(requests.get(url, headers=headers).text)
def retrieve_text_from_url(imgurl): client = ComputerVisionAPI(COMPUTERVISION_LOCATION, CognitiveServicesCredentials(COMP_VIS_SUBSCRIPTION_KEY)) txt_analysis2=client.recognize_text(imgurl,raw=True, mode='Printed') #give Computer Vision some time to process image, could also be a while loop checking status (20s is arbitrary) time.sleep(20) #Operation-Location contains url to results, use it to get the processed JSON results headers = {'Ocp-Apim-Subscription-Key':COMP_VIS_SUBSCRIPTION_KEY} url = txt_analysis2.response.headers['Operation-Location'] return json.loads(requests.get(url, headers=headers).text)
This API is currently available in:
- West US – westus.api.cognitive.microsoft.com
- West US 2 – westus2.api.cognitive.microsoft.com
- East US – eastus.api.cognitive.microsoft.com
- East US 2 – eastus2.api.cognitive.microsoft.com
- West Central US – westcentralus.api.cognitive.microsoft.com
- South Central US – southcentralus.api.cognitive.microsoft.com
- West Europe – westeurope.api.cognitive.microsoft.com
- North Europe – northeurope.api.cognitive.microsoft.com
- Southeast Asia – southeastasia.api.cognitive.microsoft.com
- East Asia – eastasia.api.cognitive.microsoft.com
- Australia East – australiaeast.api.cognitive.microsoft.com
- Brazil South – brazilsouth.api.cognitive.microsoft.com
def retrieve_text_from_url_v2(imgurl): headers = { 'Content-Type': 'application/json', 'Ocp-Apim-Subscription-Key': COMP_VIS_SUBSCRIPTION_KEY } #pass in mode and set raw equal to 'true' params = urllib.parse.urlencode({ 'mode': 'Printed', 'raw':'True' }) try: conn = http.client.HTTPSConnection('[ComputerVisionRegion].api.cognitive.microsoft.com') conn.request("POST", "/vision/v2.0/recognizeText?%s" % params, "{'url':'" + imgurl + "'}" , headers) response = conn.getresponse() ol = response.headers.get('Operation-Location') conn.close() except Exception as e: print("[Errno {0}] {1}".format(e.errno, e.strerror)) #give Computer Vision some time to process image, could also be a while loop checking status (30s is arbitrary) time.sleep(30) #clear parms params = urllib.parse.urlencode({}) try: conn = http.client.HTTPSConnection('[ComputerVisionRegion].api.cognitive.microsoft.com') conn.request("GET", "/vision/v2.0/textOperations/" + ol.split('/')[-1] + "/?%s" % params, "" , headers) response = conn.getresponse() data = response.read() conn.close() except Exception as e: print("[Errno {0}] {1}".format(e.errno, e.strerror)) return json.loads(data)
#Use the first product and first site product = products[0] site = sites[0] print("Product: " + product['Name'] + "\nSite: " + site)
print(product['Name'] + ":" + site) img1 = IMAGES_FOLDER + product['File'] orig_img = cv2.imread(img1) image_results = retrieve_images("site:" + site + " \"" + product['Name'] + "\"",IMG_SEARCH_SUBSCRIPTION_KEY) img2 = retrieve_first_img_url(image_results) comp_img = url_to_image(img2)
plot_img("Image Compare" + site,"Original Images",cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB),cv2.cvtColor(comp_img, cv2.COLOR_BGR2RGB),site)
words1 = retrieve_text_from_img(img1) for b in words1['recognitionResult']['lines']: print(b['text'])
NO – STICK COOKING SPRAY
AN
HE
HIM
PRESER
COLORS
WITH RENEWABLE
OVER PRESSURE
PRECAUTIONS OH
MERLENE
words2 = retrieve_text_from_url(img2) for b in words2['recognitionResult']['lines']: print(b['text'])
NO- STICK COOKING SPRAY
AN
One
HIM
PRESED
COLORS
WITH ROMANABLE
UNDERPRESSURE
BUTCARSON
MEALONE
IF
words3 = retrieve_text_from_url_v2(img2) for b in words3['recognitionResult']['lines']: print(b['text'])
PAM
ORIGINAL
made CANOLA OIL BLEND
PRESER
NO
S3ALLVAX
ARTIFICIAL
COLORS
ERVING SUGGESTION
CAUTION FLAMMABLE
CONTENTS UNDER PRESSURE
HEAD PRECAUTIONS ON
BACK BEFORE USING
NET WT 6 OZ (.
dbutils.fs.unmount("dbfs:/mnt/images/")
Conclusion
With the SDK it is very easy to capture the text on an image (local or url). As you can see the OCR engine preview with the V2 API does a much better job at recognizing text. Using the V2 API we get the NET WT on the Pam product along with addition details not captured by the SDK function calls, which are using the V1 API.
Categories: AI, Azure Cognitive Services, Databricks, Python