In Part 1 of Image Processing on Azure Databricks we looked at using OpenCV to SSIM compare two images stored in an Azure Storage Account. Here in Part 2 we are going to start making this process less static by introducing Azure Cognitive Services to help find images on the web to compare our base image(s) against. In part 3 we will then utilize Azure Cognitive Services to retrieve text from the images.
Part 1 guides you through setting-up your first cluster in Azure Databricks; if you don’t have an Azure Databricks cluster running please check out Part 1.
Getting Started, Azure Cognitive Services
Azure Cognitive Services are a set of SDKs and APIs to make your solutions more intelligent. Vision, speech, search, and other intelligent capabilities are available to your solutions with a simple SDK or API call. Yes it is fun to write your own text reignition solution in R or Python, but honestly, this is a powerful solution and a huge (HUGE) time save. In this blog series we will be looking at Bing Search to retrieve images from the web and Computer Vision to retrieve text from our images. I have written web scrapping solutions in both R and Python, Bing Search (not just images) could very easily minimize the need for custom web scrapping solutions.
Add Bing Search Cognitive Service
Go to the Microsoft Azure Portal and add a new Resource (as some steps were done in Part 1, only unique steps will be address here).
Search for “bing search”.
Select Bing Search and click Create.
Fill out the required fields on the Bing Search Create blade. F0 is the free tier and can be used for examples in this blog series. Touch price to beat for this solution.
We are going to use the Image Search capability and our code “as is” does not make more than 3 calls per second – F0 ($0.00 /1000 Calls) will be fine for this project. As you can see there are some cool features such as Spell Check and Autosuggest.. Possible future blog post material I think..
After filling out the required information and clicking Create, you will have the Bing Search Cognitive Services in your specified resource group ( named differently)
Let’s do the same steps for Computer Vision. Click add Resource, search for Computer Vision.
Enter the required information into the Computer Vision blade, again F0 is a good pricing tier for this solution.
Cognitive Services resources in Resource Group (names will not be the same).
Azure Databricks: Image Search Notebook
In Part 1 we created a cluster, this cluster should be running so we can attach new libraries and a new Notebook to it. If it is not running, please start the cluster now.
Libraries
We will need to bring in a few libraries to use Cognitive Service in our Notebooks (scikit-images and opencv-python were added in Part 1). With your cluster running go to the main page and click Library under New.
Change the Language to Upload Python Egg or PyPi, then in the PyPi Name type: azure-cognitiveservices-search-imagesearch.
Click Install Library.
Do the same for: azure-cognitiveservices-vision-computervision
View Libraries
By clicking on your running cluster (after navigating to clusters) you can navigate to Libraries to view the loaded libraries.
Please ensure both of the Cognitive Services libraries have a ‘Loaded’ status.
Add a new Notebook
With the cluster running and libraries added we are finally ready to create a new Notebook. Back on the main landing page under New, click Notebook (you can name it what you want).
For your convenience the code below is also available in Image_Search.py on GitHub.
First we must bring in the libraries used in this code.
import cv2 import matplotlib.pyplot as plt import re from skimage import img_as_ubyte from skimage.color import rgba2rgb from azure.cognitiveservices.search.imagesearch import ImageSearchAPI from azure.cognitiveservices.search.imagesearch.models import ImageType, ImageAspect, ImageInsightModule from msrest.authentication import CognitiveServicesCredentials
products = [{'Name': 'PAM Original 6 OZ', 'File': 'PAM_Original_6_OZ.jpg' }] sites = ['walmart.com','target.com']

IMAGES_FOLDER = "/dbfs/mnt/images/" IMG_SEARCH_SUBSCRIPTION_KEY = "[BingSearchKey]"
%fs mkdirs "/mnt/images/"
dbutils.fs.mount(source = "wasbs://[container]@[storage-account].blob.core.windows.net/",mount_point = "/mnt/images/",extra_configs = {"fs.azure.account.key.[storage-account].blob.core.windows.net": "[storage-account-key]"})
def retrieve_images(search,key): client = ImageSearchAPI(CognitiveServicesCredentials(key)) try: image_results = client.images.search(query=search) print("Search images for query " + search) return image_results except Exception as err: print("Encountered exception. {}".format(err)) return null
def retrieve_first_img_url(image_results): if image_results.value: first_image_result = image_results.value[0] #grab first image from search results print("Image result count: {}".format(len(image_results.value))) print(first_image_result.content_url) url = first_image_result.content_url #remove extra args from url, just grab upto image url_clean = re.match(r'.*(png|jpg|jpeg)',url,re.M|re.I).group(0) print(url_clean) return url_clean
def url_to_image(url): img = io.imread(url) #read url img = rgba2rgb(img) #remove alpha cv_image =cv2.cvtColor(img_as_ubyte(img), cv2.COLOR_RGB2BGR) #convert from skimage to opencv # return the image return cv_image
def plot_img(figtitle,subtitle,img1,img2,site): fig = plt.figure(figtitle, figsize=(10, 5)) plt.suptitle(subtitle,fontsize=24) ax = fig.add_subplot(1, 2, 1) ax.set_title("Base",fontsize=12) plt.imshow(img1) plt.axis("off") ax = fig.add_subplot(1, 2, 2) ax.set_title(site,fontsize=12) plt.imshow(img2) plt.axis("off") display(plt.show())
#Use the first product and first site product = products[0] site = sites[0] print("Product: " + product['Name'] + "\nSite: " + site)
print(product['Name'] + ":" + site) img1 = IMAGES_FOLDER + product['File'] orig_img = cv2.imread(img1) # query = "site: website.com search product string image_results = retrieve_images("site: " + site + " " + product['Name'],IMG_SEARCH_SUBSCRIPTION_KEY) img2 = retrieve_first_img_url(image_results) comp_img = url_to_image(img2)
Plot the images, Base image on left the image for site, with site name, on right.
plot_img("Image Compare" + site,"Original Images",cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB),cv2.cvtColor(comp_img, cv2.COLOR_BGR2RGB),site)
Unmount the Azure Databricks directory.
dbutils.fs.unmount("dbfs:/mnt/images/")
In Part 3 we will complete this blog series by implementing the Computer Vision API and retrieving text from our images.
Categories: AI, Azure Cognitive Services, Databricks, Python