Recently I had to display healthcare data on a map and color code based on cost. Nothing too difficult, except I did not have the addresses, just the name of the doctor, clinic/hospital and state. I have used Google’s APIs on past ‘experimental’ projects to do some brief experimenting with address lockups, but now I needed to Geocode ~30k records without an address.
What is required?
To get rolling you will need to create a Google developers account: https://developers.google.com/. There are limitations on how many API calls can be made in one 24 hour period for free, 1000-2000 calls. I do believe this may get bumped-up if billing information is supplied; however, I recommend thoroughly reviewing Google’s Terms of Service and Billing details before use.
R Packages
require(jsonlite) require(utils)
Calling Google Place API from R
I am using a data.frame endearingly called drLocations. This contains the doctors’ names and hospital/clinic names that I want to get latitude and longitude for. URLencode is needed from the utils package to call the generated URL. I use the fromJSON function from the jsonlite package to convert the JSON results.
plcUrl <- "https://maps.googleapis.com/maps/api/place/textsearch/json?query= " key <- "yourKEY"drLocations$geometry.location.lat <-0 drLocations$geometry.location.lng <-0 drLocations$address <- "" for(i in 1:nrow(drLocations)) { query <- paste(drLocations$First.Name[i], drLocations$Last.Name[i], drLocations$ HospitalName[i], drLocations$State,sep = "+") strurl <- as.character(paste(plcUrl ,query,"&key=",key,sep="")) setInternet2(TRUE) rd <- fromJSON(URLencode(strurl)) if(rd$status == "OK") { drLocations$geometry.location.lat[i] <- rd$results$geometry$location$lat drLocations$geometry.location.lng[i] <- rd$results$geometry$location$lng drLocations$address[i] <- rd$results$formatted_address } }
First I initialize the variables that will be used inside the data.frame iteration. plcUrl is set to the Places API URL, key must be be set to your assigned API key, otherwise you will get failures. I also initialize geometry.location.lat, geometry.location.lng, and address to their default values. As I iterate through each drLocations record I dynamically construct the Places API URL with the appropriate query for the name, hospital/clinic and state. Using jsonlite I build an R object from the results. If the result status equals “OK” (rd$status == “OK”) then set the results on the drLocations record.
You are probably wondering what is setInternet2(TRUE) being used for. During initial development I could not get R to connect outside of the cooperate network I was working from. In order to get R to utilize the IE proxy settings, I had to bring this line of code in.
Some Possible Issues to Watch-out for
- You will not get a result status of “OK” from the API when you exceed your 24 hr API call threshold.
- It is possible to get back multiple results when your search criteria is too generic. At first I was searching only on doctors’ name and hospital, I received some unexpected results. Adding state, zip, city or anything other details to help narrow down the results will only improve the outcome.
Below are the results rendered in a leaflet map via a Shiny App (possibly a future post).
I hope this small and simple post provides some benefit to anyone needing to obtain lats/longs without proper addresses.
Regards,
Jonathan
Categories: Geospatial, R Code
thanks for your script, it´s helpfull for my project. AT first, it worked well, but now, I´ve got an error message:
Error in open.connection(con, “rb”) :
Timeout was reached: Connection timed out after 10000 milliseconds
Do you have idea how to deal with it?
LikeLike