Parsing the Mapzen response data before actually calling the API

Before we make the geocode() function actually talk to Mapzen, let's just focus on a pre-fetched response and extracting the data from it.
This article is part of a sequence.
Project Example: Show Me Nearest Earthquakes
A step-by-step process of creating a project that downloads earthquake data, geocodes a user's location, and displays the nearest locations.

Summary

Contacting the Mapzen Geocoder and returning sensible results is a multi-step process…which we won’t complete just yet. So while nothing in woz.py changes, we’ll be adding code to utils/geocoding.py, including a function that will just “pretend” to get data from Mapzen API.

Once we’re sure we can deal with Mapzen’s data format and extract the good parts from it, then we’ll finally contact Mapzen and have it geocode live locations.

Table of contents

Getting started

This exercise assumes you've completed the previous exercise:

    compciv-2016
    └── projects
        └── show-me-where
            └── woz.py
            └── utils
                └── geocoding.py
                └── __init__.py

Nothing will change in woz.py. But we will be making these changes to geocoding.py

How to turn one problem into 3 separate problems

There's many ways to approach this problem, but I'm going to aim for a somewhat convoluted path, in which the geocoding functionality is split up into 3 separate functions in utils/geocoding.py.

Here are the following changes, and the order we'll implement them, in the utils/geocoding.py file.

  1. Make a fetch_mapzen_response() function that, for now, just returns the raw text data from a canned JSON response.
  2. Make a parse_mapzen_response() function that takes the raw text response from Mapzen (canned or not) and extracts the necessary data and returns a dictionary.

Basically, fetch_mapzen_response() is going to do its thing, which is fetch some canned/stale – but legitimate – data from a given URL. And parse_mapzen_response() just doesn't care where the data came from. Its job is just to extract the geocoded data.

But how does fetch_mapzen_response() send its data to parse_mapzen_response()? That's geocode()'s job. Remember geocode()? It's just been returning a fake dictionary. But now it can do the work of coordinating the fetching and the parsing of geodata.

It's worth keeping in mind that we still aren't talking to Mapzen's API. That's OK. We're moving one step forward – which is working with data that, at one point, did come from Mapzen. In the next lesson, we'll do the relatively trivial step of contacting Mapzen and getting fresh data with every call.

It'll be a little confusing at first. But hopefully you'll see the reasoning and orderliness of writing separate functions that do their own thing.

Fetching and returning a canned response with fetch_mapzen_response()

Again, this will seem a little backwards…but let's not worry about contacting the actual Mapzen service yet, which would involve reading their documentation and also writing the function that reads the credentials file. Which is not that hard, but still…

Go into utils/geocoder.py and add this function – you don't have to write a verbose Docstring, but you should write something:

def fetch_mapzen_response(location):
    """
    `location` is a string that will be passed onto Mapzen API for geocoding

    returns a text string containing JSON-formatted data from Mapzen
    """

No matter what, this function will require the requests library – whether it is fetching from Mapzen directly, or from some other online stashed file. You should add import requests to the top of utils/geocoder.py.

As for what stashed data file it should pull from…why not the one we used in the previous homework assignment?

http://www.compciv.org/files/datadumps/apis/mapzen/search-stanford.json

Here's what the body of the function will look like – nothing different from past assignments in which you had to use requests.get(), though this time, you don't have to write it to disk:

def fetch_mapzen_response(location):
    """
    `location` is a string that will be passed onto Mapzen API for geocoding

    returns a text string containing JSON-formatted data from Mapzen
    """
    # ignore the location string for now
    SAMPLE_DATA_URL = 'http://www.compciv.org/files/datadumps/apis/mapzen/search-stanford.json'
    resp = requests.get(SAMPLE_DATA_URL)
    return resp.text

Try it in ipython. Instead of running python woz.py, just run ipython. Then import the function from utils.geocoding:

>>> from utils.geocoding import fetch_mapzen_response
>>> x = fetch_mapzen_response("whatever it doesn't matter right now")
>>> type(x)
str
>>> len(x) # count number of text characters
6826

You should get a big block of JSON-formatted text – remember that fetch_mapzen_response returns a string.

Let's parse a canned Mapzen JSON response

OK, fetch_mapzen_response() works so far, as far as the current thing we care about – parse_mapzen_response() – is concerned.

Add this skeleton of a function to utils/geocoding.py:

def parse_mapzen_response(txt):
    """
    `txt` is a string containing JSON-formatted text from Mapzen's API

    returns a dictionary containing the useful key/values from the most
       relevant result.
    """

This parse_mapzen_response() function basically does the hard work of the geocode() (remember geocode()? We'll get back to it) function, which is deserializing the txt string into a dictionary, then picking out the first result, then extracting that result's longitude, latitude, confidence, et. al.

At this point, it's worth revisiting the past homework assignment, in which you had to parse that samed canned Mapzen JSON response. The key points were:

Implementing parse_mapzen_response()

It's actually not much different from the homework:

Whether or not results are fetched from Mapzen, we return gdict, which will at least have a status key

Here's one way of doing it; again, this function goes into utils/geocoding.py

def parse_mapzen_response(txt):
    """
    `txt` is a string containing JSON-formatted text from Mapzen's API

    returns a dictionary containing the useful key/values from the most
       relevant result.
    """
    gdict = {} # just initialize a dict for now, with status of None
    data = json.loads(txt)
    if data['features']: # it has at least one feature...
        gdict['status'] = 'OK' 
        feature = data['features'][0] # pick out the first one
        props = feature['properties']  # just for easier reference
        gdict['confidence'] = props['confidence']
        gdict['label'] = props['label']

        # now get the coordinates
        coords = feature['geometry']['coordinates']
        gdict['longitude'] = coords[0]
        gdict['latitude'] = coords[1]
    else:
        gdict['status'] = None

    return gdict

Testing parse_mapzen_response()

That's a big function. Does it even work?

Only one way to find out.

Go into ipython. Then import both the fetch_mapzen_response and parse_mapzen_response functions from the utils.geocoding package.

Remember that fetch_mapzen_response() takes in a location string…but just returns the canned response (i.e. always "Stanford"). But that's OK, pass that response text directly to parse_mapzen_response and see what happens:

>>> from utils.geocoding import fetch_mapzen_response, parse_mapzen_response
>>> rawtext = fetch_mapzen_response("whatever")
>>> georesult = parse_mapzen_response(rawtext)
>>> type(georesult)
dict
>>> print(georesult)

The printed dictionary should be what we expect:

{'status': 'OK', 'latitude': 37.42411, 'label': 'Stanford, Santa Clara County, CA', 'longitude': -122.16608, 'confidence': 0.949}

Once you're satisfied that it works, quit out of iPython and point your editor to utils/geocoding.py.

Putting the pieces together in geocode()

Remember the geocode function inside of utils/geocoding.py?

Remember how it just returned a dictionary full of fake data?

def geocode(location):
    """
    docstring
    """
    mydict = {}
    mydict['query_text'] = location
    mydict['latitude'] = 99
    mydict['longitude'] = -42
    mydict['confidence'] = 0.01
    mydict['label'] = "HA HA JK"
    return mydict

Now it doesn't have to do that anymore. Replace the fake arbitrary values with our…well, less fake response from fetch_mapzen_response and parse_mapzen_response. We still want to return a dictionary, mydict, though, and it should still include the original location as the query_text attribute.

The geocode function, again, takes a location argument…which is mostly ignored by fetch_mapzen_response. But that's OK. As long as rawtext is some kind of Mapzen data response, we'll be able to return the correct kind of object:

def geocode(location):
    """
    that giant docstring
    """
    rawtext = fetch_mapzen_response(location)
    mydict = parse_mapzen_response(rawtext)
    # add the location string to mydict
    mydict['query_string'] = location 
    # return the diccionary
    return mydict

Does the geocode function still work?

You can test this in two ways. Via ipython:

>>> from utils.geocoding import geocode
>>> x = geocode("doesn't matter")
>>> type(x)
dict
>>> print(x)
{'longitude': -122.16608, 'latitude': 37.42411, 'confidence': 0.949, 'query_string': "doesn't matter", 'label': 'Stanford, Santa Clara County, CA', 'status': 'OK'}

And you can run woz.py:

$ python woz.py 
What do you want to do? geocode
What is your location? Anywhere
OK...geocoding: Anywhere
{'latitude': 37.42411, 'status': 'OK', 'longitude': -122.16608, 'query_string': 'Anywhere', 'label': 'Stanford, Santa Clara County, CA', 'confidence': 0.949}
This article is part of a sequence.
Project Example: Show Me Nearest Earthquakes
A step-by-step process of creating a project that downloads earthquake data, geocodes a user's location, and displays the nearest locations.

References and Related Readings

Deserializing the JSON text from the Google Maps and Mapzen geocoding APIs
Before programmatically using an API, we need to study its response.