Dissecting the Mapzen Search API response

A detailed examination of the data structured returned by Mapzen Search API.
This article is part of a sequence.
Table of contents

Here's the endpoint for getting the 3 most relevant geocoded results for the address string, "450 Serra Mall, Stanford, CA":

(note that you need to replace YOUR-KEY-HERE with your actual Mapzen API key)

https://search.mapzen.com/v1/search?api_key=YOUR-KEY-HERE&size=3&text=450+Serra+Mall,+Stanford,+CA

You can see the full response as I've downloaded and cached it here:

http://www.compciv.org/files/datadumps/apis/mapzen/search-450-serra-mall.json

http://maps.googleapis.com/maps/api/staticmap?size=800x350&markers=label:1|37.428554,-122.17106&markers=label:2|37.427478,-122.170321&markers=label:3|37.430134,-122.171492

Google Static Map of Mapzen Search API’s 3 results returned for 450 Serra Mall, Stanford, CA

The full response

{
  "geocoding": {...},
  "type": "FeatureCollection",
  "features": [...],
  "bbox": [...],
}

The geocoding object

{
  "geocoding": {...},  //<<---------------//
  "type": "FeatureCollection",
  "features": [...],
  "bbox": [...],
}

The "geocoding" key points to this object:

{
  "version": "0.1",
  "attribution": "https://search.mapzen.com/v1/attribution",
  "query": {
    "text": "450 Serra Mall, Stanford, CA",
    "parsed_text": {
      "name": "450 Serra Mall",
      "number": "450",
      "street": "Serra Mall",
      "state": "CA",
      "regions": [
        "Stanford"
      ],
      "admin_parts": "Stanford, CA"
    },
    "size": 3,
    "private": false,
    "querySize": 1
  },
  "engine": {
    "name": "Pelias",
    "author": "Mapzen",
    "version": "1.0"
  },
  "timestamp": 1455400661088
}

The geocoding.query

This contains information about the query that was sent to the Mapzen API. The most useful object is parsed_text, which takes the original text value – i.e.

"450 Serra Mall, Stanford, CA"

– and shows how the Mapzen API interpreted it:

{
  "name": "450 Serra Mall",
  "number": "450",
  "street": "Serra Mall",
  "state": "CA",
  "regions": [
    "Stanford"
  ]
}

The type string

This is a simple string describing the nature of the data in the response. For all of our searches, this will be "FeatureCollection.

The features collection

The features key points to the data we care about: the geocoded location results from the Mapzen Search servers.

This comes as a collection of objects (i.e. a list of dictionaries, in Python terms). Each object looks like this:

{
      "type": "Feature",
      "properties": {
        "id": "poi-address-osmnode-2438059060",
        "gid": "osm:address:poi-address-osmnode-2438059060",
        "layer": "address",
        "source": "osm",
        "name": "450 Serra Mall",
        "housenumber": "450",
        "street": "Serra Mall",
        "postalcode": "94305",
        "country_a": "USA",
        "country": "United States",
        "region": "California",
        "region_a": "CA",
        "county": "Santa Clara County",
        "locality": "Stanford",
        "neighbourhood": "Oak Creek",
        "confidence": 0.6,
        "label": "450 Serra Mall, Stanford, CA"
      },
      "geometry": {
        "type": "Point",
        "coordinates": [
          -122.17106,
          37.428554
        ]
      }
}

Each object has two objects of interest:

Determining reliability of results with the properties.confidence score

One of the most important attributes in the properties object is: confidence. By default, Mapzen Search returns results in descending order of confidence.

For the example data file, the first confidence value (and the corresponding label for the location) looks like this:

        "confidence": 0.943,
        "label": "450 Serra Mall, Stanford, CA"
        "confidence": 0.903,
        "label": "450 Serra Mall, Stanford, CA"
        "confidence": 0.604,
        "label": "385 Serra Mall, Stanford, CA"

The third result, which has a label of "385 Serra Mall, Stanford, CA" is clearly not what we want in this situation, though apparently Mapzen feels obligated to provide a third-most relevant result since I specified in my query that I wanted 3 results…But what's the difference between the first and second result, which have confidence values of 0.943 and 0.903 respectively?

The first result seems to be more specific and complete; for example, it contains a postalcode. Here's what the 3 results look like on a map:

http://maps.googleapis.com/maps/api/staticmap?size=800x350&markers=label:1|37.428554,-122.17106&markers=label:2|37.427478,-122.170321&markers=label:3|37.430134,-122.171492

Google Static Map of Mapzen Search API’s 3 results returned for 450 Serra Mall, Stanford, CA

The bbox collection

As geocoded queries can return more than one "feature", the bbox list serves as a convenient way to determine the bounding box that contains all of the returned points.

The bbox object is a collection of 4 coordinates:

[
    -122.171492,
    37.427478,
    -122.170321,
    37.430134
]

It's just a simple list, so we can't refer to the values by human-friendly keys, e.g. latitude, longitude. But here's what each value is, based on its position in this list:

index (0-based) value
0 longitude of the left-most point
1 latitude of top-most point
2 longitude of the right-most point
3 latitude of bottom-most point

Another way to think of this is that values at 0 and 1 constitute the top-left coordinate pair of the bounding box. And 2 and 3 constitute the bottom-right coordinate pair.

Note that when only 1 result/feature is returned, both coordinate pairs will be identical.

http://maps.googleapis.com/maps/api/staticmap?size=800x350&markers=label:1|37.428554,-122.17106&markers=label:2|37.427478,-122.170321&markers=label:3|37.430134,-122.171492&path=color%3ablue|weight:3|37.427478,-122.171492|37.427478,-122.170321|37.430134,-122.170321|37.430134,-122.171492|37.427478,-122.171492

Google Static Map of Mapzen Search API’s 3 results returned for 450 Serra Mall, Stanford, CA

This article is part of a sequence.