Visualize Geographic Data on Python using Scatter_mapbox

Suman Gautam
Analytics Vidhya
Published in
4 min readMar 31, 2021

--

Photo credit: Image by Author (Glacier National Park)

We spend a lot of time in data visualization in any Data Science related project. Often, these includes scatter plots, histograms, boxplot etc. But ever wondered if we were to visualize these data in a geographical context and perhaps that might help to draw some spatial relationships. If we are lucky to have latitude and longitude information, we can create such plots relatively quickly using existing libraries such as geopandas, plotly etc.

In this article, I would like to discuss about an amazingly simple and useful tool called scatter_mapboxthat utilizes latitude and longitude information to plot point set data into the map. This exercise was performed as a part of exploratory data analysis for a Linear Regression project using a publicly available housing dataset from King county area, Washington.

We will begin by importing the necessary libraries:

In a nutshell, the code snippet below provides the essential functionality, parameters for this tool.

The example below shows the house locations on the map. One obvious thing we could do is to color code each point which will then allow us to quickly visualize the spatial distribution of any features. As for example, we can use ‘price’ as the color attributes to locate areas of higher housing price.

House location color-coded with ‘price’ feature

Can you notice a pattern on house price distribution from the map above ?

Yes, one can observe that some high price houses are concentrated around the northern and central region.

Note the color scale in the legend is continuous. This is true for any numeric dataset. If we have categorical data, we will get discrete legend as in the example below.

House Location color coded with ‘condition’ feature

The scatter_mapbox parameters are self-explanatory. There are few neat features in this library. One of them I found especially useful is the ‘hover_data’ parameter. We can pass as many features as possible in this parameter that is available in our pandas dataframe and get information on individual datapoint instantly. We can also play with ‘size’ parameter to reflect the values of the data points, i.e., bigger symbol for high values.

House location color-coded with ‘price’ feature (also the size parameter increased to 20)

There is one nice feature which allows you to update parameters outside of the the function. Notice, we have ‘fig.update_layout()’ command at the bottom. This can be put inside the px.scatter_mapbox() function or outside. For example:

Finally, there is one limitation with this toolbox, which is the lack of accessibility to a wide range of maps. Currently, there are only handful of maps that is accessible to anyone. There are some neat base maps available which will require created a personalized token. But for the most part, the “open-street-map” is the best I found useful for my purpose, but it still is not visually appealing.

Summary

The scatter_mapbox is extremely useful tool when visualizing data with geographical coordinates. This can also be used in conjunction with ‘geopandas’ library which is another whole chapter to discuss. Using this tool has allowed me to enhance my feature engineering for the Linear Regression model. For example, from the house price distribution map, I was able to extract a “distance from the city center” feature as one of the new features which was helpful in improving the predictability of the model.

Reference: https://plotly.com/python/mapbox-layers/

--

--