Creating awesome timelapses in Python using GeoPandas and ImageIO
5 min read

Creating awesome timelapses in Python using GeoPandas and ImageIO

Some time ago, I showed you how easy and fun it is to visualise geospatial data using GeoPandas.

I already teased that I'd also do a tutorial on creating timelapses using GeoPandas and ImageIO, and well... Here it is!

Today, we'll use the same 311 calls dataset as in the previous GeoPandas tutorial to create a timelapse of the number of calls per neighborhood over a period of 2 months.

Let's dive right in!

Importing our requirements

import pandas as pd
import numpy as np

import geopandas

from google.cloud.bigquery import Client

import imageio

import matplotlib.pyplot as plt
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas

Note, I'm using JupyterLab to run this tutorial. If you want to follow along, I'd recommend you do that as well. :)

Loading in our dataset

query = """
SELECT
  CAST(created_date AS date) AS incident_date,
  neighborhood,
  COUNT(*) AS calls
FROM
  `bigquery-public-data.san_francisco_311.311_service_requests`
WHERE  CAST(created_date AS date) between '2020-03-01' and '2020-05-01'
GROUP BY
  CAST(created_date AS date),
  neighborhood
"""

This query retrieves all calls between the 1st of March 2020 and the 1st of May 2020 (inclusive) and groups them by day and neighborhood.

client = Client(project='<your-project-here>')
df_calls = client.query(query).to_dataframe()

We're using the Google Cloud Bigquery client to retrieve the data and convert it to a Pandas dataframe right away.

The data looks like this:

Loading in the Shapefile

If you haven't followed along with the previous tutorial, the shapefile can be found here.

sf_map = geopandas.read_file('./data/SF_Find_Neighborhoods/geo_export_3d23e117-9bb1-47c4-a56b-c999603eef2d.shp')
sf_map.plot(figsize=(10,7));

Merging the datasets

So now we have the data (df_calls ) as well as the map to plot the data onto (sf_map). Let's merge these two dataframes together, to make working with both a little easier:

geo_df = sf_map.merge(df_calls, 
                      left_on='name', 
                      right_on='neighborhood', 
                      how='left')
                      
geo_df.head(3)

Preprocessing the data

geo_df.fillna({'calls': 0}, inplace=True)
geo_df['incident_date'] = pd.to_datetime(geo_df['incident_date'])
geo_df.drop(columns=['link', 'neighborhood'], inplace=True)

The data is still quite raw, so we will need to do some preprocessing before we can create our timelapse.

First, we fill all null values in the calls column. These nulls mean that these rows don't have any calls for that given neighborhood and date, so we replace all of these nulls with 0.

Then, we convert the incident_date to a datetime.

Finally, we drop the link and neighborhood columns as we don't need these anymore.

As a final sanity check, we run the following to get any leftover null values:

geo_df.isna().sum()

This returns:

We see that the incident_date column still has two null values. Having no incident date is a problem, since we want to create a timelapse per day, having no date to base a row on will make our lives difficult. So we will drop these rows.

geo_df = geo_df[~geo_df['incident_date'].isna()]

Creating the timelapse

Now, on to the really fun part: the timelapse!

We first sort our DataFrame and create a range of days to loop over.

sorted_geo_df = geo_df.sort_values(by='incident_date')
days = pd.date_range('2020-03-01', '2020-05-01', freq='D')

Then, we create a Timelapse class:

class Timelapse:
    
    def create(self, df, date_range, gif_name):
        figs = []
        for day in date_range:
            df_sub = df[df['incident_date'] == day]
            fig = self._create_plot(df_sub, day)
            figs.append(fig)

        self._save_to_gif(figs, gif_name)
            
    def _create_plot(self, df, day):
        fig, ax = plt.subplots(figsize=(10,10))
        fig.set_dpi(300) # for higher quality plots
        
        df['geometry'].plot(ax=ax, 
                            zorder=1, 
                            alpha=.95, 
                            cmap='YlOrRd')
        df[['geometry', 'calls']].plot(column = 'calls', 
                                       ax=ax, 
                                       cmap = 'YlOrRd',
                                       legend = True)
        
        ax.set_title(f'San Francisco 311 Call Counts per Neighborhood on {day.date()}', 
                     size=15);
        plt.close()
        img = self._fig_to_img_array(fig)
        return img     
    
    def _fig_to_img_array(self, fig):
        canvas = FigureCanvas(fig)
        canvas.draw()
        width, height = fig.get_size_inches() * fig.get_dpi()
        img = np.frombuffer(canvas.tostring_rgb(), 
                            dtype='uint8')
        img = img.reshape((int(height), int(width), 3))
        return img

    def _save_to_gif(self, figures, gif_name):
        imageio.mimsave(gif_name, figures, duration=1)

We can run the class like this:

gif_name = '311_calls_san_francisco_2020_daily.gif'
timelapse = Timelapse()
timelapse.create(sorted_geo_df, days, gif_name)

This is a lot to take in. So we'll go over it step by step:

  1. We initialise the Timelapse class
  2. We then call the create() method with our dataframe, the days to loop over, and the name of the gif. This will create our timelapse and save it as a gif.

The .create() method in more detail

def create(self, df, date_range, gif_name):
    figs = []
    for day in date_range:
        df_sub = df[df['incident_date'] == day]
        fig = self._create_plot(df_sub, day)
        figs.append(fig)

    self._save_to_gif(figs, gif_name)
  1. We first instantiate a figs list, to hold the arrays of our plotted figures.
  2. Then we start looping over our date_range and do the following:
    i.)  We subset the data to only include data from that day
    ii.) We create a plot from that subset and we convert the figure to a Numpy array because that's what ImageIO likes to work with (otherwise you'd have to store it as a file first and then read it in as an array). Note, that this procedure is not super well-documented. This discussion was helpful to me in finding out how to convert the Matplotlib figure directly to a numpy 2D array.
    iii.) Finally, we store that array in our figs list.
  3. Once, we have all of our figures, we create a gif out of them. Using ImageIO's mimsave method, which expects figures to be a list of 2D arrays, one per image, we save the gif with a duration of 1 second between frames.

You can of course completely customise the time between frames to your own wishes. I prefer at least a second of time between frames so it's easier to see what's going on in the frames themselves.

The result

Without further ado, here's the result.

Note that we're displaying the absolute counts per neighborhood per day in this timelapse. Depending on the use case you may want to normalize the counts relative to the population (e.g. if you want to compare neighborhoods to buy a condo in and you don't want to live in a neighborhood with higher than average 311 calls).

If you simply need to know which area has the highest number of 311 calls (e.g. to determine which neighborhoods need more 311 responders), the absolute counts are what you need.

Now, obviously this is just a simple example. You can do so much more to improve it.

Small things that immediately come to mind: you could make the legend static, so it doesn't change across frames or you could highlight the neighborhoods with the highest number of calls even more, by increasing the contrast or adding annotations.

It's your turn now, go and make some cool timelapses! And if you do, don't forget to tag me on Twitter! I'd love to see what you come up with!

Let's keep in touch! 📫

If you would like to be notified whenever I post a new article, you can sign up for my email newsletter here.

If you have any comments, questions or want to collaborate, please email me at lucy@lucytalksdata.com or drop me a message on Twitter.