The Effect of Vaccines
Welcome to the latest post of the Viz4Sci Substack! In this post we will learn how to reconstruct one of the most famous visualizations of the effects of vaccination of disease prevalence, originally published by the Wall Street Journal in 2015. We’ll use the original WSJ data and build the entire visualization using nothing but matplotlib.
We hope your enjoy this post and don’t forget to
so you never miss a post and
to help us grow! We’re looking forward to hearing your thoughts, so go ahead and….
In a previous post, we explore the inverse correlation between vaccination rates and deaths. Here we explore the effect that vaccination has on disease prevalence, the number of infected individuals per capita.
The original visualization was published in the Wall Street Journal back on Feb 11th, 2015 and covers 7 different disease/vaccine pairs. For simplicity, we’ll focus on the figure for Measles, a childhood diesease that used to kill up to 2.6 million people per year, before the advent of vaccines in 1963.
As we can see in the WSJ visualization, the introduction of the vaccine had a drastic effect in the number of infections per 100,000 people:
I strongly encourage you to explore the WSJ piece. The original figure has a nice interactive feature that allows you to see the number of cases per 100,000 people represented by each cell.
The WSJ visualization uses a nice JSON file that contains all the relevant data. We can easily download and process it to obtain a clean pandas DataFrame:
A couple of lines of Python are enough to turn this data into a convenient matrix, where each row is a State and each column a Year.
The resulting heatmap is already a reasonable start, although there’s clearly much room for improvement.
Increasing the figure dimensions and adding the line identifying when the vaccine was introduced are just a couple of quick improvements:
that imediately make the message clearer (despite the less than ideal colormap):
We can also add a colorbar to help identify the values in each cell:
Unfortunately, matplotlib produces a colorbar that is much too large, causing the heatmap to be reduced:
We can fix this by using matplotlibs add_axes to add a new Axis object to the figure and defining both its location and size. Then we can make sure that the color bar is added to this second Axis object:
resulting in something much more reasonable.
Next, we add the year and state names as the x and y-axis ticklabels, respectively:
which gets us almost to where we’re trying to get to:
The only final piece we’re missing is to use the right colormap. In order to match the colormap used by the WSJ we must create a custom colormap using LinearSegmentColormap which allows us to associate arbitrary colors to specific values, while automatically interpolating the intermediate colors. LinearSegmentColormap expects the values to be between 0 and 1, so we must also create a Normalize object to automatically map the data values to this rage. The whole process is taken care of by this small code snippet.
Now we can just pass this colormap to the original call to imshow to produce an image with the right color scheme:
The last step is a bit of housekeeping to clean up the colorbar to make sure it has the ticklabels in the right place. The final code to produce the figure is then:
which results in:
An almost perfect reproduction of the original figure. The full code for this post can be found in this gist.
We hope you’ve enjoyed our latest post and thank you for your continued support for Visualization For Science. We would love to hear your thoughts, so, please:
And don’t forget to:
this post to help us grow and:
to get access to the full archive and make sure you can access all future posts as well.