
One of the country’s go-to COVID resources comes not from the Centers for Disease Control, but from a web developer based in Schenectady, New York.
As COVID cases surged across the U.S. this summer, Dan Goodspeed found a unique way to visualize the numbers state-by-state over time. Using raw data from The New York Times, he created a program that mapped out cases as a table.
While Goodspeed continued making enhancements to the visualization, he and his audience also noticed a pattern.
“It was really starting to look like the policies of the state governments and the political beliefs of the people were taking over as a significant factor in COVID spread,” Goodspeed said in an email with the Institute.
So he worked on a new chart to reflect cases by state partisanship. This chart has been viewed on Twitter more than 8.5 million times.
Today, Goodspeed has five COVID charts to help conceptualize total cases per million since June 1, 2020.
We reached out to him to learn more about the inspiration behind the charts, the response, and what journalists can glean from the data.
What prompted you to create the charts?
Goodspeed: I was checking the Worldometers chart every day for the latest numbers as it had daily updates on cases/million, but it didn’t have easily accessible historical data so it was tough to see if states were doing better or worse. I wanted something where you could both see how a state is doing historically and how it’s doing compared to others.
My Facebook post from July 5:
“Last weekend I was inspired to try to visualize COVID cases per million state-by-state over time. I saw the NY Times releases detailed daily data, so I wrote a little program to make a big table of all the states and dates with red dots representing the number of new cases that week. It looks like a big mess and I do have a lot more ideas for it, though I doubt I’ll find the time for that with lots of other work on my plate.”
With a link that pointed to what is now at this URL.
It looked like chicken pox, and was horrible on mobile devices, it didn’t even fit on my 27” monitor. But it still at least kind of showed the data how I wanted to see it. A dozen friends liked it and I was asked to keep updating it, which I did for a few days. But with every row I added, it got more and more unruly. I started showing every other day, but realized it was a bad way to show the data. An animation would be better.
I started working on a way to make a slider of dates, and the one row circles would change in size depending on the date. I thought about maybe making the circles the shape of the state it was representing for a better UX. While it wasn’t out of my ability, it would take a really long time, and then I saw a bar chart race animation on the datatisbeautiful subreddit and I knew this was the way to do it. Plus, bar chart races are common enough that there had to be existing code libraries to build off of. A little Googling pointed me to the web app Flourish, who in a lot of ways is the real hero here. They did everything I wanted, for free, and even hosted the data. I just had to format the data into what they were looking for.
The first time I ran it, I was blown away by how cool it looked. Following different states was exciting (I’m from NY but mostly living in Chicago while my wife attends school) so I had multiple horses in the race. I also knew it would be similarly exciting for other Americans to follow their state throughout the course of the year. I shared it to Reddit and Facebook, and it took off even more than I expected, getting hundreds of thousands of views, knocking my server offline for a bit. Plenty of people had suggestions for new charts, as did I. So I redesigned the “COVID” page to handle multiple charts, added donation links, and made a few more charts. None really took off as much as the first “new cases” one though. I think from the three next charts I had five donors chipping in a few dollars each.
I dutifully kept updating the charts (the more there is, the longer it takes), when I started noticing that all of the top states on the “new COVID cases” chart seemed to be what were generally considered to be Republican states. And it seemed to start around the time when states had their act together, the first week of June. June 2-4 every state had less than 1,000 new cases per million. That hadn’t happened since March. And it didn’t happen since.
When the first chart went viral, several people recommended organizing the states by Democrat vs. Republican… but I dismissed it for two reasons — 1) Most states are purple. No state is 100% one or the other. 2) Having spent most of my time looking at the first few months of data, it really didn’t seem like a partisan issue. It was more population density and amount of travel in and out of the area that was the leading factor in COVID spread.
But now it was really starting to look like the policies of the state governments and the political beliefs of the people were taking over as a significant factor in COVID spread. Some Googling pointed me toward the Cook Partisan Voting Index as a source for ranking which way a state leaned. So I punched in the data, chose June 1 as the start date, and let the animation run… expecting to see more red than blue as time went on. I did not expect at all just how drastic the change would be. The first time it ran, I told my wife “This one is going to be big.” I haven’t gasped like that since I saw the first chart run for the first time.
I did my usual posting to Facebook and Reddit. And it only got marginal traffic. It did not take off like the first one, which kind of surprised me. Tens of thousands of hits instead of hundreds of thousands the first day. No Reddit post got more than a few dozen upvotes. But then the second day, usually when the traffic starts going back down, people started sharing the link on Facebook, and even more on the third day, beating out the traffic records from the first chart. A friend messaged me that a video screen grab of my chart was making its way around Twitter. I checked it out, and it was definitely going viral, but with no mention of the source, and it was outdated, as I had been updating the chart every day, and it didn’t have any description of the data, or show any of the states with the fewest cases. So I dusted off my old Twitter account and started replying to the posts, saying people really should check out the original chart, and answering questions/misunderstandings about the chart. There were a lot of them. People started saying I should do my own tweet introducing the video and were convincing enough that I did. I went from 50 followers to 2,500 followers in a day.
What is your process for ingesting the data?
Goodspeed: As far as technically? I have a PHP script that pulls the data from the NYT GitHub account, does some normalization and calculates cases and deaths per million and then saves the data in a SQLite file. Then I have another PHP script that takes the data from the SQLite file and converts them into CSVs readable by Flourish.
Who did you consider the audience for this visualization? What do you hope they will take away from it?
Goodspeed: Originally I just did it for myself because I was kind of the COVID data expert being the guy who checked the stats every day. I just wanted something more digestible. And if it could be something I could share with others… great! The partisan one was a little different… in that the others were just a good way to present facts and data that no one else seemed to be doing. The partisan one was the one where I was showing a clear correlation that I discovered, and was surprised that no one else had shown anything like it before.
What are some of the hidden insights that journalists can learn from these charts?
Goodspeed: Use the data and tools available. I was shocked to see NYT giving out the raw data for free, updating daily. And the Flourish chart-building app is also amazing.
Read into the numbers and find patterns. Here’s a possible scoop — one of the next patterns I’m looking into: There are four states whose most populous city is less than 70,000. The Democratic Vermont and Maine… both of which are doing much better than any other Democratic states on the partisan chart. And the Republican Wyoming and West Virginia. Both of which are doing much better than any other Republican states on the chart. I don’t believe that’s a coincidence. I’m hoping to come up with some way to display that in a meaningful, useful way.
You broke your Twitter hiatus to share the visualizations, which have been very popular on social media. How do you anticipate you’ll use Twitter going forward?
Goodspeed: Well, now the people who follow me are probably more interested in my thoughts on political data, more so than my other hobbies, which would usually be about ultimate frisbee or music. So I guess at least for now, I’ll probably just use Twitter for that. At some point I may have to follow others.
And, finally, how has the response been? What about donations?
Goodspeed: Lots and lots of messages. The donations have been flowing in, between 100 and 200 so far. It’s almost like spam, how often my inbox is full of “You’ve got money!” emails (but it’s the good kind of spam). I’ve been taking down everyone’s email addresses and I hope to let them know first about future charts I make, as they should get something for their generosity.
Wow, I am impressed by how Goodspeed has his own way of looking at the COVID-19 pandemic, which is very interesting. I also came up with an article that has bar chart races for both infections and deaths by the language spoken by the patients. It is kind of impressive too so I want to share it. You can check out the charts at https://www.freelensia.com/en/blog/articles/45.