A Code for DC project just automated a source for unreported car crash data. Here’s how technologists did it
Project creator and Women who encode director Charlotte Lee Jackson earlier this week tweeted that the project which collects data on car accidents and compares them with DC Department of Transportation (DDOT) reports, complete an important step. It now automatically collects and refreshes data from multiple sources and places it in an easy-to-read dashboard that compares whether it has appeared in DDOT data or not.
The database starts with a download from an application called Pulse point, which shows what incidents the fire and emergency services reacted to. The problem there, Johnson said, is that the app doesn’t show what kind of traffic collision has occurred. It is then redirected to a site called OpenMHz, created by Luke Berndwhich collects audio for a crash report and converts it to .WAV files. This distinguishes which accidents involved pedestrians or cyclists. The Crash Bot code executes the audio files AWS Transcribe. On top of that, it retrieves information from several Twitter accounts that report local crashes and data sources from Citizen ap to track as many incidents as possible and compare what is recorded in multiple sources.
I’m glad I finally share that we at @CodeforDC have built a pipeline that automatically collects data on 911 calls for crashes of cyclists and bicycles and checks if they occur in @DDOTDC crash data. Here are tracks that were previously invisible: https://t.co/wRzZiWFLra
– Charlotte Lee American Aunt (@cljack) July 7, 2021
Johnson noted that open data sources are a key part of consolidating the project.
“It’s not like scraping encrypted data from a secret source or anything like that,” Johnson said. “It’s all here.”
Specifically, Berndt’s OpenMHz was a key moment for data collection, as there were audio files for all calls made. Bernd, who is based in DC, built the system using software-defined radio, which he said works like a police or fire scanner, but can be tuned to more than one channel so he can hear what the dispatchers are talking about. He uses RTL-SDR a device for finding a radio and an open source program called GNU radio to build a “software version of any radio there.” Some C ++ codes combine everything, he added, to provide the .WAV files.
“Ideally, DDOT will start using this data when they make that infrastructure and policy decisions about what changes to make on our streets.”
“I put it there and then I started with what I shot in DC, but the software is open source and people have just started using it and contributing from all over the country,” Bernd said.
OpenMHz is mostly used by the media, Bernd said, but has spread across the country. Firefighters use it for training, and regular citizens investigate what is happening in their neighborhood.
“[OpenMHz] it started with personal curiosity and I think it has somehow grown – there are a lot of people who just want to be more aware of what’s going on in their community and what’s around them, and to understand what’s going on, “he said.
For Jackson, this meant compiling crash data for more accurate policy decisions. Good Hope Road SE, she said, is an example of a problem point that is often missed by DDOT data. The Crash Bot project showed that DDOT reported three disasters in the area in the last six weeks, when at least five were detected through the project. She hopes that such solid examples can help residents submit traffic safety requests to show urgency, or that DDOT will use the project data itself.
“Ideally, DDOT will start using this data when making these infrastructural and policy decisions about what changes to make on our streets. Because that’s a very different percentage of unreported accidents per ward, “Jackson said. “And the data that DDOT is currently using for East of [Anacostia] river … you can see exactly how incomplete it is. “
Berndt added that uses such as the Code for DC project are an important aspect of OpenMHz that he would not want to lose if he made the site fully encrypted.
“Having access to this data and information, and not just as a boring record, but also the actual calls and the kind of emotions and context around things, is really important for simply telling the civic history around cities and the people who work in them. Said Bernd.