Open data from transportation networks is improving the ways we understand how people move through their environments.
Adil Yalcin said he was always frustrated that so many data-analysis programs require the user to pick the design of the data visualization. In other words, the users had to decide if they wanted to view the information as a line graph, chart, or other map.
In an effort to simplify the process, the student and member of the Human Computer Interaction Lab (HCIL) at the University of Maryland spent nearly two years creating Keshif, a data-analysis program that automatically chooses the best visualization.
“There is a lot of interest in creating visualizations easily and exploring data easily, but there is still an emphasis on the richness of our tools and offering a lot of design alternatives,” Yalcin said. “Actually all of these alternatives … create a lot of extra complexity.
Originally, the tool was created to analyze any data set but when he stumbled upon the data provided by government transit organizations at a Transportation Techies data hack night, he started to look into what data was available for D.C. It turned out that PlanItMetro’s data spreadsheets were a gold mine.
“ what kind of perspectives can I offer?” Yalcin said. “The answer was immediate from the data structure: How many people got into station A and got into Station B and then look at that on the weekday, the weekend, and so on.”
One finding from his results, presented at last month’s Transportation Techies: In the thousands of trips starting from Gallery Place during midday, Union Station is the most popular destination while Farragut North stands out as the second most popular. What is driving that traffic?
After the event, he spoke with representatives of BikeArlington and created a second visualization for them using data from Capital Bikeshare.
Currently, Keshif is a compilation of visualizations that Yalcin created himself, but the goal is to allow any users the ability to take data sets they are interested in and run them through the tool to achieve the best visualization possible. He said support from his advisors Ben Bederson and Niklas Elmqvist has been key to creating a product that is as easy as possible for the non-computer scientist to utilize.
“To come to the stage where becomes very intuitive and natural, it requires a lot of preparing in advance and also getting feedback from a lot of people,” Yalcin said. “Both professors and professionals in the field as well as people who might be caring about the data but who might not have the visualization or design skills, or even the broader public because that is what I’m trying to appeal to.”
He plans to conduct user studies of Keshif this summer and said he would be more than happy to speak to anyone who would like to participate in the user study or who has data that they want to explore.
And exploration is key. After all, that’s what “keshif” means in Yalcin’s native Turkish.
For more, here are ridership data sets for the D.C. Metro and Capital Bikeshare.
Photo by M.V. Jantzen