Skip to main content

News and events

Your bus ETA courtesy of machine learning

By Amy Sprague
October 21, 2020

Starting small

city bus

Photo by SounderBruce from Seattle, United States / CC BY-SA

Your future self could have an easier time catching a bus, thanks to an ISE capstone. To city bus riders, the estimates of bus arrival times are very important. An average of 23 percent of King County Metro buses arrive behind schedule, and research shows that two-thirds of potential riders are more likely to choose the bus if they can access real-time updates.

King County Metro knows this well and the open-source app OneBusAway, the go-to tool for riders to get updated bus statuses, relies on the agency’s data as a main source of data. While the app works well, King County Metro connected with an ISE capstone team to explore using machine learning to automate and boost the accuracy of arrival predictions. And, according to King County Metro, the team delivered a convincing proof of concept.

The current system of bus arrival prediction is a straightforward method using the distance and estimated speed of the bus on various segments of the route to estimate arrival times. The ISE team suspected they could do better than that by using a machine learning program incorporating more variables that affect bus routes and clustering the route sections with an individualized algorithm for each cluster.

ISE capstone team member Vicky Tseng (ISE ’20) noted that as a first step in the process toward proving the concept, they decided to focus on one of the city’s “Rapid Ride” bus routes that provided extensive trip data for analysis, essentially stop arrival times combined with GPS coordinates and a tally of passengers entering at each stop. Focusing on one route would be enough to try out the variables and tweak their methodology and establish the promise of using machine learning for city transit.

Selecting and adapting the right data

graph chart

The team applied Filter Based Feature Selection, which determined that passenger load (the number of people on the bus) affects the bus arrival time more than any other variable.

While King County Metro’s data was crucial, the team suspected other variables that were not included in that dataset were important too. Conveniently, team member Aman Ankit (ISE ’20) had completed an internship with the Seattle Department of Transportation (SDOT) and remembered that the agency provided public access to road data, including road surface, width, speed limits, slopes, traffic volume, and more.

This raw data, both from King County Metro and SDOT,  required substantial “cleaning up” before input into the machine learning software. Ankit spearheaded that effort by creating a program in computer language Python to automate the conversion of data into a format easily digestible by various machine learning algorithms.

From both data sources, the team considered a total of 67 variables, then played with the data a bit to analyze the results. Using a method called “Filter-Based Feature Selection,” they brought the number of variables down to fifteen. They were also able to discern that the most impactful variable for bus arrival was passenger load, that is, how many people were on the bus. 

Next, the team divided the bus route into clusters primarily determined by the passenger load and time of day. They tested different numbers of clusters to understand what size resulted in the most accurate predictions.

Tseng says, “We ran our data through the program with our new variables, and we could then compare past predictions of bus arrivals with the actual arrivals. Our simulations scored a 74 percent improvement in accuracy over the current prediction system.”

Set up for future work

Tseng noted that this project proved the promise in these methods and their work is prepared and ready for use and expansion by a future team. The Python program the team developed is designed to be handed off as well. The project could expand to look at more bus routes and test the methodology.

King County Metro’s Dr. Yi Mi agrees and shares the reaction of the agency’s managers: “No one expected the students to prove the concept in just one project. We were impressed. The students validated machine learning as a promising approach to provide our riders with more accurate information. The team established a framework and methodology that we would like to see advanced in the future.”

This ISE capstone team included Vicky Tseng, Aman Ankit, Joey Moran, Brad Luong, Gol-Dann Slater, Muse Wu, and Kai Tan with adviser Patty Buchanan. To see more about the methodologies and results of this project, check out the team’s final project poster.