An ISE capstone team delivers a proof of concept for more efficient time studies for Starbucks

By: Amy Sprague
May 19, 2022

A tall order

The ISE capstone team working for Starbucks had a very broad assignment – to create a model to help automate time and motion studies for Starbucks through machine learning (ML). Researcher Audrey Slater (ISE ‘21) said, “We knew we were going to use ML, but how we were going to do that and apply it to the Starbucks customer experience was a big question mark.”

This challenge of defining scope and process was not unintentional. Their mentor from Starbucks, Systems Engineering Manager Erik Anderson (ISE ‘04), said, “My approach with the team was somewhat vague deliberately. Every day at Starbucks, I am asked to solve very vague problems, so learning how to navigate that ambiguity to produce something useful is part of the process.”

Time and motion studies

Anderson noted, “The very foundation core of industrial engineering, our subject matter, our expertise, our value, basically everything we do starts at measuring. We measure everything. And while it’s the core of everything, it’s not very technical. But it’s inserting the improvement into the process that taps the higher level engineering toolkit.”

Businesses commonly use time and motion studies to improve efficiency and productivity, but they are labor-intensive, error-prone, and resource-intensive. And with tasks that can take fractions of a second, human observations easily skew data and insert bias. The challenge for the ISE capstone was to see if time and motion studies could be automated leveraging ML to solve all three of those deficiencies.

https://youtu.be/-UtGbLmD4bQ

Caption: Team member Sara Stavaski illustrates a traditional time and motion study, which is observing an action and timing that action with a stopwatch.

Confronting limitations of time, expertise and budget

Slater remembers the initial period of the project as daunting. “We were not that familiar with machine learning, but we knew generally that developing an algorithm for a model from scratch and obtaining and training the system on the data is super time consuming. We would have to be much higher level software engineers to pull that off. And really good ML systems are really expensive, so we needed a better solution.”

The solution they chose was DeepHAR, an open source machine learning model that they could train to identify a targeted barista activity. The team chose to focus on the “lid-to-handoff” motion. This motion covers the time from when the barista puts the lid on a cup and sets it down on the counter for customer pickup. Training the program to identify whether this action was happening or not would help them deliver a proof of concept to Starbucks.

They set up in Tryer Center, Starbucks’ main user experience laboratory. The team recorded videos as four baristas each performed the lid-to-handoff motion twenty times. Then, they added an additional twenty videos of other actions to train their model. Together, these 100 videos produced over 12,000 video frames that had to be manually labeled in a binary system as “lid-to-handoff” or “not lid-to-handoff.”

https://youtu.be/nDQUYu_THbc

Caption: Team member Bill Zhao demonstrates the “lid-to-handoff,” the key activity the team focused on, which lasted from the time the barista picks up a lid to the time the barista sets the cup down on the counter for customer pickup.

https://youtu.be/M7NOLzbXBcc

Caption: The team labeled the frames of the videos as either “Lid to handoff” if the barista was currently performing the lid to hand off motion or “unknown” if the barista had not started the action yet, had already finished the action, or was performing a different action. They used the Microsoft software VoTT to do the labeling.

https://www.youtube.com/watch?v=6JdTrfPjIzs

Caption: The team set up a mock barista Station in Starbucks’ Tryer Center with two Microsoft Kinect DK’s to film baristas. They used the video data with established machine learning algorithms.

“Grande” results

The team reported 95 percent accuracy of the model correctly recognizing the lid-to-handoff motion. On timestamping, they reached 71 percent accuracy, which surpassed their Starbucks target, though the team reports that more training on a larger set of actions would improve performance. Overall, the team succeeded in proving that machine learning can be an effective tool for time and motion studies.

Slater had some final reflections on the capstones valuable lessons. “We were really constrained with time and worried about making the wrong decisions, but at a certain point, you just need to make the decision, move forward, and be confident with it. We followed our timeline and reassessed as things went right or wrong, and we just figured out how to get that final deliverable.” And of that deliverable, Anderson says, “They created something that was just fantastic.”

The researchers of this award-winning 2021 ISE capstone team were Daelyn Bergsman, Will Locatelli, Audrey Slater, Sara Stavaski, Payam Vafadari, and Bill Zhao. Faculty advisers were Patty Buchanan and Prashanth Rajivan.