This site is about: (1) my professional self, (2) my research into cognition and (3) musings about the intersection of cognition and design.
Jason H. Wong
Basic cognitive research is a necessary component of successful user-centered design. Only through scientific thinking can we make technology intuitive and productive. My goal is to integrate basic research with useful applications.
Those automated voice menu systems
I had a heck of a time with a voice menu system earlier today. My flight from Chicago O’Hare (ORD) to Washington National (DCA) was canceled. I had to call United first to make sure that it was, and that required me to say my flight number, “616.” The system only heard “16″ and I had to sit through the entire flight information before I could restart and say the correct number. Then, once I confirmed that the flight was canceled, I had to listen to 5 menu options to try to rebook a flight. In my opinion, rebooking is too complicated to be done through a computer - I needed an agent. So I just gave up and started saying “Representative!” Then I had to listen to the system ask for confirmation, then ask me if I wanted to do a customer satisfaction survey. Then it transferred me, then I got a busy signal and hung up on. And then I had to do it again. The entire process took me about 30 minutes. Which gave me plenty of time to consider the content of this post.
First, some background:
The great thing about vision is that information can be presented in parallel: a single screen can display a ridiculous amount of information to absorb. Yes, attention is often a serial process, but the temporal dynamics of attention are under our control (as in, we choose how long we focus on something).
Audition, however, is a serial process. A spoken sentence is presented one word at a time, so you need to focus and remember every word. Also, the speed/loudness/etc. of the spoken information is not under our control.
There are two design choices I noted in United’s system that are necessary evils.
1.) The recorded voice speak very slowly. This enables everyone who calls to be able to understand the speaker. This makes the already slow process of audition even slower, annoying people who wants the speaker to go faster. But it would be so complicated to provide an option somehow to get the voice to speak faster that it likely wouldn’t be possible. This is only an annoyance to some of the population, which is better than leaving a portion of the population completely unable to use the system.
2.) The system allows voice responses along with touchpad responses. This is useful for people because, otherwise, people would have to move the phone away from their face to press buttons. Being able to speak back when spoken to reinforces the idea that you’re having a conversation. However, voice recognition systems are not that great. You have to speak slowly and clearly. If I can already anticipate the question, I will speak my answer. Sometimes, a voice system is not ready to pick that up immediately. Numbers on a phone make a limited number of tones, making recognition much easier. This is likely more of a technological than human factors problem.
These design choices are are coping mechanisms for the big problem with phone menu systems that is virtually unavoidable. You are trying to present multiple options to the user, and the user has to choose only one of them. This cries out for a parallel presentation of each possible option. However, since the phone is auditory, a serial presentation must be used, requiring working memory for which option is #1, which option is #2, and so on. And because users are impatient and don’t want to wait to see if option #7 is best for them, they may end up navigating to the wrong place. I have never come across a phone menu system that gets around this. The best solution I’ve seen is to keep pressing zero, saying “Representative”, or sounding angry. Some systems will automatically forward you to a person if you sound angry. That is pretty cool.
I personally haven’t encountered any good solutions to this problem. Likely, some good solutions that have not been widely implemented are:
1.) Keep instructions short for people who can catch on quickly, but allow for the person to say “Help” in order to get a more thorough explanation.
2.) Really limit the number of options for people to have to remember. Don’t have a menu that has 8 options. Keep the number of options to less than 4, if possible. Users that are patient enough to listen to all of them may have a tough time deciding which option to choose, and impatient users will just choose something just to get the system to stop talking.
3.) Improve voice recognition systems so instead of requiring users to speak a predefined option, allow the user to speak a few key phrases. Then the system could interpret those keywords and direct the user to the right place. It would be akin to searching for the Internet.
Here’s hoping that these design and technological issues are being worked on and implemented quickly! Customer service may actually become bearable, then.
Leave a Reply
You must be logged in to post a comment.