This site is about: (1) my professional self, (2) my research into cognition and (3) musings about the intersection of cognition and design.
Jason H. Wong
Basic cognitive research is a necessary component of successful user-centered design. Only through scientific thinking can we make technology intuitive and productive. My goal is to integrate basic research with useful applications.
NASCAR: The necessity of top-notch vision.
So you’re driving down a racetrack at 200 mph. Things are flying past you at phenomenal speed, and you need to make sense of it all. Do you need spectacular vision? Sure. But it’s not just acuity (how good your vision where your visual focus is) that matters. What also matters is how well you can process things in the periphery. From an article about Nascar driver Tony Stewart in the New York Times:
For starters, Stewart has superb eyesight — 20/13 in one eye, 20/15 in the other — but it’s not visual acuity that matters so much as a driver’s ability to process everything that drifts into his periphery while he travels at 200 m.p.h. “A driver has to know what’s unfolding in front of him at a rate of a football field a second,” says Dr. Stephen Olvey, a founding fellow of the F.I.A. Institute for Motor Sport Safety.
When there is so much optic flow (a technical term meaning “stuff passing you as you are in motion”) is occurring, it makes sense that you have to be able to deal with something that suddenly appears in your peripheral vision, like another car, debris, or the wall. But not only do you have to detect that event, but you also have to react to it. You need to make a saccadic eye movement to bring the event from the periphery into the fovea - the small portion of central vision where acuity is the best.
A normal person can make a saccade within 250 ms (a rough estimate). That’s one-quarter of a second. 200 mph = 293 feet per second. Therefore, in the time it takes to make a single eye movement, you’ve traveled 73 feet at 200 mph. Add to that the fact that you’re effectively blind during a saccade, and suddenly 73 feet has passed before you know it.
Want to know how good your peripheral vision is? The New York Times article mentioned above has an excellent demonstration to see how good your peripheral vision is and how quickly you can move your eyes And if your performance is not as good as you’d like, you can train like a Nascar champ:
Greg Zipadelli, Stewart’s crew chief, says his driver hones his talent with a popular training tool: PlayStation.
Thanks to John Fedota for the link to the original New York Times article!
Dueling Monitors
A study out of the University of Utah and written up in the Wall Street Journal’s Business Technology Blog showed that bigger monitors led to faster completion of document editing and spreadsheet tasks. There were three screens used: an 18-inch monitor, a 24-inch monitor, and two 20-inch monitors. Versus the 18-inch monitor, people were 52% faster with the 24-inch monitor and 44% faster with the two 20-inch monitors.
Now this is expected - bigger is better. But what I’m interested in is the 6% improvement moving from two 20-inch monitors to a 24-inch monitor. Two 20-inch monitors provide much more screen space, but it’s not just size that matters.
Egly, Driver & Rafal (1994) were the first researchers to show the existence of object-based attention. That is, attention does not just form a spotlight (or zoom lens) that illuminates a particular portion of the visual field. Instead, attention can also mold itself to encompass a specific object, and there is a cost in switching between objects. The methodology they used was particularly ingenious.
The task was simply to detect a block that would appear in one corner of either rectangle (see below). That was it - press a button when you see the block (right most panel). Before the block, though, other things happened. In the second panel, you see that one corner of one object was also cued - it suddenly turned red. Participants did NOT have to respond to the cue - only to the block. So participants started a trial, received a red cue, waited, and then a block would flash. Reaction time was the primary measure; how long it took participants to press a button after the block flashed.
The red cue served to prime attention to a certain location. In the example above, the red cue and the block target were in the same location, and reaction time was fastest. However, sometimes, the block could appear elsewhere. There are two critical conditions:
- The block was at the other end of the same object to where the cue was.
- The block was at the same end of the other object to where the cue was.
What is critical to note is that, in these two conditions, the block and cue are the exact same distance apart. If attention was purely spatial and did not care about objects, reaction times in both conditions should be the same. This was not the case, though. Instead, participants were faster at detecting the block when it was located at the other end of the same object as the cue. The cue brought attention to that location, then attention spread to the entire object. Therefore, when the target appeared on the same object, reaction time was faster. When the target appeared on the other object, attention had to be switched, and this lead to slower reaction times.
So what does this all mean for the research at hand? Two 20-inch monitors are two separate objects. Even if they are both placed perfectly in your field of view, you will have to make eye movements and shift attention between the two monitors. This is going to slow you down more than if you had a single object (a single 24-inch monitor) in front of you. In this case, you don’t have to switch your attention between objects.
This does beg the question, however: what exactly constitutes an object? Two separate physical monitors are certainly an object. But if you have two spreadsheets open and are copying data from one to another, does that count as switching between objects? Would it be better to copy and paste inside one spreadsheet and then make one large copy and paste to the new spreadsheet right at the end? I don’t know the answers to these questions, but they are certainly worthy of research.
Surprise enemies!
In gaming, most levels work by a script: the player passes a certain point, then an enemies pops out of a doorway. However, researchers are using eyetracking technology and our understanding of how eye movements and attention interact to display enemies where they are least likely to be noticed. Even though the eyes are focused in a particular location, attention could be elsewhere. As a great deal of attention capture research has shown, an irrelevant object popping into existence can capture attention and the eyes even if there is no visual focus there.
To learn how to predict where a person’s attention was focused, the pair tested subjects’ reactions to an image suddenly appearing on the computer screen under different circumstances.
The experiments showed two things. First, when someone is looking at a fixed point in a complex part of a scene, they find it harder to divert their attention to a new object. Second, the researchers confirmed previous research suggesting that when looking at a moving object, people tend to focus their attention slightly ahead of it.
Those results were used to design a first-person shoot ‘em up game that could choose to make enemies appear in places where they would be either easy or hard to see. The game tracks a player’s eyes to work out areas they are paying most, and least, attention to.
http://technology.newscientist.com/article.ns?id=dn13264&print=true
Investing at a Glance: Morningstar Style Boxes
Morningstar is an investment analysis firm that is most famous for their in-depth analysis of mutual funds. What makes them relevant here is their fantastic data visualization tools. This company has determined that there are two important dimensions to understand a particular stock: how big the company is and how quickly it’s likely to grow. They’ve taken these two axes and created a nine-square box that has these two dimensions along the x and y axis. It is a useful conceptualization, and they are consistent in using it all across their site.
For example, for each mutual fund they analyze (mutual funds own a bunch of stocks, then you buy a piece of the fund), they can use this box. Glancing at this box immediately tells you what kind of companies you are buying. One key to investing is diversification, so you want to own companies in all the different boxes. Here, you can see where the fund sits on average and also where the majority of stocks fall:
Additionally, they use this style box to visualize market performance at any given time. The Dow Jones Industrial Average that most people follow is made up of 30 stocks, and the gains or losses of each get averaged, and that averaged is what is reported. However, the stock market is not just those 30 companies. Morningstar performs these averages for companies that fall in each of the nine boxes throughout the day and displays them and an indication of market performance. At a glance, you can see whether the entire market is doing well or if just a few areas are bringing up the average:
OK, so what does this have to do with cognition? Excellent visualizations lead to immediate information. Once you understand the axes and boxes, you can get an immediate sense of how the market is doing, what kind of mutual fund you are buying, or a variety of other information. You can even see information displayed through time.
As a final an decidedly older example of data visualization is Charles Joseph Minard’s 1861 thematic map of Napoleon’s march on Moscow. You don’t even need to be able to read the text to understand the graph. The leftmost point is the start of end of Napoleon’s March. The rightmost point is Moscow. The width of the thick line is the number of men he had in his army. Watching that number dwindle as he marches to Moscow and back is astounding. A line graph would have conveyed this information over time as well, but the extra dimension of the data points corresponding to geographic location adds an extra impact and presents additional information that can be taken in with a single glance. It’s a classic in the field and worthy of study.
Numbers are difficult for our brains to intuitively grasp; this is why data needs to be displayed in a chart or graph. When that graph is consistent and is easily readable, an amazing amount of information can be gleaned in a very short amount of time. This field of the psychology of Data Visualization/Graph Comprehension is relatively new, but absolutely Human Factors.
Those automated voice menu systems
I had a heck of a time with a voice menu system earlier today. My flight from Chicago O’Hare (ORD) to Washington National (DCA) was canceled. I had to call United first to make sure that it was, and that required me to say my flight number, “616.” The system only heard “16″ and I had to sit through the entire flight information before I could restart and say the correct number. Then, once I confirmed that the flight was canceled, I had to listen to 5 menu options to try to rebook a flight. In my opinion, rebooking is too complicated to be done through a computer - I needed an agent. So I just gave up and started saying “Representative!” Then I had to listen to the system ask for confirmation, then ask me if I wanted to do a customer satisfaction survey. Then it transferred me, then I got a busy signal and hung up on. And then I had to do it again. The entire process took me about 30 minutes. Which gave me plenty of time to consider the content of this post.
First, some background:
The great thing about vision is that information can be presented in parallel: a single screen can display a ridiculous amount of information to absorb. Yes, attention is often a serial process, but the temporal dynamics of attention are under our control (as in, we choose how long we focus on something).
Audition, however, is a serial process. A spoken sentence is presented one word at a time, so you need to focus and remember every word. Also, the speed/loudness/etc. of the spoken information is not under our control.
There are two design choices I noted in United’s system that are necessary evils.
1.) The recorded voice speak very slowly. This enables everyone who calls to be able to understand the speaker. This makes the already slow process of audition even slower, annoying people who wants the speaker to go faster. But it would be so complicated to provide an option somehow to get the voice to speak faster that it likely wouldn’t be possible. This is only an annoyance to some of the population, which is better than leaving a portion of the population completely unable to use the system.
2.) The system allows voice responses along with touchpad responses. This is useful for people because, otherwise, people would have to move the phone away from their face to press buttons. Being able to speak back when spoken to reinforces the idea that you’re having a conversation. However, voice recognition systems are not that great. You have to speak slowly and clearly. If I can already anticipate the question, I will speak my answer. Sometimes, a voice system is not ready to pick that up immediately. Numbers on a phone make a limited number of tones, making recognition much easier. This is likely more of a technological than human factors problem.
These design choices are are coping mechanisms for the big problem with phone menu systems that is virtually unavoidable. You are trying to present multiple options to the user, and the user has to choose only one of them. This cries out for a parallel presentation of each possible option. However, since the phone is auditory, a serial presentation must be used, requiring working memory for which option is #1, which option is #2, and so on. And because users are impatient and don’t want to wait to see if option #7 is best for them, they may end up navigating to the wrong place. I have never come across a phone menu system that gets around this. The best solution I’ve seen is to keep pressing zero, saying “Representative”, or sounding angry. Some systems will automatically forward you to a person if you sound angry. That is pretty cool.
I personally haven’t encountered any good solutions to this problem. Likely, some good solutions that have not been widely implemented are:
1.) Keep instructions short for people who can catch on quickly, but allow for the person to say “Help” in order to get a more thorough explanation.
2.) Really limit the number of options for people to have to remember. Don’t have a menu that has 8 options. Keep the number of options to less than 4, if possible. Users that are patient enough to listen to all of them may have a tough time deciding which option to choose, and impatient users will just choose something just to get the system to stop talking.
3.) Improve voice recognition systems so instead of requiring users to speak a predefined option, allow the user to speak a few key phrases. Then the system could interpret those keywords and direct the user to the right place. It would be akin to searching for the Internet.
Here’s hoping that these design and technological issues are being worked on and implemented quickly! Customer service may actually become bearable, then.





