Evaluating design prototypes in empirical usability tests with human users takes a lot of time, organization and money. What if we could simulate the operation of an interactive prototype using a computer? We at Ergsosign asked ourselves these and other questions from the field of model-based usability evaluation as part of the SiBed research project.
Sven, what does SiBed stand for?
Sven: SiBed is short for “simulating the operation of user interfaces based on a cognitive model for the predictive analysis of usability”. To put it simply, Dieter Wallach, Vladimir Albach and I are trying to develop a software that can operate an interactive design prototype just like a routine human user would. We can use this simulation software to predict significant usability attributes of the prototype being tested.
How have design prototypes been evaluated up to now?
Sven: Currently, design prototypes are mostly evaluated using analytical approaches such as expert reviews and heuristic analyses. Another common way of testing prototypes is the empirical approach, which may involve representative users as part of usability tests. Within the SiBed research project, we at Ergosign are investigating the possibility of a third, previously seldom used analysis category: model-based evaluation.
What exactly is „model-based evaluation“?
Sven: It comprises analytical procedures that use scientific models of user behavior to predict the usability of a system. These models can range from specific partial performance such as predicting the duration of a targeted mouse movement (Fitts’ Law) to very complex issues like modelling human learning.
Can you give us a few examples of model-based evaluation?
Sven: Probably the best-known example of model-based evaluation is the keystroke level model (KLM) by Card, Moran and Newell (1983). That’s the simplest version of a whole range of GOMS technologies. The abbreviation GOMS stands for the four main components of any such analysis: goals, operators, methods and selection rules. In a GOMS analysis, the interaction is analyzed in terms of the underlying goals of a hypothetical user. These goals are then split up into individual user interactions (operators) that are necessary to reach these goals. Methods are derived from the various operator sequences for the same purpose. Selection rules then establish which methods should be used in which circumstances. Within the GOMS family, the KLM option is restricted to the specification of specific operators in the order of keystrokes.
This technique can be used to predict the performance of user interfaces - i.e. we can make statements about the operating efficiency of a user interface. Designers can use a KLM model to break up a certain task into smaller steps for which GOMS/KLM then provides empirically established time values. The sum of these times then gives you the duration that a routine user would need to carry out the task without error. The users aren’t machines, but human beings that make mistakes. That’s why we as UX Designers are interested in how long it takes a “real” user to operate the interface and what mistakes could be made. This is the only way to optimize UX design to really reflect user needs.
That sounds exciting... and how are you solving that with SiBed?
Sven: A GOMS analysis must be carried out manually, which is inconvenient. The program CogTool is already making things easier. Using a graphic interface, it offers users the chance to demonstrate click paths. But screens have to be reconstructed and annotated in this program, generating a significant workload. We want to seamlessly integrate model-based evaluation into the design processes - without generating a load of extra work. So we developed a prototype plug-in for Antetype - our own prototyping tool - and called it Antetype P/M. With just a few adjustments, this kind of interactive Antetype prototype can be evaluated automatically.
The core of our approach is a cognitive user model. The term “cognitive model” covers a scientifically founded idea of how a human user would behave when operating the prototype. We implemented this model based on the ACT-R framework. ACT-R is based on countless psychological structure and process assumptions regarding the set-up and functionality of the human intellect. The benefit is that we can simulate our ACT-R models on a computer. We are integrating this model into Antetype P/M, allowing it to “see” the interactive prototype and carry out actions such as mouse clicks.
How does evaluation with Antetype P/M work?
Sven: Designers create an interactive prototype in Antetype as usual. They then „show“ it to the model: the click paths for evaluation scenarios are then displayed. Interactions that can’t simply be demonstrated in this way can be specified in a dialog. This could be comparing two prices, for example - when simply looking at the click path, the model only sees that I went for the cheaper option, not that I compared both prices and then opted for the cheaper product.
Here comes the most exciting part of evaluating with Antetype P/M. With this learned knowledge about the task and its existing general knowledge of how humans interact with user interfaces, the simulated user can now operate a prototype.
I can even watch as Antetype P/M visualizes the simulated user’s eye movements and shows how the cursor moves across the screen in real time, for instance. Designers can therefore observe in real time what this virtual test participant is doing and recognize important usability problems right away. Figure 1 shows how this kind of visualization looks on an interface for a riveting machine used on airplane wing units.
Finally, Antetype P/M provides an overview of all the results: this includes the time it took to complete the observed task and each individual interaction that was required on this journey. These values could be used to compare design alternatives or existing systems, for instance. Especially when it comes to the design of user interfaces for highly routine tasks, small performance improvements can have enormous consequences in terms of cost savings.
How exactly is it different from GOMS or CogTool?
Sven: We get performance predictions for routine and error-free operation using GOMS and CogTool. With Antetype P/M, integration into a comprehensive prototyping tool such as Antetype doesn’t just vastly reduce my workload. When we look closer, our first try involves the simulation of a beginner seeing the interface for the first time with only a description of what they should do. We can carry out the same task several times with the same model and observe how the user interface impacts the learning curve. This lets us predict how long it will take for users to become experts.
How good are these predictions then? Can you assess the quality of the predictions?
Sven: Of course, we asked ourselves this question when we were completing the first prototypes of Antetype P/M. That’s why we carried out multiple studies where we compared our software’s predictions with the times of real users. We report the impressive results of this comparison in two scientific papers.
That all sounds super interesting... so when can I use Antetype P/M?
Sven: With the current state of development, we’re working on a “proof of concept” right now. Up to now, we’ve been able to show that performance predictions à la GOMS can be carried out considerably more conveniently with Antetype P/M and that we can also make statements about the learnability of a user interface. Antetype P/M facilitates the simulation of motor interactions such as eye and mouse movements and the simulation of cognitive processes such as comparing prices in the interface and recognizing grouping. We’re currently working on giving the cognitive model more general knowledge about the operation of user interfaces so we can simulate more complex interactions as well. We’re also experimenting with the simulation of errors during operation.
Where can I find out more about the research project?
Sven: We presented a comprehensive overview of the functions and details of Antetype P/M last year at the Human Computer Interaction International (HCII) conference in Las Vegas. There’s also a publication in the conference transcript. We presented our latest results at the Designing Interactive Systems conference in San Diego. The paper has just been published.
Any final words?
Sven: At this point, we’d like to thank the Federal Ministry for Education and Research (BMBF) for funding the SiBed project under funding ID 01IS16037A.
Thank you, Sven, for a great chat!
As a UX Researcher and Software Engineer at Ergosign, Sven is always on the look-out for new ways to improve interactions between humans and machines.
As part of the SiBed research project, he is investigating the possibilities of model-based usability evaluations using cognitive models in Antetype.
Looking forward, it may be possible to further develop the internally generated prototype up to publication as an Antetype plug-in.