June 8, 2021
User testing should do clear a path for great UX. But what do we do when what users say and what they do don't align?
I recently conducted user testing for a client looking to improve the purchase flow for one of their most critical products. While I can’t give it all away, the goal was to make buying an access product for a collective of winter sports resorts easier.
After a comprehensive audit of the resort and travel spaces, we found that two concept models dominated the category. So beginning with pencil sketches, I moved through the UX process and eventually landed on mid-fidelity prototypes of each model.
The first prototype required users to select the date upon which they intended to use the product, then the product category. The first prototype was a relatively conventional experience for buying time- and date-based access products. Users first engaged a calendar for arrival date selection, then selected a product type.
The second prototype was more innovative in the resort space and hewed toward contemporary UI conventions found across the buying experiences in the travel category. This model requires users to pick a product category, then view product types.
After client stakeholders reviewed both prototypes, we got to work determining which version would make the purchase experience easier, faster, and more efficient. After six weeks of testing, the results continued to show variations in qualitative vs. quantitative data.
As UX designers we should remain neutral in our pursuit of the best experience. The UX team felt both options should move the client forward. That said, isn’t there almost always excitement behind, and a hedge toward, one of the competing prototypes? Cheerleading a concept before testing is hard to ignore in the client-services space.
And this case was no different. My client and development team felt there should be a pretty clear winner between the two when the dust settled. That said, we paid special attention to maintaining neutrality in crafting the test protocols.
The first protocol was a first-click test. A panel of test participants was shown static images of each version’s default state. They were asked where they would click to initiate a purchase. Garnering first-click success data provides some validation that the design correctly launches the customer journey.
The second protocol provided testers with Figma prototypes of both versions. UsabilityHub’s Figma task protocol makes this a relatively easy test to conduct. Participants were tasked with buying a single product. We crafted this test as a first attempt to see the prototypes in action as users moved through the buying process for a single product.
Much was learned and these insights fueled the second iterations of each prototype. We used these enhanced prototypes for the third protocol, again proctored through UsabilityHub’s Figma prototype test tool. This test required participants to purchase multiple access products across two product categories. We crafted this test to show us the challenges our prototypes posed when making more than one buying decision.
Consistently, we found that first-click data, task completion data, and error-free task completion data all dramatically favored the design that the test team considered the more innovative approach. This quantitative data showed the more innovative version as clearly performing better.
However, each of the three protocols also asked participants to indicate the version they preferred after engaging in the tasks. And the data showed an unmistakable preference for the more conservative version. This qualitative data showed a preference for the more conservative version.
So here’s the rub: Users performed significantly better on the innovative version in Protocols 2 and 3 when measured for success rates, time to completion, and fewest interactions. But preference tests showed a majority of testers preferred the more conservative version.
What happens next when quantitative and qualitative test results conflict? How does the design team move ahead?
Because task completion rates with the fewest errors and the time to successful completion favored the second, more innovative version, we are pushing ahead with it in upcoming sprints.
The data showed a significantly higher task completion rate for V2 over V1 in this case. The quantitative data showed a clear winner in V2. But how do we account for the qualitative preference for V1?
As UX researchers, we’re keenly aware of the many biases that can creep into our findings. One of the best ways to identify bias is to look at the various responses of each participant: Analyze both what they say and what they do.
As noted, many data sets showed faster error-free completion times on the prototype participants didn’t select as their preferred version. When looking at the written responses that accompanied the quantitative data, we saw comments noting that the participant’s preference choice was impacted by how they felt others might choose.
To paraphrase many responses, participants noted that while they preferred the version on which they performed better, they felt that others would be more likely to prefer the first, more conservative version.
So in this case, we’re looking at social desirability bias as the likely culprit. While participants performed significantly better on the more innovative experience, they think that others might like a more conventional experience. They selected the version they think most people would like rather than the one they truly prefer.
Where do we draw the line between preference and performance? In this case, performance wins out. Reducing errors means increasing conversions. And for this product, even a modest conversion increase provides a substantial revenue increase.