To Shelf Test or not?

When tasking consumers to evaluate marketing stimulus in an online research environment (whether that be a packaging design, a product, or advertising creative), there’s two crucial things you need to be aware of:

“Content” and “Context” play an equally important role in driving real-world marketing effects.
To get the most robust and reliable research outcomes it’s essential that the “Content” is separated from the “Context”.

Let’s explore each of these in further detail…

1. “Content” and “Context” play an equally important role in driving real-world marketing effects.

This means that what you “put out there” as marketers is just as important as the context it appears in. Therefore, for packaging research it’s not only important to understand how consumers respond to your design in isolation, but also how it will perform when it appears alongside competitors (i.e. reflecting the way shoppers will experience the product in-store).

It’s important to note this phenomenon isn’t unique to the retail environment; the exact same principles apply to advertising. For example, if your advertising is placed in a premium online environment (e.g. on a reputable news service website) then the expected brand impact would be far greater than if the exact same ads were run in a low-quality environment (e.g. on an “adult entertainment” website).

2. To get the most robust and reliable research outcomes it’s essential that the “Content” is separated from the “Context”.

It doesn’t need saying that the most critical component of packaging research is to understand the on-shelf battleground. However, while this might sound counterintuitive given our first point, the most reliable way to predict how a packaging design will perform in a real-world shopping environment is to test individual designs one brand at a time in a monadic way.

Clunky, artificial “shelf mockups” (a.k.a. “planograms”) — which only loosely relate to the realities of a real-world shopping environment — are a poor way to simulate the in-store experience (thus limiting how accurately these insights can predict behavior).

What is pre-testing in advertising - example of advertising storyboard

Transforming a dull and unengaging soundtrack into an upbeat and uplifting one.

A story which was confusing and eliciting a muted emotional response into one which was coherent and emotionally arresting.

Transforming strange and disturbing characters into quirky and eccentric ones.

Taking a brand from having an incidental role in the story to being an active driving force behind it.

Changing a flat and monotonous voiceover into a livelier and more dramatic one – lifting emotional engagement.

Enhancing visual cues and sound effects to ensure the humor/punchline was landing as intended.

Shelf mockups: All style, no substance?

Testing shelf mockups is an embedded part of many consumer insight professionals’ toolkits: “It’s the way we’ve always done things!”. However, not enough scrutiny has been placed on their validity. The motivations behind testing them are of course very good, with it long been believed that by simulating how shoppers will experience the product in-store, it will more reliably predict behavior.

While attempting to replicate as many of the variables at play in a retail environment might make us feel better about ourselves and how diligently we’re doing our jobs, shelf mockups represent a convoluted picture of reality. When we take a step back and look at things objectively it becomes clear that — in an online research environment — it’s impossible to replicate all the various contextual factors at play when shoppers encounter a product at shelf, and the more of them we try to account for the more detached from reality it becomes.

What does this mean for packaging design research?

Firstly, we acknowledge that no approach to packaging research is perfect. However, through our decade of testing experience we’ve repeatedly found that the most robust way to evaluate design effectiveness is to test your brand and at least a handful of your key competitors in a monadic way. In other words, exposing respondents to one design at a time and evaluating them in isolation. By utilizing metrics grounded in the empirical evidence and wealth of established learnings at our disposal, we’re able to confidently predict the expected shopper response when the packaging eventually makes its way into a competitive shelf environment.

The same goes for advertising and why we test all creative stimulus "out of context". It’s impossible to replicate all the exact placements, the various mindsets of consumers, and the levels of clutter and distraction in a real-life viewing environment. Trying to simulate even just a few of these variables can be extremely dangerous as it risks introducing biases that make the results less valid. It’s for these same reasons that most shelf mockup testing ends up becoming a glorified beauty contest and — at best — only tells us about one small component of what it takes to get a product off the shelf and into shopping baskets.

What contextual factors limit the usefulness of shelf mockups for packaging design research?

By adopting a monadic approach and testing packaging designs in isolation, we can neutralize many of the contextual factors at play in a real-world shopping environment. We’ve outlined three of the deadliest contextual biases below 👇

1. The way products are merchandised on-shelf is often completely different across retailers.

Consider for a moment that there can be a wide variation in the products stocked by different retailers, the number of shelf facings they each give to different brands, and the way these facings are organized on-shelf. Additionally, the competitors stocked will generally vary from one retailer to the next, including the number of products they each carry and corresponding real estate given to them.

This presents a conundrum if you’re looking to use shelf mockup research to inform design decisions. Given the variability across retailers and store formats, the logical conclusion would be to planogram test across every retail environment the product will be stocked in. However, this very rarely happens given the effort that would be involved in meticulously planning and executing testing across every single retailer, store format, region of the country, etc. (not to mention the eye-watering costs associated with doing so!).

Instead, what generally ends up happening is a single shelf mockup is developed that’s intended to be “representative” of all retail environments (albeit we’re not exactly sure what that means...). Or a planogram is chosen of the retailer where the brand generates its biggest sales volumes (acknowledging this still only likely represents a small fraction of the total stores the product is sold in).

That planograms aren’t representative of the retail environment calls into question the validity of testing them in the first place, and their usefulness for addressing the primary goal of packaging research — to evaluate the design’s effectiveness! With the shelf dynamic in any category constantly changing it also means that this approach would necessitate undertaking new planogram research every time something changes — no matter how small, and even if the change doesn’t have anything to do with a packaging (re)design.

If you’re sitting there and still not entirely convinced, you can easily validate these claims for yourself: The next time you’re undertaking packaging research simply test planograms across multiple retail environments. When considerable discrepancies reveal themselves only one question should remain: What’s the purpose of testing shelf mockups if the insights largely reflect contextual factors and only very loosely relate to the effectiveness of the design itself? Like any form of testing, people are only able to respond to the stimulus that’s put in front of them, so it’s important to remember that any variation in this context will consequently skew the results.

2. While a planogram might look aesthetically pleasing when meticulously curated for planning purposes, it’s completely disconnected from the realities of the in-store experience.

Planograms represent an idealized snapshot of the category at shelf, a beautiful — but highly artificial — depiction of how the product will appear in-store. But they’re deliberately designed this way for the very important purpose they serve: To help marketers and designers assess the appeal of a packaging design in a competitive shelf environment, and to identify opportunities to craft a distinctive positioning for the brand. However, the planograms we’re generally familiar with as marketers are something virtually no shopper ever experiences in reality. The reasons for this are numerous.

For instance, depending on the how expansive the category is, many planograms need to be set back quite far in order to fit every product into frame. Contrast this with the actual in-store environment: Narrow aisles, various on- and off-shelf obstructions, varying placement of the category, low/exhausted inventory, and a multitude of adjacencies. This means that a single planogram won’t ever accurately represent the way shoppers view and experience the category in real life.

A shopper’s height and proximity to shelf also means their gaze will only ever be fixated on a small handful of brands at once, which is in stark contrast to what a perfectly curated planogram would have you believe. It’s important for marketers to remember that consumers pay scant regard to the physical bounds of the category and have even less interest in consciously weighing up all the various options available to them.

Instead, all the evidence tells us that shoppers are simply looking to address the ‘job’ they came to do, which means finding the brand that best meets their needs and wants as quickly as possible. Therefore, the reality for most shoppers is that they will only ever see a small fraction of the brands they’re exposed to when testing planograms in a controlled research environment.

3. Online shopping is becoming increasingly popular, with the context packaging appears in when viewed digitally entirely different to the physical shopping experience.

The types of insights gathered through planogram research (the number of facings, their placement, brand blocking, etc.) is of no relevance to the online retail environment — where the context the brand and packaging is viewed in can consist of infinite number of possibilities. So, if you think brick-and-mortar planogram testing has its challenges, spare a thought for e-commerce — where there’s a virtually endless number of ways a brand can show up! That’s why testing packaging designs free of contextual biases and in a monadic way leads to more robust insights — better reflecting the realities of how the product will be encountered “out in the wild”. But not just that, the insights can also be better scaled and generalized across all the various touch points consumers will eventually interact with the brand.

So, is planogram testing truly capable of providing reliable insights into the effectiveness of packaging designs?

The reason why shelf mockup testing is eagerly pushed by market research vendors is simple — it makes them more money! This is a function of it being a more labor-intensive process, along with the perception that it’s a more “scientific” and rigorous undertaking (especially if eye-tracking, heatmaps, and other neuro-based tools are sold in as part of it). However, the most important question marketers and consumer insights professionals must ask themselves is this: How can planogram testing confidently inform decision-making when it doesn’t provide reliable insights into the effectiveness of the design itself?

‍Brands have limited control over the context their brands appear in (and virtually no control over how this context changes over time), including the number of facings, shelf placement, competitive set, adjacencies, etc. It therefore makes no sense that critical design decisions (something that is actually entirely within the brand’s control!) are made based on factors completely beyond the brand's influence.

Many marketers are under the mistaken belief that planogram testing is evaluating shelf impact, when in reality what it’s really doing is assessing the effectiveness of a packaging design under very specific conditions — that firstly don’t reflect reality, and secondly don’t reflect the infinite number of retail contexts it will eventually appear in. That’s why planogram testing ultimately fails to yield actionable insights when it’s used as a tool for informing packaging design decisions.

When getting consumer validation costs as little as $2k, (and you can receive a tailored report containing in-depth insights and clear actions within 48 hours,) it’s easier than ever for marketers to involve consumers in critical business decisions.
‍
Learn more about our Packaging Testing solution or Get in Touch to chat with a member of our team.