[ Trusted by builders from ]NetflixServiceNowCiscoAdobePayPalAmazonDatadogJPMorgan ChaseDell
[ Trusted by builders from ]NetflixServiceNowCiscoAdobePayPalAmazonDatadogJPMorgan ChaseDell
Prior.Runprior.run

How We Built a Skeptical Audience (On Purpose)

LLMs are polite by default. That's a disaster for design feedback.

·7 min read

Here's an experiment you can run right now. Take the worst landing page you've ever seen — the one with the stock photo handshake, the "synergize your workflow" headline, and the pricing that requires a PhD to decode. Upload it to ChatGPT and ask for feedback.

It will find something nice to say about it.

It always does. "The layout provides a clean structure." "The color scheme conveys professionalism." "The value proposition, while broad, gives users a general sense of the product's capabilities." The page is a dumpster fire and the AI is complimenting the font choice.

This isn't a flaw in the prompt. It's a flaw in the architecture. And if you're using AI for design feedback, it's silently destroying the value of every review you run.

The Root Cause

LLMs were trained to be liked, not to be honest.

Every large language model on the market was trained with human feedback. Humans rated responses. And humans — consistently, systematically, across millions of ratings — rewarded responses that were balanced, diplomatic, and encouraging. They punished responses that were blunt, negative, or confrontational.

Over thousands of training iterations, the models learned the lesson: hedging gets rewarded. "Some users might find this confusing" scores better than "this is confusing." "Consider improving the contrast" scores better than "this text is unreadable." The training process selected for agreeableness the way evolution selects for camouflage — not because anyone decided to, but because the pressure was relentless and the adaptation was inevitable.

The result is a structural bias baked into the weights of every model. You can fight it with prompts — "be critical," "be harsh," "pretend you're a demanding user" — but the underlying tendency remains. The model will be somewhat more critical, in the way that a polite person asked to "be honest" will be somewhat more honest. The pull toward center never fully releases.

For creative writing or customer service, this is a feature. For design feedback, it's a smoke detector that's been trained to whisper.

Why Prompting Doesn't Fix It

"Be more critical" creates a different problem.

The obvious fix — tell the AI to be more critical — creates uniformly critical feedback. Every design gets the same treatment. Your genuinely strong landing page gets torn apart the same way your genuinely weak one does. You've replaced false positives with false negatives, and you still can't tell which feedback to act on.

This is the core problem: a single AI model produces a single perspective. It can't disagree with itself. It can't have some part of it think the design is brilliant while another part thinks it's predatory. It gives you one voice — polite or harsh, depending on the prompt — and presents it as analysis.

What you actually want is what you'd get from a real audience: a distribution of reactions. Some people love it. Some people are suspicious. Some people don't care about the thing you spent three weeks perfecting. The signal is in the variance — in understanding who objects and why — not in any single verdict.

The Insight

Don't fix the model. Fix the population.

We tried the prompting approach first. It didn't work. So we asked a different question: what if skepticism wasn't a prompt instruction but a personality trait that varied across a population of evaluators?

In the real world, some people are naturally skeptical and others are naturally trusting. A room full of people evaluating your design won't all react the same way — and the disagreements are the most valuable data. When the skeptics and the optimists flag the same concern, that's a universal problem. When only the skeptics flag it, that's a risk you can weigh. When only the optimists flag it, something is deeply wrong.

So we built a synthetic audience where skepticism varies the way it varies in real populations — but deliberately weighted toward the critical end. The majority of our panel is naturally skeptical. A smaller group is neutral. A smaller group still is naturally trusting. This isn't a 50/50 split. It's asymmetric by design.

The asymmetry serves a specific purpose: it counteracts the LLM's built-in politeness. When you combine a skeptical personality with a model that wants to be agreeable, the two forces fight each other — and the result is more honest than either would produce alone. The personality pushes toward criticism. The model pulls toward balance. The tension produces something closer to truth.

What This Looks Like

Three skeptics walk into your pricing page. They find three different problems.

Not all skepticism is the same skepticism. This is the detail that makes population-level modeling more powerful than a single critical prompt.

One type of skeptic reads your pricing page like a contract. They're methodical, detail-oriented, looking for what you're not saying. They notice the asterisk. They calculate the annual cost before you tell them. They're the person who googles your company name plus "hidden fees" before signing up. Their feedback: "The pricing feels dishonest — the per-seat cost isn't clear until you're three clicks deep."

Another type of skeptic is contrarian by nature. They resist persuasion techniques the way some people resist sales pitches — instinctively. The testimonial carousel doesn't just fail to convince them; it backfires. They read social proof as evidence that you're trying to manipulate them. Their feedback: "The testimonials made me trust the product less, not more."

A third type of skeptic isn't hostile at all — they're anxious. They're not doubting your product; they're afraid of making a mistake. They want a comparison table. They need to know they can cancel. Their feedback: "I couldn't find any information about cancellation, so I assumed there's a catch."

Same page. Three different problems. All of them real. All of them invisible to a single AI giving balanced feedback — and invisible to a single AI told to "be more critical," which would just uniformly shred the page without distinguishing between these three very different failure modes.

Same pricing page. Three skeptics. Three different problems.

Each archetype surfaces a problem the others miss. That's why population-level modeling beats a single critical prompt.

The Analyst

Methodical · Detail-oriented · Low trust

Reads your pricing page like a contract

Calculates the annual cost before you show it. Notices the asterisk. Googles your company name + "hidden fees" before signing up.

"The per-seat pricing isn't clear until three clicks deep. That feels deliberately hidden."

The Contrarian

Independent · Resistant · Very low trust

Actively resists persuasion techniques

Social proof doesn't just fail — it backfires. Testimonials make them trust you less. They specifically avoid the product everyone else is buying.

"The testimonial carousel made me trust this less, not more. It feels like manufactured consensus."

The Worrier

Anxious · Thorough · Risk-averse

Not hostile — afraid of making a mistake

Opens your page in one tab and a competitor in another. Looking for a comparison table. Needs to know they can cancel. Their skepticism is protective, not hostile.

"I couldn't find cancellation terms anywhere. When a product hides the exit, I assume there's a catch."

A single AI told to “be critical” gives you one voice.
A diverse panel gives you the full picture.

The Professional Layer

Your users aren't just personalities. They're professionals.

There's a second dimension of skepticism that has nothing to do with personality: domain expertise. A nurse evaluating a health product will catch clinical claims that sound authoritative to everyone else but are actually meaningless. A software engineer will notice that your "AI-powered" feature doesn't explain what the AI actually does. A financial analyst will spot that your ROI projection conveniently excludes implementation costs.

Our synthetic users carry professional expertise that shapes their BS detection. When someone with twenty years in fintech looks at your financial product, they're not just skeptical by temperament — they're skeptical with precision. They know exactly which claims are inflated because they've made (or debunked) those claims themselves.

The combination of personality-driven skepticism and expertise-driven skepticism is what makes the feedback surgical. Not just "this feels wrong" but "this feels wrong because I know this industry and this specific claim doesn't hold up."

The Uncomfortable Truth

The feedback you need most is the feedback that stings.

Teams using Prior.Run for the first time often have the same reaction: "This is harsh."

They're right. It is. Because we're surfacing the reaction of the person who was going to leave your page anyway — the one who bounced in eight seconds, who will never show up to a usability study, who already decided your product wasn't for them before you even knew they existed.

That person's feedback is the most valuable feedback you'll ever receive. And no tool that's been trained to be polite will ever give it to you.

Polite feedback feels good. Skeptical feedback ships better products.

TopicsLLM biassynthetic usersdesign feedbackagreeablenessskepticism

— see it in action

Get honest feedback on your design


— Keep reading