Kill Your Losing Experiments Before You Build Them

Your growth team runs 20 experiments a quarter. Each one takes 2-3 weeks of engineering time to implement, instrument, and analyze. At the end of the quarter, you've shipped 20 tests.

Fourteen of them showed no statistically significant result.

This isn't bad luck. It's the base rate. Industry data consistently shows that 60-80% of A/B tests produce no meaningful difference. The design variant that took a week to build and two weeks to test performed identically to the control. The engineering time, the analytics overhead, the opportunity cost of not running a different test — all of it, wasted on an experiment that was never going to move the needle.

What if you could have known that before you built it?

— The Experiment Tax

Every test you run has a cost. Even the ones that fail.

Teams talk about experimentation velocity — running more tests, faster. But velocity without direction is just expensive thrashing. Running 40 experiments a quarter instead of 20 doesn't help if the hit rate stays the same. You've doubled the throughput and doubled the waste.

The real cost of a failed experiment isn't the engineering time, though that's significant. It's the opportunity cost. Every week your team spends building and running a test that shows no result is a week they're not spending on the test that would have shown a 15% lift. The experiment pipeline is a queue, and every dud in the queue pushes the winners further out.

The best growth teams in the world don't run more experiments than everyone else. They run better experiments. Their hit rate is higher because they've developed an instinct for which design changes will move the metric and which ones are noise. That instinct comes from experience — and from having a way to pre-screen ideas before committing engineering resources.

— The Pre-Flight Check

Screen the design before you build the experiment.

A pilot doesn't take off without a pre-flight checklist. An engineer doesn't deploy without a code review. But growth teams routinely commit weeks of work to experiments without ever checking whether the design change is likely to matter.

A pre-flight check for experiments works like this: before you commit to building a variant, run the design past a synthetic audience. Not to get a definitive answer — you'll still need the A/B test for that. But to answer a simpler question: is there enough signal here to justify spending engineering time on it?

If your synthetic audience can't distinguish between the control and the variant — if the reactions are essentially identical — that's a strong signal that real users won't distinguish them either. You've just saved two weeks of engineering time and two weeks of test runtime. Move on to the next idea.

If your synthetic audience flags a strong preference — especially if different segments prefer different variants — now you have something worth testing. You've also learned which segments to watch in the real test, which makes the analysis sharper and the results more actionable.

— The Math

Doubling your hit rate is worth more than doubling your velocity.

Say your team runs 20 experiments per quarter with a 30% hit rate. That's 6 winners per quarter driving measurable improvement.

Option A: double your experiment velocity. Run 40 tests. At the same hit rate, you get 12 winners — but you've also doubled the engineering cost, the analytics load, and the organizational complexity. And your team is exhausted.

Option B: keep running 20 experiments, but pre-screen them to kill the obvious losers before building. If pre-screening eliminates even half of the duds, your effective hit rate jumps from 30% to 50%+. Ten winners from 20 experiments — almost as good as Option A at half the cost and none of the burnout.

The real win is Option B plus a modest velocity increase. Pre-screen to raise the hit rate, then run slightly more experiments. The compound effect of a higher hit rate with moderate throughput increase dramatically outperforms raw velocity with random selection.

The math that changes everything

Same number of experiments. Same engineering cost. Dramatically different results.

Without pre-screening

Run everything. Hope for the best.

winners / quarter

Hit rate

30%

With pre-screening

Kill the obvious losers first. Run the rest.

winners / quarter

Hit rate

50%

+67% more winners from the same 20 experiments. No extra engineering cost. No extra analytics overhead.

— What to Pre-Screen For

Not every experiment needs a pre-flight. These ones do.

Pre-screening works best for experiments where the change is visual, emotional, or trust-related — the kinds of changes that live in the gap between what users say and what they do.

Pricing page redesigns: will this build more trust or less? Onboarding flow changes: does this feel simpler or more confusing? Landing page copy variations: does the new messaging resonate with the target segment or only with the copywriter who wrote it? CTA placement and styling: does moving the button actually change behavior or just look different?

Pre-screening works less well for purely mechanical changes — button color tests, font size tweaks, layout shifts that don't change the meaning. For those, just run the test. The cost of building the variant is low enough that pre-screening adds friction without proportional value.

The sweet spot is the experiment that would take a week to build and three weeks to run — the ones where the engineering investment is real and the outcome is uncertain. Those are the experiments where a 30-minute pre-flight saves 30 days of waiting for a flat result.

— The Cultural Shift

Stop celebrating experiment volume. Start celebrating experiment quality.

The best growth teams we've worked with have made a subtle but important cultural shift. They stopped tracking how many experiments they run and started tracking how many experiments produce actionable results.

This changes the incentive structure in a way that makes everything better. When the metric is volume, teams are incentivized to run easy, low-risk tests that fill the pipeline. When the metric is actionable results, teams are incentivized to choose experiments that matter — and to pre-screen the marginal ones before committing.

The experiment that never gets built because a pre-flight check showed it wouldn't matter? That's not a failure. That's the most efficient experiment you've ever run. It gave you a result in minutes instead of weeks, and it freed up the engineering slot for an experiment that actually moves the number.

Kill the losers early. Ship the winners faster. That's not less experimentation. It's better experimentation.

The most efficient experiment you've ever run is the one that never gets built.

TopicsexperimentationA/B testinggrowthdesign decisionsexperiment velocity

Kill Your Losing Experiments Before You Build Them

Every test you run has a cost. Even the ones that fail.

Screen the design before you build the experiment.

Doubling your hit rate is worth more than doubling your velocity.

Not every experiment needs a pre-flight. These ones do.

Stop celebrating experiment volume. Start celebrating experiment quality.

Pre-flight your next experiment

Why Synthetic Users Give Better Design Feedback Than Real Ones

How We Built a Skeptical Audience (On Purpose)