[object Object]

HubSpot will happily declare an email A/B winner with 200 sends and a 1% open-rate gap. That’s noise, not signal. Most teams promote the “winner,” then watch performance regress. Here’s how to test honestly.

Minimum sample size

For a meaningful open-rate test (assuming a 25% baseline and detecting a 3-point lift), you need roughly 1,800 contacts per variant. For click-rate at 4% baseline detecting a 1-point lift, you need around 6,000 per variant. Smaller list = no valid test.

If your test list is under 4,000 total, don’t A/B test. Run two single sends in different weeks and look at directional patterns.

Test one variable, not three

Subject line OR send time OR preview text. Never all three. If you change three variables and one wins, you don’t know which change caused the lift. HubSpot’s UI will let you, your decision-making shouldn’t.

Pick your decision metric before you launch

Open rate is misleading after iOS 15 (Mail Privacy Protection inflates opens). Use click rate or downstream conversion as the decision metric. Document it in the test name: 2026-04-pricing-cta-test-decisionmetric-CTR.

Wait the full cycle

Most B2B email engagement happens in the first 48 hours; long-tail engagement runs to 7 days. Don’t call the test at 24 hours because one variant is “ahead.” Wait the full week.

Account for day-of-week confounds

Tuesday vs Thursday changes everything. HubSpot’s split test sends both variants at the same time, which controls for this. If you’re running serial tests instead of split tests, randomize day of week or you’re testing days, not creative.

Document the loss

A null result is a result. Maintain a experiment_log.md:

Test: 2026-04 pricing CTA color
Hypothesis: Orange CTA outperforms blue
Result: No statistically significant difference (p=0.34, n=8200/variant)
Decision: Stay with blue, retest only if creative direction changes

Most teams forget the losing tests, which means they re-run them every 18 months.

Don’t test what doesn’t move the needle

Subject line emoji on a transactional email isn’t worth testing. Save A/B capacity for nurture-funnel CTAs and welcome series, where compounded lift matters.

What to do this week

Audit your last 10 A/B tests. Calculate whether each had the sample size to detect the lift you claimed. Discard the false positives and retest the ones that actually mattered.

[object Object]
Share