March 28, 202611 min read

A/B testing email campaigns: what to test and how to analyze

A marketer sends a campaign. Open rate: 18%. Good or bad? Without an A/B test, there is no answer, only guesses. Testing turns guesses into data — but only when done right: large enough sample, single variable, clearly defined metric.

Why test campaigns at all

Marketer intuition is useful. It is also systematically wrong about predictions. A subject line that seems dull pulls record open rates. The red button everyone called "aggressive" gets twice the clicks of the blue one. We see this with clients constantly: a team spends a week arguing over a subject line. An A/B test takes four hours and gives a clear answer. 23% open rate vs. 17% leaves nothing to argue about.

The point is not only picking a winner for one campaign. After six months of regular tests you know things about your audience that no blog post can tell you: subscribers respond better to questions in the subject than to statements; first-name personalization in the preheader adds 3-4% to opens; emails with one link get more clicks than emails with five.

What to test: the full list

Subject line. The most important element, because it directly drives open rate. Variables: length (under 40 vs. under 80 chars), numbers, question vs. statement, emoji vs. none, personalization. In our data the subject line accounts for 60-70% of the open decision. Start here.

Preheader (preview text). The text shown next to the subject in the inbox. Underrated: a good preheader can add 5-8% to open rate. Test a complement to the subject vs. an independent phrase.

Sender name. "Anna from Company" vs. "Company". A personal name in B2B typically lifts open rate 10-15%. In e-commerce the brand name usually wins.

Send time and day. Tuesday at 10 a.m. or Thursday at 2 p.m.? Start with two contrasting options (morning vs. evening, weekday vs. weekend), then narrow from there.

Content and layout. Long vs. short. One block vs. several. Text vs. visual. One client found plain text with no images outperformed a designed HTML template by 34% on clicks.

CTA. Button text, color, size, position. "Buy now" vs. "Browse the catalog". Test this separately from other changes, or you will not know what moved the number.

Offer. 10% discount vs. free shipping. Promo code vs. link to a sale. Testing offers through email is one of the fastest ways to validate a business hypothesis.

How an A/B test works

Take one campaign and create two versions differing by a single element. From your full list, pull a test sample — usually 20-30% of the segment. Half gets version A, half gets version B. After 2-4 hours (open rate) or 12-24 hours (clicks and conversions), check the result. The winner goes to the remaining 70-80%. Most ESPs automate all of this — Mailchimp, GetResponse, and Brevo each have built-in A/B tests.

One variable per test. If you change the subject line, preheader, and send time simultaneously, you do not know what drove the result. You only know that version B performed better. The why stays hidden.

Sample size: how many addresses you need

This is the question most marketers skip. A test on 200 addresses that shows 22% vs. 24% open rate is not statistically significant. That is noise, not signal. If your open rate is around 20% and you want to detect a 3-point lift at 95% confidence, you need at least 1,500-2,000 addresses per variant. To detect a 1-point difference: 15,000-20,000.

Practical rule: if your segment has fewer than 5,000 subscribers, test only large changes. Radically different subject lines, completely different structure, fundamentally different offer. Small variations on small samples produce unreliable results.

Choosing the right metric

Subject line: open rate. Content and CTA: click rate. Offer: conversion (purchase, registration, download). The mistake is common: testing the subject line while watching clicks, or testing the CTA while judging by open rate, which the email body does not affect. Also: do not confuse clicks with conversions. Version A may get more clicks but version B more purchases, because A attracted the curious and B attracted people ready to buy.

Revenue per email (total campaign revenue divided by emails sent) is often more informative than CTR or conversion rate alone. A version with lower CTR can generate more revenue if it reaches a higher-value audience.

When a test is done

The most common mistake is stopping too early. One hour after send, only the most active subscribers have opened. For open rate: 3-4 hours (one time zone), 6-8 hours (multiple zones). For CTR: at least 12 hours, preferably 24. For conversions: 24-48 hours, because someone may click now and buy tomorrow.

Decide the wait time upfront and do not change it. Do not check every 15 minutes and do not declare a winner when the gap "looks convincing." This is the peeking problem, and it produces false positives. Set up the test, leave, come back at the specified time. That is it.

Mistakes that invalidate tests

Multiple variables at once. You changed the subject and preheader in one test. Results improved 5%. Which one did it? Unknown. One test, one variable. The exception is MVT, but that requires 50,000+ addresses and a dedicated tool.

Testing on a dirty list. If 15-20% of addresses are invalid, results are skewed. Dead addresses sit in the denominator of your open rate calculation. What would show a 5-point gap on a clean list shows 2 points on a dirty one, and you conclude there is no meaningful difference. There is.

Small sample, small variation. Test on 500 addresses: "Hi!" vs. "Hello!". Result: 19.2% vs. 19.8%. Statistically a draw. The marketer picks B and believes they optimized something. They picked a random number.

No record of results. Twenty tests over six months, and only the last one is remembered. Keep a spreadsheet: date, what you tested, sample size, result, statistical significance, conclusion. A year from now that table is worth more than any course.

One test, permanent conclusion. "We tested emojis in 2024, they don't work." Audiences change. Email clients change rendering. Rerun key tests every six months.

A real example: cosmetics e-commerce

List: 35,000 subscribers. Open rate: 16%. CTR: 2.1%. Two campaigns per week, never tested. First step: list validation. Of 35,000 addresses, 5,600 were invalid (16%). After cleaning: 29,400 valid. Open rate on the first post-cleanup campaign: 19.4%. Three points up with zero content changes.

Second step: subject line test. Control: "March new arrivals: skincare". Test: "What's new in skincare? 5 products we added this week." Sample: 8,000 addresses (4,000 per variant). Wait: 4 hours. Result: 18.7% vs. 23.1%. The question-format subject won by 4.4 points. Statistically significant.

Third step: CTA test. "Browse catalog" vs. "Find something for me". Sample: 6,000. Wait: 24 hours. CTR: 2.3% vs. 3.1%. Personalized phrasing won. After two months: open rate up from 16% to 22%, CTR from 2.1% to 3.4%, email revenue up 38%. Most of the gain came from list cleaning and subject line tests. Not a new design, not a new ESP, not a copywriter.

A/B test vs. multivariate test

A classic A/B test compares two versions of one element. Multivariate testing (MVT) tests combinations: three subjects, two preheaders, and two CTAs produce 12 combinations. MVT finds the best combination, not just the best single element. The problem: each combination gets 1/12 of the test audience. On a list below 50,000 subscribers, MVT rarely reaches statistical significance. For most companies, A/B is the right choice. More advanced: multi-armed bandit, supported by Mailchimp and Klaviyo, which shifts traffic toward the better variant as data comes in rather than waiting for a fixed endpoint.

Building testing into your workflow

Testing should not be a separate project. If you send two campaigns a week, make one an A/B test. For subject line or preheader tests, the body is identical, so setup takes five extra minutes. A practical schedule for teams just starting out:

Weeks 1-4: test subject lines. Two variants per campaign. Record results.
Weeks 5-8: test preheaders. Lock the subject line using what you learned.
Weeks 9-12: test send times. Morning vs. evening, weekday vs. weekend.
Weeks 13-16: test content and CTAs. Requires two versions of the email body.

Four months in, you will have 16+ documented tests and know more about your audience than after a year of sending without testing.

Why list cleanliness matters for A/B tests

Invalid addresses are not just junk in the database. For A/B testing they are poisoned data. Say your test sample has 4,000 addresses per variant, 600 of them dead. Those 600 will not open A or B, but they sit in the denominator. The test shows 19% vs. 20%, a draw. On a clean list the same test would show 22.4% vs. 23.5%, and the difference would be visible.

There is another effect: if invalid addresses are distributed unevenly between variants (possible on small samples), one variant gets more dead weight and loses not because it is worse but because it drew the unlucky half. Run the segment through uChecker, remove invalid and risky addresses, then launch the test.

Pre-launch checklist

Segment validated. Check the list, remove invalid addresses before testing.
One variable. If testing subject line, content and design are identical. If CTA, subject and content are the same.
Sample size is sufficient. Minimum 1,000 per variant. For small differences: at least 5,000.
Metric defined upfront. Open rate, CTR, conversion, or revenue per email. One primary metric per test.
Wait time fixed. 3-4 h for open rate, 12-24 h for clicks, 24-48 h for conversions. Do not stop early.
Results will be recorded. Date, hypothesis, variants, sample, result, significance, conclusion.

A/B testing is a habit, not a project. After a year the difference between teams that test and teams that guess becomes obvious. Start simple: send your next campaign with two subject line variants. Write down the result. Repeat. After six months you will have sixteen data points. That is more than 90% of companies sending campaigns without a single test.

Before launching an A/B test, validate your list in uChecker — clean data makes test results reliable, not random.

A/B testing emailemail split testsubject line testingemail optimizationopen rateCTR emailemail marketing

← All articles