Blog/Verification

March 24, 202612 min read

Email regex validation: patterns from naive to RFC 5322

Every developer has written an email regex at some point. Most got it wrong. Some patterns reject half of all legitimate addresses; others happily accept strings that have nothing to do with email. This article walks through concrete regex patterns from the trivially simple to the RFC 5322-adjacent, with working code in three languages, and explains why regex alone isn't enough for real validation.

Why check email with a regular expression at all

Regex has one job in the email context: catch obvious garbage client-side before the data ever hits your server. Registration forms, newsletter signups, checkout flows. Someone types “asdf” or forgets the @ symbol. Regex catches that in microseconds, no network round-trip needed.

The boundary matters. It checks syntax, not reality. It has no idea whether the domain exists, whether the MX server is responding, or whether the mailbox is disposable. Think of it like verifying a passport by format: you can confirm the number has the right length, but you can't confirm the document is genuine.

Developers who rely only on regex routinely end up with databases containing 20-30% invalid addresses. The syntax is fine; the addresses don't exist. Treat regex as a first filter, not a solution.

Level 1: the naive pattern

The simplest version, common in beginner tutorials:

.+@.+\..+

What it does: requires at least one character before @, at least one after, a dot, and something after the dot. That's it.

The problems are obvious. This passes “hello@.com”, “@domain.com” (if the dot lands in the right spot), and strings with spaces. For a contact form on a landing page that's probably acceptable. For account registration or a payment form, it creates problems downstream.

Level 2: the practical pattern

This pattern covers 95% of real-world cases and stays readable:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breaking it down:

^                    # start of string
[a-zA-Z0-9._%+-]+   # local part: letters, digits, dot, _, %, +, -
@                    # separator
[a-zA-Z0-9.-]+      # domain: letters, digits, dot, hyphen
\.                  # dot before TLD
[a-zA-Z]{2,}        # TLD: at least 2 letters
$                    # end of string

This handles the vast majority of addresses correctly: user@example.com, firstname.lastname@company.co.uk, tag+filter@gmail.com. It rejects strings without @, addresses with spaces, and domains without a TLD.

It has limits. It rejects internationalized domains (user@почта.рф), doesn't handle quoted local parts ("john doe"@example.com), and doesn't check part lengths. In practice, 99% of addresses in commercial email lists fit this format.

The same pattern in three languages:

JavaScript

function isValidEmail(email) {
  const re = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
  return re.test(email);
}
isValidEmail("user@example.com");     // true
isValidEmail("tag+test@gmail.com");   // true
isValidEmail("user@.com");            // false

Python

import re
def is_valid_email(email: str) -> bool:
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))
is_valid_email("user@example.com")     # True
is_valid_email("missing-at.com")       # False

var emailRe = regexp.MustCompile(
    `^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$`,
)
func isValidEmail(email string) bool { return emailRe.MatchString(email) }

Level 3: RFC 5322-adjacent

RFC 5322 defines the email address format. Full compliance requires recursive parsing, which regex cannot do. But you can get close. The pattern below shows up in production validation libraries — it handles quoted strings in the local part, nested subdomains, numeric TLDs, and other edge cases:

^(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
  |"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]
  |\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")
@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
  |\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
  (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:
  (?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]
  |\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])$

It looks alarming. In practice this lives inside validation libraries, not hand-written code. If you see it in a codebase without comments and a source link, that's a code review conversation waiting to happen.

What it covers that the practical pattern misses:

Quoted local part: "john doe"@example.com
IP address instead of domain: user@[192.168.1.1]
Special characters in local part: !#$%&'*+/=?^_{|}~
Escaped characters inside quoted strings

For a SaaS registration form, the practical pattern is the better choice: simpler, readable, debuggable. Nobody signs up with "john doe"@[192.168.1.1]. For a mail server or SMTP library, the RFC pattern is justified.

HTML5 input type="email" and built-in validation

Before writing your own regex, remember that browsers already do basic checking. An input with type="email" uses its own pattern from the WHATWG spec. Stricter than the naive pattern, looser than RFC 5322 — no quoted local parts or IP literals. For most web forms, that's enough.

Combining input type="email" on the frontend with a server-side regex gives two layers of protection with minimal extra code. The browser shows a native error; the server catches requests from bots and custom clients that skip the browser entirely.

Common mistakes in email regex

After years of code reviews, the same five mistakes come up repeatedly.

1. Capping TLD length at three characters

# Bad: rejects .info, .museum, .company
\.[a-zA-Z]{2,3}$

# Good: accepts any TLD length
\.[a-zA-Z]{2,}$

ICANN has issued hundreds of new gTLDs since 2014. Addresses like user@startup.technology or contact@my.company are perfectly valid. A pattern with {2,3} silently drops them.

2. Blocking the + character in the local part

# Bad: rejects tag+filter@gmail.com
^[a-zA-Z0-9._-]+@

# Good: + is included
^[a-zA-Z0-9._%+-]+@

Plus-addressing works on Gmail, Outlook, Fastmail, and many other providers. Technically savvy users rely on it. Block the + and you cut off part of your audience.

3. Missing anchors ^ and $

# Bad: matches email-like substring anywhere
/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/

# Good: validates the entire string
/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/

Without anchors the regex finds an email-shaped substring inside arbitrary text. “buy cheap meds user@spam.com right now” passes validation.

4. Case-sensitive domain check

The domain part of an email is case-insensitive by standard. User@GMAIL.COM and user@gmail.com route to the same mailbox. Without the /i flag or explicit A-Z in the character class, you'll reject addresses with uppercase letters in the domain.

5. Trying to validate everything in one expression

Putting syntax, local-part length (max 64 characters), domain length (max 253 characters), no consecutive dots, no leading/trailing dot — all into one expression — produces a 300-character regex nobody can read, test, or fix. Split the checks into stages:

function validateEmail(email) {
  if (!email || !email.includes("@")) return false;
  const [local, domain] = email.split("@");
  if (local.length > 64 || domain.length > 253) return false;
  if (local.startsWith(".") || local.endsWith(".")) return false;
  if (local.includes("..")) return false;
  const re = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
  return re.test(email);
}

Each stage is testable in isolation. Each returns a clear failure reason. The regex stays simple and does what it's good at: checking allowed characters.

Pattern comparison

What passes and what fails at each level:

Address	Naive	Practical	RFC 5322
user@example.com	pass	pass	pass
tag+test@gmail.com	pass	pass	pass
user@sub.domain.co.uk	pass	pass	pass
"john doe"@example.com	pass	fail	pass
user@[192.168.1.1]	pass	fail	pass
missing-at.com	fail	fail	fail
user@domain	pass	fail	fail
user @exam ple.com	pass	fail	fail

The naive pattern is too permissive. The RFC pattern is too complex for typical use. The practical pattern hits the right balance for most web applications.

Libraries instead of hand-rolled regex

For production code, tested and maintained libraries are worth the dependency:

# JavaScript / TypeScript
npm install zod
# or
npm install validator

# Python
pip install email-validator

# Go
go get github.com/badoux/checkmail

Zod has become the de facto validation standard in TypeScript projects:

import { z } from "zod";
const schema = z.object({ email: z.string().email("Invalid email address") });
const result = schema.safeParse({ email: "user@example.com" });
if (!result.success) console.log(result.error.issues);

Python's email-validator goes beyond syntax: it checks DNS records for the domain and returns the normalized address form.

from email_validator import validate_email, EmailNotValidError
try:
    info = validate_email("user@example.com", check_deliverability=True)
    normalized = info.normalized
except EmailNotValidError as e:
    print(str(e))

Libraries solve the syntax problem. Even with DNS checking, they still can't tell you whether the mailbox is active, whether it's a spam trap, or whether it will accept mail next week.

Why regex alone isn't enough for real validation

test@example.com passes every regex. Syntactically it's perfect. But example.com is reserved by IANA for documentation and doesn't accept mail. Regex has no way to know that.

Real email validation has several layers that regex physically cannot handle:

MX record check. Does the domain have a mail server? That requires a DNS query.
SMTP handshake. Is the server responding? Does it accept this specific address? That requires an actual connection.
Catch-all domains. Some servers accept any address at their domain. user123456@company.com passes SMTP, but that doesn't mean anyone reads that mailbox.
Disposable domains. Addresses on mailinator.com, guerrillamail.com, and hundreds of similar services are syntactically valid but useless for mailing lists.
Spam traps. Addresses set up specifically to catch spammers. Hitting one damages the sender domain's reputation.

Each of those layers needs server-side logic, network requests, and up-to-date databases. Regex operates on text, not infrastructure.

Email regex is spell-check. Necessary, but it doesn't verify facts. A syntactically correct address can be dead, disposable, or a trap.

A working strategy: regex + validation service

A solid email validation pipeline looks like this:

User enters email
        │
[1] HTML5 input type="email"     ← browser, free
        │
[2] Client-side regex            ← instant feedback
        │
[3] Server-side regex + length   ← bot protection
        │
[4] Validation service (API)     ← MX, SMTP, disposable, traps
        │
Address added to database

The first three layers cut 70-80% of garbage instantly. The fourth handles the rest server-side. Together they give validation that actually protects the database.

Client-side regex wired to a server validation API:

function onSubmit(email) {
  const re = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
  if (!re.test(email)) { showError("Check the email format"); return; }
  fetch("/api/validate-email", { method: "POST", body: JSON.stringify({ email }) })
    .then(res => res.json())
    .then(data => {
      if (data.status === "valid") proceedWithRegistration(email);
      else showError(data.reason);
    });
}

On the backend, /api/validate-email calls a validation service that checks MX, SMTP, catch-all status, and disposable domain lists, then returns a result with a risk level. Regex handled its part on the frontend. Everything else is infrastructure.

Regex checks the format. Checking whether an address is real requires a validation service. uChecker checks MX records, SMTP, catch-all domains, disposable providers, and spam traps. 30 free checks to get started.

email regex validationregex email patternRFC 5322 emailemail validation JavaScriptemail validation Pythonemail validation GoSMTP verificationMX record check

← All articles