uCheckeruChecker
Blog/Verification
12 min read

Python email validation: libraries, code examples and pitfalls

Your registration form happily accepts "test@@gmial.con". The CRM accumulates thousands of dead addresses. Campaigns go nowhere and the bounce rate climbs. Any Python developer who has been told to "add email validation" knows this story. Here are the specific libraries, working code, and an explanation of why regex alone is not enough.

Three levels of email validation

Email validation is not one operation but three sequential steps. Each is more accurate than the last and costs more in time and resources.

Syntax check confirms the string matches RFC 5321 format: a local part, an @ sign, and a valid domain. It cuts obvious garbage, empty strings, spaces, double dots, addresses missing a domain. No network requests needed; it runs instantly.

DNS check queries the domain's MX records. If the domain has no mail server, there is nowhere to deliver the message. This catches typos in domains (gmial.com, yaho.com) and invented domains.

SMTP check connects to the mail server and asks whether a specific mailbox exists. The most accurate method, but with caveats: servers can lie (catch-all domains accept everything), block frequent requests, and require TLS.

In practice, the first two levels handle most use cases. SMTP checks make sense when accuracy is critical, such as before a bulk send.

email-validator: syntax and DNS in one library

The email-validator library (by Joshua Tauberer) has become the de-facto standard for email validation in Python projects. It checks syntax against RFC 5321/5322, normalizes the address (removes extra dots, lowercases the domain), and optionally verifies DNS.

Install:

pip install email-validator

Basic example: validate a single address with DNS check:

from email_validator import validate_email, EmailNotValidError

def check_email(address: str) -> dict:
    """Validates email: syntax + DNS."""
    try:
        result = validate_email(address, check_deliverability=True)
        return {
            "valid": True,
            "normalized": result.normalized,
            "local_part": result.local_part,
            "domain": result.domain,
            "mx": result.mx,
        }
    except EmailNotValidError as e:
        return {"valid": False, "error": str(e)}

# Examples
print(check_email("User@Example.COM"))
# {'valid': True, 'normalized': 'User@example.com', ...}

print(check_email("bad@@broken"))
# {'valid': False, 'error': 'An @ sign may not ...'}

print(check_email("user@nonexistent-domain.xyz"))
# {'valid': False, 'error': 'The domain name ... does not exist.'}

The check_deliverability=True parameter turns on DNS validation: the library queries the domain's MX and A records. If neither resolves, the address is rejected. For registration forms where response time matters, you can skip this and run it asynchronously instead.

Normalization is a less obvious but useful feature. The library lowercases the domain, converts internationalized domains to ASCII-compatible encoding (Punycode), and removes extra dots. This prevents duplicates in your database: "User@Gmail.com" and "user@gmail.com" become the same address.

If you only need syntax validation without network requests (in unit tests, for example):

result = validate_email(
    "user@example.com",
    check_deliverability=False,
)
print(result.normalized)  # user@example.com

Manual DNS lookup with dns.resolver

Sometimes you need full control over DNS queries: a custom timeout, a specific DNS server, per-request logging. That is where dnspython fits in.

pip install dnspython

The function below fetches a domain's MX records sorted by priority. When MX is absent, it falls back to the A record: per RFC 5321, a server at the A record can still accept mail:

import dns.resolver

def get_mail_servers(domain: str, timeout: float = 5.0) -> list[str]:
    """Returns list of mail servers for the domain."""
    resolver = dns.resolver.Resolver()
    resolver.timeout = timeout
    resolver.lifetime = timeout

    # Try MX records
    try:
        mx_records = resolver.resolve(domain, "MX")
        servers = sorted(mx_records, key=lambda r: r.preference)
        return [str(r.exchange).rstrip(".") for r in servers]
    except (dns.resolver.NoAnswer, dns.resolver.NXDOMAIN):
        pass
    except dns.resolver.NoNameservers:
        return []

    # Fallback to A record
    try:
        a_records = resolver.resolve(domain, "A")
        return [str(r) for r in a_records]
    except Exception:
        return []

# Example
print(get_mail_servers("gmail.com"))
# ['gmail-smtp-in.l.google.com', 'alt1.gmail-smtp-in.l.google.com', ...]

print(get_mail_servers("nonexistent-domain.xyz"))
# []

An empty list is a reliable sign the domain does not accept mail. MX records being present, though, does not guarantee a specific mailbox exists. That requires the next level: SMTP.

SMTP check with smtplib

SMTP validation simulates the start of a mail delivery. You connect to the mail server, send EHLO, MAIL FROM, RCPT TO commands, and read the response to determine whether the mailbox exists. No email is actually sent: the connection closes before the DATA command.

import smtplib
import dns.resolver

def smtp_check(email: str, timeout: int = 10) -> dict:
    """Checks whether a mailbox exists via SMTP.

    Returns dict with fields: exists, code, message.
    """
    domain = email.split("@")[-1]

    # 1. Get MX server
    try:
        mx = dns.resolver.resolve(domain, "MX")
        mail_server = str(sorted(mx, key=lambda r: r.preference)[0].exchange).rstrip(".")
    except Exception:
        return {"exists": False, "code": 0, "message": "No MX record"}

    # 2. Connect via SMTP
    try:
        smtp = smtplib.SMTP(timeout=timeout)
        smtp.connect(mail_server, 25)
        smtp.ehlo("verify.example.com")
        smtp.mail("check@verify.example.com")
        code, msg = smtp.rcpt(email)
        smtp.quit()

        return {
            "exists": code == 250,
            "code": code,
            "message": msg.decode(),
        }
    except smtplib.SMTPServerDisconnected:
        return {"exists": None, "code": 0, "message": "Server disconnected"}
    except smtplib.SMTPConnectError:
        return {"exists": None, "code": 0, "message": "Connection refused"}
    except Exception as e:
        return {"exists": None, "code": 0, "message": str(e)}

# Example
result = smtp_check("user@gmail.com")
print(result)
# {'exists': False, 'code': 550, 'message': '5.1.1 ... not found'}

Response code 250 means the mailbox exists. 550 means the mailbox was not found. 450 or 451 is a temporary error worth retrying later.

SMTP validation is a powerful tool, not a universal one. Catch-all servers return 250 for any address. Gmail and Outlook aggressively throttle frequent checks. At scale, SMTP validation does not work without IP rotation and serious infrastructure behind it.

SMTP validation pitfalls

On the surface SMTP validation looks like the perfect approach: ask the server directly, get a definitive answer. The reality is messier.

Catch-all domains. Corporate servers are often configured to accept mail for any address in the domain. The RCPT TO command returns 250 for "asdfkjh@company.com" the same as for "ceo@company.com". You can detect catch-all by sending a request to a provably invalid address in the same domain. If the server responds 250, the domain is catch-all.

Greylisting. Some servers deliberately reject the first connection from an unfamiliar IP (code 450) and wait for a retry after a few minutes. Real mail servers retry; spammers usually do not. For a validator this produces a false negative on the first attempt.

Rate limiting. Gmail, Outlook, and Yahoo restrict SMTP connections from a single IP. After a few dozen checks you will hit a temporary block. Validating a thousand addresses from one server takes hours, not minutes.

Port 25 blocked. Most cloud providers (AWS, GCP, Azure) block outbound connections to port 25 by default. Code that works on your laptop may not work in production.

Legal and ethical considerations. Bulk SMTP probing can look like scanning. Not every mail administrator will take it well. In some jurisdictions these checks may fall under spam legislation.

Why regex for email is a bad idea

The temptation to write r"^[\w.-]+@[\w.-]+\.\w+$" is real. One line, no dependencies, works "for most cases". The problem is that the email standard (RFC 5321/5322) permits constructs that no simple regex can handle.

A fully RFC-compliant regular expression runs to several kilobytes. It is impossible to maintain, hard to debug, and still cannot check DNS or SMTP. The address user@nonexistent-domain-12345.com passes any regex.

The practical advice: use regex only as a rough pre-filter (has @, is not empty) and delegate real validation to a dedicated library.

import re

# Simple regex - only as a fast pre-filter
BASIC_PATTERN = re.compile(r"^[^@\s]+@[^@\s]+\.[^@\s]+$")

def quick_filter(email: str) -> bool:
    """Cuts obvious garbage before calling full validation."""
    return bool(BASIC_PATTERN.match(email))

# Examples
print(quick_filter("user@example.com"))   # True
print(quick_filter("not-an-email"))       # False
print(quick_filter("user@fake.xyz"))      # True - regex cannot see the problem

The last example shows the limitation: regex let through an address with a non-existent domain. A DNS lookup is the only way to catch that.

Full pipeline: syntax + DNS + SMTP

In practice it makes sense to combine all three levels into a single function. Each stage only runs if the previous one passed:

from email_validator import validate_email, EmailNotValidError
import dns.resolver
import smtplib

def full_validate(email: str) -> dict:
    """Three-level validation: syntax, DNS, SMTP."""

    # Level 1: syntax
    try:
        info = validate_email(email, check_deliverability=False)
        normalized = info.normalized
        domain = info.domain
    except EmailNotValidError as e:
        return {"level": "syntax", "valid": False, "error": str(e)}

    # Level 2: DNS
    try:
        mx = dns.resolver.resolve(domain, "MX")
        mail_server = str(
            sorted(mx, key=lambda r: r.preference)[0].exchange
        ).rstrip(".")
    except Exception:
        return {"level": "dns", "valid": False, "error": "No MX record"}

    # Level 3: SMTP
    try:
        smtp = smtplib.SMTP(timeout=10)
        smtp.connect(mail_server, 25)
        smtp.ehlo("verify.example.com")
        smtp.mail("check@verify.example.com")
        code, msg = smtp.rcpt(normalized)
        smtp.quit()

        return {
            "level": "smtp",
            "valid": code == 250,
            "code": code,
            "normalized": normalized,
        }
    except Exception as e:
        return {
            "level": "smtp",
            "valid": None,
            "error": f"SMTP error: {e}",
            "normalized": normalized,
        }

# Examples
print(full_validate("User@Gmail.com"))
# {'level': 'smtp', 'valid': True, 'code': 250, 'normalized': 'User@gmail.com'}

print(full_validate("test@@broken"))
# {'level': 'syntax', 'valid': False, 'error': '...'}

This pipeline works for one-off checks and small lists. For batch processing, add async support ( aiosmtplib instead of smtplib), a connection pool, and retry logic with exponential backoff.

Bulk validation: when you have thousands of addresses

Local SMTP validation of ten thousand addresses runs into hard limits fast. Servers block your IP, cloud hosts restrict outbound traffic on port 25, and each connection takes several seconds.

For bulk work, an API is more practical. Here is an example using the uChecker API: upload a file, start a task, poll for status:

import requests
import time
import os

API_KEY = os.environ["UCHECKER_API_KEY"]
BASE = "https://api.uchecker.net"
HEADERS = {"x-api-key": API_KEY}

def bulk_validate(file_path: str) -> dict:
    """Uploads a file and waits for validation results."""

    # 1. Upload the file
    with open(file_path, "rb") as f:
        resp = requests.post(
            f"{BASE}/api/v1/validate/bulk",
            headers=HEADERS,
            files={"file": f},
        )
    task = resp.json()
    task_id = task["task_id"]
    print(f"Task created: {task_id}")

    # 2. Wait for completion
    while True:
        status = requests.get(
            f"{BASE}/api/v1/tasks/{task_id}",
            headers=HEADERS,
        ).json()

        if status["status"] == "completed":
            break
        if status["status"] == "failed":
            return {"error": status.get("message", "Task failed")}

        print(f"Progress: {status.get('progress', 0)}%")
        time.sleep(5)

    # 3. Fetch results
    result = requests.get(
        f"{BASE}/api/v1/tasks/{task_id}/result",
        headers=HEADERS,
    ).json()

    return result

# Usage
result = bulk_validate("emails.csv")
print(f"Valid: {result['valid_count']}, Invalid: {result['invalid_count']}")

The API handles IP rotation, greylisting bypass, catch-all detection, and dozens of other edge cases. Instead of building SMTP infrastructure, you send one POST request and get back clean results.

Approach comparison

MethodAccuracySpeedLimitations
RegexLow< 1 msNo domain or mailbox check
email-validatorMedium50-200 msNo mailbox check
SMTP (smtplib)High2-10 secCatch-all, blocking, port 25
API (uChecker)High1-3 secRequires API key

Which approach to choose

Registration form. Use email-validator with check_deliverability=True. Syntax plus DNS catches most errors, runs fast, and needs no external services.

Cleaning an existing list. For a few hundred addresses, combine email-validator with smtplib. For thousands or tens of thousands, use a validation API. Local SMTP at that scale drowns in blocks and timeouts.

CI/CD pipeline. If you validate emails in automated tests or during deployment, syntax-only is enough: check_deliverability=False. DNS and SMTP requests will slow tests and make them flaky.

Real-time in production. For real-time checks on every incoming address (webhook, API endpoint, Telegram bot), use a validation API over HTTP. One request, a response in under a second, no SMTP infrastructure to maintain.

The general rule: the larger the list and the stricter the accuracy requirement, the more it makes sense to hand validation off to a dedicated service. Rolling your own SMTP checker is a good learning exercise and a workable solution for small volumes. At production scale with thousands of addresses, it creates more problems than it solves.

Check your list with uChecker — 30 free checks to get a sense of your list quality. An API key is issued immediately after sign-up.

python email validationemail-validator pythonvalidate email pythonpython SMTP checkpython regex emaildns.resolver pythonemail validation