Email content filtering: how spam filters analyze your messages
Content filtering is what happens when a mail server inspects an incoming message before deciding where to put it: inbox, spam, or outright reject. The filter reads the text, HTML markup, links, attachments, and headers, runs them through a ruleset or a machine-learning model, and produces a score. Cross the threshold, and the message goes to spam.
What a content filter actually checks
Modern spam filters do not hunt for one bad word. They evaluate dozens of signals at once and add up the points. The main categories:
Text and phrasing. Filters track word and phrase frequency. "Free," "act now," "earn money from home," "click here" each add points to the spam score. One trigger phrase in context is usually fine. Ten of them packed into a subject line and preheader is a problem. Filters also flag excessive capitals, strings of exclamation marks, and Unicode substitutions like replacing the letter "o" with a zero to slip past simpler pattern matchers.
HTML markup. Malformed or suspicious HTML raises the score. Common culprits: hidden text (font-size: 0, or text colored to match the background), tag soup that looks machine-generated, and a poor text-to-image ratio. A message that is a single image with no text gets a high score because the filter cannot read what is inside the picture.
Links. Every URL in the message is checked against blacklists: SURBL, URIBL, and Spamhaus DBL are the common ones. Shortened links (bit.ly, tinyurl) add suspicion because they hide the real destination. Many different domains in a single message also raise eyebrows.
Headers. The filter checks that From, Reply-To, and Return-Path are consistent with each other. A mismatch between the visible sender address and the actual envelope sender is a classic forgery signal. DKIM signature, SPF record, and DMARC result are all examined here too.
Attachments. Executable files (.exe, .bat, .scr) are blocked by nearly every provider. Password-protected archives raise suspicion. Even a plain PDF can push the score up if the filter cannot parse its contents.
How SpamAssassin works
SpamAssassin is the most widely deployed open-source filter. It runs a message through hundreds of individual rules, each carrying a positive or negative weight. The weights sum to a final score. Above 5.0 and the message is marked as spam; above 10 and many servers block it outright. Rules cover everything from body text to technical headers.
Gmail, Outlook, and Yahoo use proprietary filters built on machine learning. These look beyond content: they factor in whether recipients open mail from this sender, whether they drag it out of spam, whether they reply. Content filtering at large providers is one layer of a stack that also includes sender reputation and engagement analysis.
Writing email that passes content filters
- Avoid clustering trigger words. One "free" in a pricing section is fine. Three in the subject line plus two in the preheader is not.
- Keep text-to-image ratio at least 60/40. Do not send a message that is a single image.
- Use clean, valid HTML. Avoid inline styles that hide content.
- Do not shorten links. Use your own tracking domain with HTTPS and a solid sending reputation.
- Include a plain-text version (multipart/alternative). Filters treat text-only-readable messages more favorably.
- Check your spam score before sending. Tools like mail-tester.com or GlockApps show exactly which rules fire on your message.
Content and sender reputation together
Content filtering does not work in isolation. A sender with strong IP and domain reputation gets some leeway on minor content issues. A sender with a poor reputation will see even clean content land in spam. Content is one factor. A healthy subscriber list, proper authentication (SPF, DKIM, DMARC), and consistent engagement matter just as much.
uChecker reduces the pressure on content filters from a different angle: validating your list removes invalid addresses, spam traps, and role-based mailboxes. A cleaner recipient list means better sender reputation, which gives your content more room to pass filters without changes.
