Why You Shouldn't Validate Emails with Regex
Why email regex fails: it rejects valid addresses, accepts undeliverable ones, and can trigger ReDoS. Use HTML5 email input or a library instead.
A regex cannot validate an email address — it can only check rough shape, and even the most carefully crafted pattern will both reject legitimate addresses and accept syntactically plausible ones that will never deliver. The reason isn’t that you haven’t found the right pattern yet; it’s that the question “is this email valid?” conflates three different problems, and a regex can only ever touch one of them — the least useful one. This article separates those three problems, cites what the specs actually say (RFC 5321, RFC 5322, RFC 6531, and the WHATWG HTML Living Standard), shows exactly where popular patterns break in both directions, and gives you the pragmatic JS/TS code to use instead.
Key Takeaways
- Email validation has three distinct layers — a UX sanity check, syntactic validation against RFC 5321/5322, and existence verification — and only a confirmation email proves the address actually receives mail.
- The HTML Living Standard’s
<input type="email">uses a regex the spec itself labels a “willful violation of RFC 5322”; it is deliberately not RFC-complete and is a better default than any pattern you’ll write by hand. - Popular email regexes fail in both directions: they reject real addresses (plus-addressing, new gTLDs, quoted local parts, internationalized addresses per RFC 6531) and accept undeliverable ones.
- A backtracking-prone email regex run server-side in Node.js can be exploited with a short crafted input to block the event loop — a ReDoS denial-of-service vector (CWE-1333).
- The one length constraint worth enforcing before any pattern runs comes from RFC 5321 §4.5.3.1.3: the address itself is capped at 254 characters.
The three layers of email validation
Email validation has three distinct layers: a UX sanity check (does it look like an email?), syntactic validation (does it conform to RFC 5321/5322 rules?), and existence verification (does this mailbox actually receive mail?). Only the third layer proves the address works, and only a confirmation email delivers it. The references that tell you to “stop using regex” are right, but they blur these layers together. Keeping them separate is what tells you which tool belongs where.
- Layer 1 — UX sanity check. A cheap, fast, client-side check that catches obvious typos (
alicegmail.com, a trailing space) and gives immediate feedback. This is the only layer where a regex belongs, and even here you want the smallest pattern that does the job. - Layer 2 — syntactic validation. Does the string conform to the grammar in the email RFCs? This is far harder than it looks, defeats hand-written regex, and — critically — proves nothing about deliverability. A perfectly RFC-conformant address can point at a domain that doesn’t exist.
- Layer 3 — existence verification. Does a real mailbox receive mail at this address? The only proof that an email address works is a successfully delivered message; a confirmation email does in one step what no regex can do at all.
The mistake nearly every “ultimate email regex” makes is trying to do Layer 2 perfectly, when Layer 2 doesn’t answer the question anyone actually cares about. The question is Layer 3, and no pattern reaches it.
What “valid” actually means
Discover how at OpenReplay.com.
A valid email address is far more permissive than most regexes assume, because the grammar in RFC 5321 (SMTP) and RFC 5322 (the message format) allows constructs that look broken. The local part — everything before the @ — can contain a long list of special characters and can even be a quoted string.
The unquoted local part is built from atext, defined in RFC 5322 §3.2.3, which permits these characters alongside letters and digits:
! # $ % & ' * + - / = ? ^ _ ` { | } ~
That means user+tag@example.com (plus-addressing) is valid — the + is ordinary atext, per RFC 5321 §4.1.2. Whether the receiving server treats the +tag as a subaddress is implementation-specific (RFC 5233), but the address itself is well-formed. The local part can also be a quoted string: "user name"@example.com is valid per RFC 5321 §4.1.2 and RFC 5322 §3.2.4, spaces and all. The domain can be an IP address literal in brackets — user@[192.168.1.1] is valid per RFC 5321 §4.1.3.
There is one constraint worth enforcing cheaply. RFC 5321 §4.5.3.1.3 caps the forward-path at 256 octets including the angle brackets, which leaves 254 characters for the address itself; the local part is capped at 64 octets (§4.5.3.1.1) and the domain at 255 (§4.5.3.1.2). A length check is the one validation a string comparison handles correctly and a regex doesn’t need to.
Internationalized addresses (EAI)
Internationalized email addresses defined in RFC 6531 — such as 用户@例子.广告 — are valid and increasingly common; no ASCII-only regex handles them, and this is a library problem, not a regex problem. EAI (RFC 6531 §3.3) extends the local part to allow UTF-8, and the domain can be non-ASCII Unicode. This is distinct from IDNA punycode-encoded domains (RFC 5891): EAI covers the local part too. Any pattern that assumes [a-zA-Z0-9] for the local part is wrong for a growing slice of the world’s users, and there is no single regex that correctly accepts both ASCII and Unicode local parts without also accepting garbage.
Why email validation regex fails in both directions
A hand-written email regex fails as both a gatekeeper and a filter: it produces false negatives (rejecting deliverable addresses) and false positives (accepting addresses that conform to the grammar but will never receive mail). Both failure modes ship to production constantly because the test suite uses test@example.com, which passes every pattern.
Take the canonical Stack Overflow copy-paste — a pattern that requires a 2-to-4-character TLD:
// A common copy-pasted pattern. Do not use this.
const bad = /^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$/;
Here is what it does to real addresses:
| Address | What it tests | This regex | Correct result | Why it’s wrong |
|---|---|---|---|---|
name+filter@gmail.com | plus-addressing | ✅ accepts | ✅ valid | (passes here, but stricter patterns reject +) |
user@studio.photography | long gTLD | ❌ rejects | ✅ valid | {2,4} rejects TLDs longer than 4 chars |
"user name"@example.com | quoted local part | ❌ rejects | ✅ valid | quoted strings and spaces are valid |
用户@例子.广告 | EAI (RFC 6531) | ❌ rejects | ✅ valid | ASCII-only character classes |
someone@validformat.test | nonexistent domain | ✅ accepts | ❌ undeliverable | syntax is fine; the domain doesn’t resolve |
A regex that rejects name+filter@gmail.com (plus-addressing) or user@studio.photography (a gTLD delegated under ICANN’s New gTLD Program, with .photography added to the root in 2013) isn’t being strict — it’s being wrong. Both are syntactically plausible addresses using valid email features. The {2,4} TLD constraint alone breaks .photography, .accountants, .engineering, and hundreds of other valid delegations.
Session replays frequently reveal users encountering validation errors, correcting their input multiple times, and abandoning the form. Studies of form usability have consistently identified validation friction as a contributor to abandonment and reduced conversion rates.
False positives — addresses that pass the regex yet never deliver — are just as real. someone@validformat.test passes the pattern above and most others, yet .test is a reserved TLD (RFC 2606) that will never deliver. Syntactic conformance and deliverability are independent properties, and a regex only ever sees the first.
ReDoS: when the regex is the vulnerability
A backtracking-prone email regex run server-side in Node.js can be exploited with a crafted input to block the event loop — a denial-of-service vector (CWE-1333: Inefficient Regular Expression Complexity, and the OWASP ReDoS reference) that has nothing to do with email and everything to do with catastrophic backtracking. Patterns with nested or adjacent quantifiers over overlapping character classes can take exponential time on inputs that almost match.
Here is a reproducible demonstration. The pattern’s (...)+ wraps a group that can match the same character in multiple ways, so a long run of one character followed by a non-matching character forces the engine to try exponentially many partitions before failing:
// Node.js v24. Run with: node redos.js
// A deliberately vulnerable, backtracking-prone pattern.
const evil = /^([a-zA-Z0-9]+)*@example\.com$/;
// A crafted near-match: many 'a's, then a character that breaks the match.
const attack = "a".repeat(40) + "!";
console.time("redos");
evil.test(attack); // hangs the event loop
console.timeEnd("redos");
On a current Node.js build, increasing the repeat count makes the match time grow explosively — each added character roughly doubles the work. Because Node’s regex engine runs synchronously on the main thread, a single request carrying this input stalls the event loop and blocks every other request in flight. The (x+)* shape is the tell: any group that can match the same substring in more than one way, under an outer quantifier, is a candidate for catastrophic backtracking. The fix is not a cleverer pattern — it’s not building this class of pattern at all, which is exactly what delegating to the platform or a maintained library buys you.
Syntax is not deliverability
Even a perfectly RFC-conformant address tells you nothing about whether mail will arrive. A regex cannot check that the domain exists, that it has MX records, that the mailbox is provisioned, or that the address isn’t a disposable throwaway. These are network and policy questions, not grammar questions. An address like realuser@gmail.com and a typo’d realuser@gmial.com are both syntactically valid; only a DNS lookup distinguishes them, and only an actual delivery distinguishes a live mailbox from a dead one.
Disposable and temporary-email domains are a related, separate concern: addresses that are syntactically and operationally valid but exist to evade your signup. Detecting them requires a maintained blocklist of provider domains, not a pattern — the domain list changes constantly, and any list you hard-code goes stale. Treat this as a policy layer on top of validation, not part of it.
What to do instead
Use the layered approach: a minimal sanity check for UX, the platform’s built-in validation for syntax, a maintained library only when you need more, and a confirmation email for the one thing that actually matters. Here is the order, from cheapest to authoritative.
1. A minimal sanity check
For instant client-side feedback, the smallest useful pattern is the one from the original “stop validating with regex” argument: require something, an @, something, a dot, and something. Pair it with a length check.
/**
* Layer 1 sanity check: catches obvious typos, nothing more.
* Deliberately permissive — it is NOT proof of validity.
* @param value - the raw input string
* @returns true if the value has the rough shape of an email and is <= 254 chars
*/
export function looksLikeEmail(value: string): boolean {
if (value.length > 254) return false; // RFC 5321 §4.5.3.1.3
return /.+@.+\..+/.test(value);
}
This sanity check rejects alicegmail.com and alice@localhost, accepts plus-addressing and long gTLDs, and runs in constant time. It is not safe to treat its true as “valid” — it’s a typo catcher.
2. Prefer the platform: <input type="email">
The best default for syntactic validation is the browser’s own <input type="email">, and it’s worth knowing exactly what it does. The HTML Living Standard’s <input type="email"> uses a regex the spec itself calls a “willful violation of RFC 5322” — it’s intentionally not RFC-complete, trading spec accuracy for usability, and it’s a better default than any pattern you’ll write yourself. The spec quotes the exact pattern:
^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
Read clause by clause:
[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+— the local part, allowing theatextspecial characters. It deliberately does not support quoted local parts ("user name"@…).@— exactly one separator.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?— a domain label: starts and ends alphanumeric, hyphens allowed inside, capped at 63 characters. It deliberately does not support IP-literal domains (user@[192.168.1.1]).(?:\.<label>)*— zero or more additional dot-separated labels, so single-label and multi-label domains both pass.
The WHATWG documents the tradeoff openly: this pattern rejects some technically valid RFC 5322 addresses (quoted parts, IP literals) on purpose, because those forms are vanishingly rare in real signups and supporting them invites more bugs than they prevent. That is the right tradeoff for a form field, and it’s why <input type="email"> should be your Layer 2 baseline — it has no backtracking pathology and it matches what browsers already enforce.
3. Reach for a maintained library only when you need more
If you need server-side syntactic validation beyond the HTML5 pattern, use a maintained, well-tested library rather than rolling your own. The validator package (npm validator, MIT-licensed) exposes an isEmail function that supports quoted local parts and provides options for IP-literal domains and display names:
import isEmail from "validator/lib/isEmail";
/**
* Layer 2 syntactic validation, server-side.
* @param email - candidate address (already length-checked)
* @returns true if syntactically valid per validator's RFC-aligned rules
*/
export function isSyntacticallyValid(email: string): boolean {
return isEmail(email, { allow_utf8_local_part: true });
}
Prefer this over the older email-validator package, which has not been published since 2018. A library gets you tested edge-case handling and an active maintainer fixing the cases your hand-written pattern never will — including, with the right options, EAI addresses.
4. The real answer: send a confirmation email
The only step that proves an address works is delivery. Send a confirmation message with a one-time link; treat the address as verified only after the user clicks it. This is double opt-in, and it makes elaborate upstream validation redundant — a malformed or undeliverable address simply never confirms.
/**
* Sketch of the verification flow. Storage and mailer are app-specific.
* @param email - a string that already passed length + syntax checks
*/
async function startEmailVerification(email: string): Promise<void> {
const token = crypto.randomUUID();
await storePendingVerification(email, token); // expires after, e.g., 24h
const link = `https://app.example.com/verify?token=${token}`;
await sendMail(email, "Confirm your email", `Click to confirm: ${link}`);
// Mark the account verified only when /verify is hit with a valid token.
}
Sending a confirmation email is the structure source after source eventually arrives at, for the same reason: it does in one step what no regex can do at all. As Jamie Zawinski put it, “Some people, when confronted with a problem, think, ‘I know, I’ll use regular expressions.’ Now they have two problems.” For email, the second problem is that the regex still didn’t answer the question.
Conclusion
Stop trying to validate the address and start trying to verify the mailbox. Use a minimal pattern plus a 254-character cap for instant UX feedback, lean on <input type="email"> or a maintained library like validator for syntax, and gate every real account behind a confirmation email — that final step is the only one that proves anyone is home. The next time a signup form needs an email field, reach for the platform and the confirmation flow, not the Stack Overflow pattern.
FAQs
What is the maximum valid length of an email address?
An email address is capped at 254 characters. This derives from RFC 5321 section 4.5.3.1.3, which limits the forward-path to 256 octets including the surrounding angle brackets, leaving 254 for the address itself. The local part is separately capped at 64 octets and the domain at 255 octets. A simple length comparison enforces this correctly, which is the one validation worth doing before any pattern runs.
Does the HTML5 email input validate against the full RFC 5322 grammar?
No. The HTML Living Standard explicitly describes its email input regex as a 'willful violation of RFC 5322.' It deliberately rejects technically valid forms like quoted local parts ('user name'@example.com) and IP-literal domains (user@[192.168.1.1]) because those are vanishingly rare in real signups. The tradeoff favors usability over spec-completeness, which makes it a safer default than a hand-written pattern, but it is not a complete RFC validator.
How can an email validation regex cause a denial-of-service attack?
A regex with nested or adjacent quantifiers over overlapping character classes, such as the shape ([a-zA-Z0-9]+)*, can take exponential time on inputs that almost match. This is catastrophic backtracking, classified as CWE-1333. Run server-side in Node.js, where the regex engine executes synchronously on the main thread, a single crafted request can stall the event loop and block every other request in flight. The fix is avoiding this pattern class entirely, not writing a cleverer one.
Can a regex check whether an email address actually exists?
No. A regex only inspects the string's shape; it cannot verify that the domain exists, has MX records, or that the mailbox is provisioned. Syntactic conformance and deliverability are independent properties. An address like realuser@gmial.com is syntactically valid but undeliverable due to a typo, and someone@validformat.test passes most patterns yet uses a reserved TLD that never delivers. Only a successfully delivered confirmation email proves an address receives mail.
Why do email regexes reject valid addresses like name+filter@gmail.com?
Plus-addressing is fully valid because the plus sign is ordinary atext under RFC 5322 section 3.2.3 and RFC 5321 section 4.1.2. Patterns that reject it, along with addresses on long gTLDs like .photography or internationalized addresses defined in RFC 6531, are not being strict but wrong. These false negatives reach production because test suites use test@example.com, which passes every pattern, so the rejection of real addresses never surfaces in tests.