Encoding HTML entities means turning characters that have special meaning in HTML — &, <, >, ", ' — into their safe entity equivalents so they display as text instead of being parsed as markup. Decoding reverses it. This guide gives you the idiomatic, correct way to do both in JavaScript, PHP, and Python, plus the specific traps in each.
First, the rule that makes all of this matter: encode on output, into the right context. The reason you escape < to < is not cosmetic — it is the core defense against cross-site scripting (XSS). Get the function and the context right and untrusted text is rendered harmlessly.
If you just need to convert a string once, skip the code and paste it into the HTML entity encoder/decoder. For anything programmatic, read on.
JavaScript
JavaScript has no single built-in htmlencode() function, which surprises people. The approach depends on whether you are in a browser or in Node.
Encoding in the browser
The smallest correct encoder uses the DOM to do the escaping for you:
function encodeHTML(str) {
const div = document.createElement('div');
div.textContent = str;
return div.innerHTML;
}
encodeHTML('<script>alert("x")</script>');
// "<script>alert("x")</script>"
Setting textContent and reading back innerHTML lets the browser escape &, <, and > correctly. Note it does not escape quotes — that is fine for text content, but not enough for attribute values. For attributes, escape manually:
function encodeHTMLAttribute(str) {
return str
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
The order matters: & must be replaced first, otherwise you double-encode the ampersands you introduce in the later replacements.
Decoding in the browser
The classic "textarea trick" decodes entities by letting the browser parse them:
function decodeHTML(str) {
const txt = document.createElement('textarea');
txt.innerHTML = str;
return txt.value;
}
decodeHTML('Tom & Jerry © 2025');
// "Tom & Jerry © 2025"
A safer, modern alternative that does not rely on element side effects is DOMParser:
function decodeHTML(str) {
return new DOMParser()
.parseFromString(str, 'text/html')
.documentElement.textContent;
}
Important: never decode untrusted HTML by assigning it to innerHTML of a live element and then reading the rendered result — that can execute scripts and event handlers. The textarea and DOMParser techniques above are safe because the content is never inserted into the live document.
Node.js
Node has no DOM, so use a library. The de-facto standard is he (named after "HTML entities"):
const he = require('he');
he.encode('<foo> & "bar"'); // '<foo> & "bar"'
he.decode('Tom & Jerry'); // 'Tom & Jerry'
// Encode everything to numeric references:
he.encode('café', { encodeEverything: true });
he is spec-compliant for all 2,000+ named references, which hand-rolled replacements never are.
PHP
PHP has first-class built-ins, but three of them are easy to confuse: htmlspecialchars, htmlentities, and html_entity_decode.
Encoding
// Escapes ONLY the five HTML-significant characters: & < > " '
$safe = htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');
// Escapes EVERY character that has a named entity (é, ©, etc.) too
$safe = htmlentities($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');
Always pass the flags explicitly. ENT_QUOTES escapes both single and double quotes — the default only escapes double quotes, which leaves single-quoted attributes vulnerable. Always state the encoding ('UTF-8'); relying on the default has caused security bugs across PHP versions.
htmlspecialchars vs htmlentities
This is the most common PHP question, and the answer is simple: use htmlspecialchars almost always.
htmlspecialcharsencodes only& < > " '. With a UTF-8 document — which is everything today — every other character (accents, symbols, emoji) renders correctly as its raw self. This is what you want for escaping output.htmlentitiesadditionally converts accented and special characters into named entities likeé. This bloats your output and is only useful when targeting a legacy non-UTF-8 encoding.
In a modern UTF-8 codebase, htmlentities solves a problem you do not have. Reach for htmlspecialchars.
Decoding
// Reverses htmlentities (and htmlspecialchars) — decodes ALL entities:
$text = html_entity_decode($encoded, ENT_QUOTES | ENT_HTML5, 'UTF-8');
// Reverses ONLY the five special characters:
$text = htmlspecialchars_decode($encoded, ENT_QUOTES);
Use html_entity_decode when the input might contain named entities like © or ; use htmlspecialchars_decode when you only ever encoded the five basics and want a faster, narrower reversal.
Python
Python keeps it minimal and correct with the standard-library html module — no third-party package needed.
Encoding
import html
html.escape('<a href="x">Tom & Jerry</a>')
# '<a href="x">Tom & Jerry</a>'
# Leave quotes alone (text content, not attributes):
html.escape('Tom & Jerry', quote=False)
# 'Tom & Jerry'
html.escape() escapes & < > always, and " ' as well unless you pass quote=False. The default quote=True is the safe choice — keep it on unless you have a specific reason not to.
Decoding
import html
html.unescape('Tom & Jerry © 2025')
# 'Tom & Jerry © 2025'
html.unescape('café €100')
# 'café €100'
html.unescape() understands the full set of named references plus decimal and hex numeric references, so it reverses anything a browser would. It replaced the deprecated HTMLParser.unescape() from Python 2 — if you find that in old code, swap it for html.unescape.
Side-by-Side Cheat Sheet
| Task | JavaScript | PHP | Python |
|---|---|---|---|
| Encode (basics) | he.encode() / DOM trick |
htmlspecialchars() |
html.escape() |
| Encode (everything) | he.encode(s,{encodeEverything:true}) |
htmlentities() |
— (rarely needed) |
| Decode | he.decode() / DOMParser |
html_entity_decode() |
html.unescape() |
| Decode basics only | textarea trick | htmlspecialchars_decode() |
html.unescape() |
Cross-Cutting Gotchas
- Encode the ampersand first. In any hand-written encoder, replace
&before<,>, etc., or you will double-encode. This is the number-one cause of&lt;showing up in output. See How to Fix Double-Encoded HTML Entities. - Encode once, at output. Storing pre-encoded text in your database and then encoding again at render time is exactly how double-encoding happens. Store raw text; encode only when writing into HTML.
- Match the context. Escaping for HTML text is different from escaping for an HTML attribute, a URL, or a JavaScript string. Use
ENT_QUOTES/quote=Truewhenever the value lands inside an attribute. - Do not decode untrusted HTML into the live DOM. In the browser, decode via
textarea/DOMParser, never by assigning untrusted markup toinnerHTMLof an attached element.
For a no-code check of any of these, the encoder/decoder tool shows you the exact named, decimal, and hex output so you can confirm your code is producing what you expect.
FAQ
What is the difference between htmlspecialchars and htmlentities in PHP?
htmlspecialchars encodes only the five HTML-significant characters (& < > " '). htmlentities additionally converts every character that has a named entity, such as accented letters, into entities. On a UTF-8 site you almost always want htmlspecialchars — htmlentities produces larger output and only matters for legacy non-UTF-8 encodings.
How do I decode HTML entities in JavaScript without a library?
In the browser, create a textarea, set its innerHTML to the encoded string, and read back its value; or use DOMParser().parseFromString(str, 'text/html') and read documentElement.textContent. Both let the browser do spec-correct decoding without inserting the content into the live page. In Node.js, use the he library's he.decode().
Is html.escape in Python enough to prevent XSS?
It handles HTML text and attribute contexts (with the default quote=True) correctly. But XSS prevention is context-specific: data going into a URL, a <script> block, or inline CSS needs different escaping. html.escape is the right tool for HTML text and attributes, not a universal shield for every context.
Why does my output show &amp; instead of &?
The text was encoded twice. The first pass turned & into &; the second pass turned that & into &amp;. Encode exactly once, at output time, and store raw text in your database. The fix and detection steps are covered in How to Fix Double-Encoded HTML Entities.
Which is faster, named or numeric encoding?
The performance difference is negligible for normal payloads. Choose based on output, not speed: named entities (©) are more readable, numeric entities (©) work for characters with no name and in encoding-uncertain contexts. The tool lets you generate either form.
Do I need the he library, or are built-ins enough?
In the browser, the built-in DOM techniques are enough for encoding and decoding. In Node.js there is no DOM, so he (or a similar library) is the standard, spec-complete choice. PHP and Python both ship complete built-ins (htmlspecialchars/html_entity_decode and html.escape/html.unescape) and need no extra package.