Encode & Decode HTML Entities in JS, PHP & Python

Try the HTML Entity Encoder
Article illustration: Encode & Decode HTML Entities in JS, PHP & Python

Encoding HTML entities means turning characters that have special meaning in HTML — &, <, >, ", ' — into their safe entity equivalents so they display as text instead of being parsed as markup. Decoding reverses it. This guide gives you the idiomatic, correct way to do both in JavaScript, PHP, and Python, plus the specific traps in each.

First, the rule that makes all of this matter: encode on output, into the right context. The reason you escape < to &lt; is not cosmetic — it is the core defense against cross-site scripting (XSS). Get the function and the context right and untrusted text is rendered harmlessly.

If you just need to convert a string once, skip the code and paste it into the HTML entity encoder/decoder. For anything programmatic, read on.


JavaScript

JavaScript has no single built-in htmlencode() function, which surprises people. The approach depends on whether you are in a browser or in Node.

Encoding in the browser

The smallest correct encoder uses the DOM to do the escaping for you:

function encodeHTML(str) {
  const div = document.createElement('div');
  div.textContent = str;
  return div.innerHTML;
}

encodeHTML('<script>alert("x")</script>');
// "&lt;script&gt;alert("x")&lt;/script&gt;"

Setting textContent and reading back innerHTML lets the browser escape &, <, and > correctly. Note it does not escape quotes — that is fine for text content, but not enough for attribute values. For attributes, escape manually:

function encodeHTMLAttribute(str) {
  return str
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');
}

The order matters: & must be replaced first, otherwise you double-encode the ampersands you introduce in the later replacements.

Decoding in the browser

The classic "textarea trick" decodes entities by letting the browser parse them:

function decodeHTML(str) {
  const txt = document.createElement('textarea');
  txt.innerHTML = str;
  return txt.value;
}

decodeHTML('Tom &amp; Jerry &copy; 2025');
// "Tom & Jerry © 2025"

A safer, modern alternative that does not rely on element side effects is DOMParser:

function decodeHTML(str) {
  return new DOMParser()
    .parseFromString(str, 'text/html')
    .documentElement.textContent;
}

Important: never decode untrusted HTML by assigning it to innerHTML of a live element and then reading the rendered result — that can execute scripts and event handlers. The textarea and DOMParser techniques above are safe because the content is never inserted into the live document.

Node.js

Node has no DOM, so use a library. The de-facto standard is he (named after "HTML entities"):

const he = require('he');

he.encode('<foo> & "bar"');   // '&lt;foo&gt; &amp; &quot;bar&quot;'
he.decode('Tom &amp; Jerry'); // 'Tom & Jerry'

// Encode everything to numeric references:
he.encode('café', { encodeEverything: true });

he is spec-compliant for all 2,000+ named references, which hand-rolled replacements never are.


PHP

PHP has first-class built-ins, but three of them are easy to confuse: htmlspecialchars, htmlentities, and html_entity_decode.

Encoding

// Escapes ONLY the five HTML-significant characters: & < > " '
$safe = htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');

// Escapes EVERY character that has a named entity (é, ©, etc.) too
$safe = htmlentities($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');

Always pass the flags explicitly. ENT_QUOTES escapes both single and double quotes — the default only escapes double quotes, which leaves single-quoted attributes vulnerable. Always state the encoding ('UTF-8'); relying on the default has caused security bugs across PHP versions.

htmlspecialchars vs htmlentities

This is the most common PHP question, and the answer is simple: use htmlspecialchars almost always.

  • htmlspecialchars encodes only & < > " '. With a UTF-8 document — which is everything today — every other character (accents, symbols, emoji) renders correctly as its raw self. This is what you want for escaping output.
  • htmlentities additionally converts accented and special characters into named entities like &eacute;. This bloats your output and is only useful when targeting a legacy non-UTF-8 encoding.

In a modern UTF-8 codebase, htmlentities solves a problem you do not have. Reach for htmlspecialchars.

Decoding

// Reverses htmlentities (and htmlspecialchars) — decodes ALL entities:
$text = html_entity_decode($encoded, ENT_QUOTES | ENT_HTML5, 'UTF-8');

// Reverses ONLY the five special characters:
$text = htmlspecialchars_decode($encoded, ENT_QUOTES);

Use html_entity_decode when the input might contain named entities like &copy; or &nbsp;; use htmlspecialchars_decode when you only ever encoded the five basics and want a faster, narrower reversal.


Python

Python keeps it minimal and correct with the standard-library html module — no third-party package needed.

Encoding

import html

html.escape('<a href="x">Tom & Jerry</a>')
# '&lt;a href=&quot;x&quot;&gt;Tom &amp; Jerry&lt;/a&gt;'

# Leave quotes alone (text content, not attributes):
html.escape('Tom & Jerry', quote=False)
# 'Tom &amp; Jerry'

html.escape() escapes & < > always, and " ' as well unless you pass quote=False. The default quote=True is the safe choice — keep it on unless you have a specific reason not to.

Decoding

import html

html.unescape('Tom &amp; Jerry &copy; 2025')
# 'Tom & Jerry © 2025'

html.unescape('caf&eacute; &#8364;100')
# 'café €100'

html.unescape() understands the full set of named references plus decimal and hex numeric references, so it reverses anything a browser would. It replaced the deprecated HTMLParser.unescape() from Python 2 — if you find that in old code, swap it for html.unescape.


Side-by-Side Cheat Sheet

Task JavaScript PHP Python
Encode (basics) he.encode() / DOM trick htmlspecialchars() html.escape()
Encode (everything) he.encode(s,{encodeEverything:true}) htmlentities() — (rarely needed)
Decode he.decode() / DOMParser html_entity_decode() html.unescape()
Decode basics only textarea trick htmlspecialchars_decode() html.unescape()

Cross-Cutting Gotchas

  • Encode the ampersand first. In any hand-written encoder, replace & before <, >, etc., or you will double-encode. This is the number-one cause of &amp;lt; showing up in output. See How to Fix Double-Encoded HTML Entities.
  • Encode once, at output. Storing pre-encoded text in your database and then encoding again at render time is exactly how double-encoding happens. Store raw text; encode only when writing into HTML.
  • Match the context. Escaping for HTML text is different from escaping for an HTML attribute, a URL, or a JavaScript string. Use ENT_QUOTES / quote=True whenever the value lands inside an attribute.
  • Do not decode untrusted HTML into the live DOM. In the browser, decode via textarea/DOMParser, never by assigning untrusted markup to innerHTML of an attached element.

For a no-code check of any of these, the encoder/decoder tool shows you the exact named, decimal, and hex output so you can confirm your code is producing what you expect.


FAQ

What is the difference between htmlspecialchars and htmlentities in PHP?

htmlspecialchars encodes only the five HTML-significant characters (& < > " '). htmlentities additionally converts every character that has a named entity, such as accented letters, into entities. On a UTF-8 site you almost always want htmlspecialcharshtmlentities produces larger output and only matters for legacy non-UTF-8 encodings.

How do I decode HTML entities in JavaScript without a library?

In the browser, create a textarea, set its innerHTML to the encoded string, and read back its value; or use DOMParser().parseFromString(str, 'text/html') and read documentElement.textContent. Both let the browser do spec-correct decoding without inserting the content into the live page. In Node.js, use the he library's he.decode().

Is html.escape in Python enough to prevent XSS?

It handles HTML text and attribute contexts (with the default quote=True) correctly. But XSS prevention is context-specific: data going into a URL, a <script> block, or inline CSS needs different escaping. html.escape is the right tool for HTML text and attributes, not a universal shield for every context.

Why does my output show &amp;amp; instead of &?

The text was encoded twice. The first pass turned & into &amp;; the second pass turned that & into &amp;amp;. Encode exactly once, at output time, and store raw text in your database. The fix and detection steps are covered in How to Fix Double-Encoded HTML Entities.

Which is faster, named or numeric encoding?

The performance difference is negligible for normal payloads. Choose based on output, not speed: named entities (&copy;) are more readable, numeric entities (&#169;) work for characters with no name and in encoding-uncertain contexts. The tool lets you generate either form.

Do I need the he library, or are built-ins enough?

In the browser, the built-in DOM techniques are enough for encoding and decoding. In Node.js there is no DOM, so he (or a similar library) is the standard, spec-complete choice. PHP and Python both ship complete built-ins (htmlspecialchars/html_entity_decode and html.escape/html.unescape) and need no extra package.

Encode HTML Entities Instantly

Encode and decode HTML entities with named, decimal, and hex output, Unicode support, and client-side processing.

Open HTML Entity Encoder