How to Fix Double-Encoded HTML Entities

Try the HTML Entity Encoder
Article illustration: How to Fix Double-Encoded HTML Entities

If your page is showing &amp; where an & should be, or &lt; as visible text instead of a <, you are looking at double-encoded HTML entities. The content was encoded twice, and the browser only undoes one layer. This guide explains exactly why it happens, how to confirm it, and how to fix both the symptom and the cause.

The fastest way to confirm the diagnosis: paste the broken text into the HTML entity decoder. If decoding it once gives you something that still contains entities (like &amp;), and decoding that gives clean text, you have double-encoding.


What Double-Encoding Actually Is

HTML encoding replaces & with &amp;. The crucial detail is that the ampersand is itself a special character — so when you encode already-encoded text, the & in &amp; gets encoded again into &amp;amp;.

Walk through it with the ampersand in "Tom & Jerry":

Raw text:          Tom & Jerry
Encoded once:      Tom &amp; Jerry      → browser shows: Tom & Jerry  ✅
Encoded twice:     Tom &amp;amp; Jerry  → browser shows: Tom &amp; Jerry  ❌

When the browser renders &amp;amp;, it decodes the outer &amp; back to &, leaving the literal text amp; behind. That is why you see &amp; on screen: it is the half-decoded remains of a double-encoded ampersand.

The same thing happens to every entity:

<  →  &lt;  →  &amp;lt;   → page shows the text "&lt;"
"  →  &quot; →  &amp;quot; → page shows the text "&quot;"
©  →  &copy; →  &amp;copy; → page shows the text "&copy;"

So the tell-tale signs are: literal &amp; on the page, or literal &lt; / &gt; / &quot; / &copy; appearing as text instead of as the symbol they name.


Why It Happens

Double-encoding is almost always a pipeline problem — two stages each doing their job, unaware the other already did it.

1. Encoding on both store and display

The classic cause. Code encodes user input before saving it to the database, and the template encodes again on output. Each step is individually correct; together they double-encode. The right model is: store raw text, encode only at output.

2. A CMS / framework that auto-escapes, plus manual escaping

Modern templating engines (Twig, Blade, Jinja, React) auto-escape by default. If you also pass already-escaped text into them, you get two layers. In WordPress, functions like esc_html() applied to content that a plugin or the editor already escaped produce the same result.

3. WordPress and shortcodes / migrations

WordPress is a frequent offender. Pasting from the visual editor, certain plugins, importing/exporting content, or migrating a database with a tool that re-escapes can each add a layer. A particularly common case: an importer runs htmlentities on content that already contained entities.

4. Form round-trips

A value is displayed in a form field (encoded once so it is safe in the value attribute), the user submits it unchanged, and the server encodes it again on save. Repeat across a few edits and you can even get triple-encoding (&amp;amp;amp;).

5. APIs and JSON layers

Data is HTML-encoded by a backend, sent through a JSON API, then encoded again by a frontend that assumes it received raw text. Each layer thinks it is the one responsible for escaping.


How to Detect It

Eyeball test

Look for &amp; rendered literally on the page, or &lt;, &gt;, &quot;, &nbsp;, &copy; appearing as visible text. A symbol's name showing up where the symbol belongs is the signature of double-encoding.

View source vs. rendered

View the page source. If you see &amp;amp; or &amp;lt; in the raw HTML, it is confirmed — the source literally contains the doubly-encoded sequence.

Decode test with the tool

Paste the affected text into the decoder. Count how many passes it takes to reach clean text:

  • One pass → clean: correctly encoded, no problem.
  • Two passes → clean: double-encoded.
  • Three passes → clean: triple-encoded (yes, it happens).

Database check

Query the raw stored value. If &amp;amp; is sitting in the database itself, the corruption happened on write and is now persisted — that changes the fix (see below).


How to Fix It

There are two distinct jobs: clean the data that is already broken, and stop it from happening again. Do both, in that order is fine, but never skip the second.

Fix 1: Decode the existing content

If the data is double-encoded, decode it one extra time. Programmatically:

// PHP — decode twice to peel both layers
$clean = html_entity_decode(
  html_entity_decode($text, ENT_QUOTES | ENT_HTML5, 'UTF-8'),
  ENT_QUOTES | ENT_HTML5, 'UTF-8'
);
# Python
import html
clean = html.unescape(html.unescape(text))
// JavaScript (browser) — run the decode twice
const decode = s => new DOMParser()
  .parseFromString(s, 'text/html').documentElement.textContent;
const clean = decode(decode(text));

For a one-off cleanup of a snippet, just run it through the decoder tool twice.

Be careful: only decode as many times as it was over-encoded. Blindly decoding in a loop "until no entities remain" will corrupt content that legitimately contains an entity-like sequence, and can re-introduce the very < characters you were trying to neutralize — an XSS risk. Determine the exact depth first.

Fix 2: Repair the source so it stops recurring

Decoding the stored data is a one-time patch. If the pipeline still double-encodes, the problem returns on the next save. Find the duplicate encode and remove one of them:

  • Store raw, encode on output. Remove any htmlspecialchars / htmlentities / esc_html that runs before saving to the database. Keep the one in your template.
  • Trust your template engine. Twig, Blade, Jinja, and React auto-escape. Do not pre-escape values you hand to them — pass raw text and let the engine escape once.
  • WordPress: avoid double esc_* on content. If the editor or a plugin already escaped it, do not escape again on output. Be cautious with import/migration tools that re-encode.
  • API boundaries: decide one layer owns escaping. Typically the backend stores and returns raw text, and the rendering layer (the thing that actually writes HTML) escapes.

Fix 3: Bulk-clean a database (carefully)

If corruption is already persisted across many rows, write a one-time migration that decodes the affected columns the correct number of times. Always:

  1. Back up first.
  2. Test the transformation on a copy.
  3. Decode by the exact measured depth, not in an open-ended loop.
  4. Spot-check rows that legitimately contain & or < to be sure you have not over-decoded.

A Note on Under-Encoding (the opposite problem)

Sometimes the issue is the reverse: raw < or & reaches the browser unescaped and breaks the layout, or worse, executes injected markup. The fix there is to encode (once!) on output. If your page renders a stray < as the start of a tag that swallows the rest of your content, you are under-encoded, not over-encoded — encode it with the encoder tool or htmlspecialchars/html.escape in code.


FAQ

Why does my website show &amp; instead of &?

Because the text was HTML-encoded twice. The & became &amp;, then that & became &amp;amp;. The browser decodes one layer back to &amp; and displays it literally. Decode the stored value one extra time, then remove the duplicate encoding step in your pipeline so it does not recur.

How do I know if my content is double-encoded or just normally encoded?

Paste it into the decoder. If a single decode pass produces clean, entity-free text, it was encoded correctly. If after one pass the result still contains &amp; or other entities and a second pass is needed, it was double-encoded.

Why does WordPress double-encode my content?

Usually because two things escape the same content: the visual editor or a plugin escapes on input, and the theme escapes again on output — or a migration/import tool re-encoded existing entities. Find and remove one of the encode steps. Avoid running esc_html() / htmlentities() on content that has already been escaped.

Is it safe to just decode in a loop until no entities are left?

No. Looping "until clean" can over-decode content that legitimately contains entity-like text and can resurrect raw < characters, creating an XSS vulnerability. Measure exactly how many times the content was over-encoded and decode precisely that many times.

Can text be triple-encoded?

Yes. Each round-trip through a form field or a re-encoding migration can add another layer, giving &amp;amp;amp;. The detection method is the same: count how many decode passes the tool needs to reach clean text, then decode the data by exactly that depth.

My page shows raw < and the layout breaks — is that double-encoding?

No, that is the opposite: under-encoding. A raw < is being parsed as the start of a tag. Encode the output once with htmlspecialchars (PHP), html.escape (Python), the he library or DOM trick (JavaScript), or the encoder, so the < becomes &lt; and renders as text.

Encode HTML Entities Instantly

Encode and decode HTML entities with named, decimal, and hex output, Unicode support, and client-side processing.

Open HTML Entity Encoder