How to Escape Characters in HTML

Sushant Poudel Feb 02, 2024
  1. Introduction to Character Escapes in HTML
  2. Characters That Must Be Escaped in HTML
How to Escape Characters in HTML

In this tutorial, we will learn about the HTML character escapes. Furthermore, the tutorial will introduce which characters need to be escaped in HTML.

Introduction to Character Escapes in HTML

We use character escapes in markup languages like HTML, XML, and XHTML to represent the Unicode characters using the ASCII characters.

Character escapes arise when we need to represent characters like <, >, used in markup languages. If we do not escape these characters, the markup rules will be interfered with, and we will not achieve the desired output.

Various Unicode characters can be escaped in HTML. Some of the characters that can be escaped are ", ', <, >, &, etc.

The list of HTML entities can be found here. These characters can be represented mainly in two ways.

Those are numeric character references and named character references. The numeric character references can be represented as decimal and hexadecimal numeric character references.

Let’s look at the following example, which shows the representation of a no-break space using the different character references.

Example Code:

<p> Hi Jack&nbsp;! </p> <!-- named character references -->
<p> Hi Jack&#xA0;! </p> <!-- hexadecimal numeric character references -->
<p> Hi Jack&#160;! </p> <!-- decimal numeric character references -->

Output:

As seen in the example above, the named character references, hexadecimal numeric character references, and decimal numeric character references of the non-breaking-space are &nbsp;, &#xA0;, and &#160;, respectively. We can notice that the numeric character references start with the &# characters and end with ;.

In HTML, there are various contexts where some characters should be escaped. Some of the contexts are the document body and inside attributes, style and script tags, etc.

Characters That Must Be Escaped in HTML

This section will discuss the characters that must be escaped in HTML. Three characters should not be missed to escape in HTML. Those are:

  • (<)
  • (>)
  • (&)

The markup languages like HTML and XML constitute greater and smaller than symbols, < and >, also called tax wrappers. We should escape these syntax wrappers in the document body; otherwise, the markup syntax will be interfered with.

The name character reference of the syntax wrapper is shown below.

  • (<) &amp;lt;
  • (>) &amp;gt;

Here, we will see the conditions where the characters are not escaped and how we can escape the characters.

In the example below, in the first <a> tag, we have written the text the <a> tag between the <a> tags.

As a result, the hyperlink is applied only to the character the because it has been enclosed with two <a> tags.

But, this is not our goal. Our goal is to display the <a> tag only. So it is necessary to escape the syntax wrappers around the a tag.

Therefore, we used the &lt; and &gt; character references to escape the syntax wrappers. The &lt;a&gt; reference represents the <a> tag.

As a result, in the second <a> tag in the example below, a hyperlink is applied to the whole text, the <a> tag. This is why we should escape the syntax wrappers in HTML.

Example Code:

<a href="#"> the <a> tag </a> <br>
<a href="#"> the &lt;a&gt; tag </a> 

Output:

The ampersand symbol is used as the first character to write the reference character of the Unicode characters while escaping the characters. But, if we have to show the reference character of a particular Unicode character in HTML, we need to escape the ampersand symbol &.

The example is shown below.

Example Code:

<p> The character reference of the symbol &lt; is &amp;lt; </p>

Output:

Our goal is to show the equivalent reference character of the < symbol, &lt;. But when we write the reference &lt;, it is converted into < in the browser.

To represent the name reference character, we should escape the ampersand symbol in the reference &lt;. After escaping ampersand, we can write the remaining characters as usual, as shown in the example above.

This article taught us why we should escape characters and what characters should be escaped in HTML. We also learned about the name character references and numeric character references.

Sushant Poudel avatar Sushant Poudel avatar

Sushant is a software engineering student and a tech enthusiast. He finds joy in writing blogs on programming and imparting his knowledge to the community.

LinkedIn

Related Article - HTML Entities