In this tutorial, we will learn about the HTML character escapes. Furthermore, the tutorial will introduce which characters need to be escaped in HTML.
Introduction to Character Escapes in HTML
We use character escapes in markup languages like HTML, XML, and XHTML to represent the Unicode characters using the ASCII characters.
Character escapes arise when we need to represent characters like
>, used in markup languages. If we do not escape these characters, the markup rules will be interfered with, and we will not achieve the desired output.
Various Unicode characters can be escaped in HTML. Some of the characters that can be escaped are
The list of HTML entities can be found here. These characters can be represented mainly in two ways.
Those are numeric character references and named character references. The numeric character references can be represented as decimal and hexadecimal numeric character references.
Let’s look at the following example, which shows the representation of a no-break space using the different character references.
<p> Hi Jack ! </p> <!-- named character references --> <p> Hi Jack ! </p> <!-- hexadecimal numeric character references --> <p> Hi Jack ! </p> <!-- decimal numeric character references -->
Hi Jack ! Hi Jack ! Hi Jack !
As seen in the example above, the named character references, hexadecimal numeric character references, and decimal numeric character references of the non-breaking-space are
, respectively. We can notice that the numeric character references start with the
&# characters and end with
In HTML, there are various contexts where some characters should be escaped. Some of the contexts are the document body and inside attributes, style and script tags, etc.
Characters That Must Be Escaped in HTML
This section will discuss the characters that must be escaped in HTML. Three characters should not be missed to escape in HTML. Those are:
The markup languages like HTML and XML constitute greater and smaller than symbols,
>, also called tax wrappers. We should escape these syntax wrappers in the document body; otherwise, the markup syntax will be interfered with.
The name character reference of the syntax wrapper is shown below.
Here, we will see the conditions where the characters are not escaped and how we can escape the characters.
In the example below, in the first
<a> tag, we have written the text
the <a> tag between the
As a result, the hyperlink is applied only to the character
the because it has been enclosed with two
But, this is not our goal. Our goal is to display the
<a> tag only. So it is necessary to escape the syntax wrappers around the
Therefore, we used the
> character references to escape the syntax wrappers. The
<a> reference represents the
As a result, in the second
<a> tag in the example below, a hyperlink is applied to the whole text,
the <a> tag. This is why we should escape the syntax wrappers in HTML.
<a href="#"> the <a> tag </a> <br> <a href="#"> the <a> tag </a>
the tag the <a> tag
The ampersand symbol is used as the first character to write the reference character of the Unicode characters while escaping the characters. But, if we have to show the reference character of a particular Unicode character in HTML, we need to escape the ampersand symbol.
The example is shown below.
<p> The character reference of the symbol < is &lt; </p>
The character reference of < is <
Our goal is to show the equivalent reference character of the
<. But when we write the reference
<, it is converted into
< in the browser.
To represent the name reference character, we should escape the ampersand symbol in the reference
<. After escaping ampersand, we can write the remaining characters as usual, as shown in the example above.
This article taught us why we should escape characters and what characters should be escaped in HTML. We also learned about the name character references and numeric character references.