How to Escape HTML in Java

Zeeshan Afridi Feb 02, 2024
  1. How to Escape HTML Tags
  2. How to Escape HTML in Java
How to Escape HTML in Java

This article explains how to escape HTML characters and symbols in Java. We can use the Apache commons-text and StringEscapeUtils.escapeHtml4(str) methods to escape HTML symbols and characters in Java.

How to Escape HTML Tags

We identify the tags and their characteristics to avoid and escape HTML tags in your Java program. Let’s say we have a <head> tag; we know that something that starts with < and ends with > will be a tag in a specific scenario.

So we can utilize these characteristics of HTML tags and escape HTML tags. To understand it better, let’s look at the example below.

<html lang="en-US">
<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <link rel="shortcut icon" href="https://www.w3schools.com/images/w3schools_green.jpg" type="image/x-icon">
</head>

In the example above, we have multiple HTML tags like <link>, <HTML>, <head>, and <meta>. These tags are HTML tags, each with special meaning to the browser.

To understand this better, you can go to any webpage, right-click on the page and select Inspect to see the webpage’s structure, which is made of different HTML tags.

In HTML, every tag is enclosed in less than < and greater than > symbols. So it is important to notice that these <, > symbols have some specific meaning, and if you use the HTML entity names instead of those HTML characters in a specific code, the browser will not hide the tags, but rather it will displace the actual text instead of interpreting it.

So replace < with the entity name &lt. And replace > with the entity name &gt.

&lt;html lang=&quot;en-US&quot;&gt;
&lt;head&gt;
    &lt;meta http-equiv=&quot;content-type&quot; content=&quot;text/html; charset=utf-8&quot; /&gt;
    &lt;link rel=&quot;shortcut icon&quot; href=&quot; https://www.w3schools.com/images/w3schools_green.jpg &quot; type=&quot;image/x-icon&quot;&gt;
&lt;/head&gt;

Now that we have understood about escaping HTML let’s understand how to escape HTML in Java.

How to Escape HTML in Java

As discussed at the beginning of this guide, we will utilize Apache, a third-party service. It is a software foundation named after a Native American tribe from the Southwestern part of the US.

The developers of Apache built software for Apache that introduced some very useful and helpful tools to expedite the development process.

One of these useful tools is used to escape HTML in a string. All you need to do is include the dependency in your pom.xml file.

Import Commons-Text Dependency to Use StringEscapeUtils in Java

To use StringEscapeUtils, you must import the commons-text dependencies.

<dependency>
	<groupId>org.apache.commons</groupId>
	<artifactId>commons-text</artifactId>
	<version>3.12</version>
</dependency>
  1. Insert this dependency in your POM and then proceed as follows.
  2. The methods we need to use to escape HTML in Java are StringEscapeUtils.escapeHtml4() and StringEscapeUtils.unescapeHtml4().
  3. Write this code in your Java compiler.
String html = "<html lang=\"en-US\">\r\n"
    + "<head>\r\n"
    + "    <meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\" />\r\n"
    + "    <link rel=\"shortcut icon\" href=\" https://www.w3schools.com/images/w3schools_green.jpg \" type=\"image/x-icon\">\r\n"
    + "</head>";

// This is used to escape html
String escapedOutput = StringEscapeUtils.escapeHtml4(html);
System.out.println(escapedOutput); // printing the output

The String html is just an HTML code snippet we used above in example 1.

The core thing in this program is StringEscapeUtils.escapeHtml4(html) which is responsible for escaping HTML in this context. The StringEscapeUtils class has different methods, but we will utilize the escapeHtml4().

Now, if you try to run this code mentioned above, you’ll see the escaped output you saw in this article’s first section.

Get the Original Unescaped Data in Java

Using the same class of Apache StringEscapeUtils, we can easily unescape the string to its original form. And to do so, you must use the following code in your Java compiler.

String html = "<html lang=\"en-US\">\r\n"
    + "<head>\r\n"
    + "    <meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\" />\r\n"
    + "    <link rel=\"shortcut icon\" href=\" https://www.w3schools.com/images/w3schools_green.jpg \" type=\"image/x-icon\">\r\n"
    + "</head>";

String escapedOutput = StringEscapeUtils.escapeHtml4(html);
String original = StringEscapeUtils.unescapeHtml4(escapedOutput);
System.out.println(original);

In the above code, we use the unescapeHtml4() method of StringEscapeUtils class right after escapeHtml4() to convert the escaped data into unescaped data.

Run the above code, and you will get this output.

<html lang="en-US">
<head>
	<meta http-equiv="content-type" content="text/html; charset=utf-8" />
	<link rel="shortcut icon"
href=" https://www.w3schools.com/images/w3schools_green.jpg " type="image/x-icon"> </head>
Zeeshan Afridi avatar Zeeshan Afridi avatar

Zeeshan is a detail oriented software engineer that helps companies and individuals make their lives and easier with software solutions.

LinkedIn