How to Encode HTML in PHP

Habdul Hazeez Feb 02, 2024
  1. Encode With htmlspecialchars()
  2. Encode With htmlentities()
  3. Encode With htmlentities() and HTML5 Encoding
  4. Encode With A Custom Method
How to Encode HTML in PHP

HTML encoding is an attempt to prevent cross-site scripting XSS in PHP web applications when processing user-supplied data. This tutorial will teach you how to encode data with htmlentities(), htmlspecialchars(), and a custom method.

Encode With htmlspecialchars()

PHP htmlspecialchars() is a built-in function that can convert special characters to HTML entities. The syntax is as follows:

htmlspecialchars( $string, $flags, $encoding, $double_encode )

Explanation of the parameters:

  • $string: The input string
  • $flags: The flags that dictate how the function should handle quotes in the string
  • $encoding: Specifies the encoding used by the function. This parameter is optional
  • $double_encode: A Boolean attribute that dictates if PHP will encode existing entities. If you set it to false, PHP will not encode existing entities

Like all functions, htmlspecialchars() returns a value. Its value is the converted string. But, if the function considers the string as invalid, it will return an empty string.

The next example shows how to convert a string with htmlspecialchars(). You’ll observe that the function is not used with any flags.

<?php
    $stringToEncode = "A <b>bold text</b> a'nd á <script>alert();</script> tag";

    $encodedString = htmlspecialchars($stringToEncode);

    echo $encodedString;
?>

Output:

A <b>bold text</b> a'nd á <script>alert();</script> tag

When you view the source of the web page, you’ll observe that the apostrophe and the á characters are not encoded:

A &lt;b&gt;bold text&lt;/b&gt; a'nd á &lt;script&gt;alert();&lt;/script&gt; tag

Now, if you supply a flag and encoding format to htmlspecialchars(), the apostrophe gets encoded, but á is not.

<?php
    $stringToEncode = "A <b>bold text</b> a'nd á <script>alert();</script> tag";

    $encodedString = htmlspecialchars($stringToEncode, ENT_QUOTES, 'UTF-8');
    
    echo $encodedString;
?>

Output:

A <b>bold text</b> a'nd á <script>alert();</script> tag

View source of the page shows the browser encodes the apostrophe as &#039;:

A &lt;b&gt;bold text&lt;/b&gt; a&#039;nd á &lt;script&gt;alert();&lt;/script&gt; tag

Encode With htmlentities()

The htmlentites() is also a built-in PHP function. With htmlentities(), all applicable characters are converted to HTML entities. Its syntax is as follows:

htmlentities( $string, $flags, $encoding, $double_encode )

The following is an explanation of the parameters:

  • $string: The input string
  • $flags: The flags that dictate how the function should handle quotes in the string
  • $encoding: Specifies the encoding used by the function. This parameter is optional
  • $double_encode: A Boolean attribute that dictates if PHP will encode existing entities. If you set it to false, PHP will not encode existing entities

The return value for this function is the encoded string.

The following is an example of converting a string with htmlentities(). Here htmlentities() is not used with any flag.

<?php    
    $stringToEncode = "A <b>bold text</b> ánd a <script>alert();</script> tag's";

    $ecodedString = htmlentities($stringToEncode);

    echo $ecodedString;
?>

Output:

A <b>bold text</b> ánd a <script>alert();</script> tag's

The view source of the page shows that the function encodes the á character without any flag, but the apostrophe is not encoded.

A &lt;b&gt;bold text&lt;/b&gt; &aacute;nd a &lt;script&gt;alert();&lt;/script&gt; tag's

A change to the code will allow the function to encode the apostrophe.

<?php
    $stringToEncode = "A <b>bold text</b> ánd a <script>alert();</script> tag's";

    $ecodedString = htmlentities($stringToEncode, ENT_QUOTES, 'UTF-8');

    echo $ecodedString;
?>

Output:

A <b>bold text</b> ánd a <script>alert();</script> tag's

View source of the page:

A &lt;b&gt;bold text&lt;/b&gt; &aacute;nd a &lt;script&gt;alert();&lt;/script&gt; tag&#039;s

Encode With htmlentities() and HTML5 Encoding

When you have non-English characters in your string, you can use the HTML 5 flag and the UTF-8 encoding.

The HTML5 flag instructs the function to treat the string as HTML5, and the UTF-8 flag allows the function to understand any standard Unicode character.

The following is an example of how to use htmlentities() with an HTML5 flag and UTF-8 encoding:

<?php
    $stringToEncode = "àéò ©€ ♣♦ ↠ ↔↛ āžšķūņ ↙ ℜ℞ ∀∂∋ rūķīš ○";

    $ecodedString = htmlentities($stringToEncode, ENT_HTML5, 'UTF-8');
    
    echo $ecodedString;
?>

View source of the page:

&agrave;&eacute;&ograve; &copy;&euro; &clubs;&diamondsuit;
&twoheadrightarrow; &harr;&nrarr; &amacr;&zcaron;&scaron;
&kcedil;&umacr;&ncedil; &swarr; &Rfr;&rx; &forall;&part;&ReverseElement;
r&umacr;&kcedil;&imacr;&scaron; &cir;

Encode With A Custom Method

If you want to roll your encoding scheme, a custom method can come in handy. This method will take your input string and apply some string manipulation. In the end, you get an encoded string.

The following HTML has a text area and a single submit button. The form action points to a file that will encode the string passed into the form input.

<main>
    <h1>Enter and HTML code and click the submit button</h1>
    <form action='encodedoutput.php' method='post'>
        <div class="form-row">
            <textarea rows='15' cols='50' name='texttoencode' required></textarea>
        </div>
        <div class="form-row">
            <input type='submit'>
        </div>
    </form>
</main>

The next code block is the PHP code that will perform the encoding. Save it as encodedoutput.php.

<?php
    if (isset($_POST['texttoencode']) && !empty($_POST)) {
        // Check for empty text
        if ($_POST['texttoencode'] == "") {
            echo "Invalid text";
            die();
        }

        $inputHTML = bin2hex($_POST['texttoencode']); 
        $spiltHTML = chunk_split($inputHTML, 2 ,"%");
        $HTMLStringLength = strlen($spiltHTML);
        $HTMLSubLength = $HTMLStringLength - 1;
        $HTMLSubString = substr($spiltHTML,'0', $HTMLSubLength);

        $encodedOutput="<script>document.write(unescape('%$HTMLSubString'));</script>";

    } else {
        echo "Not allowed";
        die();
    }
?>

<textarea rows='15' cols='60'>
    <?php
        if ($encodedOutput) {
            echo $encodedOutput;
        } else {
            echo "";
            die();
        }
    ?>
</textarea>

Sample Output for <script>alert("Hello world");</alert>:

<script>document.write(unescape('%3c%73%63%72%69%70%74%3e%61%6c%65%72%74%28%22%48%65%6c%6c%6f%20%77%6f%72%6c%64%22%29%3b%3c%2f%61%6c%65%72%74%3e'));</script>
Habdul Hazeez avatar Habdul Hazeez avatar

Habdul Hazeez is a technical writer with amazing research skills. He can connect the dots, and make sense of data that are scattered across different media.

LinkedIn

Related Article - PHP Encode