Read HTML Table in a Pandas DataFrame

Read HTML Table in a Pandas DataFrame

  1. Use the read_html() Method to Read HTML Tables in a Pandas DataFrame
  2. Use the read_html() Method to Read HTML Table From a URL
  3. Use the read_html() Method to Read HTML Table From a String
  4. Use the read_html() Method to Read HTML Table From a File

This tutorial will demonstrate how to read HTML tables from a URL, string, or file and convert them into a Pandas dataframe in Python. The read_html() is a quick and handy method of the Pandas library, used to scrape HTML tables in a Pandas data frame.

Use the read_html() Method to Read HTML Tables in a Pandas DataFrame

The read_html() method takes the URL of the website, HTML string, or HTML text file as an argument. It scrapes all the tables and returns them in the form of a list of data frames because a website, string, or file can contain multiple tables.

If no table exists, the code gives a ValueError: No tables found.

Install lxml in Python

lxml is a library in Python used for data scraping, handling, and processing HTML and XML data. Before we use the read_html() method, we have to install lxml using the following command and restart the kernel if we are using Jupyter Notebook.

#Python 3.x
pip install lxml

Use the read_html() Method to Read HTML Table From a URL

We will pass the web site’s URL as an argument in the read_html() method to read all the tables and store them into the Pandas dataframe. We can use the len() method with the dataframe to count the number of tables returned.

Here, we get a list of two tables. If we want to access the first table, we will access it through its index 0 in the list.

Example Code:

#Python 3.x
import pandas as pd
tables = pd.read_html('https://www.w3schools.com/html/html_tables.asp')
print('No of tables returned:', len(tables))
display(tables[0])

Output:

Pandas Read HTML From URL - Output

Use the read_html() Method to Read HTML Table From a String

In the following code, we have an HTML table in the form of a string stored in the table variable. To convert the table into a Pandas dataframe, we will call the read_html() method and pass the HTML string as an argument.

There is only one table in the HTML string, so the length of the list of dataframe is 1. We will display the table by accessing it using its index.

Example Code:

#Python 3.x
import pandas as pd
table='''<table>
    <thead>
        <tr>
            <th>Name</th>
            <th>Department</th>
            <th>Marks</th>
            <th>Age</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Robert</td>
            <td>CS</td>
            <td>60</td>
            <td>20</td>
        </tr>
        <tr>
            <td>Sam</td>
            <td>SE</td>
            <td>81</td>
            <td>21</td>
        </tr>
        <tr>
            <td>Alia</td>
            <td>SE</td>
            <td>79</td>
            <td>20</td>
        </tr>
    </tbody>
</table>'''
df_table=pd.read_html(table)
display(df_table[0])

Output:

Pandas Read HTML From String - Output

Use the read_html() Method to Read HTML Table From a File

We will read the HTML table stored in a text file in a Pandas dataframe through file handling. First, we put the text file that contains the table in the current directory.

Or, if we are using a Jupyter notebook, we have to upload the text file in the home directory. Then we will read the text file through open() and pass the filename and r as mode because we will read a file.

We will extract the file contents in the Pandas dataframe through the read() method.

Example Code:

#Python 3.x
import pandas as pd
table_path = 'table.txt'
with open(table_path, 'r') as f:
    df_table = pd.read_html(f.read())
display(df_table[0])

Output:

Pandas Read HTML From File - Output

Author: Fariba Laiq
Fariba Laiq avatar Fariba Laiq avatar

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

LinkedIn

Related Article - Pandas DataFrame

  • Get Pandas DataFrame Column Headers as a List
  • Delete Pandas DataFrame Column
  • Convert Pandas Column to Datetime
  • Convert a Float to an Integer in Pandas DataFrame
  • Sort Pandas DataFrame by One Column's Values
  • Get the Aggregate of Pandas Group-By and Sum