How to Read XLSX Files Using Python and Pandas

Vaibhav Vaibhav Feb 02, 2024
How to Read XLSX Files Using Python and Pandas

Pandas is a very robust and optimized library that helps handle data efficiently, perform operations over that data with ease, and analyze it with all sorts of plots and graphs. Common operations such as merging, concatenation, and reshaping can be performed very easily using pandas.

Pandas also supports reading files and storing the data inside the files into various objects such as arrays and DataFrames. One such file is .xlsx. .xlsx files are Microsoft Excel Open XML Format Spreadsheet files that are compressed and XML based. This article will talk about how to read .xlsx files using pandas.

Reading .xlsx Files Using pandas in Python

To read .xlsx files using pandas, we can use the read_excel() function. This function reads an excel file into a pandas Dataframe. And, we can use this function to read xlsx, xls, xlsm, xlsb, odf, ods, and odt files. Since excel files can contain multiple sheets, this function can read a single and multiple sheets.

The read_excel() function has a couple of parameters. We will discuss some main ones.

  • io: This parameter can be str, bytes, ExcelFile, xlrd.Book, path object, or file-like object. It is essentially the file or the object that has to be read. One can also pass a valid file path in the form of a string to this parameter.
  • sheet_name: This parameter can be str, int, list, or None. The default value is 0. Since excel files are made up of sheets, we can specify a single sheet or a bunch of sheets to read. We can not only use integer indexes (2, 0, [1, 3, 7]) but also string names to specify the sheets (Sheet1, Sheet3, ["Sheet4", "Sheet5", "Sheet8"]).
  • header: The row that should be considered as the header for the excel file. The default value is 0 because the first row is usually the header and considers zero-based indexing. If there is no header, set it to None.
  • names: We can specify what columns we wish to read from the excel file. It accepts a list of column names. The default value is None. Since column names are found inside the header, the file should have one. Otherwise, just set header = None.

Now let us see how we can use this function to read a .xlxs file. Refer to the following code.

file_name = "my_file.xlsx"  # File name
sheet_name = 3  # 4th sheet
header = 1  # The header is the 2nd row
df = pd.read_excel(file_name=file_name, sheet_name=sheet_name, header=header)
print(df.head())  # Prints first 5 rows from the top along with the header
print(df.tail())  # Prints first 5 rows from the bottom along with the header
Vaibhav Vaibhav avatar Vaibhav Vaibhav avatar

Vaibhav is an artificial intelligence and cloud computing stan. He likes to build end-to-end full-stack web and mobile applications. Besides computer science and technology, he loves playing cricket and badminton, going on bike rides, and doodling.

Related Article - Python Excel