Pandas is a very robust and optimized library that helps handle data efficiently, perform operations over that data with ease, and analyze it with all sorts of plots and graphs. Common operations such as merging, concatenation, and reshaping can be performed very easily using pandas.
Pandas also supports reading files and storing the data inside the files into various objects such as arrays and
DataFrames. One such file is
.xlsx files are Microsoft Excel Open XML Format Spreadsheet files that are compressed and XML based. This article will talk about how to read
.xlsx files using pandas.
.xlsx Files Using
pandas in Python
.xlsx files using
pandas, we can use the
read_excel() function. This function reads an excel file into a
pandas Dataframe. And, we can use this function to read
odt files. Since excel files can contain multiple sheets, this function can read a single and multiple sheets.
read_excel() function has a couple of parameters. We will discuss some main ones.
io: This parameter can be
path object, or
file-like object. It is essentially the file or the object that has to be read. One can also pass a valid file path in the form of a string to this parameter.
sheet_name: This parameter can be
None. The default value is
0. Since excel files are made up of sheets, we can specify a single sheet or a bunch of sheets to read. We can not only use integer indexes (
[1, 3, 7]) but also string names to specify the sheets (
["Sheet4", "Sheet5", "Sheet8"]).
header: The row that should be considered as the header for the excel file. The default value is
0because the first row is usually the header and considers zero-based indexing. If there is no header, set it to
names: We can specify what columns we wish to read from the excel file. It accepts a list of column names. The default value is
None. Since column names are found inside the header, the file should have one. Otherwise, just set
header = None.
Now let us see how we can use this function to read a
.xlxs file. Refer to the following code.
file_name = "my_file.xlsx" # File name sheet_name = 3 # 4th sheet header = 1 # The header is the 2nd row df = pd.read_excel(file_name=file_name, sheet_name=sheet_name, header=header) print(df.head()) # Prints first 5 rows from the top along with the header print(df.tail()) # Prints first 5 rows from the bottom along with the header