Reshape Pandas DataFrame From Wide to Long

Reshape Pandas DataFrame From Wide to Long

  1. Wide Data Format vs. Long Data Format in Pandas
  2. Uses of the Wide Data Format
  3. Uses of the Long Data Format
  4. Use pd.melt() to Reshape Pandas DataFrame From Wide to Long Format
  5. Use pd.unstack() to Reshape Pandas DataFrame From Wide to Long Format
  6. Use pd.wide_to_long() to Reshape Pandas DataFrame From Wide to Long Format

Reshaping the Pandas dataframe is one of the most used data-wrangling tasks in data analysis. It is also addressed as transporting, unpivoting/pivoting a table from wide to long.

In this tutorial, we will learn the difference between wide and long data formats, which will lead to their uses, followed by different code examples demonstrating how to reshape the Pandas dataframe from wide to long.

Wide Data Format vs. Long Data Format in Pandas

We can have a dataset in two formats - either it would be wide or long. The primary difference between wide and long data formats is given below.

  1. Wide Data Format - The values in the first column do not repeat.
  2. Long Data Format - The values in the first column do repeat.

Let’s take the example data frames containing lab and theory exam marks for groups A, B, C, and D to understand both formats clearly.

reshape pandas data frame from wide to long - wide vs long data format

As we can see, both data frames have the same information but in different formats.

When to use which data format to represent the dataset? It depends on what we want to do with the data.

Uses of the Wide Data Format

The wide data format is used to record real-world data because it is easy to understand for our brains. We also use this data format if we are analyzing data.

Let’s take the previous example of having marks for lab and theory exams.

If we want to calculate the average of lab and theory exams per group, it would be easier to keep data in the wide format because it would be easy to read lab and theory values for each group in the same record (row).

See the following screenshot.

reshape pandas data frame from wide to long - use of wide data format

Uses of the Long Data Format

We primarily use long data format when we visualize multiple variables in one plot using some statistical tool, for instance, R programming language.

We must have to convert wide to long data format to let the software create the plot, for instance, plot multiple columns, create heatmap, etc. Sometimes, we also need to reshape datasets for data wrangling using Python.

Use pd.melt() to Reshape Pandas DataFrame From Wide to Long Format

Example Code:

import pandas as pd

df = pd.DataFrame({
    'Groups': ['A', 'B', 'C', 'D'],
    'lab': [25, 21, 14, 22],
    'theory': [60, 55, 49, 69]
})

print("Wide Data Frame:\n")
print(df)

df = pd.melt(df, id_vars='Groups', value_vars=['lab', 'theory'])

print("\n\nLong Data Frame:\n")
print(df)

OUTPUT:

Wide Data Frame:

  Groups  lab  theory
0      A   25      60
1      B   21      55
2      C   14      49
3      D   22      69


Long Data Frame:

  Groups variable  value
0      A      lab     25
1      B      lab     21
2      C      lab     14
3      D      lab     22
4      A   theory     60
5      B   theory     55
6      C   theory     49
7      D   theory     69

Here, we have a data frame with three columns, Groups, lab, and theory in a wide data format which we convert to a long data format using the pd.melt() function.

pd.melt() is used to unpivot a dataframe from wide to long data format (optionally, leaving the identifiers set). It transforms a dataframe into a format where one or multiple columns/fields are identifier variables (id_vars).

Excluding the id_vars, all other columns are considered measured variables (value_vars). These are unpivoted to the row axis, leaving two non-identifier columns (variable and value).

Use pd.unstack() to Reshape Pandas DataFrame From Wide to Long Format

Example Code:

import pandas as pd

df = pd.DataFrame({
    'Groups': ['A', 'B', 'C', 'D'],
    'lab': [25, 21, 14, 22],
    'theory': [60, 55, 49, 69]
})

print("Wide Data Frame:\n")
print(df)

df = df.unstack()

print("\n\nLong Data Frame:\n")
print(df)

OUTPUT:

Wide Data Frame:

  Groups  lab  theory
0      A   25      60
1      B   21      55
2      C   14      49
3      D   22      69


Long Data Frame:

Groups  0     A
        1     B
        2     C
        3     D
lab     0    25
        1    21
        2    14
        3    22
theory  0    60
        1    55
        2    49
        3    69
dtype: object

This example uses the same data frame having three columns, Groups, lab, and theory, but here, we are using DataFrame.unstack to reshape the data frame from wide to long data format.

It pivots the level of (necessarily hierarchical) index labels and returns a dataframe containing a new level of field/column labels whose inner-most level has the pivoted index labels.

Remember, we will get a series as an output if an index is not the MultiIndex. Also, if flexible and fast solution is required for a particular problem then, prefer pd.melt() instead of DataFrame.unstack().

Use pd.wide_to_long() to Reshape Pandas DataFrame From Wide to Long Format

Example Code:

import pandas as pd

df = pd.DataFrame({
    'Groups': ['A', 'B', 'C', 'D'],
    'lab1': [25, 21, 14, 22],
    'lab2': [25, 21, 14, 22],
    'theory1': [60, 55, 49, 69],
    'theory2': [60, 55, 49, 69]
})

print("Wide Data Frame:\n")
print(df)

df = pd.wide_to_long(df, stubnames=['lab', 'theory'], i='Groups', j='Exams')

print("\n\nLong Data Frame:\n")
print(df)

OUTPUT:

Wide Data Frame:

  Groups  lab1  lab2  theory1  theory2
0      A    25    25       60       60
1      B    21    21       55       55
2      C    14    14       49       49
3      D    22    22       69       69


Long Data Frame:

Groups Exams  lab  theory
A      1       25      60
B      1       21      55
C      1       14      49
D      1       22      69
A      2       25      60
B      2       21      55
C      2       14      49
D      2       22      69

Here, we have a dataframe with lab1, lab2, theory1, and theory2 marks for four groups: ’ A, B, C, and D. Before understanding the pd.wide_to_long()` function, let’s look at the following visual to understand how the wide data format is reshaped into the long data format.

reshape pandas data frame from wide to long - visualize wide to long function

The pd.wide_to_long() functions in a very particular manner, it actually uses pd.melt() function under the hood. It accepts four mandatory parameters we will understand below, but the essential thing is how the column names are formed and displayed.

See the following column format for wide_to_long() function.

reshape pandas data frame from wide to long - wide to long variable names

We are passing four parameters to the pd.wide_to_long() function to reshape the data frame from wide to long data format. The four parameters that we used are listed below:

  1. df - This is the dataframe we want to reshape.
  2. stubnames - We can also say group names (prefixes) that need to be grouped. In our case, these are lab and theory.
  3. i - It is the identifier variable(s) that is not supposed to be stacked.
  4. j - The column’s name that contains the suffixes, or we can say the labels of columns.

The optional parameters are sep (separator) and suffix. You can read more about pd.melt(), pd.wide_to_long(), and DataFrame.unstack() here.

Mehvish Ashiq avatar Mehvish Ashiq avatar

Mehvish Ashiq is a former Java Programmer and a Data Science enthusiast who leverages her expertise to help others to learn and grow by creating interesting, useful, and reader-friendly content in Computer Programming, Data Science, and Technology.

LinkedIn GitHub Facebook

Related Article - Pandas DataFrame

  • Get Pandas DataFrame Column Headers as a List
  • Delete Pandas DataFrame Column
  • Convert Pandas Column to Datetime
  • Convert a Float to an Integer in Pandas DataFrame
  • Sort Pandas DataFrame by One Column's Values
  • Get the Aggregate of Pandas Group-By and Sum