# Pandas Scatter Plot Regression Line

The charting tools that come with Pandas are fantastic tools to use. Although there are many different plotting libraries, like `Seaborn`, `Bokeh`, and `Plotly`, we find `Pandas` plotting highly satisfactory to most of my requirements.

However, this article will explore how to use Python’s `Seaborn` library and `matplotlib` method to create Pandas’ scatter plots regression line.

## Draw a Regression Using Scatter Plot With Pandas

In Python, we draw a regression using the scatter plot along with Pandas. You can utilize the following code to create a Scatterplot from Pandas.

``````df.plot.scatter(x='one', y='two, title='Scatterplot')
``````

If there is a Parameter, it plots a Regression line and displays the Parameters of the fit.

``````df.plot.scatter(x='one', y='two', title='Scatterplot', Regression_line)
``````

However, you can determine the linear trend by adding a regression curve to a scatterplot of two numerical variables. Furthermore, we will also see an illustration of adding a regression curve to a scatter plot to make it more unique.

There are three core steps to do it.

1. Import the required libraries.
2. Create, load, or import the data.
3. Use the `regplot()` or `lmplot()` function to plot the graph.

Note that you must have the modules of these libraries first with the help of the following ways according to your Python version.

Code - `seaborn`:

``````# in a virtual environment or using Python2
pip install seaborn

# for python3 (could also be pip3.10 depending on your version)
pip3 install seaborn

# if you get a permissions error
sudo pip3 install seaborn

# if you don't have pip in your PATH environment variable
python -m pip install seaborn

# for python3 (could also be pip3.10 depending on your version)
python3 -m pip install seaborn

# alternative for Ubuntu/Debian
sudo apt-get install python3-seaborn

# alternative for CentOS
sudo yum install python3-seaborn

# alternative for Fedora
sudo yum install python3-seaborn

# for Anaconda
conda install -c conda-forge seaborn
``````

Code - `matplotib`:

``````# in a virtual environment or using Python2
pip install matplotlib

# for python3 (could also be pip3.10 depending on your version)
pip3 install matplotlib

# if you get a permissions error
sudo pip3 install matplotlib

# if you don't have pip in your PATH environment variable
python -m pip install matplotlib

# for python3 (could also be pip3.10 depending on your version)
python3 -m pip install matplotlib

# alternative for Ubuntu/Debian
sudo apt-get install python3-matplotlib

# alternative for CentOS
sudo yum install python3-matplotlib

# alternative for Fedora
sudo yum install python3-matplotlib

# for Anaconda
conda install -c conda-forge matplotlib
``````

## Use `regplot()` to Draw a Regression

This technique plots data and the fit to a linear regression model. However, there are several options for estimating the regression model, all of which are mutually exclusive.

Code Example:

``````# importing libraries
import seaborn as sb

# use regplot
sb.regplot(x = "sepal_length",
y = "petal_length",
ci = None,
data = df)
``````

Output:

## Use `Implot()` to Draw a Regression

Another straightforward plot is the `lmplot()`. It displays a line denoting a linear regression model and data points in a 2D space.

However, you can adjust the labels `x` and `y` to indicate the horizontal and vertical axes, respectively.

Code Example:

``````# importing libraries
import seaborn as sb

# use lmplot
sb.lmplot(x = "sepal_length",
y = "petal_length",
ci = None,
data = df)
``````

Output:

## Use `sklearn` to Merge the Regression Line With the Scatter Plot

Code Example:

``````import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression

X = marks_df.iloc[:, :-1].values
y = marks_df.iloc[:, 1].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
plt.scatter(X_train, y_train,color='g')

plt.plot(X_test, y_pred,color='k')
plt.show()
``````

Output:

## Use `Matplotlib` for Pandas Scatter Plot Regression Line

Using `Matplotlib`, the following code illustrates how to produce a scatterplot with an evaluated regression line for these data.

Code Example:

``````# import libraries
import numpy as np
import matplotlib.pyplot as plt

# creating data
a = np.array([1, 3, 1, 5, 0, 9, 5, 7, 6, 7, 3, 7])
b = np.array([13, 18, 17, 12, 23, 14, 27, 25, 24, 23, 36, 31])

# create a simple scatterplot
plt.plot(a, b, 'o')

# obtain the m (slope) and b(intercept) of the linear regression line
m, b = np.polyfit(x, y, 1)

#add a linear regression line to the scatterplot
plt.plot(x, m*x+b)
``````

Output:

## Use `seaborn` to Draw Regression Line

First, import the modules like pandas, random, matplotlib, and seaborn, which will be needed for the dataset.

``````import pandas as pd
import random
import matplotlib.pyplot as plt
import seaborn as sns
``````

After creating an empty dataset, we generated a set of random data using the random function, which we placed in the variables X and Y. However, the dataset’s first five rows were printed using the print function.

``````df = pd.DataFrame()
df['x'] = random.sample(range(1, 500), 70)
df['y'] = random.sample(range(1, 500), 70)
``````

With the help of `sns.lmplot`, we first plot a scatter plot lacking a regression line. However, we entered data `x`, target `y`, `dataframe`, and `fit_reg` as False since we do not require a regression line, and we entered the numbers for the plot in `scatter_kws`.

The `title`, `x`, and `y-axis` labels have also been specified.

``````sns.lmplot('x', 'y', data=df, fit_reg=False, scatter_kws={"marker": "D", "s": 20})
plt.title('Scatter Plot of Data without Regression Line')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()
``````

We need to set the `fir_eg` parameter to True to generate a scatterplot with a regression line. However, this will draw a regression line alongside the scatterplot.

The `title`, `x`, and `y-axis` labels have also been specified.

``````sns.lmplot('x', 'y', data=df, fit_reg=True, scatter_kws={"marker": "D", "s": 20})

plt.title('Scatter Plot of Data with Regression Line')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()
``````

Output:

``````     x    y
0   79  386
1  412   42
2  239  139
3  129  279
4  404  239
``````

## Conclusion

This is how the pandas scatter plot regression line is created using Matplotlib or Seaborn. The linear trend can be easily seen by adding a regression line to a scatterplot between two numerical variables.

In this article, we learned two different Python Seaborn methods for creating scatter plots using regression lines. We also learned an illustration of how to add a regression line to a scatter plot.

Zeeshan is a detail oriented software engineer that helps companies and individuals make their lives and easier with software solutions.