How to Execute SQL Query in Pandas

Fariba Laiq Feb 02, 2024
  1. Use pandasql to Run SQL Queries in Python
  2. Use DuckDB to Run SQL Queries in Python
  3. Use Fugue to Run SQL Queries in Python
How to Execute SQL Query in Pandas

SQL stands for Structured Query Language; it is a well-known language used to interact with a relational database. There are many ways to run SQL queries in Python.

Use pandasql to Run SQL Queries in Python

This package has an sqldf method like the sqldf in R. The pandasql provides a more familiar way to perform CRUD operations on the data frame.

Before we use pandasql, we have to install it first using the following command.

#Python 3.x
pip install -U pandasql

We will import the sqldf method from the pandasql module to run a query. Then we will call the sqldf method that takes two arguments.

The first argument is a SQL query in string format. The second argument is a set of session/environment variables (globals() or locals()).

In the following code, first, we have uploaded the Student.csv data to our current working directory of the Jupyter notebook. Then we have read the student’s records using the traditional SQL Select query over the data frame; it will show all records in the data frame.

# Python 3.x
import pandas as pd
from pandasql import sqldf


def mysql(q):
    return sqldf(q, globals())


df = pd.read_csv("Student.csv")
mysql("SELECT * FROM df")

Output:

SQL Select Query

We have used the WHERE clause in the SQL query to display only selected records that satisfy the given condition in the following code.

# Python 3.x
import pandas as pd
from pandasql import sqldf


def mysql(q):
    return sqldf(q, globals())


df = pd.read_csv("Student.csv")
mysql("SELECT * FROM df WHERE Department = 'SE'")

Output:

SQL Where Clause

Use DuckDB to Run SQL Queries in Python

DuckDB is a Python API and a database management system that uses SQL queries to interact with the database.

To use DuckDB, we should install it first using the following command.

#Python 3.x
pip install duckdb

In the following code, we have imported the duckdb and Pandas package, read the CSV file and run the query by calling the query() method with duckdb. We will pass the query (as an argument) to the query() method.

The code will return the result as a data frame. We can write any SQL query of our choice according to the data frame.

# Python 3.x
import pandas as pd
import duckdb

df = pd.read_csv("Student.csv")
duckdb.query("SELECT * FROM df").df()

Output:

SQL Select Query

Use Fugue to Run SQL Queries in Python

Fugue is a unified interface for distributed computing that allows users to run Python, Pandas, and SQL code on Spark and Dask without rewriting.

We have to install it first using the following command to use fugue.

#Python 3.x
pip install fugue[sql]

We have imported Pandas and fugue packages in the following code and loaded the data frame with the CSV file. Then we will pass our SQL query to the fsql() method and call the run() method with it.

# Python 3.x
import pandas as pd
from fugue_sql import fsql

df = pd.read_csv("Student.csv")
query = "SELECT * FROM df PRINT"
fsql(query).run()

Output:

Fugue SQL

Author: Fariba Laiq
Fariba Laiq avatar Fariba Laiq avatar

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

LinkedIn

Related Article - Pandas SQL