Pandas read_sql_query in Python

Manav Narula Jan 05, 2023
Pandas read_sql_query in Python

SQL is a programming language used to create and maintain databases in a Relational Database Management System. We can create many database applications using Python.

Several libraries like SQLite, SQLAlchemy, and more can be used to create connections to SQL applications installed on the device.

In Python, we can store tables as DataFrame objects using the pandas library. We can work with SQL queries with this library as well.

This tutorial will demonstrate how to read results from a SQL query using the pandas.read_sql_query() function.

Python Pandas read_sql_query

Using the pandas.read_sql_query() function, we can directly read the result from a query into a DataFrame. We need to create a connection to an SQL database to use this function.

In our example, we will use the sqlite3 library. We can use this library to work with the SQLite database application.

The connect() method is used to create a connection, and the name of the database is provided in this function. We will create the connection and execute the SELECT statement to load data into the DataFrame.

See the code below.

import pandas as pd
import sqlite3

connection = sqlite3.connect("delftstack.db")
crsr = connection.cursor()
ct_sql = """CREATE TABLE data3 (
number INTEGER,
name VARCHAR(20));"""
crsr.execute(ct_sql)
crsr.execute("""INSERT INTO data3 VALUES (1, "GEORGE");""")
crsr.execute("""INSERT INTO data3 VALUES (2, "KEVIN");""")
df = pd.read_sql_query("""SELECT number,name FROM data3""", con=connection)
print(df)
connection.close()

Output:

   number    name
0       1  GEORGE
1       2   KEVIN

In the above example, we read data from the data1 table in the delftstack.db database. The result of the query is stored in the df DataFrame.

We can use some convenient parameters also with this function. The index_col parameter is used to specify the index for the DataFrame, and the column names can be provided with the columns parameter.

The coerce_float parameter can convert non-numeric, non-string values to floating-point numbers.

The most useful parameter here is the chunksize parameter. We may encounter huge tables and datasets in SQL.

So if we read them at once into a DataFrame, then a lot of memory is consumed internally. This can be avoided with the chunksize parameter that creates chunks of data of the specified rows.

It creates iterator objects of such chunks that can be iterated over and used to work with the data.

The pandas.read_sql is another function available in the pandas library that can read the result of an SQL query to a DataFrame. However, this is just a wrapper around the read_sql_query and read_sql_table functions for backward compatibility.

Whenever the pandas.read_sql() function encounters an SQL query, it gets routed to the read_sql_query function discussed in this tutorial.

An alternative to this function is by using the fetchall() function. This function fetches all the rows of the result of an SQL query.

We will execute the SELECT statement to print the rows and use this function to retrieve them and store them in a DataFrame using the pandas.DataFrame constructor.

See the following example.

import pandas as pd
import sqlite3

connection = sqlite3.connect("delftstack.db")
crsr = connection.cursor()
ct_sql = """CREATE TABLE data4 (
number INTEGER,
name VARCHAR(20));"""
crsr.execute(ct_sql)
crsr.execute("""INSERT INTO data4 VALUES (1, "GEORGE");""")
crsr.execute("""INSERT INTO data4 VALUES (2, "KEVIN");""")
crsr.execute("""SELECT number,name FROM data4""")
df_new = pd.DataFrame(crsr.fetchall(), columns=["number", "name"])
print(df_new)
connection.close()

Output:

   number    name
0       1  GEORGE
1       2   KEVIN
Author: Manav Narula
Manav Narula avatar Manav Narula avatar

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

LinkedIn

Related Article - Pandas SQL