Custom Search Engine Using Google API in Python

Custom Search Engine Using Google API in Python

  1. Create a Search Engine Using Google CSE Platform
  2. Implement the Custom Search API in Python
  3. Conclusion

The article explains creating a Custom Search Engine (CSE) using Google Search API in Python. A CSE is a search engine designed for developers that enables them to incorporate it into any application, including websites, mobile apps, and other things.

For web scraping, many apps use the Google Custom Search Engine. This article will explain how to set up a CSE and use its Google Search API in Python.

Manually scraping Google Search is highly discouraged because the search gets restricted after every few requests.

Create a Search Engine Using Google CSE Platform

Using a Google Search API in Python to get search results is a three-tier process. Unlike web scrapping, which returns results directly from Google search, this method creates a custom search engine and uses it to fetch results.

This helps fetch the same results as scrapping without any thresholds for sending requests.

To create a search engine, look up the programmable search engine page or click on this link. Give a name for the search engine and add a sample URL inside What to search?.

Remember that this sample URL can be changed later, and that’s what we will be doing.

Confirm reCAPTCHA and click on Create to create a custom search engine. This search engine needs to be tweaked to access the entire web.

Click on Customize on the next page.

Create CSE

Under Basic, some essential data can be found, like the search engine ID, which will be used to send search requests. Copy the search engine ID and store it.

Scroll down to Search Features and turn on the Search the entire web option.

In the Sites to search section, tick the checkbox of the added URL and delete it. This will make the search engine open to the entire web.

Modify CSE

Once the Custom Search Engine is created, it is time to use the Google Search API in Python.

First, we need to get an API for the created search engine.

Get a Google API Key

Google’s Application Programming Interface (API) is a feature of Google Cloud to embed Google services into third-party applications. A Google project needs to be created to get a custom search API key and then use it as a Google search API in Python.

There are two ways to fetch an API key for the custom search engine:

  1. Create a project in Google Cloud and get a Google Custom Search API.
  2. Get a JSON API key.

Both steps require a Google Cloud project.

Create a Project in Google Cloud and Get a Google Custom Search API

Head over to the credentials page of Google Cloud. Then, click on New Project.

Google Cloud - New Project

Name it and leave the organization box as it is. Then, click on Create.

Google Cloud - Click Create

After creating the project, we need to attach a custom search API to this project. In the left-hand side panel, select Credentials and then click on the Create Credentials button on the top.

Inside Create Credentials, select the API key.

Google Cloud - Credential API Key

Selecting the API key option will create an API key for the project. Click on the Show key to copy the API key.

Google Cloud - API Key Created

The API key fetched from this method is inactive. It can be manually activated when running the Python script bearing this API key.

The prompt for activation is thrown by Python when the script is run for the first time. After activating the API key, the custom search engine can be used.

Get a JSON API Key

This method is relatively simpler as it does not require activation of the key. The API Key can be directly fetched from this method if a Google Cloud project already exists.

Go to the guide page of the programmable search engine website.

Click on the Get a key button to open a pop-up asking to choose the project.

Google Cloud - JSON API Get Key

Click on the project and select Next to create an API for the project.

JSON API - Select Project

Click on the Show key to get the API Key.

Google Cloud - JSON API Show Key

This JSON API key can be used directly, whereas the API key fetched manually through the Credentials tab in the Google Cloud needs to be activated.

Implement the Custom Search API in Python

After the CSE ID and the API key is ready, the Google search API in Python can be used inside scripts. There are two programs below that will explain the process.

Example 1:

For the Google Search API in Python to work, we need a Python library to parse the API key. We can use the Google API Python Client.

To install it, go to CMD or any IDE that runs Python and install the Google API Python Client.

Inside CMD, write the command:

pip install google-api-python-client

This will install the Python package into the system.

A Python script needs to be created that will send search queries to the custom search engine and return the result.

Code- custom_search_engine.py:

from googleapiclient.discovery import build

my_api_key = "The API_KEY you acquired"
my_cse_id = "The search-engine-ID you created"


def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=api_key)
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res['items']


results = google_search('"How to code in Python"', my_api_key, my_cse_id, num=10)
for result in results:
    print(result)

Let’s break down the code to understand what it does. The first line of code imports the build module from the Python library package google api python client.

Two object variables, my_api_key and my_cse_id, have been created that store the API key and the custom search engine ID, respectively.

A method google_search is created with four parameters: search_term, which stores the search query, api_key for passing the API key, cse_id for passing the custom search engine’s ID, and lastly, the keyword argument **kwargs.

The below code creates a variable service that uses the build function to create a customsearch API service that will be fitted to the custom search engine.

service = build("customsearch", "v1", developerKey=api_key)

The next line uses the service.cse() module to create a client that will send search queries to the custom search engine and store it in the variable rex.

The list(q=search_term, cx=cse_id, **kwargs) creates a list of the results fetched from the search term, where **kwargs is used to put a limit to the number of search terms returned from the client.

res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()

Lastly, the variable rex is returned as an array with the search results list.

Finally, a variable result is created to store the search results. The method google_search is called with the search query as the first parameter. Then, its API key, CSE ID, and the number of search iterations for the following parameters.

The list returned is stored inside the variable result. Inside a for loop, it is printed up to its length.

results = google_search('"How to code in Python"', my_api_key, my_cse_id, num=10)
for result in results:
    print(result)

Output:

Python Implement the Custom Search API - Output 1

Example 2:

In this example, we will make a Python script that sends search requests without using any external dependency. This program will use the API key and the CSE ID and create a client that uses the inbuilt Python libraries together with the Google search API in Python.

Code:

import requests

API_KEY = "Your API Key"

SEARCH_ENGINE_ID = "Your CSE ID"

# the search query you want
query = "Starboy"
# using the first page
page = 1
# construct the URL
# doc: https://developers.google.com/custom-search/v1/using_rest
# calculating start, (page=2) => (start=11), (page=3) => (start=21)
start = (page - 1) * 10 + 1
url = f"https://www.googleapis.com/customsearch/v1?key={API_KEY}&cx={SEARCH_ENGINE_ID}&q={query}&start={start}"

# make the API request
data = requests.get(url).json()

# get the result
search_items = data.get("items")
# iterate over 10 results
for i, search_item in enumerate(search_items, start=1):
    try:
        long_description = search_item["pagemap"]["metatags"][0]["og:description"]
    except KeyError:
        long_description = "N/A"
    # get the title of the page
    title = search_item.get("title")
    # get the page snippet
    snippet = search_item.get("snippet")
    # alternatively, you also can get the HTML snippet (bolded keywords)
    html_snippet = search_item.get("htmlSnippet")
    # extract page url
    link = search_item.get("link")
    # print results
    print("="*10, f"Result #{i+start-1}", "="*10)
    print("Title:", title)
    print("Description:", snippet)
    print("Long description:", long_description)
    print("URL:", link, "\n")

Let’s understand what the above code does.

The first line imports Python HTTP library requests. The two variables are initialized, API_KEY and SEARCH_ENGINE_ID, which store the previously created credentials.

import requests

API_KEY = "Your API Key"
SEARCH_ENGINE_ID = "Your CSE ID"

The variable query is used to store the search term that the application will look for. The variable page displays the search result from a particular page, while the variable start indicates the sequence of results from that page.

For example, every page has 10 search results. If the variable start has page = 1, it will show the first 10 search results, meaning the first page, while page = 2 will display search results followed by the 10th result, which means results starting from the 11th.

The variable url stores the service URL used to get the search results from the custom search engine. It stores the credentials like the API key, the search query, and the page number of search results to be displayed.

query = "Starboy"
page = 1
start = (page - 1) * 10 + 1
url = f"https://www.googleapis.com/customsearch/v1?key={API_KEY}&cx={SEARCH_ENGINE_ID}&q={query}&start={start}"

This program sends an API request using the requests function to the stored URL and saves the data returned from the API call into the variable data.

The variable search_items is used to get the search items. It is put inside a for loop starting from the first element and running up to its length.

The first element being searched for is the result description, which is put inside an exception handling block.

If the program finds any description, it gets stored inside the variable long_description. In case nothing is returned, it stores N/A.

data = requests.get(url).json()
search_items = data.get("items")
for i, search_item in enumerate(search_items, start=1):
    try:
        long_description = search_item["pagemap"]["metatags"][0]["og:description"]
    except KeyError:
        long_description = "N/A"

In the below code, the attributes of each search result are stored inside the variable of its name. This process is repeated 10 times for every search result.

title = search_item.get("title")
snippet = search_item.get("snippet")
html_snippet = search_item.get("htmlSnippet")
link = search_item.get("link")

Finally, all the results are printed—the first line prints the result’s number followed by attributes like title, description, etc.

print("="*10, f"Result #{i+start-1}", "="*10)
print("Title:", title)
print("Description:", snippet)
print("Long description:", long_description)
print("URL:", link, "\n")

The results are printed using Google search API in Python without needing external dependency.

Output:

Python Implement the Custom Search API - Output 2

Conclusion

This article has explained creating a client that sends search queries to a custom search engine using Google search API in Python. The reader would be able to create a custom search engine, fetch API keys, and can easily create Python scripts that send search requests.