Scroll Down Web Page Using Selenium in Python

Scroll Down Web Page Using Selenium in Python

  1. Use a Scale to Scroll Down Web Page in Python
  2. Implement Scroll Down Up to the Bottom of the Webpage in Python
  3. Infinite Scrolling in Python
  4. Scroll Browser to a Target Element Using Selenium in Python
  5. Conclusion

This article demonstrates scrolling webpages using Selenium. Selenium is a web page automation tool that sends Python commands to web browsers and automates them as per requirement.

Use a Scale to Scroll Down Web Page in Python

Web pages are of multiple kinds, some scrollable up to the bottom, and some are endless, like Facebook. In the same way, scrolling can be both limited and endless.

It depends on the program and what kind of scrolling is required. Here, a program demonstrates how to scroll a webpage with a defined scale.

Import Packages

This program requires three import packages, each of which has different purposes.

  1. selenium - The first import package includes all the web automation sub-packages that come with it. The program requires webdriver, a tool to control the browser.
  2. ChromeDriveManager - The second import package is a sub package of the import package webdriver_manager, which allows automation inside Chrome Browser.
  3. The final import package time is a Python library which will be used to break the automation after a given period.

Implement Scroll Using Selenium Webdriver in Python

The program needs to load the ChromeDriveManager to implement scrolling. The driver is installed and loaded inside the variable driver.

All the further commands will be injected through this variable driver. Syntax driver.maximize_window() opens a maximized window of Chrome.

To get the contents from a web address, syntax driver.get("URL") is used, where the web address is placed in the URL. The webpage’s title is fetched using driver.title and printed.

Once the window is opened and the contents of the web page is loaded, scroll is implemented using syntax driver.execute_script("window.scrollTo(0, x)").

The driver executes a JavaScript to the web server that commands it to scroll the page from 0 to x, where x is the scroll limit.

The browser is set to close after 10 seconds of loading and scrolling the web page using the syntax from selenium import web driver. Finally, driver.close releases the driver.

from selenium import webdriver

from webdriver_manager.chrome import ChromeDriverManager
import time

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()

driver.get("https://www.theatlantic.com/culture/archive/2022/06/how-vacations-make-friendships-stronger/661349/?utm_source=pocket-newtab-intl-en")

print("Webpage Title= " + driver.title)

driver.execute_script("window.scrollTo(0, 1920)")
time.sleep(10)
driver.close()

Output:

Browser scrolling

Implement Scroll Down Up to the Bottom of the Webpage in Python

This program shows how to scroll to the bottom of the webpage using Selenium Webdriver.

This program shares some commonalities with the last program, where the import packages used here are the same as the method to load the driver and inject URL to the driver using driver.get().

A variable SCROLL_PAUSE_TIME is created that stores the browser’s timeout.

The bottom must be known to scroll up to the bottom of the screen.

To get that bottom, syntax driver.execute_script() is used. This is a JavaScript code that is injected into the webpage.

Inside the parameter, scrollHeight is returned. It is the total length of the scroll, just like a total length of a string.

The syntax fetches the bottom limit of the scroll and stores it inside the variable last_height.

Inside the while loop, JavaScript is injected to scroll the browser until scrollHeight using the syntax below.

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

A load time is provided using syntax time.sleep(SCROLL_PAUSE_TIME). This gives the browser some time to load new content if it’s there.

If no new contents get loaded, the window closes after counting until SCROLL_PAUSE_TIME.

Once the browser scrolls up to scrollHeight, JavaScript is injected to fetch the new scrollHeight, and it gets stored inside the variable new_height.

If the new_height equals last_height, the loop breaks, and no more scrolling is required. The last known height is updated into the variable last_height.

The window gets closed after remaining idle for 5 seconds as per SCROLL_PAUSE_TIME.

from selenium import webdriver

from webdriver_manager.chrome import ChromeDriverManager
import time

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()

driver.get("https://www.delftstack.com/")

SCROLL_PAUSE_TIME = 5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

Output:

Scrolled to the bottom

Infinite Scrolling in Python

Webpages like Facebook and Twitter can be infinitely scrolled because new content is loading and displaying.

Here, infinite scrolling occurs when the value of last_height keeps increasing. This causes the loop to run infinitely, as the new_height never gets equal to the last_height.

What happens inside the browser is that the scroll keeps loading new objects and scrolling to the bottom of it.

Cases like these have two outcomes:

  1. The system crashes due to an overflow of memory.
  2. The browser closes if new objects take more time to load than the SCROLL_PAUSE_TIME.

Infinite Scrolling

Scroll Browser to a Target Element Using Selenium in Python

This article section explains how to find an element in a webpage and scroll the browser up to it. Three things must be known to achieve this outcome.

  1. The URL of the webpage.
  2. The XPath of the target element.
  3. The average time it takes to load the page.

The URL of the webpage can be fetched from the search bar of any browser. If the target element is inside one of the subpages, then the subpage’s address must be given instead of the website’s home page.

XPath is a language that makes navigation easier inside web pages. Like every webpage has a URL, the elements inside the webpage have unique paths.

Fetch XPath of the Website

To fetch the XPath, go to the webpage, press F12, or right-click and choose inspect element. A panel will appear at the bottom of the browser.

A small icon of a black cursor over a square box appears on the top left-hand side of the panel.

Clicking on the icon puts the browser on an object selection mode, where hovering the cursor over the elements of the webpage will highlight it with blue color.

Clicking on an element inside object selection mode will display the HTML of that element. Right-click over the highlight HTML inside inspect panel, go to copy, and select copy XPath.

This will copy the XPath inside the clipboard.

Imports

The program requires two import packages - webdriver sub package of Selenium library and By sub package from selenium.webdriver.common.by library.

Import Driver and Fetching Target Element

This program requires chromedriver, which can be downloaded from here.

Unzip the downloaded package and copy the path of the .exe file inside the syntax parameters below.

driver = webdriver.Chrome()

The URL of the webpage needs to be put inside the parameters of syntax driver.get().

The syntax driver.find_element() searches for an element, while (By.XPATH, "your XPath") searches element for the given XPath. The XPath is put inside the double-quotes.

The contents from the XPath of the webpage get stored inside a variable el, while el.click executes a click command on the element to check its behavior.

time.sleep() puts a timeout that closes the browser when the process is finished or when no elements are found.

driver.quit releases the driver.

from selenium import webdriver

from selenium.webdriver.common.by import By

driver = webdriver.Chrome("C:/Users/Win 10/Downloads/chromedriver_win32/chromedriver.exe")
driver.maximize_window()
driver.get("https://www.w3schools.com/")
el = driver.find_element(By.XPATH, "/html/body/div[5]/div[9]/div/h1")
el.click()
time.sleep(10)
driver.quit()

Output:

Import Driver and Fetching Target Element

Conclusion

The article helps the reader learn how to make Python programs that use Selenium web driver to scroll the browser, using a scale or scrolling up to elements.

Related Article - Python Selenium

  • Check if Element Exists Using Selenium Python
  • Python Selenium Refresh Page
  • WebDriverException: Message: Geckodriver Executable Needs to Be in PATH Error in Python
  • Install Python Selenium in macOS
  • Login to a Website Using Selenium Python