Python Address Parser

Abid Ullah Oct 10, 2023
  1. Parse Address Using Python Library PyParsing
  2. Parse Address From CSV File Using PyParsing in Python
Python Address Parser

This article will show you how to parse addresses using Python. We will use the pyparsing library to manually parse the address and use the functions or pyparsing for addresses in the CSV file.

We’ll start with a simple example and then move on to a complex one.

Parse Address Using Python Library PyParsing

It is widely acknowledged that the Python programming language’s pyparsing module is an invaluable tool for performing operations on text data.

The pyparsing package, used for parsing and modifying text data, simplifies working with addresses. This is because the module can convert and help in parsing addresses.

In this article, we will discuss the usage of the PyParsing module for handling parsing as well as modifications. Let’s look at a real example of parsing an address using the PyParsing module.

After that, we will look at a more extensive example to demonstrate how PyParsing can be used to alter and parse address data.

Simple Address Parsing Using PyParsing

Let’s start by looking at a basic example of parsing an address with the help of the Python library PyParsing. As a first example, let’s look at the following address and parse it.

567 Main Street

Follow these steps to parse this address:

  • Import pyparsing library

    First, we will import the pyparsing library with all its modules and functions by mentioning *.

    from pyparsing import *
    
  • Create a variable

    Now we will create a variable and assign it to the address we want to parse.

    address = "567 Main Street"
    
  • Break down

    Now we will break down the address parts by mentioning nums and alphas.

    addressParser = Word(nums) + Word(alphas) + Word(alphas)
    
  • Now we will create a variable and call parseString from the library pyparsing.
    addressParts = addressParser.parseString(address)
    
  • Print

    Finally, we will print the variable and see the result.

    print(addressParts)
    

Let’s write the entire code and run it to see the result.

from pyparsing import *

address = "123 Main Street FL"
addressParser = Word(nums) + Word(alphas) + Word(alphas) + Word(alphas)
addressParts = addressParser.parseString(address)
print(addressParts)

Output:

['123', 'Main', 'Street', 'FL']

This code will parse the address into four parts: the street number, the street name, the street type, and the state of the address.

The street number will be the first part, the street name will be the second part, the street type will be the third part, and the state will be the last part.

Four Useful Functions of PyParsing

We can use one of four available functions to do the actual parsing.

  1. ParseString - With parseString, you can start parsing text from the beginning without worrying about unnecessary content at the end.
  2. ScanString - ScanString searches the input string for matching words, somewhat like re.finditer().
  3. SearchString - SearchString is similar to scanString, except instead of returning a single token, it provides a collection of them.
  4. TransformString - TransformString is similar to scanString but allows you to substitute tokens with others of your choosing.

Parse Address From CSV File Using PyParsing in Python

The addressing information is a specific piece of data frequently recorded in CSV files. Because there is a great deal of difference in how they are structured, they might be hard to parse.

The pyparsing module simplifies extracting addresses from CSV files using a defined structure. To begin, let’s define a few straightforward guidelines and functions for how to parse an address correctly.

After that, we will apply these principles to parsing address-containing CSV files.

Assume our configuration file or address’s CSV file looks something like this:

city=LAUDERDALE, state=FL, Zipcode: 33316

We will have to parse the string in key=value format. A KEY=VALUE string has three parts: the key, the equals sign, and the value.

Including the equals sign in the final output of a parse of such an expression is unnecessary. It is possible to prevent a token from being included in the output using the Suppress() method.

Token names can be provided by the setResultsName() function or by calling the parser with the name as an argument when the parser is constructed, making it slightly more straightforward to retrieve specific tokens. Tokens should preferably have names associated with them.

Let’s try the code and see how pyparsing works with CSV files.

We will start with importing the pyparsing library with all its functions and modules.

from pyparsing import *

Secondly, we will create a variable for the key part of the input for output. We will mention alphanums because the data set of addresses can contain alphabets and numbers.

key = Word(alphanums)("key")

We want to remove the = sign from our output in the CSV file. We will use the Suppress function.

equals = Suppress("=")

Now, we will make a variable for the value part. And again, we will mention alphanums because the data set of addresses can contain alphabets and numbers.

value = Word(alphanums)("value")

Now, we will create another variable to concatenate the variables.

keyValueExpression = key + equals + value

Now we will open our CSV file of address using file formatting. And use the file.read function to read every data in the file.

with open("/address.csv") as address_file:
    address_file = address_file.read()

After this, we will use a for loop with the scanString function or pyparsing to read each line of the address one by one.

for adrs in keyValueExpression.scanString(address_file):
    result = adrs[0]

And lastly, we will use the print function to see the result.

print("{0} is {1}".format(result.key, result.value))

Here our code ends, and now we will write the entire code to run it. And see what output we will get when we provide a CSV file with the address.

# import library
from pyparsing import *

key = Word(alphanums)("key")
# delet = from the output
equals = Suppress("=")
value = Word(alphanums)("value")
keyValueExpression = key + equals + value
# use file formating to open csv file
with open("/content/address.csv") as address_file:
    address_file = address_file.read()
# use for loop to read your CSV file
for adrs in keyValueExpression.scanString(address_file):
    result = adrs[0]
    # print the output
    print("{0} is {1}".format(result.key, result.value))

Output:

city is LAUDERDALE
state is FL

The output of the code shows the data our file contains. In the address.csv file, we only had one address.

And you can see the functionality of using the pyparsing library as the address is parsed.

PyParsing offers a more robust and mature alternative to regular expressions when parsing text into tokens and retrieving or replacing individual tokens.

For example, nested fields are no problem for PyParsing, but they would be for regular expressions. This parser is more like the old standbys, like lex and yacc.

In other words, regular expressions may be used to search for tags and extract data from HTML, but they cannot be used to verify an HTML file. However, pyparsing would allow you to accomplish this.

We hope you find this article helpful in understanding the address parser used in Python.

Author: Abid Ullah
Abid Ullah avatar Abid Ullah avatar

My name is Abid Ullah, and I am a software engineer. I love writing articles on programming, and my favorite topics are Python, PHP, JavaScript, and Linux. I tend to provide solutions to people in programming problems through my articles. I believe that I can bring a lot to you with my skills, experience, and qualification in technical writing.

LinkedIn