# Python Apriori Algorithm

This tutorial will discuss the implementation of the apriori algorithm in Python.

## Explanation of the Apriori Algorithm

The Apriori Algorithm is widely used for market basket analysis, i.e., to analyze which items are sold and which other items. This is a useful algorithm for shop owners who want to increase their sales by placing the items sold together close to each other or offering discounts.

This algorithm states that if an itemset is frequent, all non-empty subsets must also be frequent. Let’s look at a small example to help illustrate this notion.

Let’s say that in our store, milk, butter, and bread are frequently sold together. This implies that milk, butter, and milk, bread, and butter, bread are also frequently sold together.

The Apriori Algorithm also states that the frequency of an itemset can never exceed the frequency of its non-empty subsets. We can further illustrate this by expanding a little more on our previous example.

In our store, milk, butter, and bread are sold together 3 times. This implies that all of its non-empty subsets like milk, butter, and milk, bread, and butter, bread are sold together at least 3 times or more.

## Apriori Algorithm in Python

Before implementing this algorithm, we need to understand how the apriori algorithm works.

At the start of the algorithm, we specify the support threshold. The support threshold is just the probability of the occurrence of an item in a transaction.

\$\$ Support(A) =(Number of Transactions Containing the item A) / (Total Number of Transactions) \$\$

Apart from support, there are other measures like confidence and lift, but we don’t need to worry about those in this tutorial.

The steps we need to follow to implement the apriori algorithm are listed below.

1. Our algorithm starts with just a `1-itemset`. Here, 1 means the number of items in our itemset.
2. Removes all the items from our data that do not meet the minimum support requirement.
3. Now, our algorithm increases the number of items (`k`) in our itemset and repeats steps 1 and 2 until the specified `k` is reached or there are no itemsets that meet the minimum support requirements.

## Implement the Topological Sort Algorithm in Python

To implement the Apriori Algorithm, we will be using the `apyori` module of Python. It is an external module, and hence we need to install it separately.

The `pip` command to install the `apyori` module is below.

``````pip install apyori
``````

We’ll be using the Market Basket Optimization dataset from Kaggle.

``````import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori
``````

We have imported all the libraries required for our operations in the code given above. Now, we need to read the dataset using `pandas`.

This has been implemented in the following code snippet.

``````market_data = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
``````

Now, let’s check the total number of transactions in our dataset.

``````len(market_data)
``````

Output:

``````7501
``````

The output shows that we have 7501 records in our dataset. There are just two small problems with this data; these transactions are of variable length.

Given the real-world scenarios, this makes a lot of sense.

To perform the apriori algorithm, we need to convert these arbitrary transactions into equi-length transactions. This has been implemented in the following code snippet.

``````transacts = []
for i in range(0, len(market_data)):
transacts.append([str(market_data.values[i,j]) for j in range(0, 20)])
``````

In the above code, we initialized the list `transacts` and stored our transactions of length 20 in it. The issue here is that we insert null values inside transactions with fewer than 20 items.

But we don’t have to worry about it because the `apriori` module handles null values automatically.

We now generate association rules from our data with the `apriori` class constructor. This is demonstrated in the following code block.

``````rules = apriori(transactions = transacts, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)
``````

We specified our thresholds for the constructor’s minimum support, confidence, and lift thresholds. We also specified the minimum and the maximum number of items in an itemset to be 2, i.e., we want to generate pairs of items that were frequently sold together.

The apriori algorithm’s association rules are stored inside the `rules` generator object. We now need a mechanism to convert this `rules` into a `pandas` dataframe.

The following code snippet shows a function `inspect()` that takes the generator object `rules` returned by our `apriori()` constructor and converts it into a `pandas` dataframe.

``````def inspect(output):
Left_Hand_Side = [tuple(result[2][0][0])[0] for result in output]
support = [result[1] for result in output]
confidence = [result[2][0][2] for result in output]
lift = [result[2][0][3] for result in output]
Right_Hand_Side = [tuple(result[2][0][1])[0] for result in output]
return list(zip(Left_Hand_Side, support, confidence, lift, Right_Hand_Side))

output = list(rules)
output_data = pd.DataFrame(inspect(output), columns = ['Left_Hand_Side', 'Support', 'Confidence', 'Lift', 'Right_Hand_Side'])
print(output_data)
``````

Output:

``````         Left_Hand_Side   Support  Confidence      Lift Right_Hand_Side
0           light cream  0.004533    0.290598  4.843951         chicken
1  mushroom cream sauce  0.005733    0.300699  3.790833        escalope
2                 pasta  0.005866    0.372881  4.700812        escalope
3         fromage blanc  0.003333    0.245098  5.164271           honey
4         herb & pepper  0.015998    0.323450  3.291994     ground beef
5          tomato sauce  0.005333    0.377358  3.840659     ground beef
6           light cream  0.003200    0.205128  3.114710       olive oil
7     whole wheat pasta  0.007999    0.271493  4.122410       olive oil
8                 pasta  0.005066    0.322034  4.506672          shrimp
``````

We can now sort this dataframe by support level and display the top 5 records in our dataset with the following code.

``````print(output_data.nlargest(n = 5, columns = 'Lift'))
``````

Output:

``````      Left_Hand_Side   Support  Confidence      Lift Right_Hand_Side
3      fromage blanc  0.003333    0.245098  5.164271           honey
0        light cream  0.004533    0.290598  4.843951         chicken
2              pasta  0.005866    0.372881  4.700812        escalope
8              pasta  0.005066    0.322034  4.506672          shrimp
7  whole wheat pasta  0.007999    0.271493  4.122410       olive oil
``````

Apriori is a very basic and simple algorithm for market basket analysis. It can provide helpful insides to increase sales of items in a market or a store.

The only disadvantage of this algorithm is that it takes a lot of memory for large datasets. This is because it creates a lot of combinations of frequent items.

We also experienced this limitation as this tutorial was meant to work with the UCI online retail data set, but due to memory limitations, we had to change our dataset to market basket optimization.

Maisam is a highly skilled and motivated Data Scientist. He has over 4 years of experience with Python programming language. He loves solving complex problems and sharing his results on the internet.