Market Basket Analysis 101: Anticipating Customer Behavior

In the retail and restaurant businesses, market basket analysis (MBA) is a set of statistical affinity calculations that help managers better understand — and ultimately serve — their customers by highlighting purchasing patterns. In simplest terms, MBA shows what combinations of products most frequently occur together in orders. These relationships can be used to increase profitability through cross-selling, recommendations, promotions, or even the placement of items on a menu or in a store.

The approach is based on the theory that customers who buy a certain item (or group of items) are more likely to buy another specific item (or group of items). For example: while at a quick-serve restaurant (QSR), if someone buys a sandwich and cookies, they are more likely to buy a drink than someone who did not buy a sandwich. This correlation becomes more valuable if it is shown to be stronger than that between the sandwich and drink without the cookies.

MBA can be used to recommend a purchase based on the absence of a common pairing, such as when a customer at a QSR orders only a small sandwich. They might be more likely to buy a dessert or a second sandwich than someone who bought a large sandwich. Staff trained to recognize these situations can offer their customers the additional items, possibly with a discount to make the option more attractive.

Applied more deeply, MBA allows companies to identify the keystone products, those that differentiate them in the market and could potentially hurt business if they were unavailable or more expensive. Gourmet or other specialty items in a grocery store might have limited appeal, but the customers they attract (and their subsequent spending) could justify high-visibility placement. Customers ordering through the company’s app could be interested in items or combinations that offer extra loyalty points.

Generating reliable insights from MBA typically requires large volumes of transactional data. Large data sets are difficult to process without highly-scalable storage and compute resources. Modern cloud-based architectures allow for more agile analytics with the capability to test a number of theories of customer behavior or to examine the effectiveness of a recent marketing campaign. Smartbridge clients are using Microsoft Azure and Snowflake as a data lake with BI tools like Microstrategy, integrated with specialized R statistical packages, to solve these problems.

Market Basket Analysis, Explained

MBA aims to find relationships and establish patterns across purchases. The relationship is modeled in the form of a conditional algorithm:

IF {sandwich, cookies} THEN {drink}

In the shorthand notation, which translates to “the items on the right are likely to be ordered with the items on the left:”

A collection of items purchased by a customer is an itemset. The set of items on the left-hand side (sandwich, cookies in the example above) is the antecedent of the rule, while the one to the right (drink) is the consequent. The probability that the antecedent event will occur, i.e., a customer will buy a sandwich and cookies, is the support of the rule. That simply refers to the relative frequency that an itemset appears in transactions. In a QSR, the support of an item or item combination helps to identify keystone products. Hence, if a sandwich and cookies have high support, then they can be priced to attract people to the store.

The probability that a customer will purchase a drink on the condition of purchasing a sandwich and cookies is referred to as the confidence of the rule. Confidence can be used for product placement strategy and increasing profitability. Placing high margin items near associated high confidence (driver) items can increase the overall margin on purchases.

The lift of the rule is the ratio of the support of the left-hand side of the rule (sandwich, cookies) co-occurring with the right-hand side (drink), divided by the probability that the left-hand side and right-hand side co-occur if the two are independent.

Market basket analysts search for rules with lift that are greater than 1 backed with high confidence values and often, high support.

The Apriori algorithm is a commonly-applied technique in computational statistics that identifies itemsets that occur with a support greater than a pre-defined value (frequency) and calculates the confidence of all possible rules based on those itemsets.

Market Basket Analysis Example

The Apriori algorithm is implemented in the arules package, which can be installed and run in R. Data is loaded into the engine in the following format:

The first column is the order/transaction number and the second is the item name or, more often, the item ID. The next steps usually involve aggregating each transaction across records into a single record as an array and converting the dataset to an R transaction. The result of that aggregation is as shown below:

Finally, the Apriori logic is applied to the transactions with the following resultset:

Overall, we see a number of strong consequent (RHS) combinations with soda, which could indicate that soda is a keystone product category. A manager could decide to keep the price and margins low on soda to drive sales volume. Bottled water is the exception, but we can assume that those would be considered alternatives to each other.

Perhaps a more interesting result is that all of the rules with ice cream show significant lift with a confidence of 1. (Associations in the other direction aren’t nearly as likely.) This could lead us to further promote ice cream with the expectation that customers will purchase other items at the same time.

Other Uses for Market Basket Analysis

In addition to its popularity as a retailer’s technique, MBA is applicable in many other areas:

More and more organizations are discovering ways of using market basket analysis to gain useful insights into associations and hidden relationships. As industry leaders continue to explore the technique’s value, a predictive version of market basket analysis is making in-roads across many sectors in an effort to identify sequential purchases.

Originally published at on July 16, 2019.




We’re geeks for the enterprise systems and tech that sustains and strengthens business. Simplifying business transformation.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Power BI Dynamic Visuals Using SWITCH TRUE Logic — Visualization Technique

Power BI Dynamic Visuals Using SWITCH TRUE Logic - Visualization Technique

Sebastian’s note #1: large-scale study of dream reports

Community Data for Improved Disaster Management and Urban Planning — Mwanza, Tanzania

Stochastics technical indicator


Reports For Retail Sales Performance In Power BI

Reports For Retail Sales Performance In Power BI

K-Means — Machine Learning Algorithms with Implementation in Python

PP-YOLO Surpasses YOLOv4 — Object Detection Advances

Data science: what is and how it can help your business?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


We’re geeks for the enterprise systems and tech that sustains and strengthens business. Simplifying business transformation.

More from Medium

Analytics stack with a GIS twist*9jZOjzQKJQp2IvNyL_H-SA.jpeg

What are the limitations and biases with data?

It all started with a term, and the term was BI…

Are small teams forever excluded from cloud data warehousing?