Pareto Principle-80/20 Rule

Pareto distribution with an example

İkbal Arslan
3 min readMar 11, 2021
Photo by Austin Distel on Unsplah

80/20 rule is known as pareto distribution which claims that for many events, roughly 80% of the effects come from 20% of the causes.

Vilfredo Pareto was a late nineteenth-century economist/sociologist who first noted and reported his observation that about 80 percent of wealth was concentrated in about 20 percent of a population. This is the basis for what we now call the Pareto Principle [1].

The Pareto principle has been applied in a variety of fields ranging from Economics, and Business, to Biology and Criminology, in an attempt to not only explain observations in the field, but also fine tune the practices in that field towards improved efficacy. Business is the field where this rule is observed the most. Business managers for example realize that [1,2]:

  • 80% of their profits come from 20% of their customers,
  • 80% of their complaints come from 20% of their customers,
  • 80% of their profits come from 20% of the time they spend,
  • 80% of their sales come from 20% of their products, and
  • 80% of their sales are made by 20% of their sales staff.

Let’s examine an example together for pareto principle. We’ll find products that make up about 80 percent of a company’s revenue.

As first step, required libraries are imported and dataset is read. Using df.head(), the first 5 observations of each column are examined.

df.head()

It is examined whether there is a missing value in the dataset or not:

missing values

Missing values are observed in dataset. Since we will not use these variables they can be dropped for this example:

Now we will create new variable which is TotalPrice by multiplying price and quantity. Thus, we will find the total expenditure made by a customer.

Invoice code starts with the letter ‘C’,indicates a cancellation. That’s why we chose the ones by using ~ which do not contains “C”

To ensure the smoothness of the dataset (due to cancellations), we choose those with Quantity greater than 0

We group the dataset by StockCode and sort values by TotalPrice:

df1.head()

Since we are looking to find products that make up about 80 percent of a company’s revenue, we calculate the cumulative sum and precentage.

Let’s check how many unique products we have:

df[“StockCode”].nunique() is 3665. By creating a new dataframe we will find the products which make up 80% of the revenue:

df_urun[“StockCode”].nunique() is 776. Finally let’s check the percentage of products which 80% of the sales come from.

The result is 21.17. Yes it is almost 20%. Which show us that almost 80% of the sales revenue come from 20% of the products.

Thank you for reading!

It can be found out the full version of the project and dataset story on my GitHub account:

REFERENCES:

[1] Sanders, R. (1987). The Pareto principle: its use and abuse. Journal of Services Marketing.

[2] Kiremire, A. R. (2011). The application of the pareto principle in software engineering. Consulted January, 13, 2016.

M. Vahit KESKİN & veribilimiokulu.com

--

--