## Wednesday, November 30, 2016

### Association Rule Mining : Example & R code

Let's try to solve the following association rules mining problem.

Minimum support = 30%

Solution:  Association Rules Mining

 Transaction ID Items Bought 1 {Laptop, Printer, Tablet, Headset} 2 {Printer, Monitor, Tablet} 3 {Laptop, Printer, Tablet, Headset} 4 {Laptop, Monitor, Tablet, Headset} 5 {Printer, Monitor, Tablet, Headset} 6 {Printer, Tablet, Headset} 7 {Monitor, Tablet} 8 {Laptop, Printer, Monitor} 9 {Laptop, Tablet, Headset} 10 {Printer, Tablet}

a)
Step 1: Calculating minimum support count :
It’s given that minimum support percentage is 30%, so, the minimum support count is 3

C1

 Items Support Count {Laptop} 5 {Printer} 7 {Tablet} 9 {Headset} 6 {Monitor} 5

L1

 Items Support Count {Laptop} 5 {Printer} 7 {Tablet} 9 {Headset} 6 {Monitor} 5

C2

 Items Support Count {Laptop, Printer} 3 {Laptop, Tablet} 4 {Laptop, Headset} 4 {Laptop, Monitor} 2 {Printer, Tablet} 6 {Printer, Headset} 4 {Printer, Monitor} 3 {Tablet, Headset} 6 {Tablet, Monitor} 4 {Headset, Monitor} 2

L2

 Items Support Count {Laptop, Printer} 3 {Laptop, Tablet} 4 {Laptop, Headset} 4 {Printer, Tablet} 6 {Printer, Headset} 4 {Printer, Monitor} 3 {Tablet, Headset} 6 {Tablet, Monitor} 4

C3

 Items Support Count {Laptop, Printer, Tablet} 2 {Laptop, Printer, Headset} 2 {Laptop, Tablet, Headset} 4 {Printer, Monitor, Tablet} 2 {Tablet, Printer, Headset} 4

L3

 Items Support Count {Laptop, Tablet, Headset} 4 {Tablet, Printer, Headset} 4

C4 will not be generated because one of its subset is not in L3. So, we are stopping here.

Result :

Frequent Item sets

 {Laptop} {Printer} {Tablet} {Headset} {Monitor} {Laptop, Printer} {Laptop, Tablet} {Laptop, Headset} {Printer, Tablet} {Printer, Headset} {Printer, Monitor} {Tablet, Headset} {Tablet, Monitor} {Laptop, Tablet, Headset} {Tablet, Printer, Headset}

Efficiency of Apriori vs FP Growth Algorithm:

When a database is scanned to check Ck for creating Lk, a large number of transactions will be scanned even they do not contain any k-item set.  For any large application, reading data from disk is really resource consuming one.

For example,  the number of frequent pattern generated for 100 item set is 2^100 -1, which is 1.27×10^30 frequent sub-patterns! It’s a too huge set for any computer to store and compute.

In FP Growth algorithm, it scans database only twice. It’s much more efficient and faster than Apriori algorithm.

1st time : Scan DB once and find single item frequent pattern
2nd time : Scan DB again and construct a tree.

Implementation of the above problem using R programming language:

Output: