Minimum support = 30%
Solution: Association Rules Mining
Transaction
ID
|
Items
Bought
|
1
|
{Laptop,
Printer, Tablet, Headset}
|
2
|
{Printer,
Monitor, Tablet}
|
3
|
{Laptop,
Printer, Tablet, Headset}
|
4
|
{Laptop,
Monitor, Tablet, Headset}
|
5
|
{Printer,
Monitor, Tablet, Headset}
|
6
|
{Printer,
Tablet, Headset}
|
7
|
{Monitor,
Tablet}
|
8
|
{Laptop,
Printer, Monitor}
|
9
|
{Laptop,
Tablet, Headset}
|
10
|
{Printer,
Tablet}
|
a)
Step 1: Calculating minimum support count :
It’s given that
minimum support percentage is 30%, so, the minimum
support count is 3
C1
Items
|
Support Count
|
{Laptop}
|
5
|
{Printer}
|
7
|
{Tablet}
|
9
|
{Headset}
|
6
|
{Monitor}
|
5
|
L1
Items
|
Support Count
|
{Laptop}
|
5
|
{Printer}
|
7
|
{Tablet}
|
9
|
{Headset}
|
6
|
{Monitor}
|
5
|
C2
Items
|
Support Count
|
{Laptop, Printer}
|
3
|
{Laptop, Tablet}
|
4
|
{Laptop, Headset}
|
4
|
{Laptop,
Monitor}
|
2
|
{Printer, Tablet}
|
6
|
{Printer, Headset}
|
4
|
{Printer, Monitor}
|
3
|
{Tablet, Headset}
|
6
|
{Tablet, Monitor}
|
4
|
{Headset,
Monitor}
|
2
|
L2
Items
|
Support Count
|
{Laptop, Printer}
|
3
|
{Laptop, Tablet}
|
4
|
{Laptop, Headset}
|
4
|
{Printer, Tablet}
|
6
|
{Printer, Headset}
|
4
|
{Printer, Monitor}
|
3
|
{Tablet, Headset}
|
6
|
{Tablet, Monitor}
|
4
|
C3
Items
|
Support Count
|
{Laptop,
Printer, Tablet}
|
2
|
{Laptop,
Printer, Headset}
|
2
|
{Laptop, Tablet,
Headset}
|
4
|
{Printer,
Monitor, Tablet}
|
2
|
{Tablet, Printer,
Headset}
|
4
|
L3
Items
|
Support Count
|
{Laptop, Tablet,
Headset}
|
4
|
{Tablet, Printer,
Headset}
|
4
|
C4 will not be
generated because one of its subset is not in L3. So, we are stopping here.
Result :
Frequent Item sets
{Laptop}
|
{Printer}
|
{Tablet}
|
{Headset}
|
{Monitor}
|
{Laptop, Printer}
|
{Laptop, Tablet}
|
{Laptop, Headset}
|
{Printer, Tablet}
|
{Printer, Headset}
|
{Printer, Monitor}
|
{Tablet, Headset}
|
{Tablet, Monitor}
|
{Laptop, Tablet,
Headset}
|
{Tablet, Printer,
Headset}
|
Efficiency of Apriori vs FP Growth Algorithm:
When
a database is scanned to check Ck for creating Lk, a large number of
transactions will be scanned even they do not contain any k-item set. For any large application, reading data from
disk is really resource consuming one.
For
example, the number of frequent pattern
generated for 100 item set is 2^100 -1, which is 1.27×10^30 frequent
sub-patterns! It’s a too huge set for any computer to store and compute.
In
FP Growth algorithm, it scans database
only twice. It’s much more efficient and faster than Apriori algorithm.
1st
time : Scan DB once and find single item frequent pattern
2nd
time : Scan DB again and construct a tree.
Implementation of the above problem using R programming language:
Output:
No comments :
Post a Comment