FP-growth算法有大量的开源实现,其中名气较大的是pyfpgrowth。完整演示代码请见本书GitHub上的11-3.py。pyfpgrowth的安装非常简单:
pip install pyfpgrowth
pyfpgrowth实现后封装的函数如下,其中support代表支持度,minConf代表置信度:
patterns = pyfpgrowth.find_frequent_patterns(transactions, support) rules = pyfpgrowth.generate_association_rules(patterns, minConf)
假设我们需要从下列数据中挖掘频繁项集:
transactions = [[1, 2, 5], [2, 4], [2, 3], [1, 2, 4], [1, 3], [2, 3], [1, 3], [1, 2, 3, 5], [1, 2, 3]]
满足的条件为支持度为2,置信度为0.7:
patterns = pyfpgrowth.find_frequent_patterns(transactions, 2) rules = pyfpgrowth.generate_association_rules(patterns, 0.7)
输出结果为:
{(1, 5): ((2,), 1.0), (5,): ((1, 2), 1.0), (2, 5): ((1,), 1.0), (4,): ((2,), 1.0)}