Basics of FP-Growth

bernardo_pagnonbernardo_pagnon Member, University ProfessorPosts:60University Professor
edited March 2020 inHelp
Hello all,

I am struggling quite a bit with the FP-growth operator. I got all sorts of errors (no binomial attributes when I manually set them to binomial, outputs that I cannot understand, etc). I am trying to run the smallest possible example: 2 transactions, 3 products (juice, meat and milk)! My excel file is like that:

0 0 1
0 0 1

What am I doing wrong? What are the basic errors one should avoid when using FP-Growth? I read the help page at RM on this operator and I found it extremely confusing also. Any help is appreciated, I just want to use the operator in the simples possible way.

Regards,
Bernardo
Jasmine_

Best Answer

  • bernardo_pagnonbernardo_pagnon Member, University ProfessorPosts:60University Professor
    Solution Accepted
    Oh, now I see: this option has tow modes, and when find min number of itemsets is checked it ignores this minimum value.

    Solved!!!
    Jasmine_

Answers

  • bernardo_pagnonbernardo_pagnon Member, University ProfessorPosts:60University Professor
    Follow up: I have been playing with the data set of chapter 8 of the book RapidMiner: Data mining use cases and business analytics applications, which is available athttp://rapidminerbook.com/.
    I think there is something weird going on: using the exact same steps as the author suggests, I got the same result as he did. For instance, the frequency of "juices" as a single item was 0.780, while the one for desserts was 0.312. Then I implemented the same situation, but now I used "read csv", and the "numerical to binomial" operator. The results for the frequencies were .220 for Juice, and 0.312 for desserts. I checked on Excel, using COUNT IF, and the last results seem to be the correct ones. Strange. It seems that RM is not counting those singletons properly, or some operator inverts a few of the values. I would appreciate it if someone could check that.

    Best,
    Bernardo
    Jasmine_
  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist
    Hi@bernardo_pagnon,

    I tested on the same market data downloaded fromhttp://rapidminerbook.com/index.php/chapter-downloads/chapter-8/
    The frequency output for "juices" is shown as 0.219613 which matches with your Excel count if results.


    support= (Number of times an item or itemset appears in the database) / (Number of baskets in the database)
    Attached is the process for reference.

    < ?xml version = " 1.0 " encoding = " utf - 8 " ?> <过程版本sion="9.6.000">                                                                             


    Cheers,
    YY
    Jasmine_
  • bernardo_pagnonbernardo_pagnon Member, University ProfessorPosts:60University Professor
    Dear YY,

    thank you so much for your reply, and for taking the time to reproduce the results.
    Take a look at this process. i did the same thing and the results are pretty weird.

    Regards,
    Bernardo







    Jasmine_
  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist
    edited March 2020
    That is because your "min support" is set way too high and there is no association rules extracted based on the threshold.


    You have opened duplicated threads on the same question. For easy communication and trace down the issues, please go to
    https://community.www.turtlecreekpls.com/discussion/45849/fp-growth-itemset-one-of-the-items-is-oversupported#latest

    Jasmine_ sgenzer
  • bernardo_pagnonbernardo_pagnon Member, University ProfessorPosts:60University Professor
    Thank you for your reply, and sorry for opening multiple threads with the same question. I still do not get it, if the threshold is high, then the output of FP-Growth should be empty. It often happens that I put 0.95 and frequent item sets shows combinations with support 0.75, 0.6, etc. I don't see the purpose of the min support parameter if it does not help me cutting combinations below the 0.95 level.

    Best,
    Bernardo
    Jasmine_
Sign InorRegisterto comment.