"How to join TID"

wasylniwasylni MemberPosts:6Contributor II
edited June 2019 inHelp
Hello,

I am trying to find some association rules using RapidMiner 4.4.000
I am connecting to MySQL database and dig data from it by entering below SQL statements:

SELECT
`orders_products`.`orders_id` AS `TID`, `orders_products`.`products_name` AS
`ITEM`
FROM
`orders_products`;


after executing above I am getting data :

TID ITEM
1 Speed 2: Cruise Control
2 Red Corner
3 Speed 2: Cruise Control
4 Microsoft IntelliMouse Explorer
5 The Replacement Killers
5 Courage Under Fire
6 The Matrix
7 There's Something About Mary
7 Speed 2: Cruise Control
8 Under Siege 2 - Dark Territory
8 SWAT 3: Close Quarters Battle
.
.
.
up to 25000 records (15000 transactions)

Can someone advise how to distinct TID to have all products for given TID in one row (like below)

TID ITEM
1 Speed 2: Cruise Control
2 Red Corner
3 Speed 2: Cruise Control
4 Microsoft IntelliMouse Explorer
5 The Replacement Killers, Courage Under Fire
6 The Matrix
7 There's Something About Mary, Speed 2: Cruise Control
8 Under Siege 2 - Dark Territory, SWAT 3: Close Quarters Battle

???

Thanks,
wasyl





Tagged:

Answers

  • wasylniwasylni MemberPosts:6Contributor II
    when Nominal2Binominal is applied I am getting
    1.0 "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    2.0 "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    3.0 "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    4.0 "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    5.0“false”“假”“假”“true”“假”“假”"false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    5.0 "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    6.0 "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    7.0 "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    7.0 "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    8.0 "false" "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    8.0 "false" "false" "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    9.0 "false" "false" "false" "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    9.0 "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"


    but still can not Distinct unique TID

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, MemberPosts:294RM Product Management
    Hi,

    you have to use the [tt]Examples2AttributesPivoting[/tt] operator. Presumably, you also have to generate a column consisting of ones to acutally fill the newly created attributes with some values.

    Kind regards,
    Tobias
  • wasylniwasylni MemberPosts:6Contributor II
    Thanks,

    can you give me some tip on how to get results like these below (pivoting does not work or I am doing something wrong)

    example transaction 5
    5.0 "false" "false" "false""true""false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    5.0 "false" "false" "false" "false""true""false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"

    i would like to get distinct data (in one row) like

    5.0 "false" "false" "false""true" "true""false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
  • steffensteffen MemberPosts:347Maven
    Hello

    I studied your problem and tried various approaches, but I wasnt successful, too. The best result I could achieve with the Example2AttributePivoting was this:



    <不相上下ameter key="filename" value=""/>




    <不相上下ameter key="attribute" value="TID"/>



    <不相上下ameter key="condition_class" value="attribute_value_filter"/>
    <不相上下ameter key="parameter_string" value="TID=%{a}"/>







    <不相上下ameter key="group_attribute" value="TID"/>
    <不相上下ameter key="index_attribute" value="id"/>


    ... giving data in csv format as in your first post.

    The result of this approach breaks the one column = one item - policy, which is not preferred in general.

    So what now: The only thing I can suggest is to apply Nominal2Binominal, Sorting using TID and then your own selfwritten - operator which aggregates the rows using the boolean "OR"-operator for every TID for every column. Maybe there is an approach using only available operators, but I cannot think of any.

    regards,

    Steffen

    PS@RapidMiner-Team: This is another usecase for scripting, isnt it ?;)
  • wasylniwasylni MemberPosts:6Contributor II
    How can I build my own operator,

    Can I apply somecomplex;)SQL/PLSQL statements or something similar to distinct mentioned above rows ?

    regards,
    wasylni
  • keithkeith MemberPosts:157Guru
    This can be done entirely with MySQL's GROUP_CONCAT operatorhttp://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat, and doesn't need RM at all (assuming I've understood the question correctly):

    SELECT
    `orders_products`.`orders_id` AS `TID`,
    GROUP_CONCAT(`orders_products`.`products_name` ORDER BY 1 SEPARATOR ',') as `ITEM`
    FROM
    `orders_products`;
    GROUP BY `orders_products`.`orders_id`

    I use Oracle and MS SQL Server, which don't have GROUP_CONCAT, and so haven't tested this in MySQL, but the link above is from the MySQL documentation, so if it doesn't work, it should be easy to figure out why.
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, MemberPosts:294RM Product Management
    Hi,

    finally I had another look into your issue. I have to correct my former advise to create a column of ones before applying the pivoting operator. Nevertheless you have to create kind of dummy column that has to have different values for each dataset row. Here is the code:



    <不相上下ameter key="filename" value="tdata.csv"/>




    <不相上下ameter key="name" value="id"/>


    <不相上下ameter key="group_attribute" value="TID"/>
    <不相上下ameter key="index_attribute" value="ITEM"/>




    <不相上下ameter key="condition_class" value="attribute_name_filter"/>
    <参数键=“attribute_name_regex”值="TID"/>
    <不相上下ameter key="invert_selection" value="true"/>

    <不相上下ameter key="attributes" value=".*"/>


    <不相上下ameter key="replace_what" value="?"/>
    <不相上下ameter key="replace_by" value="false"/>
    <不相上下ameter key="add_default_mapping" value="true"/>
    <不相上下ameter key="default_value" value="true"/>



    Hope that solves your problem,
    Tobias
  • wasylniwasylni MemberPosts:6Contributor II
    Hello,
    Thanks for all your replies guys!

    I have tested all proposed solutions to my problem and it looks like the best for further progres will be simplest solution proposed by keith,
    as I am running out of memory on the one proposed by steffen, and there seems to be some problem with Tobias solution(or maybe its me ???).


    What I have done is:
    connected to MySQL database
    I`ve applied sql
    <不相上下ameter key="query" value="SELECT
     `orders_products`.`orders_id` AS `TID`,
     GROUP_CONCAT(`orders_products`.`products_id` ORDER BY 1 SEPARATOR ',') as `ITEM`
    FROM
     `orders_products` GROUP BY `orders_products`.`orders_id`;"/>
    <不相上下ameter key="label_attribute" value="ITEM"/>
    <不相上下ameter key="id_attribute" value="TID"/>
    and I have result as shown below:

    行没有TID项
    1 1.0 18
    2 2.0 14
    3 3.0 18
    4 4.0 26
    5 5.0 4,16
    6 6.0 6
    7 7.0 18,19
    8 8.0 10,21
    9 9.0 7,14
    十10.0 26
    11 11.0 12,17
    12 12.0 12
    13 13.0 1
    14 14.0 2,14,23,25

    .
    .
    .
    up to transactions15000


    can you please advise what do I have to apply next in order to get some rules of above data set?

    Thanks,
    wasylni


  • wasylniwasylni MemberPosts:6Contributor II
    I think I might have found solution:

    I will apply

    1)GROUP_CONCAT `orders_products`.`products_id` as ITEM

    2)split

    3)apply FPGrowth

    4)appl AM

    what do you think ?




    SESION DETAILS HERE



    <不相上下ameter key="query" value="SELECT GROUP_CONCAT(`orders_products`.`products_id` ORDER BY 1 SEPARATOR ',') as `ITEM` FROM `orders_products` GROUP BY `orders_products`.`orders_id`;"/>


    <不相上下ameter key="attributes" value="ITEM"/>
    <不相上下ameter key="apply_to_special_features" value="true"/>
    <不相上下ameter key="split_mode" value="unordered_split"/>


    <不相上下ameter key="find_min_number_of_itemsets" value="false"/>
    <不相上下ameter key="min_support" value="0.0010"/>


    <不相上下ameter key="keep_frequent_item_sets" value="true"/>
    <不相上下ameter key="min_confidence" value="0.5"/>


  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, MemberPosts:294RM Product Management
    Hi,

    if it works (?) this seems to be a solution to your problem!;)The steps after having loaded the data seems to be ok, if the data is loaded as you have posted.

    Kind regards,
    Tobias
Sign InorRegisterto comment.