multiple data set relating and clustering please help me

shahabshahab MemberPosts:8Contributor II
edited December 2018 inHelp

Hi every body

I want to read 3 dataset or csv files that one of thems is users data with user ID and ,,,,, and second is movie data such as movie ID and ,,,,,,,,, and the last is rating data with user ID and movie ID and ,,,,,,

finally after reading this 3 data i want to use kmeans clustering and cluster users bas on ratings for movie.can you help me?

Best Answer

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Solution Accepted

    Hi@shahab,

    Can you share your dataset(s) (and eventually your process) in order we better understand, and help you ?

    Regards,

    Lionel

    sgenzer

Answers

  • shahabshahab MemberPosts:8Contributor II

    sure.

    because one of files is large i upload it on an upload center.

    http://s8.picofile.com/file/8332549950/ratings.csv.html

    I have 3 dataset :1- Users with some users data and an unique identifier(User ID).

    2- Movies with some attributes such as genres-name and an unique identifier(Movie ID).

    3-rating dataset with user ID and Movie ID and users rating to movies

    I want to cluster users base on age-gender and movie ratings with kmeans clustering.

    我可以做this.thanks如何

    movies.csv 166K
    users.csv 112.9K
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@shahab,

    Here a possible solution to cluster your data :





    < macros/>
































































    I hope it helps,

    Regards,

    Lionel

    sgenzer
  • shahabshahab MemberPosts:8Contributor II

    Hi Mr Lionel.

    I used your solution but i couldnt to solve my problem by using this solution.

    I describe my data and application again.

    I have 3 dataset:

    Users datasetthat there are some users attributes such as UserID;Gender;Age;Occupation;Zip-code

    UserID is a unique ID and Gender and Age is used for my sample and model.

    Movies Datasetthat it consist of MovieID;Title;Genres;

    in this dataset Movie ID is unique and others are movie attrbibutes.

    Ratings Datasetthis dataset have UserID-Movie ID as identifiers and ratings as user rating to movies with each genre.

    i want to cluster my users base on age -gender and favorite genre .

    here user geneder has 2 value (man-woman) and age can be 4 age ranges.

    please help me.

    thank you so much

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@shahab,

    "I used your solution but i couldnt to solve my problem by using this solution"

    Can you me more explicit ?

    Personally, I don't know what to add to the process I shared....

    Regards,

    Lionel

  • shahabshahab MemberPosts:8Contributor II

    Hi Mr Lionel and thanks a lot

    I added datasets that i uploaded previous.

    there are on top messages.

    Do you need that i upload them again?

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@shahab,

    No need to upload again your datasets : I worked with these datasets to build the process I shared.

    You said :"I used your solution but i couldnt to solve my problem by using this solution" ==>But I don't understand why the process I shared don't answer to your problem. So could you more explicit about this "problem". As said I don't know what to add (or to remove) to the process I shared.

    Regards,

    Lionel

  • shahabshahab MemberPosts:8Contributor II

    HI Lionel and thanks a lot.

    would you describe your process details

    in this process we have 2 read csv operator while we have 3 dataset totally.

    2数据集必须进口的运营商?

    regards.

    Shahab

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@shahab,

    You want to cluster the data based on"age-gender and movie ratings"according to your first post.

    Inusers.csvdataset, I have the following attributes : UserID, gender and age.

    Inratings.csvdataset, I have the following attributes : UserID, Ratings

    Then, I apply theJoin操作符between these two datasets with the UserID as key-attribute.

    The resulting dataset contains the following attributes : UserId, gender, age, Ratings.

    Then, I select only these three attributes : gender, age, Ratings and apply a clustering model ...

    ... to apriori obtain what you want to do...

    So in conclusion no need of your third dataset (Movies dataset)

    NB : If you want to cluster data based on "age -gender and favorite genre" (in a other of your post), you have, in deed, to join

    theMovies datasetto other datasets, to have in fine in a unique dataset the following attributes : UserID, age - gender and genre.

    After you can maybe use theAggregate操作符to obtain the "favorite genre" according to UserID (and thus age-gender).

    I hope it helps,

    Regards,

    Lionel

    sgenzer Pavithra_Rao
  • shahabshahab MemberPosts:8Contributor II
    Hi Mr Lionel and thanks for your descriptions.but we have some problems in this section:
    " NB : If you want to cluster data based on "age -gender and favorite genre" (in a other of your post), you have, in deed, to join

    the Movies dataset to other datasets, to have in fine in a unique dataset the following attributes : UserID, age - gender and genre.

    After you can maybe use the Aggregate operator to obtain the "favorite genre" according to UserID (and thus age-gender)."

    How we can use all of datasets with others and select and produce outputs based on our inputs with this format:

    clusters based on users who have ID and gender and age ranges (for example we want to categorize users based on their age ranges for example 5-10 as children (male and female )and 10-18 teenagers and ....... ) and their favorite genres .

    please help us.thanks a lot

    sincerely Shahab

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    我困惑你的描述你想要的task. Do you actually want to cluster based on age and gender along with movie rating, or do you actually want to cluster only based on movie rating, based on a predefined set of gender and age splits? Because these are two different tasks.
    If you want to do the latter, then you will need to create your age and gender bins and then run a separate clustering analysis for each of them (which you can do using loops). This will give you clusters based on movie ratings within each group defined by age/gender.
    If you want age, gender, and movie rating all to be used in clustering, make sure you have normalized your data first. But this is not going to give you discrete user categorizations (eg., all females between 20-29) for defining your clusters as you have described above.
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    sgenzer
  • shahabshahab MemberPosts:8Contributor II
    hi and thanks.please help me to do both of these operations.

Sign InorRegisterto comment.