multiple data set relating and clustering please help me

shahab · August 2018

Hi every body

I want to read 3 dataset or csv files that one of thems is users data with user ID and ,,,,, and second is movie data such as movie ID and ,,,,,,,,, and the last is rating data with user ID and movie ID and ,,,,,,

finally after reading this 3 data i want to use kmeans clustering and cluster users bas on ratings for movie.can you help me?

lionelderkrikor · August 2018

Hi@shahab,

Can you share your dataset(s) (and eventually your process) in order we better understand, and help you ?

Regards,

Lionel

shahab · August 2018

sure.

because one of files is large i upload it on an upload center.

http://s8.picofile.com/file/8332549950/ratings.csv.html

I have 3 dataset :1- Users with some users data and an unique identifier(User ID).

2- Movies with some attributes such as genres-name and an unique identifier(Movie ID).

3-rating dataset with user ID and Movie ID and users rating to movies

I want to cluster users base on age-gender and movie ratings with kmeans clustering.

我可以做this.thanks如何

lionelderkrikor · August 2018

Hi@shahab,

Here a possible solution to cluster your data :





< macros/>

I hope it helps,

Regards,

Lionel

shahab · August 2018

Hi Mr Lionel.

I used your solution but i couldnt to solve my problem by using this solution.

I describe my data and application again.

I have 3 dataset:

Users datasetthat there are some users attributes such as UserID;Gender;Age;Occupation;Zip-code

UserID is a unique ID and Gender and Age is used for my sample and model.

Movies Datasetthat it consist of MovieID;Title;Genres;

in this dataset Movie ID is unique and others are movie attrbibutes.

Ratings Datasetthis dataset have UserID-Movie ID as identifiers and ratings as user rating to movies with each genre.

i want to cluster my users base on age -gender and favorite genre .

here user geneder has 2 value (man-woman) and age can be 4 age ranges.

please help me.

thank you so much

lionelderkrikor · August 2018

Hi@shahab,

"I used your solution but i couldnt to solve my problem by using this solution"

Can you me more explicit ?

Personally, I don't know what to add to the process I shared....

Regards,

Lionel

shahab · August 2018

Hi Mr Lionel and thanks a lot

I added datasets that i uploaded previous.

there are on top messages.

Do you need that i upload them again?

lionelderkrikor · August 2018

Hi@shahab,

No need to upload again your datasets : I worked with these datasets to build the process I shared.

You said :"I used your solution but i couldnt to solve my problem by using this solution" ==>But I don't understand why the process I shared don't answer to your problem. So could you more explicit about this "problem". As said I don't know what to add (or to remove) to the process I shared.

Regards,

Lionel

shahab · August 2018

HI Lionel and thanks a lot.

would you describe your process details

in this process we have 2 read csv operator while we have 3 dataset totally.

2数据集必须进口的运营商?

regards.

Shahab

lionelderkrikor · August 2018

Hi@shahab,

You want to cluster the data based on"age-gender and movie ratings"according to your first post.

Inusers.csvdataset, I have the following attributes : UserID, gender and age.

Inratings.csvdataset, I have the following attributes : UserID, Ratings

Then, I apply theJoin操作符between these two datasets with the UserID as key-attribute.

The resulting dataset contains the following attributes : UserId, gender, age, Ratings.

Then, I select only these three attributes : gender, age, Ratings and apply a clustering model ...

... to apriori obtain what you want to do...

So in conclusion no need of your third dataset (Movies dataset)

NB : If you want to cluster data based on "age -gender and favorite genre" (in a other of your post), you have, in deed, to join

theMovies datasetto other datasets, to have in fine in a unique dataset the following attributes : UserID, age - gender and genre.

After you can maybe use theAggregate操作符to obtain the "favorite genre" according to UserID (and thus age-gender).

I hope it helps,

Regards,

Lionel

shahab · January 2019

Hi Mr Lionel and thanks for your descriptions.but we have some problems in this section:
" NB : If you want to cluster data based on "age -gender and favorite genre" (in a other of your post), you have, in deed, to join

the Movies dataset to other datasets, to have in fine in a unique dataset the following attributes : UserID, age - gender and genre.

After you can maybe use the Aggregate operator to obtain the "favorite genre" according to UserID (and thus age-gender)."

How we can use all of datasets with others and select and produce outputs based on our inputs with this format:

clusters based on users who have ID and gender and age ranges (for example we want to categorize users based on their age ranges for example 5-10 as children (male and female )and 10-18 teenagers and ....... ) and their favorite genres .

please help us.thanks a lot

sincerely Shahab

Telcontar120 · January 2019

我困惑你的描述你想要的task. Do you actually want to cluster based on age and gender along with movie rating, or do you actually want to cluster only based on movie rating, based on a predefined set of gender and age splits? Because these are two different tasks.
If you want to do the latter, then you will need to create your age and gender bins and then run a separate clustering analysis for each of them (which you can do using loops). This will give you clusters based on movie ratings within each group defined by age/gender.
If you want age, gender, and movie rating all to be used in clustering, make sure you have normalized your data first. But this is not going to give you discrete user categorizations (eg., all females between 20-29) for defining your clusters as you have described above.

shahab · May 2019

hi and thanks.please help me to do both of these operations.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

multiple data set relating and clustering please help me

Best Answer

Answers