"Issues with processing data and clustering operators"

naman_sharmanaman_sharma MemberPosts:4Contributor I
edited June 2019 inHelp

Hi,

I am making a project on Rapidminer for the Kaggle Walmart Customer Trip type prediction but I want to use Clustering Algorithm instead of Prediction to find the maximum and minimum sales based on days and the departments making the maximum and minimum sales. I am using the same data set used in the Kaggle competition.

I am new to data analytics and am trying to understand the operators to reach my result but I am unable to proceed ahead with the process. Please have a look at the process flow in the attachment and help me out by letting me know where am I going wrong.

Dataset:https://www.kaggle.com/c/walmart-recruiting-trip-type-classification/data

Regards,

Naman

process.png 45.9K
Tagged:

Answers

  • dangdang MemberPosts:11Contributor II

    @naman_sharmawhere is your clustering operator? Could you please share the process xml code or rmp file?

    sgenzer
  • naman_sharmanaman_sharma MemberPosts:4Contributor I

    @dangPlease see the attached rmp file. I tried using k-means for clustering but its taking too much time to complete the process. In 2 hours it completed just 14% of the process.

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    @naman_sharmathe process you shared has no clustering operator attached. Please attach the dataset you used, I don't want to answer a survey from Walmart to unlock the dataset.

  • naman_sharmanaman_sharma MemberPosts:4Contributor I

    @Thomas_OttI have attached the dataset.

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    @naman_sharmathe process runs in about 40 seconds on my machine, so it might be a problem with memory or the type of license you have.

    I'm not familiar with this dataset but I noticed that the "visit numbers" attribute is on a huge scale (from 5 to 20,000 or so). That'll skew the results a bit and you might want to think about normalizing that if it makes sense.



























    <参数键= =“FinelineNumber.true.inte“6”价值ger.attribute"/>






































    sgenzer
Sign InorRegisterto comment.