First steps. Need help in clustering

Antonios1Antonios1 MemberPosts:9Learner I
edited October 2020 inHelp

hi,

I create a fictious dataset using Excel RANDBETWEEN function. The dataset is composed of 18000 rows and two columns. Columns A contains IDs with values ranging between 1 and 100. Column B contains an hypothetical expense amount between 0 and 50000 for each ID numbers except for ID number 100 whose column B corresponding expense range is narrower and comprised between 48000 and 50000.

Let’s suppose I don’t know how the dataset is composed and I’d wanted to see it there is one ore more IDs with anomaly concentration (I mean I would like the analysis to spot ID number 100 with its concentration between 480000 and 50000), what kind of analysis I should perform? I tried with clustering (k-means), but without success; probably I do not know the steps to follow to perform the analysis. Might somebody help me?

Best Answer

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    Solution Accepted
    Try some of the operators in the anomaly detection methods available in the free extension of that name. LOF might be particularly useful in this type of context.
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    Antonios1

Answers

  • Antonios1Antonios1 MemberPosts:9Learner I
    Thanks for helping Brian. I am really new at Rapidminer and AI, so forgive me if I do not use the relevant terms. Anyway, I am sorry I was unable to test the LOF operator. I downoload the anomaly detection extension and used the LOF operator. I connected my file through the out port to the exe port on the LOF operator and connected the exa operator port to the res port. The process seemed to take a lot of time to give an output so I stopped it after a few hours, I run it again this morning before going to work and once back at one, I found the software crashed. I have launched it again to see how it proceed. Now it has been running for about 1 hour and still going. Pc is an i7 with 16GB Ram.


  • Antonios1Antonios1 MemberPosts:9Learner I
    Thank you, Brian. It works. I had the possibility to run the operator on a different pc and it worked correctly. It also seems to be quite immediate to interpret the result..
Sign InorRegisterto comment.