“PCA运营商塔基•ng to much time"

waqaskhan343waqaskhan343 MemberPosts:11Contributor I
edited June 2019 inHelp

Hello, I am performing sentiment analysis on text data in which I examine 1700 tweets. after performing all preprocessing of data I want to visualize it using PCA to check the relationship between the different classes. After generating TF-IDF I am using pca operator with componant=2 and fixed number variance but it taking much much time approx 2 to 3 hour. Even I put a normalize operator before PCA but it doesn't work for me

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    Did you apply any pruning when you generated your word vector? If not, then you probably have thousands of attributes, many of which have extremely low values, and that is why PCA is taking so long! You should definitely prune your wordlist first, since tokens that have only a handful of occurrences are not going to be meaningful, but they are causing a lot of computational effort on the part of the PCA operator.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    Thomas_Ott
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    What@Telcontar120said. Work on your wordlist first before you put it into PCA. Even just 50 attributes could chew up runtime if you don't have a large memory computer.

Sign InorRegisterto comment.