Process that worked under RapidMiner 4.4 is now giving JAVA.OUTOFMEMORY ERROR

RobertoRoberto 成员Posts:13Contributor II
edited November 2018 inHelp
Hello,

I'm trying to use a feature selection with embeded validation and JSVMLearner to select for relevant features in a dataset. The dataset is a CSV file with 28 examples each containing 2000 attributes between the values of 0 and 1 with a signle label that can either be true or false. In the previous version of RapidMiner, I had no problem doing this...it just took lots of time. Now with 4.5, I'm getting an out of memory error from Java within 45 minutes of the run.

Here's my code:





































































<操作符的名字= " OperatorChain(2)”类=”操作符Chain" expanded="yes">






























Any help would be appreciated! Thanks!
Roberto

...A second, less pertinent question is the wrapper validation takes forever to process, in the past I have used just a Weighted Feature Selection on the dataset after performing a SVMWeighting operator that was not nested like this and gotten 100% accuracy within a couple of hours. Can I trust the results from that, or is the wrapper validation the way to go? Again, thanks so much!

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    to answer the second question first: This depends were you inserted the weighting. If it was inside the learning part of the XValidation, then it should not have used data it cannot know and the XVal should return a valid performance estimation.

    To the second question: The FeatureSelection might use a very huge amount of memory, and it might be that it uses now a little more memory, because some of the underlying data structures have been changed. Perhabs you can increase the maximum heap size?

    Greetings,
    Sebastian
  • RobertoRoberto 成员Posts:13Contributor II
    Thanks for the reply Sebastian,

    As per the status of my problem...the computer died on me this morning, so I'll have to wait until I can build my new system to deal with the memory issue. The new computer will have 48Gb RAM so I don't think it will have a problem. In your opinion, do you think the 48Gb system will be enough to handle a dataset with the same number of examples (28), but with about 13,500 attributes? Our 12Gb machine maxed out at a 24X3000 matrix of real values to train on, our 8Gb maxed out at 24X2000.

    As for my second question, this is the algorithm that I used to do that analysis...




















































































    Now if I'm understanding you correctly, your telling me to place the SVM weighting within the XValidation? Im a little bit confused as to how to structure that. The weight guided feature selection relies on an initial set of weights provided by the SVMWeighting operator. That operator is supposed to select for features that improve a model that is provided by the LibSVMLearner, whose incremental performance is monitored by the XValidation/Classification Performance Operators, correct? Then when the maximal fitness or a certain number of generations without improval is reached, the selected features are returned as a result along with the performance statistics? So where would I put in the SVMweighting exactly, or is it fine as is?

    thanks so much!
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi Roberto,
    the problem of the your approach is, that the weighting is calculated on the complete data set. But for estimating the real performance you have to use unseen data and if you weight using all your data you don't have any unseen data left.
    So I would suppose to include the complete weight guided feature selection into an XValidation. This would increase the runtime, but would give you reliable results.

    On your computer problem: I'm not sure if 48 GBs will suffice, because I don't know if the memory consumption increases linearly. But I know that I would be able to provide you with a plugin for forward and backward feature selection supporting a nearly infinit number of attributes for less than a tenth of that computer:)Probably for less than the price of 32 Gigs of that computers RAM...

    Greetings,
    Sebastian
  • RobertoRoberto 成员Posts:13Contributor II
    The problem with using the weight guided feature selection within a XValidation operator is that weight guided feature selection does not return a Model as its I/O, so the process fails...even if i save the resultant model with ModelWriter and then upload it using ModelUploader in the operation chain within XValidation, I still get an error. Now if I use wrapperXValidation, I get an error that the attribute weights are not being passed to the weight guided feature selection operator???? Can you suggest a work around?

    This is the code I used to get the first error:












































































    <操作符的名字= " OperatorChain(2)”类=”操作符Chain" expanded="yes">
















    And here's the code that doesn't pass the attribute weights to the feature selection:













































































    <操作符的名字= " OperatorChain(2)”类=”操作符Chain" expanded="yes">



















    至于插件,我们必须ee how the new computer performs, its already all ordered and half of the parts are here. If we can't do what we want to do with it, though, then my boss may just take you up on that offer. If I understand the logistics of how the feature selection operator works, though, memory consumption should be linear.

    Thanks for all your help Sebastian!
  • RobertoRoberto 成员Posts:13Contributor II
    Sebastian,

    I talked to my boss about the plugin. Could you send me the info on this plugin? How much for a single seat license? How much if I wanted to host it on a local server for up to 10 users?
    If you could please send me that info that would be great!

    Roberto
Sign InorRegisterto comment.