I am working with a large dataset with over 15 million rows. I have created the dataset on the server and trying to use auto model. The processes are getting killed since only 4096M are assigned to each job process. How do I change it on the linux docker image?
I do change that on windows, you can see if this is similar or not in Linux. I dont use docker but this is based on some reading, In docker, the configurations are in volume/configuration and the parameter is JOBAGENT_CONTAINER_MEMORYLIMIT. You can change the memory limit here. I also think that the memory you can allocate is based on your license, if it is unlimited then no problem. You can see your limit in server page under Administration --> Manage License
One more thing related to Automodel, Auto model will downsample data and it is absolute downsampling, you can check that in the model. If you want to run 15 million then you need to remove this sample operator inside process. I am not sure how feasible it is to run such huge data on AM.
Be Safe. Follow precautions and Maintain Social Distancing
1
IngoRMAdministrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
Just a quick correction ;-)
...Auto model will downsample data to 250,000 samples...
AM actually does not always sample down to that number. The sampling depends on the learning algorithm and can be even less (for slower algorithms like SVM) or much more (for faster ones like Naive Bayes). Just wanted to throw this out here :-)
Those sample rates are chosen to keep AM within reasonable time limits. As@Varunhas mentioned, you can manually change them in the process but often that does not make a lot of sense for most data sets so I would be a bit careful that you do not blow up the whole process...
Answers
MarlaBot
Looks like your post is not answered yet
I do change that on windows, you can see if this is similar or not in Linux. I dont use docker but this is based on some reading, In docker, the configurations are in volume/configuration and the parameter is JOBAGENT_CONTAINER_MEMORYLIMIT. You can change the memory limit here. I also think that the memory you can allocate is based on your license, if it is unlimited then no problem. You can see your limit in server page under Administration --> Manage License
One more thing related to Automodel, Auto model will downsample data and it is absolute downsampling, you can check that in the model. If you want to run 15 million then you need to remove this sample operator inside process. I am not sure how feasible it is to run such huge data on AM.
@Marco_Boeckor@Edin_Klapicmight have something here about memory change.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Ingo
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing