How can I obtain the accuracy list of my process?
fiddinyusfida
MemberPosts:12Contributor II
Hi everyone,
我很新的Rapidminer和发现规律lty here. I am conducting a loop process for a model, says 10 iterations and calculate the accuracy performance. However, the result shows only the averaged accuracy or final accuracy. I need the list of accuracy (which is contains 10 accuracies) in order to further check using statistical software like SPSS.
Is it possible to obtain accuracy list of my process using rapidminer?
Below is the averaged accuracy sample. Thanks for your kind response
我很新的Rapidminer和发现规律lty here. I am conducting a loop process for a model, says 10 iterations and calculate the accuracy performance. However, the result shows only the averaged accuracy or final accuracy. I need the list of accuracy (which is contains 10 accuracies) in order to further check using statistical software like SPSS.
Is it possible to obtain accuracy list of my process using rapidminer?
Below is the averaged accuracy sample. Thanks for your kind response
Tagged:
0
Best Answer
-
varunm1 Moderator, MemberPosts:1,207UnicornHello@fiddinyusfida
Did you add a performance operator inside the loop operator? Here is a sample XML code (Click SHow) on Titanic dataset. To use this XML code, you need to copy this and open a new process in your rapidminer, the paste this in XML window of rapidminer and click on green tick mark. You will see the process in the process window.
<运营商激活= " true " class = " apply_model“compatibility="9.3.001" expanded="true" height="82" name="Apply Model" width="90" x="380" y="187">
<操作符= " true " class = " performance_clas激活sification" compatibility="9.3.001" expanded="true" height="82" name="Performance" width="90" x="648" y="136">
Please inform if you need more information.
Regards,
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
6
Answers
In your previous solution, I cannot define how many iterations. Here I attached the loop with average function.
After I calculated manually, Why does this process produce a different averaged result?
Thanks for the process, I did check the process. My understanding is the change in accuracy is based on splitting of data. As you are splitting it some times the test set changes and train set changes t changes accuracy. I fixed it by using a "local random seed" option in Split data operator, can you check now the below-modified process and see it is ok for you.
<运营商激活= " true " class = " apply_model“compatibility="9.3.001" expanded="true" height="82" name="Apply Model" width="90" x="514" y="85">
<操作符= " true " class = " performance_clas激活sification" compatibility="9.3.001" expanded="true" height="82" name="Performance" width="90" x="715" y="85">
<运营商激活= " true " class = " apply_model“compatibility="9.3.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="514" y="187">
<操作符= " true " class = " performance_clas激活sification" compatibility="9.3.001" expanded="true" height="82" name="Performance (2)" width="90" x="648" y="136">
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
I just curious,
Are there any ways to make this local random seed increases as the iteration process?
Such as this pseudocode
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
@hughesfleming68Thanks for the response. What actually I want to do is repeating the process 30 times (based on the Central Limit Theorem) by using a random seed.
After I obtain the 30 accuracies (comes from random seed 1 to 30), I want to do statistical hypothesis testing to know whether my proposed method is significant or not (compare to another).
Or is there any suggestion about this?
I quoted central limit theorem from this link
(https://www.investopedia.com/terms/c/central_limit_theorem.asp)
In your case, the data splitting is the weak link and whether you change the spit ratio,sampling type or random seed, you still could get wildly different different results. It is something I would do as a last resort. It is better to use as much data as you can and then use cross validation or sliding window validation in the case of a time series to get a result you can start to trust. In the end only testing on out of sample data will tell you if your testing was valid. If your data is very random....sometimes we can't control this part then even averaging 30 times might not be helpful. It all depends how stable your data is.
I have only 100 records and seems hard to add the data since I obtained it from public dataset repository.
So, based on your tips, It will be better if I use Cross-validation (for instance K=10) and just averaging the accuracy instead of doing data-split with ratio?
I really appreciate your advice, thank you....