RM 9.1 feedback : Auto-Model /Calculation of the Std-dev of the performance
lionelderkrikor
Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
Hi,
There is an inconsistency between the standard deviation of the performance delivered by thePerformance Average (Robust)operator :
and the Standard deviation of the logged performance (inside the CV) calculated via theAggregateoperator :
We can see that the average of the performance is the same in both case.
How explain this difference of results ?
Regards,
Lionel
NB : The process in attached file
There is an inconsistency between the standard deviation of the performance delivered by thePerformance Average (Robust)operator :
and the Standard deviation of the logged performance (inside the CV) calculated via theAggregateoperator :
We can see that the average of the performance is the same in both case.
Regards,
Lionel
NB : The process in attached file
Tagged:
0
Best Answers
-
IngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM FounderHi Lionel,Sorry, you are right, I was a bit on a wrong track here :-) And thanks for your persistence, since this indeed looks like the std dev calculation in the new average operator is broken. After it all it does not seem too robust then :-) We will have a look into this asap.Best,Ingo
5 -
IngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM FounderHi Lionel,Ok, we checked a bit deeper here. The difference in the numbers (0.044 vs. 0.05) is a result of the Bessel correction which is performed as part of the std dev calculation in Aggregation but not when the std dev is calculated on the performance vectors. Here is more on the Bessel correction in case you are interested in the details:https://en.wikipedia.org/wiki/Bessel's_correctionIf you are familiar with Excel, the two functions for this are "STDEV.P" vs. "STDEV.S". It is important to notice that there is no real "right" or "wrong" here, although I typically would apply the correction (or, in Excel-speak, use the function STDEV.S).
The reason why the Aggregate calculation performs the correction (i.e. using N-1 as the denominator instead of N) is that the Aggregate function is working on a data table which typically a sample of the complete population. Therefore, the correction should be applied following this logic.
You could argue that the same could hold true for the average building of the performance vectors (which I could easily agree with). However, the original implementation assumed that the population values which are averaged are not a sample but the complete known population. Which I can also follow to some degree.
By the way, this whole phenomenon can also be observed if you run a cross validation and compare the std dev there to the one calculated by yourself or by Excel. Depending on which function you use, i.e. if you apply the correction or not, you would either get the result from the cross-validation or from the aggregate operator.This is a tough one to be honest. I see arguments for both sides and I am somewhat inclined to change the calculation of the cross validation to a version where the Bessel correction is applied. But as I said, I can also see the argument for the other side where it should not.最后我想添加交叉瓦里dation operator (and the other validation loop operators) have been around for about 15 years now and nobody ever wanted us to apply the Bessel correction so far. This could be a pointer that either (a) nobody cared or (b) some people did care but agreed that the correction may not need to be applied here. In any case the differences are typically relatively small anyway.So here you have the reason but where to go from here? What do you think we should do? And others? I would like to understand your views a bit better before I would push this into the product management process. After all, the validation operators are pretty central and changing their behavior should not be easily done...Best,
Ingo
1
Answers
Ingo
Thanks for answer me... but ...
I allow myself to insist, being clearer.
In the process I shared, I compare thesamevalidation method (the multiple hold-out set validation) results via 2 different methods of calculation :
- the first is the result provided by thePerformance Average (Robust)operator
- the second is the result of the calculation of the average (after removing the 2 outliers via theFilter Example Rangeoperator), via the operatorAggregateoperator of the logged performance(s) inside thePerformance for Hold-Out Sets (Loop)operator.
From my point of view, the results of these 2 differents methods of calculation must be strictly equal.(it is the case for the average of the performance but not for its standard-deviation)
I hope sincerely you take some time to take a look at this process and the associated results because I think
there is something weird in thestandard deviationof the performance associated to themultiple hold-out set validation.
Regards,
Lionel
NB : I must admit that I did not express myself correctly in my first post. In deed, I assimilated the "multiple hold-out set validation"
to the "cross-validation" : I think that misled you...
You're welcome, Ingo. Thank you for spending time to take a look at this.
Regards,
Lionel
Interesting topic, it reminds me of the statistics courses at my engineering school...
I agree with you, it's a difficult choice : There are relevant arguments on both sides.
Personally, the first argument that came to my mind is that the continuity of the method of calculation of the performance must be ensured.
In deed, for example in order to compare the past performances of a model (created many years ago) to future version(s) of this same model,
the method of calculation of the performance must be the same in order to compare "apples with apples".
自从RapidMiner最初它tes the std-dev of the performance associated with cross-validation without Bessel correction, I would be tempted to keep this method...
I hope I took a little bit of debate,
Regards,
Lionel