what is the logic regarding the impact of weights on prediction?
MiguelHH98
MemberPosts:11Contributor I
Hi!
I am working on a project in which I need to predict the value of a variable based on others that in the set are part of a database, which, in turn, I am using as input in the program. For this, the tool or method that I'm using in Rapidminer is Automodel. Everything good when running the model. The algorithm that came out as the best was Gradient Boosted Trees, so I focused on that one. Once there, in the tab "Weights" certain variables (let's call them "a", "b" and "c") came out as the most influential or of major importance. Then I went to the tab "Simulator" in order to see how these variables affect the value of my target variable (suppose "y"). However, the value remains intact. I tried modifying the values of the variables that were less influential to see if any had an impact on "y". While doing this test, I came across two variables ("m" and "n") that did change the value of "y" but what seemed strange to me was that neither of them was as influential as "a", "b "or" c ". Another thing that I observed and found curious was that, in the tab "Production Model", most of the trees presented these two variables "m" and "n" as headers, but I don't know what I can conclude from it. Please, I would like someone to explain to me why this happens or what the real logic regarding the impact of weights on prediction is, and why certain variables that are hardly influential at all do cause an impact. I hope you can help me. Thanks in advance.
Regards,
Miguel Hinostroza
I am working on a project in which I need to predict the value of a variable based on others that in the set are part of a database, which, in turn, I am using as input in the program. For this, the tool or method that I'm using in Rapidminer is Automodel. Everything good when running the model. The algorithm that came out as the best was Gradient Boosted Trees, so I focused on that one. Once there, in the tab "Weights" certain variables (let's call them "a", "b" and "c") came out as the most influential or of major importance. Then I went to the tab "Simulator" in order to see how these variables affect the value of my target variable (suppose "y"). However, the value remains intact. I tried modifying the values of the variables that were less influential to see if any had an impact on "y". While doing this test, I came across two variables ("m" and "n") that did change the value of "y" but what seemed strange to me was that neither of them was as influential as "a", "b "or" c ". Another thing that I observed and found curious was that, in the tab "Production Model", most of the trees presented these two variables "m" and "n" as headers, but I don't know what I can conclude from it. Please, I would like someone to explain to me why this happens or what the real logic regarding the impact of weights on prediction is, and why certain variables that are hardly influential at all do cause an impact. I hope you can help me. Thanks in advance.
Regards,
Miguel Hinostroza
Tagged:
0
Best Answer
-
varunm1 Moderator, MemberPosts:1,207UnicornOh yeah, the weights shown in your image corresponds to number 2.
If I wanted to test the effects of the attributes on the prediction´s value, on the simulator tab, should I take into account the weights of number 2? Because as I showed you, in this case specifically, it seems that they don't have nothing to do. Unlike the weights of numbers 3 and 4 which make more sense.I recommend you use weights shown in 3 and 4. In your case, 2 has a limited explanation of predictions and doesn't correspond with model-based weights..
Regards,
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
5
Answers
The weights that you are seeing in automodel are related to "local prediction weights". These are different from global model weights. The weights are calculated by an operator called "Explain Predictions". This operator works based on Locally Interpretable model explanations (LIME) method (modified version).
These weights are used to explain, which attributes are important locally for individual predictions instead of a global scale. I will explain you with an example process.
I created a process where I use split validation (70:30 , train:test) on deals data present in the community samples.
If you run the attached process you will get two windows in results screen, one related to the model and another related to weights. So if you take a close look at the "Description" , You can see that Age has huge importance compared to the other two attributes. The way these are calculated are based on training of model (training data). These are global variable importances.
On the other hand, if you see the Attribute weights calculated by "Explain Predictions" operator, you can observe that "Age" has less weight.
Why is this? The reason is, the methods used and their interpretations are different. Explain predictions will calculate weights based on "Predictions" (Testing Data). This explains which attributes are important in predictions rather than a global model as we saw in the previous case.
Why do we need this? Every model cannot provide you with global importance and it is tough to explain the importance of attributes for predictions based on global importance as this is much more complex. You will find cases where you need to explain each prediction (E.g.: Medical Diagnosis). Remember, this prediction might be right or wrong. So, rapidminer comes up with a method that takes a Correlation-based LIME method to calculates attribute weights based on correct and wrong predictions.
@JoanneyuI guess this explanation might help you as well.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
我认为现在我更好地理解不同的选择ween these two types of weights. But I still don´t get why the predicted value doesn't change when I modify the values of the attributes wich have the highest wights (for prediction).
I mean, this atributte has the highest wight:
but it doesn't cause an impact on the prediction:
And these attributes, which have lower weights...
do affect the prediction, for example this:
I don't understand why it happens, or maybe I'm getting something wrong. I hope you can help me. Thanks!
Regards,
MH
Can you open the process and run it to check how the global attribute importance of a GBT model is? To do this, you just need to click on the Open process in auto model and then run the process. You will get multiple windows in results, there you need to check GradientBoosted(Model PO) tab and then go to description, you will find variable importances of GBT.
If you want me to check and confirm, I need your data to do that.
有时全球不匹配loc重要性al attribute weights which might cause these sorts of behaviors. I am also attaching@IngoRMif he could add something here.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Sorry, I´d like you to check the data by yourself but I'm not allowed to share this information. Nevertheless, I could do what you asked me to, and this makes more sense:
The first three attributes cause a reaction on the prediction. So I can conclude that I have to pay attention to this weights instead of the AttributeWeights when I want to test the sensitivity of the prediction. Please correct me if I'm wrong.
My goal in this project is to figured out the combination of values of all these attributes that maximizes the prediction's value. That´s why I'm using the simulator. Also, I was using the Attributes weights to explain how much each attribute affects the prediction, but now I'm not sure if these weights are usefull in this case, and, If they were, how could they help?
I remain attentive to your answer. Thanks!
Regards,
MH
In the simulator, there is another effect which can make things confusing sometimes. While some attributes are important globally (i.e. for most examples), they may have little influence for a specific example. This can lead to situations where you change an attribute which is supposed to be somewhat important with little or no impact on the prediction because it is just not that important for the example at hand. What is currently important is shown by the local weights (number 4) in the bottom right of the Simulator.
Hope this helps,
Ingo
Just to confirm:
the weights I showed you in my last comment correspond to number 3?
And do these weights correspond to number 2?:
I await your comments
Regards,
MH
And do these weights correspond to number 2?: May or may not, as these are model-based weights and sometimes differ from global weights calculated in 2. The reason is related to test data used in the calculation of global weight in method 2. The variations in the test set influence method 2.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Please, also I hope you may answer this question: If I wanted to test the effects of the attributes on the prediction´s value, on the simulator tab, should I take into account the weights of number 2? Because as I showed you, in this case specifically, it seems that they don't have nothing to do. Unlike the weights of numbers 3 and 4 which make more sense. Thanks in advance.
Regards,
MH
One question more: Is there any way to know why the software determinates those attributes as the most important (according to number 3)? Thanks.
Regards,
MH
The GBT algorithm in automodel is H2O based. You can look in the below link to understand how variable importance are calculated for tree based algorithms.
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/variable-importance.html
Hope this helps.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing