分割数据
Hi,
I have got another question, hopefully someone might be able to point me in the right direction.
I am using the knn process and looping through a lot of data based on the name.
I want to split my data but by different amounts depending on where I am through the data.
My decisions are done on a time basis, the top part of my data is the earliest observations and the bottom the later observations. I will try and show an simple example below.
Example Name 1 is in the data set 10 times. The first time it appears in the data set it will have no previous results so a KNN can not be done, so I would discard this example.
The second time the name appears I want to base the KNN on the example that has happened before, so the top 10% of the data for Name 1 would go into creating the model. Then the current example would go into apply model and I would discard the other 80% (as from this examples point of view it has not happened yet so it is information I would not have at the time).
The third time the name appears I would base the KNN on the two above examples, so the top 20% of data would go into creating the model. The the current example would go into apply model and I would discard 70%.
I want to carry on doing this as per the below table.
I should also note that names might occur different times sometimes just once other times over 20.
I was hoping to use the split data function with a macro to split the data. I have the percentages in my data, but I am struggling to get the figures into my split data operator.
This is my current operation, I have tried to use macros but have taken them out as it did not work and replaced them with some random ratio.
Any help would be much appreciated.
Thanks,
Oli
I have got another question, hopefully someone might be able to point me in the right direction.
I am using the knn process and looping through a lot of data based on the name.
I want to split my data but by different amounts depending on where I am through the data.
My decisions are done on a time basis, the top part of my data is the earliest observations and the bottom the later observations. I will try and show an simple example below.
Example Name 1 is in the data set 10 times. The first time it appears in the data set it will have no previous results so a KNN can not be done, so I would discard this example.
The second time the name appears I want to base the KNN on the example that has happened before, so the top 10% of the data for Name 1 would go into creating the model. Then the current example would go into apply model and I would discard the other 80% (as from this examples point of view it has not happened yet so it is information I would not have at the time).
The third time the name appears I would base the KNN on the two above examples, so the top 20% of data would go into creating the model. The the current example would go into apply model and I would discard 70%.
I want to carry on doing this as per the below table.
Make Model Apply Model Discard 4th 30% 10% 60% 5th 40% 10% 50% 6th 50% 10% 40% 7th 60% 10% 30% 8日70% 10% 20% 9th 80% 10% 10% 10th 90% 10% 0% |
I was hoping to use the split data function with a macro to split the data. I have the percentages in my data, but I am struggling to get the figures into my split data operator.
This is my current operation, I have tried to use macros but have taken them out as it did not work and replaced them with some random ratio.
Any help would be much appreciated.
Thanks,
Oli
<宏/ >
<连接from_op = "性能(2)”from_port="example set" to_op="Materialize Data" to_port="example set input"/>
< portSpacingport="source_example set" spacing="0"/>
< portSpacingport="sink_out 1" spacing="0"/>
< portSpacingport="sink_out 2" spacing="0"/>
< portSpacingport="source_input 1" spacing="0"/>
< portSpacingport="sink_result 1" spacing="0"/>
Tagged:
0
Answers
Just wondered if anyone had any suggestions on this, all help very much appreciated.
Thanks,
Oli