分割数据

olioli MemberPosts:6Contributor II
edited September 2019 inHelp
Hi,

I have got another question, hopefully someone might be able to point me in the right direction.

I am using the knn process and looping through a lot of data based on the name.

I want to split my data but by different amounts depending on where I am through the data.

My decisions are done on a time basis, the top part of my data is the earliest observations and the bottom the later observations. I will try and show an simple example below.

Example Name 1 is in the data set 10 times. The first time it appears in the data set it will have no previous results so a KNN can not be done, so I would discard this example.

The second time the name appears I want to base the KNN on the example that has happened before, so the top 10% of the data for Name 1 would go into creating the model. Then the current example would go into apply model and I would discard the other 80% (as from this examples point of view it has not happened yet so it is information I would not have at the time).

The third time the name appears I would base the KNN on the two above examples, so the top 20% of data would go into creating the model. The the current example would go into apply model and I would discard 70%.

I want to carry on doing this as per the below table.
Make Model Apply Model Discard
4th 30% 10% 60%
5th 40% 10% 50%
6th 50% 10% 40%
7th 60% 10% 30%
8日70% 10% 20%
9th 80% 10% 10%
10th 90% 10% 0%
I should also note that names might occur different times sometimes just once other times over 20.

I was hoping to use the split data function with a macro to split the data. I have the percentages in my data, but I am struggling to get the figures into my split data operator.

This is my current operation, I have tried to use macros but have taken them out as it did not work and replaced them with some random ratio.

Any help would be much appreciated.

Thanks,

Oli





<宏/ >





































<连接from_op = "性能(2)”from_port="example set" to_op="Materialize Data" to_port="example set input"/>

< portSpacingport="source_example set" spacing="0"/>
< portSpacingport="sink_out 1" spacing="0"/>
< portSpacingport="sink_out 2" spacing="0"/>










< portSpacingport="source_input 1" spacing="0"/>
< portSpacingport="sink_result 1" spacing="0"/>



Tagged:

Answers

  • olioli MemberPosts:6Contributor II
    Hi,

    Just wondered if anyone had any suggestions on this, all help very much appreciated.

    Thanks,

    Oli
Sign InorRegisterto comment.