ALL FEATURE REQUESTS HERE ARE MONITORED BY OUR PRODUCT TEAM.
VOTING MATTERS!
IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.
NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.
VOTING MATTERS!
IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.
NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.
FeatureSet converter
christos_karras
MemberPosts:50Guru
I generated some features using the Automatic Feature Engineering Operator. Now I would like to manipulate the FeatureSet as an example set, but I can't find any converter between FeatureSetIOObject and ExampleSet (including in the Converters Extension). Would it be possible to create operators for the following (if they don't already exist):
- FeatureSet to ExampleSet
- ExampleSet to FeatureSet
And until this is available, would I be able to implement something myself using the scripting operator?
I would need this for various reasons, for example:
因为我知道th -删除一些生成特性ey don't make sense and were probably selected just by coincidence based on the provided data. Example: exp(exp(exp([SourceFeature])))
- Combine multiple feature sets generated using different methods
- Distinguish "raw" features from generated features, to exclude them from some specific operators, without assigning a special role to the generated features (because I still want them to be considered by most operators including models)
- FeatureSet to ExampleSet
- ExampleSet to FeatureSet
And until this is available, would I be able to implement something myself using the scripting operator?
I would need this for various reasons, for example:
因为我知道th -删除一些生成特性ey don't make sense and were probably selected just by coincidence based on the provided data. Example: exp(exp(exp([SourceFeature])))
- Combine multiple feature sets generated using different methods
- Distinguish "raw" features from generated features, to exclude them from some specific operators, without assigning a special role to the generated features (because I still want them to be considered by most operators including models)
Tagged:
2
Comments
Thanks for sharing the feedback!! We will create feature requests for internal prod/dev team.
Have you tried "Apply Feature Set" operator on the "raw" data with the FeatureSetIOObject to do feature selection/generation? Then you can pick and remove the generated feature with "Select Attribute" (invert selection).
If you convert the feature set object to a data-set, how would you use the data-set afterwords?
Thanks again for your inputs!
YY
Yes I'm using Apply Feature Set on the raw data to generate the same features on new data. However, I generated features based on different models (for example Linear Regression, Decision Tree) and want to test them all in the whole data preparation pipeline as inputs to the different machine learning models I'm testing (Boosted Trees, Random Forest, Generalized Linear Model). So I'm applying multiple feature sets in a loop by using Apply Feature Set at each iteration.
但事实上我想建立一个单一feature set that contains the best features found from the different methods. I would like to have the ability to analyze each generated feature to see if it makes sense, then build a feature set that contains only the ones that make sense according to the analysis, which may come from different generated feature sets. I may also want to modify some of the generated features. For example, if I have a generated feature = A/B, but from domain knowledge I know that B should be replaced by an average of B and C (because both B and C have the same impact on the results), then I would want to replace A/B by A/(0.5*(B+C)).
I would do this kind of manipulations by storing the ExampleSet in the RapidMiner repository, using the data editor, saving back the ExampleSet and then converting it back to a FeatureSet.
The easiest workaround, which we'll probably end doing in the short term, is to use Generate Attributes to reimplement the same expressions as those found in the feature sets. However this is not as convenient as having a FeatureSet object that can be reused at different stages in the process. For example, another thing I might want to do is to use the Feature Set on something similar to the "Work on Subset" operator, which would allow working either only on the features that are in the feature set, or only on the features that are *not* in the feature set. With the "manual" approach using Generate Attributes, I would have to re-enter the list of generated features at each "Work on Subset" operator.
Thansks