"[SOLVED] Rapidminer / Excel Missing Value editing"

dramaticlookdramaticlook MemberPosts:5因素二世
edited June 2019 inHelp
Hi all,
Im learning how to use Rapidminer for a project. Im stuck at some point. I have a dataset as follows: There are countries. For each country Im keeping track of some values (medals lets say) for years 1990-2012. As an example:


- Country Year Gold Silver Bronze
-----------------------------------------
USA 1990 10 5 7
.....
USA 2012 12 3 8
Spain 1990 8 12 9
...
Spain 1992 7 ? 8
....
Spain 2012 4 11 12
...GOES ON...

What I want to do is to replace the missing values. For example Spain has a missing value in 1992 for Silver Medals. I want to find the average for Silver data available for Spain and replace the missing value with that. How can I do this? If the present modules in Rapidminer not able to do this, is there some kind of macro etc? I can also use Excel to preprocess the data (but how)???.
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hey, easy exercise with RapidMiner;)

    Did you manage to load the data, and does RapidMiner correctly recognize the missing values? Then you can simply use the Replace Missing Values operator and define "average" as the replacement strategy.

    Best, Marius
  • dramaticlookdramaticlook MemberPosts:5因素二世
    Hey thanks for replying I have loaded the data and I can clearly see the missing vals. I used Replace Missing Attributes however its not what I exactly want. I want to use the average for that country's data instead of average of the whole attribute.
  • RLorrigRLorrig MemberPosts:4Contributor I
    I'd suggest to sort your data by the country attribute and then use the "filter example range" operator to separate your data by countrys. Afterswards you can use the replace missing attributes operator. As long as you havn't got too many countrys it's not that much work;)
  • dramaticlookdramaticlook MemberPosts:5因素二世
    ;DActually what I presented here was a small example. I actually have a large dataset with 35 financial attributes and two of them lists the country codes and years in a similar structure. There are more than 150 country codes.
  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Heya, I would iterate all countries with the Loop Values operator. In each iteration, it creates a macro, which you can use to filter the example set in the inner process with Filter Examples and the attribute_value filter. Then you can apply the Replace Missing Values operator on the filtered data, Append the output of Loop Values and you're done. There should be some examples on the usage of Loop Values here in the forums.

    Happy Mining!

    ~Marius
  • dramaticlookdramaticlook MemberPosts:5因素二世
    Hi, I just had a chance to apply your advice. I can loop through the country codes. However I can not use the attribute filter thing. Do I have to create a filter for attribute values of country codes? I mean there is a string parameter when I try to use Filter Examples inside the looper with the option attribute_value_filter. It reasons to type something like CountryCode=USA there and then use the replace missing values in the next operator. However I am going to need to create around 150 operators manually. How can I automate it?
    The first layer
    image
    The second layer
    image
  • Nils_WoehlerNils_Woehler MemberPosts:463Maven
    Hi,

    please post your process setup like it is described in Marius signature.

    Best,
    Nils
  • dramaticlookdramaticlook MemberPosts:5因素二世







































    <参数键= " 20 "lue="Industry value added (annual % growth).true.real.attribute"/>

























    <参数键= "属性" value = " |电话线s|Services etc. value added (annual % growth)|Secure Internet servers|Scientific and technical journal articles|Researchers in R&D (per million people)|Research and development expenditure (% of GDP)|Patent applications residents|Patent applications nonresidents|OECD membership|Mobile cellular subscriptions|Manufacturing value added (annual % growth)|Internet users|Industry value added (annual % growth)|Imports of goods and services (annual % growth)|Household final consumption expenditure per capita growth (annual %)|Household final consumption expenditure (annual % growth)|Household final consumption expenditure etc. (annual % growth)|High-technology exports (current US$)|High-technology exports (% of manufactured exports)|Gross fixed capital formation (annual % growth)|Gross capital formation (annual % growth)|General government final consumption expenditure (annual % growth)|GNI per capita growth (annual %)|GNI growth (annual %)|GDP per capita growth (annual %)|GDP growth (annual %)|Fixed broadband Internet subscribers|Final consumption expenditure etc. (annual % growth)|Exports of goods and services (annual % growth)|Daily newspapers (per 1000 people)|Agriculture value added (annual % growth)|Adjusted net savings including particulate emission damage (current US$)|Adjusted net savings excluding particulate emission damage (current US$)|Adjusted net national income (annual % growth)"/>













    <参数键= "属性" value = " |电话线s|Services etc. value added (annual % growth)|Secure Internet servers|Scientific and technical journal articles|Researchers in R&D (per million people)|Research and development expenditure (% of GDP)|Patent applications residents|Patent applications nonresidents|OECD membership|Mobile cellular subscriptions|Manufacturing value added (annual % growth)|Internet users|Industry value added (annual % growth)|Imports of goods and services (annual % growth)|Household final consumption expenditure per capita growth (annual %)|Household final consumption expenditure (annual % growth)|Household final consumption expenditure etc. (annual % growth)|High-technology exports (current US$)|High-technology exports (% of manufactured exports)|Gross fixed capital formation (annual % growth)|Gross capital formation (annual % growth)|General government final consumption expenditure (annual % growth)|GNI per capita growth (annual %)|GNI growth (annual %)|GDP per capita growth (annual %)|GDP growth (annual %)|Fixed broadband Internet subscribers|Final consumption expenditure etc. (annual % growth)|Exports of goods and services (annual % growth)|Daily newspapers (per 1000 people)|CountryCode|Agriculture value added (annual % growth)|Adjusted net savings including particulate emission damage (current US$)|Adjusted net savings excluding particulate emission damage (current US$)|Adjusted net national income (annual % growth)"/>










    < portSpacing端口= " source_exampleset" spacing="0"/>

















    here is my process. I have an extra decision tree which you can omit. The problem is related to the loop values process which I use to replace the missing values with the average.
  • Nils_WoehlerNils_Woehler MemberPosts:463Maven
    You can use the macro provided by Loop Values and set the parameter of Filter Examples like this: 'CountryCode=%{loop_value}'

    Best,
    Nils
Sign InorRegisterto comment.