"Read Excel without number parsing?"
Hello everybody,
I'm just building a process for extending some data I collected from the web earlier. The previous processes finally created an Excel file containing all the relevant data. There also were some numbers with leading zeros (postal codes, area codes) which were extracted and written as text (certainly because of this Excel didn't remove the leading zeros earlier). Now I want to grab that data again, and load it into my process via "Read Excel". Now guess what happens? All those numbers are parsed as integer, leading zeros are removed and when written via "Write Excel" one fraction digit is added to all those numbers (although they are displayed as integer before). The "Read CSV" operator allows to disable the unwanted parsing, do you have any suggestions what to do best in this case?
Thanks for all your hints and help.
Regards,
Matthias
I'm just building a process for extending some data I collected from the web earlier. The previous processes finally created an Excel file containing all the relevant data. There also were some numbers with leading zeros (postal codes, area codes) which were extracted and written as text (certainly because of this Excel didn't remove the leading zeros earlier). Now I want to grab that data again, and load it into my process via "Read Excel". Now guess what happens? All those numbers are parsed as integer, leading zeros are removed and when written via "Write Excel" one fraction digit is added to all those numbers (although they are displayed as integer before). The "Read CSV" operator allows to disable the unwanted parsing, do you have any suggestions what to do best in this case?
Thanks for all your hints and help.
Regards,
Matthias
Tagged:
0
Answers
there have been many changes in the last few versions of RapidMiner regarding the read operators. Which version are you exactly using?
Greetings,
Sebastian
I am currently always using the newest version available through subversion (building and running RapidMiner from eclipse) since there have been some relevant fixes. This should be a (still mislabeled
Regards,
Matthias
ok then. Did you try to configure the read excel operator using the wizard? It offers more settings than the parameter itself.
Greetings,
Sebastian
thanks for this suggestion. Using the wizard even the guessing results in type "nominal" for my postal codes, otherwise I would be able to change it manually - this is great. But as it seems the wizard doesn't offer a possibility to take the first row (containing the column headings) as attribute names. Setting the respective parameter afterwards stays without effect. It's a bit confusing, I usually consider wizards as an easy guide through setting parameters, but in this case parameters and wizard seem to be independent from each other. Decisions made in the wizard can not be revised later using the parameters - perhaps usability could be improved here?
Regards,
Matthias
inside the wizard you can select the usage of a row. See the second step, first column. You can select a "Name" usage there.
Greetings,
Sebastian
thank you for that hint - I indeed didn't notice that the first column was clickable since this possibility isn't mentioned there (the column heading "Use as" is the only clue - that was not enough for me
I'm using the latest version from SVN, could perhaps someone verify this problem?
Thanks,
Matthias