"read from Excel/CSV"

CaptainChaos · September 2011

Hi Guys,

Can somebody explain me howe i can tell rapid miner to take each line under "A" as a seperate Document and each line under "B" as its ID.
I would like to add a Data to silimirity operator to it but theirfore each line has to be calssified as a document. Does any body know a operator that can do this.

Thanks

MariusHelf · September 2011

Hello CaptainChoas,

did you try the wizards in the Read Excel/Read CSV operators? There you are able to define toroleof each column, so you can set the id role to column B. Hope this helps, if not, please tell me how exactly a "document" in your files looks like.

Cheers,
Marius

CaptainChaos · September 2011

Hi Marius,

I tried all the widgets but they dont help me to do what i want . I know i can chose the attribute for a column there but this doesnt help me out so far.

At the moment i just have one column(changed it) in Excel Column "A"
in each row of "A" is some kind of text. I just would like to make rapid miner treat each of them like a own document.

Thanks
Reegards

JEdward · September 2011

So you have a document that splits the data across two rows?
可能有一种更简单的方法,布鲁里溃疡t you could do it by converting into XML and then back again.

For example:
我创建了一个名为测试CSV的CSV文件the following structure:


Data
1
Record
2
Information
3

Then made the following process to convert it to XML in the following structure:

Data1Record2Information3

The process then reads in the XML file and changes it into data.

Probably not at all what you were after, but it was a fun process to build & might be useful for other tasks.

Best regards,
JEdward.

colo · September 2011

Hi,

it seems hard to understand what you're after... If you have an example set, each line is a example and usually this is the correct format for most of the operators. If you want to do something with each single example, then the operator "Loop Examples" is probably the right tool. Using IDs for examples is possible by creating new ones via "Generate ID" or setting existing columns to the ID type using "Set Role".

When talking about documents this usually refers to the document datatype of the text processing extension and is only used in text and web mining context.

I am not familiar with the "Data to Similarity" operator, but this one requires an example set as input. So your data should already have the right format. If you want to do something for only one example isolated from all the others, use "Loop Examples" and put the example processing inside this operator.

For further support, it might be useful if you post a process as far as you created it, and describe where things are not working and what you would like to do different.

Regards
Matthias

P.S. Please don't post similar questions to other forums, if they are not answered immediately. Especially specific questions as yours should be posted here instead of the general data mining forum.

CaptainChaos · September 2011

Hi,

Look i do have a excel file with data just in Column a(A1:A3000).
Structure looks like this:

A
Text1........
Text2..........
Text3.......
..
...
Text3000

I know that i can loop through the file, but when i want to work with the Data later on the problem is that the Operator takes the wole Text of one Row and compares it against another(like one term). But I want one row is recognized as a single document and the words inside this row/document can be compared to those of another row/document. In the Moment My process document Operater just takes the whole Row as one term and compares it against another row.
I Hope i made a bit more clear what I want i post my code here maybe one of you guys can than undersatand what my problem is.

Thanks again seems that you all have a hard time with me :P

colo · September 2011

Hi,

try adding the operator "Tokenize" inside the "Process Documents" operator. Otherwise the word vector consists of only one word (the whole text). You can also add other preprocessing operators at this place, e.g. "Transform Cases" or "Filter Stopwords".

Hope this is what you are looking for...

Regards
Matthias

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"read from Excel/CSV"

Answers