"read from Excel/CSV"

CaptainChaosCaptainChaos MemberPosts:17Maven
edited June 2019 inHelp
Hi Guys,

Can somebody explain me howe i can tell rapid miner to take each line under "A" as a seperate Document and each line under "B" as its ID.
I would like to add a Data to silimirity operator to it but theirfore each line has to be calssified as a document. Does any body know a operator that can do this.

Thanks

Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hello CaptainChoas,

    did you try the wizards in the Read Excel/Read CSV operators? There you are able to define toroleof each column, so you can set the id role to column B. Hope this helps, if not, please tell me how exactly a "document" in your files looks like.

    Cheers,
    Marius
  • CaptainChaosCaptainChaos MemberPosts:17Maven
    Hi Marius,

    I tried all the widgets but they dont help me to do what i want . I know i can chose the attribute for a column there but this doesnt help me out so far.

    At the moment i just have one column(changed it) in Excel Column "A"
    in each row of "A" is some kind of text. I just would like to make rapid miner treat each of them like a own document.

    Thanks
    Reegards
  • JEdwardJEdward RapidMiner注册分析师RapidMiner认证Expert, MemberPosts:578Unicorn
    So you have a document that splits the data across two rows?
    可能有一种更简单的方法,布鲁里溃疡t you could do it by converting into XML and then back again.

    For example:
    我创建了一个名为测试CSV的CSV文件the following structure:

    Data
    1
    Record
    2
    Information
    3
    Then made the following process to convert it to XML in the following structure:
    Data1Record2Information3
    The process then reads in the XML file and changes it into data.
























































































    Probably not at all what you were after, but it was a fun process to build & might be useful for other tasks.

    Best regards,
    JEdward.
  • colocolo MemberPosts:236Maven
    Hi,

    it seems hard to understand what you're after... If you have an example set, each line is a example and usually this is the correct format for most of the operators. If you want to do something with each single example, then the operator "Loop Examples" is probably the right tool. Using IDs for examples is possible by creating new ones via "Generate ID" or setting existing columns to the ID type using "Set Role".

    When talking about documents this usually refers to the document datatype of the text processing extension and is only used in text and web mining context.

    I am not familiar with the "Data to Similarity" operator, but this one requires an example set as input. So your data should already have the right format. If you want to do something for only one example isolated from all the others, use "Loop Examples" and put the example processing inside this operator.

    For further support, it might be useful if you post a process as far as you created it, and describe where things are not working and what you would like to do different.

    Regards
    Matthias

    P.S. Please don't post similar questions to other forums, if they are not answered immediately. Especially specific questions as yours should be posted here instead of the general data mining forum.
  • CaptainChaosCaptainChaos MemberPosts:17Maven
    Hi,

    Look i do have a excel file with data just in Column a(A1:A3000).
    Structure looks like this:

    A
    Text1........
    Text2..........
    Text3.......
    ..
    ...
    Text3000

    I know that i can loop through the file, but when i want to work with the Data later on the problem is that the Operator takes the wole Text of one Row and compares it against another(like one term). But I want one row is recognized as a single document and the words inside this row/document can be compared to those of another row/document. In the Moment My process document Operater just takes the whole Row as one term and compares it against another row.
    I Hope i made a bit more clear what I want i post my code here maybe one of you guys can than undersatand what my problem is.
















































    Thanks again seems that you all have a hard time with me :P
  • colocolo MemberPosts:236Maven
    Hi,

    try adding the operator "Tokenize" inside the "Process Documents" operator. Otherwise the word vector consists of only one word (the whole text). You can also add other preprocessing operators at this place, e.g. "Transform Cases" or "Filter Stopwords".

    Hope this is what you are looking for...;)

    Regards
    Matthias
Sign InorRegisterto comment.