"writting a collection of documents"

mohammadrezamohammadreza MemberPosts:23Contributor I
edited June 2019 inHelp
Hi all,

I read an XML file in my process and convert it to a collection of documents in memory. Now I need to write each document as a separate file. Is there any way to do that? (I cam think of using "Write Document" in a loop but I can't figure out the right way to do that).

Best

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,438RM Data Scientist
    Hi mohammedreza,

    what about either Document to Data or Combine Documents first?
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • mohammadrezamohammadreza MemberPosts:23Contributor I
    Hi Martin,

    The data is already combined in one big XML file so I am trying to break it down to several files and write them. The only remaining part is just writing the document collection (which is in memory) on hard drive: Here is my process so far:













    @id"/>;















    <参数键= " regular_expression" value="</text><text>"/>




















    Thanks in advance
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:578Unicorn
    This is a very simple example, but the trick is to pass the Write Document operator a filename, but set that filename using Macros.

    I say it is a simple example as it just uses the iteration of the loop operator as the filename. I would recommend you use either Extract Macro or Extract Macro from Annotation to get the name of the file you'd like it saved.
    You might want to try the ID or the Author or a combination of the two?













    @id"/>;















    <参数键= " regular_expression" value="</text><text>"/>
























  • mohammadrezamohammadreza MemberPosts:23Contributor I
    Hi Edward,

    谢谢。当你正确地提到,我需要保存each file with its own name (id). According to your explanations (using Extract Macro) I came up with the following process, But I do not know what to choose for "example index" parameter of "Extract Macro operator" to be the "id" of each file.













    @id"/>;















    <参数键= " regular_expression" value="</text><text>"/>




























  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,438RM Data Scientist
    Hi,

    Extract Macro can just be applied on example sets. So you might go with one big loop examples around and then extract the macro before converting it to a document,

    Best
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • In777In777 MemberPosts:29Contributor II
Sign InorRegisterto comment.