How use Read XML Operator

dimons91dimons91 MemberPosts:3Contributor I
edited November 2018 inHelp

Hello.

I have troubles to use Read XML Operator.

I read this xml file:




id1

kw1
kw2
kw3



ID2

kw4
kw5
kw6


and want to get this table:

ID name KEYWORD
id1 text1 kw1
id1 text1 kw2
id1 text1 kw3
ID2 text2 kw4
ID2 text2 kw5
ID2 text2 kw6

How I can do it?

Xml version of process is here:





















































new 4.xml 374B
Tagged:

Answers

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist

    Hi dimons91,

    Have you tried to use the Import Wizard for Read XML? It will generate the xpaths for attributes automatically. All you need is to select the beans in the step 4 of configuration wizard for your wanted attributes.

    select beans read xml.pngImport Configuration Wizard Step 4: select beans

    I have the sample process here for you









































    dity up the column names











    Drop the unwanted ID












    Make sure you have the path of the saved file, exactly same as the file path in 'Open File'



    HTH,

    YY

    dang dimons91
  • dimons91dimons91 MemberPosts:3Contributor I

    Hi, yyhuang

    Thank you! It's really helpful.

    But I made the xml file is more complex, and again difficulties arose.

    I added some levels of nesting.

    Because of this, an error occurs in the de-pivot operator

    May be I need use more then one Read xml operator and then join tables? When I try this way I haven't any attribute make join correctly.

    Here is new xml:




    20160101


    AreaName1



    3456


    1287


    4565




    1234


    1345


    1232





    AreaName2



    1343


    <值> 6745 < /值>


    8767




    5455


    2345


    1234




    this table I need to make:

    day name name2 desc start end value
    20160101 AreaName1 Device1 active 0 30 3456
    20160101 AreaName1 Device1 active 30 100 1287
    20160101 AreaName1 Device1 active 100 130 4565
    20160101 AreaName1 Device1 passive 0 30 1234
    20160101 AreaName1 Device1 passive 30 100 1345
    20160101 AreaName1 Device1 passive 100 130 1232
    20160101 AreaName2 Device2 active 0 30 1343
    20160101 AreaName2 Device2 active 30 100 6745
    20160101 AreaName2 Device2 active 100 130 8767
    20160101 AreaName2 Device2 passive 0 30 5455
    20160101 AreaName2 Device2 passive 30 100 2345
    20160101 AreaName2 Device2 passive 100 130 1234

    这过程中我做了:














    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/attribute::name"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/attribute::desc"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[1]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[1]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[1]/value[1]/text()"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[2]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[2]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[2]/value[1]/text()"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[3]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[3]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[3]/value[1]/text()"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/attribute::desc"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[1]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[1]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[1]/value[1]/text()"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[2]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[2]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[2]/value[1]/text()"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[3]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[3]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[3]/value[1]/text()"/>







    <参数键=“2”值= " measuringpoint[1] /措施ingchannel[1]/attribute::desc.true.attribute_value.attribute"/>























    dity up the column names






















    May be I need use more then one Read XML operator to load?
    Somthing wrong here<br>What if in file edded measuringpoint name=&quot;Device3&quot;?


    new 10.xml 1.5K
  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist

    Thanks for giving me the new XML data. I fixed some of the function expressions for 'de-pivot'. It is always tricky to make the regular expressions work for that.














    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/attribute::name"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/attribute::desc"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[1]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[1]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[1]/value[1]/text()"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[2]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[2]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[2]/value[1]/text()"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[3]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[3]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[1]/period[3]/value[1]/text()"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/attribute::desc"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[1]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[1]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[1]/value[1]/text()"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[2]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[2]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[2]/value[1]/text()"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[3]/attribute::end"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[3]/attribute::start"/>
    <参数键=“xpath_for_attribute”价值= "的措施ingpoint[1]/measuringchannel[2]/period[3]/value[1]/text()"/>







    <参数键=“2”值= " measuringpoint[1] /措施ingchannel[1]/attribute::desc.true.attribute_value.attribute"/>























    dity up the column names















    first de-pivot, focus on the first digit for channel[d]








    second de-pivot, focus on the second digit for period[d]











    Happy RapidMinining, :smileyvery-happy:

    YY

    dang
  • michael_jessmichael_jess MemberPosts:1Contributor I

    Hi yyhuang,

    我如何打开向导?我看到引用to it several times, but I cannot find any documentation. There is the "File">"Add Data" import dialog, but that one only allows me to import CSV and Excel. Is it part of some plugin?

    Thanks,

    Michael

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    The wizard is available directly inside the Read XML operator. See the attached parameter view.read xml.PNG

    You'll need to save a local copy of the XML file to run it but then you can point the resulting operator back to a web address if you want.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
Sign InorRegisterto comment.