"Extract Rows from Text Value (JSON Array)"

megmedmegmed MemberPosts:3Contributor I
edited June 2019 inHelp
Hi,

Is there any solution to extract rows from text (JSON)? I couldn't find an appropriate rapid miner operater.

From a database table I get text data (JSON Array) from a column of the type BLOB.
Data like this:
[{"timestamp":1368525704,"event_type":"dom.movement","cursor_x":"671","cursor_y":"452"},
{"timestamp":1368525704,"event_type":"dom.movement","cursor_x":"657","cursor_y":"462"},
{...},
{...}]


I now want to extract each {...} as a row with columns split by the "," char. At the and I want to get a table like this:
timestamp event_type cursor_x cursor_y
1368525704 dom.movement 671 452
1368525704 dom.movement 657 462
... ... ... ...
How can I transform the JSON Array into rows?

我试着“提取文档”- >”Document" with RegEx\{(.*?)\}-> "Documents To Data". This works, but "Extract Document" extracts only one doc/row specified by the "example index" param.

Best,
Micha

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:578Unicorn
    Have you tried JSON to XML then connecting that to a Write Document operator & connecting the file output port to a Read XML operator?

    That's my normal method.
    Thomas_Ott
  • megmedmegmed MemberPosts:3Contributor I
    Thanks for your reply.

    I already tried this before but the problem is JSON to XML need a doc as an input. I have to start with Data Table type which is the output of the Read Database operator. Thus it doesn't work for me.
  • awchisholmawchisholm RapidMiner Certified Expert, MemberPosts:458Unicorn
    Hello Micha

    After a bit of gymnastics I got the attached basically working. The input JSON is probably not quite what you read from your data - I had to add something around it - but it's a start and will give you some ideas.















    <运营商激活= " true " class = " nominal_to_text”compatibility="5.3.013" expanded="true" height="76" name="Nominal to Text" width="90" x="112" y="165"/>





































































































    The tricks are

    - Use Nominal to Text so that Data To Documents can be used
    - Combine Documents makes a single document for JSON to XML to work on
    - Use Cut Document to chop the XML document into chunks - the nifty bit is using Tokenize with the power of XPath to determine the number of rows automatically
    - Use various Extract Information operators with XPath inside the Cut Document operator to obtain values from the cut documents individually. This creates meta data in the document that will automatically become attributes in the final example set when the Documents to Data operator is used.

    You'll have to modify it for your needs. Hope it helps.

    regards

    Andrew
    Thomas_Ott neginz
  • megmedmegmed MemberPosts:3Contributor I
    Hi Andrew,

    非常感谢你的建议。是的我t works, but only with one JSON array of one column field of one row (like your generated data) in my database. Now I have the problem, that the "Combine Documents" operator cannot handle that much documents it is getting from the "Data to Documents" operator. Rapid Miner does not respond when I activate this operator.

    Let me say I have 5000 rows in my database, follows that I will have 5000 JSON arrays from the BLOB column, follows 5000 documents. This might be a problem for the "Combine Documents" operator?

    I would generate some data for you, but I don't know how to generate more than one row of user specific data with the generate data operator.

    Many thanks in advance
    Micha

  • awchisholmawchisholm RapidMiner Certified Expert, MemberPosts:458Unicorn
    Hello Micha

    You can use the Append operator to join together the output from multiple Generate Data operators.

    I did this with the original process but the XML that gets created is not valid and because I don't understand the format of the data, it's difficult to know what to suggest.

    regards

    Andrew
  • rachel_lomaskyrachel_lomasky MemberPosts:52Guru

    Has there been any update to this method in the past years? The JSON I have doesn't make properly formed XML (for example, some of the elements are numbers). It would be nice to parse the JSON directly.

Sign InorRegisterto comment.