"Extract Rows from Text Value (JSON Array)"
Hi,
Is there any solution to extract rows from text (JSON)? I couldn't find an appropriate rapid miner operater.
From a database table I get text data (JSON Array) from a column of the type BLOB.
Data like this:
[{"timestamp":1368525704,"event_type":"dom.movement","cursor_x":"671","cursor_y":"452"},
{"timestamp":1368525704,"event_type":"dom.movement","cursor_x":"657","cursor_y":"462"},
{...},
{...}]
I now want to extract each {...} as a row with columns split by the "," char. At the and I want to get a table like this:
How can I transform the JSON Array into rows?
我试着“提取文档”- >”Document" with RegEx\{(.*?)\}-> "Documents To Data". This works, but "Extract Document" extracts only one doc/row specified by the "example index" param.
Best,
Micha
Is there any solution to extract rows from text (JSON)? I couldn't find an appropriate rapid miner operater.
From a database table I get text data (JSON Array) from a column of the type BLOB.
Data like this:
[{"timestamp":1368525704,"event_type":"dom.movement","cursor_x":"671","cursor_y":"452"},
{"timestamp":1368525704,"event_type":"dom.movement","cursor_x":"657","cursor_y":"462"},
{...},
{...}]
I now want to extract each {...} as a row with columns split by the "," char. At the and I want to get a table like this:
timestamp | event_type | cursor_x | cursor_y |
1368525704 | dom.movement | 671 | 452 |
1368525704 | dom.movement | 657 | 462 |
... | ... | ... | ... |
我试着“提取文档”- >”Document" with RegEx\{(.*?)\}-> "Documents To Data". This works, but "Extract Document" extracts only one doc/row specified by the "example index" param.
Best,
Micha
Tagged:
0
Answers
That's my normal method.
I already tried this before but the problem is JSON to XML need a doc as an input. I have to start with Data Table type which is the output of the Read Database operator. Thus it doesn't work for me.
After a bit of gymnastics I got the attached basically working. The input JSON is probably not quite what you read from your data - I had to add something around it - but it's a start and will give you some ideas. The tricks are
- Use Nominal to Text so that Data To Documents can be used
- Combine Documents makes a single document for JSON to XML to work on
- Use Cut Document to chop the XML document into chunks - the nifty bit is using Tokenize with the power of XPath to determine the number of rows automatically
- Use various Extract Information operators with XPath inside the Cut Document operator to obtain values from the cut documents individually. This creates meta data in the document that will automatically become attributes in the final example set when the Documents to Data operator is used.
You'll have to modify it for your needs. Hope it helps.
regards
Andrew
非常感谢你的建议。是的我t works, but only with one JSON array of one column field of one row (like your generated data) in my database. Now I have the problem, that the "Combine Documents" operator cannot handle that much documents it is getting from the "Data to Documents" operator. Rapid Miner does not respond when I activate this operator.
Let me say I have 5000 rows in my database, follows that I will have 5000 JSON arrays from the BLOB column, follows 5000 documents. This might be a problem for the "Combine Documents" operator?
I would generate some data for you, but I don't know how to generate more than one row of user specific data with the generate data operator.
Many thanks in advance
Micha
You can use the Append operator to join together the output from multiple Generate Data operators.
I did this with the original process but the XML that gets created is not valid and because I don't understand the format of the data, it's difficult to know what to suggest.
regards
Andrew
Has there been any update to this method in the past years? The JSON I have doesn't make properly formed XML (for example, some of the elements are numbers). It would be nice to parse the JSON directly.