Bug with Loop Values?
Hi RM team, I'm trying to call Facebook graph API using Enrich Data by Webservice operator which I'm using inside the Loop Values operator that outputs a collection of documents. Input data is a csv with a bunch of facebook business page usernames. Basically, as far as I understand, the Loop Values operator is supposed to grab each username and return me some facebook content for each handle, but -
- it is only doing that partially
- each document in the collection from Loop Values should only contain data for one username however it contains all the usernames and only one row of data per user.
Attached:
- RM process
- Input excel
- JSON output from facebook API from an API testing platform.
Any help will be greatly appreciated as I'm kind of on a deadline for this. Thank you.
PROCESS
<不相上下ameter key="excel_file" value="C:\Users\Pari\Documents\BDC\Socials\Facebook Scrapper\Test\TestHandles.xlsx"/>
<不相上下ameter key="imported_cell_range" value="A1:A5"/>
<不相上下ameter key="first_row_as_names" value="false"/>
<不相上下ameter key="0" value="Name"/>
<不相上下ameter key="0" value="Username.true.polynominal.attribute"/>
<不相上下ameter key="attribute" value="Username"/>
<不相上下ameter key="query_type" value="JsonPath"/>
<不相上下ameter key="message" value="$..message"/>
<不相上下ameter key="post id" value="$..id"/>
<不相上下ameter key="url" value="https://graph.facebook.com/v2.10/<%Username%>/posts?access_token=1745625495738593|w_a8sajfHCYsCHNZOTDr5H1r-wY"/>
<不相上下ameter key="encoding" value="UTF-8"/>
JSON Output
{
"data": [
{
"created_time": "2017-10-31T12:01:32+0000",
"message": "Click to read news on #Tableau latest conference.\n#BigData #Tech",
"id": "1563861787269208_1910035019318548"
},
{
"created_time": "2017-10-30T22:02:02+0000",
"message": "\"South Australia is about to get “Big Doctor”, cloud-based artificial intelligence that analyses our health and intervenes when it spots something amiss.\"-Brad Crouch",
"id": "1563861787269208_1909800592675324"
},
{
"created_time": "2017-10-30T21:21:00+0000",
"message": "Why you should welcome Artificial Intelligence with open arms",
"id": "1563861787269208_1909790786009638"
},
{
"created_time": "2017-10-30T12:00:59+0000",
"message": "\"AI will put bankers out of work? Some people think these advances will boost productivity, enabling industries to actually increase the number of jobs\"",
"id": "1563861787269208_1909600706028646"
},
{
"created_time": "2017-10-27T12:01:38+0000",
"message": "What's Elon Musks stance on Artificial Intelligence?",
"id": "1563861787269208_1908177749504275"
}
],
"paging": {
"cursors": {
"before": "Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5TXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9qazFPVEE1TURrME56STJOelkzTlRZAeU9ROE1ZAWEJwWDNOMGIzSjVYMmxrRHlFeE5UWXpPRFl4TnpnM01qWTVNakE0WHpFNU1UQXdNelV3TVRrek1UZAzFORGdQQkhScGJXVUdXZAmhtSEFFPQZDZD",
"after": "Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5UXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9pMHlORGs0TURBMk9UZA3pOVEEyTkRJMU9EUVBER0ZA3YVY5emRHOXllVjlwWkE4aE1UVTJNemcyTVRjNE56STJPVEl3T0Y4eE9UQTRNVGMzTnpRNU5UQTBNamMxRHdSMGFXMWxCbG56SUNJQgZDZD"
},
"next": "https://graph.facebook.com/v2.10/1563861787269208/posts?pretty=1&limit=5&after=Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5UXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9pMHlORGs0TURBMk9UZA3pOVEEyTkRJMU9EUVBER0ZA3YVY5emRHOXllVjlwWkE4aE1UVTJNemcyTVRjNE56STJPVEl3T0Y4eE9UQTRNVGMzTnpRNU5UQTBNamMxRHdSMGFXMWxCbG56SUNJQgZDZD"
}
}
Best Answer
-
sgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
ah I see. Sorry about that.So this is a common challenge that we are currently working - parsing JSON arrays as a response to some webservice. There are a couple of workarounds that you can use in the meanwhile...converting to XML is probably the easiest. RapidMiner handles XML much, much better than JSON in its current version.
<参数键= value =“csv_file /用户/ GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/TestHandles.csv"/>
<不相上下ameter key="first_row_as_names" value="false"/>
<不相上下ameter key="0" value="Name"/>
<不相上下ameter key="encoding" value="UTF-8"/>
<不相上下ameter key="0" value="Username.true.polynominal.attribute"/>
<不相上下ameter key="query_type" value="Regular Expression"/>
<不相上下ameter key="jsonResponse" value=".*"/>
<不相上下ameter key="message" value="$..message"/>
<不相上下ameter key="post id" value="$..id"/>
<不相上下ameter key="url" value="https://graph.facebook.com/v2.10/<%Username%>/posts?access_token=1745625495738593|w_a8sajfHCYsCHNZOTDr5H1r-wY"/>
<不相上下ameter key="encoding" value="UTF-8"/>
<不相上下ameter key="first_example" value="%{example}"/>
<不相上下ameter key="last_example" value="%{example}"/>
<不相上下ameter key="select_attributes_and_weights" value="true"/>
<不相上下ameter key="jsonResponse" value="1.0"/>
<不相上下ameter key="file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/jsonExport.xml"/>
<不相上下ameter key="file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/jsonExport.xml"/>
<不相上下ameter key="xpath_for_examples" value="//json/data"/>
<不相上下ameter key="xpath_for_attribute" value="created_time[1]/text()"/>
<不相上下ameter key="xpath_for_attribute" value="id[1]/text()"/>
<不相上下ameter key="xpath_for_attribute" value="message[1]/text()"/>
<不相上下ameter key="use_default_namespace" value="false"/>
<不相上下ameter key="0" value="created_time[1]/text().true.attribute_value.attribute"/>
<参数键= " 1 " value = " id () .true.attri[1] /文本bute_value.attribute"/>
<不相上下ameter key="2" value="message[1]/text().true.attribute_value.attribute"/>
<运营商激活= " true " class = " loop_collection”compatibility="7.6.001" expanded="true" height="82" name="Output (4)" width="90" x="45" y="34">
<不相上下ameter key="set_iteration_macro" value="true"/>
<参数键= "指数" value = " %{迭代}" / >
<不相上下ameter key="condition_type" value="expression"/>
<不相上下ameter key="expression" value="%{iteration}==1"/>
<不相上下ameter key="name" value="LoopData"/>
<不相上下ameter key="name" value="LoopData"/>
<参数键= "指数" value = " %{迭代}" / >Is this better?
Scott1
Answers
Hi,
are you sure that this is not caused by a limit on the API? Have you tried to deactivate parallelism of Loop Values and add a Delay (with a Delay Operator)?
Edit: Nevermind, that was off the scope..
Cheers,
Martin
Dortmund, Germany
hi@pari1234- yes I understand what you're trying to do. You're working too hardWith "Enrich Data via Webservice", it is already going through your values for Username one by one, feeding each one to your API and getting a response. You don't need to Loop Values. It's also why there's a "delay" parameter in Enrich Data...it is good practice to put a 200ms or greater delay between queries (to prevent overloading server).
(FYI it's probably not a good idea to post your token in an open forum like this)
Scott
Thanks@sgenzer, I appreciate your response. The token I'm using is for an unpublished app on fb and I will change it once I have exhausted RM community resources :smileyhappy: . In the xml process that you replied back with, the key problem still remains. I only get one row of data per username, i.e. one post and one post id. However I wish to get all the posts (with whatever pagination limit facebook has) and post_ids per username. If you look at the sample JSON o/p from the Graph API, it has 5 posts with ids and a time stamp or in other words 5 rows of data for the given username. Which is why I thought using a loop might solve that for me. Hope that helps with you understanding it better. Thank you.
Pari
Thank you VERY much@sgenzerThis one helps. There was a minor hiccup with the file encoding during "Read XML" but I changed the encoding for "Write Document" from SYSTEM to UTF-16 and it seems to be working perfectly! Thank you!