Bug with Loop Values?

pari1234pari1234 MemberPosts:26Maven
edited November 2018 inHelp

Hi RM team, I'm trying to call Facebook graph API using Enrich Data by Webservice operator which I'm using inside the Loop Values operator that outputs a collection of documents. Input data is a csv with a bunch of facebook business page usernames. Basically, as far as I understand, the Loop Values operator is supposed to grab each username and return me some facebook content for each handle, but -

  • it is only doing that partially
  • each document in the collection from Loop Values should only contain data for one username however it contains all the usernames and only one row of data per user.

Attached:

  1. RM process
  2. Input excel
  3. JSON output from facebook API from an API testing platform.

Any help will be greatly appreciated as I'm kind of on a deadline for this. Thank you.

PROCESS










<不相上下ameter key="excel_file" value="C:\Users\Pari\Documents\BDC\Socials\Facebook Scrapper\Test\TestHandles.xlsx"/>
<不相上下ameter key="imported_cell_range" value="A1:A5"/>
<不相上下ameter key="first_row_as_names" value="false"/>

<不相上下ameter key="0" value="Name"/>


<不相上下ameter key="0" value="Username.true.polynominal.attribute"/>



<不相上下ameter key="attribute" value="Username"/>


<不相上下ameter key="query_type" value="JsonPath"/>







<不相上下ameter key="message" value="$..message"/>
<不相上下ameter key="post id" value="$..id"/>

<不相上下ameter key="url" value="https://graph.facebook.com/v2.10/&lt;%Username%&gt;/posts?access_token=1745625495738593|w_a8sajfHCYsCHNZOTDr5H1r-wY"/>

<不相上下ameter key="encoding" value="UTF-8"/>

















JSON Output

{
"data": [
{
"created_time": "2017-10-31T12:01:32+0000",
"message": "Click to read news on #Tableau latest conference.\n#BigData #Tech",
"id": "1563861787269208_1910035019318548"
},
{
"created_time": "2017-10-30T22:02:02+0000",
"message": "\"South Australia is about to get “Big Doctor”, cloud-based artificial intelligence that analyses our health and intervenes when it spots something amiss.\"-Brad Crouch",
"id": "1563861787269208_1909800592675324"
},
{
"created_time": "2017-10-30T21:21:00+0000",
"message": "Why you should welcome Artificial Intelligence with open arms",
"id": "1563861787269208_1909790786009638"
},
{
"created_time": "2017-10-30T12:00:59+0000",
"message": "\"AI will put bankers out of work? Some people think these advances will boost productivity, enabling industries to actually increase the number of jobs\"",
"id": "1563861787269208_1909600706028646"
},
{
"created_time": "2017-10-27T12:01:38+0000",
"message": "What's Elon Musks stance on Artificial Intelligence?",
"id": "1563861787269208_1908177749504275"
}
],
"paging": {
"cursors": {
"before": "Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5TXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9qazFPVEE1TURrME56STJOelkzTlRZAeU9ROE1ZAWEJwWDNOMGIzSjVYMmxrRHlFeE5UWXpPRFl4TnpnM01qWTVNakE0WHpFNU1UQXdNelV3TVRrek1UZAzFORGdQQkhScGJXVUdXZAmhtSEFFPQZDZD",
"after": "Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5UXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9pMHlORGs0TURBMk9UZA3pOVEEyTkRJMU9EUVBER0ZA3YVY5emRHOXllVjlwWkE4aE1UVTJNemcyTVRjNE56STJPVEl3T0Y4eE9UQTRNVGMzTnpRNU5UQTBNamMxRHdSMGFXMWxCbG56SUNJQgZDZD"
},
"next": "https://graph.facebook.com/v2.10/1563861787269208/posts?pretty=1&limit=5&after=Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5UXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9pMHlORGs0TURBMk9UZA3pOVEEyTkRJMU9EUVBER0ZA3YVY5emRHOXllVjlwWkE4aE1UVTJNemcyTVRjNE56STJPVEl3T0Y4eE9UQTRNVGMzTnpRNU5UQTBNamMxRHdSMGFXMWxCbG56SUNJQgZDZD"
}
}

Best Answer

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
    Solution Accepted

    ah I see. Sorry about that.:)So this is a common challenge that we are currently working - parsing JSON arrays as a response to some webservice. There are a couple of workarounds that you can use in the meanwhile...converting to XML is probably the easiest. RapidMiner handles XML much, much better than JSON in its current version.










    <参数键= value =“csv_file /用户/ GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/TestHandles.csv"/>
    <不相上下ameter key="first_row_as_names" value="false"/>

    <不相上下ameter key="0" value="Name"/>

    <不相上下ameter key="encoding" value="UTF-8"/>

    <不相上下ameter key="0" value="Username.true.polynominal.attribute"/>



    <不相上下ameter key="query_type" value="Regular Expression"/>


    <不相上下ameter key="jsonResponse" value=".*"/>






    <不相上下ameter key="message" value="$..message"/>
    <不相上下ameter key="post id" value="$..id"/>

    <不相上下ameter key="url" value="https://graph.facebook.com/v2.10/&lt;%Username%&gt;/posts?access_token=1745625495738593|w_a8sajfHCYsCHNZOTDr5H1r-wY"/>

    <不相上下ameter key="encoding" value="UTF-8"/>




    <不相上下ameter key="first_example" value="%{example}"/>
    <不相上下ameter key="last_example" value="%{example}"/>


    <不相上下ameter key="select_attributes_and_weights" value="true"/>

    <不相上下ameter key="jsonResponse" value="1.0"/>





    <不相上下ameter key="file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/jsonExport.xml"/>


    <不相上下ameter key="file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/jsonExport.xml"/>
    <不相上下ameter key="xpath_for_examples" value="//json/data"/>

    <不相上下ameter key="xpath_for_attribute" value="created_time[1]/text()"/>
    <不相上下ameter key="xpath_for_attribute" value="id[1]/text()"/>
    <不相上下ameter key="xpath_for_attribute" value="message[1]/text()"/>


    <不相上下ameter key="use_default_namespace" value="false"/>


    <不相上下ameter key="0" value="created_time[1]/text().true.attribute_value.attribute"/>
    <参数键= " 1 " value = " id () .true.attri[1] /文本bute_value.attribute"/>
    <不相上下ameter key="2" value="message[1]/text().true.attribute_value.attribute"/>

















    <运营商激活= " true " class = " loop_collection”compatibility="7.6.001" expanded="true" height="82" name="Output (4)" width="90" x="45" y="34">
    <不相上下ameter key="set_iteration_macro" value="true"/>


    <参数键= "指数" value = " %{迭代}" / >


    <不相上下ameter key="condition_type" value="expression"/>
    <不相上下ameter key="expression" value="%{iteration}==1"/>









    <不相上下ameter key="name" value="LoopData"/>












    <不相上下ameter key="name" value="LoopData"/>










    <参数键= "指数" value = " %{迭代}" / >




















    Is this better?


    Scott

    MartinLiebig

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,314RM Data Scientist

    Hi,

    are you sure that this is not caused by a limit on the API? Have you tried to deactivate parallelism of Loop Values and add a Delay (with a Delay Operator)?

    Edit: Nevermind, that was off the scope..

    Cheers,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    hi@pari1234- yes I understand what you're trying to do. You're working too hard:)With "Enrich Data via Webservice", it is already going through your values for Username one by one, feeding each one to your API and getting a response. You don't need to Loop Values. It's also why there's a "delay" parameter in Enrich Data...it is good practice to put a 200ms or greater delay between queries (to prevent overloading server).










    <参数键= value =“csv_file /用户/ GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/TestHandles.csv"/>
    <不相上下ameter key="first_row_as_names" value="false"/>

    <不相上下ameter key="0" value="Name"/>

    <不相上下ameter key="encoding" value="UTF-8"/>

    <不相上下ameter key="0" value="Username.true.polynominal.attribute"/>



    <不相上下ameter key="query_type" value="JsonPath"/>







    <不相上下ameter key="message" value="$..message"/>
    <不相上下ameter key="post id" value="$..id"/>

    <不相上下ameter key="url" value="https://graph.facebook.com/v2.10/&lt;%Username%&gt;/posts?access_token=1745625495738593|w_a8sajfHCYsCHNZOTDr5H1r-wY"/>

    <不相上下ameter key="encoding" value="UTF-8"/>









    (FYI it's probably not a good idea to post your token in an open forum like this:))


    Scott

  • pari1234pari1234 MemberPosts:26Maven

    Thanks@sgenzer, I appreciate your response. The token I'm using is for an unpublished app on fb and I will change it once I have exhausted RM community resources :smileyhappy: . In the xml process that you replied back with, the key problem still remains. I only get one row of data per username, i.e. one post and one post id. However I wish to get all the posts (with whatever pagination limit facebook has) and post_ids per username. If you look at the sample JSON o/p from the Graph API, it has 5 posts with ids and a time stamp or in other words 5 rows of data for the given username. Which is why I thought using a loop might solve that for me. Hope that helps with you understanding it better. Thank you.

    Pari

  • pari1234pari1234 MemberPosts:26Maven

    Thank you VERY much@sgenzerThis one helps. There was a minor hiccup with the file encoding during "Read XML" but I changed the encoding for "Write Document" from SYSTEM to UTF-16 and it seems to be working perfectly! Thank you!

    sgenzer
Sign InorRegisterto comment.