Passing examples to an operator (API) incrementally

batstache611batstache611 MemberPosts:45Guru
edited December 2018 inHelp

Hi everybody,

I'm sorry I tried finding a solution to this but wasn't successful. I'm trying to pass examples to one of the twitter operators that GETs tweets via one of their APIs. However it has a rate limit of 450 tweets/15 mins. I have a list of twitter handles whose tweets I want to collect. I'm using loop values to iterate through each of those. In my twitter operator, I can configure how many tweets per handle I want to GET. Right now my process is configured to get 30 tweets per handle with a delay of 1 minute between each handle such that it is approximately 450 tweets every 15 minutes. If I wanted more tweets per handle, I'd have to increase the delay time between each handle so that it never goes over the rate limit. This is not only the case with twitter's API but most APIs have these kinds of limits.

But instead of having to calculate how much I should adjust the delay time by every time I wan't to increase the number of tweets per handle, I'd like to have a way of grabbing 4 handles from the exampleset at one time with 150 tweets for each with a delay time of 15 mins -> and then move on to the next 4 handles. What would be the simplest way to do this? Attached is my process. Thank you.







<运营商激活= " true " class = "process" compatibility="7.5.001" expanded="true" name="Process">

<运营商激活= " true " class = "read_csv" compatibility="7.5.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">












<运营商激活= " true " class = "concurrency:loop_values" compatibility="7.5.001" expanded="true" height="82" name="Loop Values" width="90" x="380" y="34">



<运营商激活= " true " class = "handle_exception" compatibility="7.5.001" expanded="true" height="82" name="Handle Exception" width="90" x="179" y="340">



<运营商激活= " true " class = " social_media:海rch_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="179" y="187">


















<运营商激活= " true " class = "store" compatibility="7.5.001" expanded="true" height="68" name="Store" width="90" x="380" y="340">


<运营商激活= " true " class = "delay" compatibility="7.5.001" expanded="true" height="82" name="Delay" width="90" x="581" y="340">









< portSpacing端口= " sink_output 2”间隔= " 0 " / >


<运营商激活= " true " class = "subprocess" compatibility="7.5.001" expanded="true" height="82" name="Union Append" width="90" x="648" y="34">

<运营商激活= " true " class = "loop_collection" compatibility="7.5.001" expanded="true" height="82" name="Output (4)" width="90" x="45" y="34">





<运营商激活= " true " class = "branch" compatibility="7.5.001" expanded="true" height="82" name="Branch (2)" width="90" x="313" y="34">










<运营商激活= " true " class = "recall" compatibility="7.5.001" expanded="true" height="68" name="Recall (5)" width="90" x="45" y="187">


<运营商激活= " true " class = "union" compatibility="7.5.001" expanded="true" height="82" name="Union (2)" width="90" x="179" y="34"/>









<运营商激活= " true " class = "remember" compatibility="7.5.001" expanded="true" height="68" name="Remember (5)" width="90" x="581" y="34">







< portSpacing端口= " sink_output 2”间隔= " 0 " / >


<运营商激活= " true " class = "select" compatibility="7.5.001" expanded="true" height="68" name="Select (6)" width="90" x="179" y="34">




















Answers

  • FBTFBT MemberPosts:106Unicorn

    I think one way of solving this would be to use the "Multiply" and "Filter Examples" operators. If you need run this process only once, and if the number of your handles is manageable the solution is fairly simple:

    1.Generate a rank attribute (it may be quicker to do this directly in your source file). This basically just assigns one unique number from 1 to X (where X is the total number of handles) to your handles. It is then used for filtering purposes.

    2. Multiply your dataset as many times as required. From your given information, you would need X/4 copies of your dataset.

    3. Filter examples for rank. 1 - 4; 5 - 8; 9 - 12; etc., each being a different thread from your multiply operator.

    4. Run the process as you have it now (just adapt the delay accordingly).

    General note: make sure that the process order (within a multiply thread) is correct.

    If you have a huge list of handles, the filtering can be solved more elegantly in a different loop, but it requires some slightly more elaborate logic to make sure the correct handles are selected.

    If the process is meant to run constantly, you would need to put everything in yet another loop, making sure to configure the delay in such a way that it doesn't exceed the API limit.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    hi...another option is to use the Twitter Streaming API instead of the one out of the box (search). I have not used it myself but my understanding is that, for use cases such as yours, it may be a better option:https://dev.twitter.com/streaming/public

    Scott

Sign InorRegisterto comment.