Creative Misuse of RapidMiner

IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
edited November 2018 inKnowledge Base

Creative Misuse of RapidMiner

One of the most fun events at the RapidMiner Wisdom conference is the live predictive analytics process design competition "Who Wants to be a Data Miner?" In this competition, participants must design RapidMiner processes for a given goal within a few minutes. The tasks are related to predictive analytics and data analysis in general, but are rather uncommon. In fact, most of the challenges ask for things RapidMiner was never supposed to do.

DuringRapidMiner Wisdom 2016 in New York City, we again had two tasks prepared for the audience. Three brave contestants battled against each other and the clock to find the right solution (or at least something which is close enough). The first task this year was:

Create the full lyrics to “99 Bottles of Beer on the Wall”

According toWikipedia, "99 Bottles of Beer is an anonymous United States folk song dating to the mid-20th century. It is a traditional song in both the United States and Canada. It is popular to sing on long trips, as it has a very repetitive format which is easy to memorize, and can take a long time to sing.”

Well, yeah. Some say that there arenumerous problems这首歌但这——尽管是一个有趣的意图d – not the subject of this post. (By the way, the song has appeared many time in popular culture as well: maybe most notably, at least for some, in the gameMonkey Island.)

Anyway, here is how the song goes:

99 bottles of beer on the wall, 99 bottles of beer.

Take one down and pass it around, 98 bottles of beer on the wall.

98 bottles of beer on the wall, 98 bottles of beer.

Take one down and pass it around, 97 bottles of beer on the wall.

97 bottles of beer on the wall, 97 bottles of beer.

Take one down and pass it around, 96 bottles of beer on the wall.

1 bottle of beer on the wall, 1 bottle of beer.

Take one down and pass it around, 0 bottles of beer on the wall.

Full lyrics can be foundherebut I think you got the idea.

So how can we solve the task above with RapidMiner?

Let’s start with a screenshot of the solution first:

ingo1.png

We start with the operator “Generate Data” and generate a random data set with only 1 column and 100 examples (make the appropriate settings in the parameters of the operator). This is maybe not the most elegant way but one of the easiest ways in RapidMiner to get a data set with a specific structure and size. As a next step, we now need numbers from 1 to 100 in an extra column. Again, there are multiple ways to achieve this but the simplest is probably to use the operator “Generate ID” which is doing exactly that. We can now use “Select Attributes” and remove the columns which have been originally generated by “Generate Data”, i.e. we only keep our new “id” column. The result is a data set with 100 rows and the numbers 1 to 100 in one column named “id”.

Now all the logic happens in the next operator: “Generate Attributes”. The main problem which needs to be solved is how do we transform the sequence of numbers from 1 to 100 into a sequence from 99 to 0? Well that is easy: we can just generate a new value by subtracting the current “id” in each row from 100. At the same time we add the rest of the lyrics around those numbers. Here is how you need to set the parameters of “Generate Attributes” to achieve this:

ingo2.png

Now you could even concatenate all these new columns into a single one if you want to. I leave it to you to figure out how. The final result after executing the process then looks like the following screenshot (only showing the beginning):

ingo3.png

If you run the process yourself, check out the last line as well. I admit that we could handle this a bit better since the created lyrics end on: “0 bottles of beer on the wall, 0 bottles of beer. Take one down and pass it around. -1 bottles of beer on the wall.” Well, there is nothing wrong with -1 bottles of beer for mathematicians and physicists but some IT systems might not like negative numbers of objects.

Using RapidMiner for tasks like this is of course a bit, well, strange. But it also shows how flexible and powerful the visual approach of RapidMiner actually is. Others have created solutions in practicallyevery programming languageon earth, some shorter and some longer than others. But I would always prefer the RapidMiner solution over the code of most of them.

下面是XMLthe complete process. You can save it into an arbitrary file on your system and use “File -> Import Process…” to get it into RapidMiner.

Have fun trying this out!

XML of the Process:

BalazsBarany
    Sign InorRegisterto comment.