"Classloader problem integrating Hadoop to Rapidminer"
First of all, many thanks for your amazing job with Rapidminer. As everybody tells, the Rapidminer team is composed of super heroes.
I'm creating an extension in order to integrate Rapidminer to Haddop, Mahout, Hive and so on, and I'm getting the following exception when I try to submmit a job:
Find bellow my Operator.doWork() code:
Do you have some idea how to fix it?
Thanks in advance,
But in fact,the class org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule is inside the extension jar, togheter with other dependencies that runs fine with a public static void main code.
java.lang.RuntimeException: java.io.IOException: failure to login
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:546)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318)
at com.rapidminer.operator.rmahout.clustering.KMeans.doWork(KMeans.java:116)
at com.rapidminer.operator.Operator.execute(Operator.java:834)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
at com.rapidminer.operator.rmahout.configuration.MastersNode.doWork(MastersNode.java:51)
at com.rapidminer.operator.Operator.execute(Operator.java:834)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
at com.rapidminer.operator.Operator.execute(Operator.java:834)
at com.rapidminer.Process.run(Process.java:925)
at com.rapidminer.Process.run(Process.java:848)
at com.rapidminer.Process.run(Process.java:807)
at com.rapidminer.Process.run(Process.java:802)
at com.rapidminer.Process.run(Process.java:792)
at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)
Caused by: java.io.IOException: failure to login
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452)
at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1494)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1395)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:542)
... 18 more
Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org.apache.hadoop.security.UserGroupInformati
at javax.security.auth.login.LoginContext.invoke(Unknown Source)
at javax.security.auth.login.LoginContext.access$000(Unknown Source)
at javax.security.auth.login.LoginContext$5.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokeCreatorPriv(Unknown Source)
at javax.security.auth.login.LoginContext.login(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471)
As I could check, this is a clas loader problem. Even if I put the dependencies inside Rapidminer\lib directory, the things go wrong.
public void doWork() throws OperatorException {
... Configuration config = new Configuration();
config.set("fs.default.name", "hdfs://" + host + ":"+ hdfsPort);
config.set("mapred.job.tracker",host+":" + mapredPort);
JobConf job = new JobConf(config);
FileInputFormat.setInputPaths(job, new Path("/user/beckmann/testdata"));
FileOutputFormat.setOutputPath(乔b, new Path("b"));
I figured out this problem is not related to classloading, and in fact this is not a rapidminer problem.
The problem lies on a reported bug (https://issues.apache.org/jira/browse/HADOOP-7982), that was fixed in hadoop 1.1.2.
When I moved from hadoop 1.0.4 to 1.1.2, the problem desapeared, and my work is going ahead.
I'll let you known when everything be done,
Best regards,
The work with the "Rapidminer Hadoop extension" is going ahead and for sure will be a 100% open source extension, like the other Hadoop related components did before.
Unfortunatelly not in time for RCOMM 2013.
Just to let you know and to avoid pitfalls, the Hadoop related components have several security constraints,
and some class definitions and security contexts must be in the main class loader, not in the plugin classloader, otherwise we'll face strange behaviors
during plugin execution.
To workarround this, for a while I put all Hadoop's dependent jar inside the rapidminer.jar (like other components did),
但是会以正确的方式,和to avoid to create a "proprietary" rapiminer.jar.
Does someone know how to do this?
Best regards,