Categories

Versions

Troubleshooting

This article outlines common problems while upgrading RapidMiner Server.

Timeout during RapidMiner Server start

You might see the following log lines in theserver.logfile within theRapidMiner Server home directory:

ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) JBAS013412: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[ ("core-service" => "management"), ("management-interface" => "native-interface") ]' ERROR [org.jboss.as.controller.client] (Controller Boot Thread) JBAS014781: Step handler org.jboss.as.server.DeployerChainAddHandler$FinalRuntimeStepHandler@2821a6c1 for operation {"operation" => "add-deployer-chains","address" => []} at address [] failed handling operation rollback -- java.util.concurrent.TimeoutException

Explanation: JBoss requires a lot of time to start after the initial upgrade and then times out. This can happen because a column has been added in the new version and the existing table needs to be migrated. When such tables are large, the migration can take a lot of time and exceed the JBoss deployment time which is 300 seconds by default.

Solution: The propertyjboss.as.management.blocking.timeoutis used to determine how long a deployment might take before JBoss aborts the deployment. A solution to the problem is to temporarily increase the default timeout. Please use the following statement (.baton Windows) to start RapidMiner Server for a temporary timeout increase:./bin/standalone.sh -Djboss.as.management.blocking.timeout=3600. After the upgrade completed successfully the timeout increase is not required anymore.

Overlapping Job Container ports on a single host

With RapidMiner Server 9.5 the Job Container architecture changed fundamentally and requires system ports for the Job Agent to Job Container communication which are only used locally on the machine on which the Job Agent is deployed.

In case multiple Job Agents are hosted on a single shared machine, the definition of duplicate ports might result in the following log lines:

Job container '1' cannot be spawned, because port '10000' is not available Job container '1' started successfully with PID 'null'.

Such a scenario occurs when multiple Job Agents define the same value for thejobagent.container.listenPortRangeStartproperty and are hosted on a single machine. To overcome this problem, ensure to define distinct port start ranges for each deployed Job Agent on the same machine to avoid overlapping ports of Job Containers.

Job Archive contains pending or running jobs

When you've upgraded to9.10.4while not all executions have been finished (see instructions on thechangelog page), non-final executions show in theOnly archived executionsview. This is expected because the underlying database migration only renamed the tables to have thea_prefix.

Those jobs are also not picked up by theJob Cleanupbecause their state is not final.

To overcome this, you need to delete those archived jobs which aren't in a final state from theJob Archivetables manually.

Here's an example for Postgres of how to view all jobs of the archive table which are still in a non-final state:

Remember to always备份before executing any destructive database operation.

SELECT * FROM a_jobservice_job WHERE state IN ('PENDING', 'STARTING', 'RUNNING')

If you execute a properDELETE声明中,确保它小瀑布es also referenced tables likea_jobservice_job_errorora_jobservice_job_log.