Categories

Versions

Automatic Job Cleanup

RapidMiner Server automatically saves information related to recently executed jobs. This includes the user who triggered the execution, the job's state, thequeueon which the job has been executed and also the date on which the process was executed - this information can be reviewed on the Executions page within RapidMiner Server. Additionally, the Job Agent which was responsible for executing the job, created adedicated working directoryfor it.

Those stored pieces of information and also the working directories can grow large. In order to avoid this, RapidMiner Server provides a job cleanup mechanism to wipe old jobs. If you'd like to configure it, the following environment variables can be set in your configuration for theaihub-backendcontainer.

With a docker-compose based deployment, add these environment variables to theaihub-backendblock, underenvironment:.

  1. JOBSERVICE_SCHEDULED_ARCHIVE_JOB_CLEANUP_ENABLED: (true or false) Enable the job cleanup.

  2. JOBSERVICE_SCHEDULED_ARCHIVE_JOB_CLEANUP_JOB_CRON_EXPRESSION: This property defines the point in time when the automatic job cleanup will be executed with the help of a cron expression. By default, the cleanup task is configured to runhourlywith the cron expression0 0 * * * *. It follows the cron pattern . So0 */30 * * * *would run the job cleanup every 30 minute whereas0 0 0 * * *would run it daily at midnight. You can use the scheduling dialog in RapidMiner Studio to create cron expressions graphically.

  3. JOBSERVICE_SCHEDULED_ARCHIVE_JOB_CLEANUP_JOB_CONTEXT_CRON_EXPRESSION: The cron expression for cleaning up the job context files like logs. This is independent from the database cleanup, and it is a good idea to select slightly different execution times for the two processes.

  4. JOBSERVICE_SCHEDULED_ARCHIVE_JOB_CLEANUP_MAX_AGE: This property defines the maximum age of jobs inseconds. Jobs older than the value set in the property will be cleaned up. Set this to any arbitrary number greater than zero. Determine a retention interval for the job execution data according to your organization's policies and the available disk space.

  5. JOBSERVICE_SCHEDULED_ARCHIVE_JOB_CLEANUP_JOB_BATCH_SIZE: (number) This property defines the number of job entries to be cleaned up in the database in one execution of the cleanup process. The number should be a bit higher than the number of executions in the job cleanup interval. For example, if you run the job cleanup every 10 minutes and the AI Hub executes 100 jobs in that period, this could be something like 120 or 150. If you are just activating the job cleanup after a long time of not using it, and you want to clean up your data faster, choose higher values. The number shouldn't be higher than the possible cleanup actions in the interval between the job cleanup executions.

  6. JOBSERVICE_SCHEDULED_ARCHIVE_JOB_CLEANUP_JOB_CONTEXT_BATCH_SIZE: (number) The number of job execution results to be cleaned up in the file system in one execution of the cleanup process. Choose a value that is similar to the job batch size (previous entry) based on the number of executions in the cleanup period.