Categories

Versions

Project Versioning

AI Hub data storage is backed by integratedGitandLarge File System (LFS)servers for keeping all data of yourProjects

The integrated git server is used to keep track of smaller files and the integrated LFS server is responsible for handling larger files. By default, and if LFS is enabled for aProject, files with the extensions.ioo,.rmhdf5table,.collectionand.conninfoare stored in the LFS server.

It’s recommended to always store binary data like Excel sheets or pictures inside LFS and have LFS enabled for all Projects by default!

You can define additional file extensions which will be tracked by adapting the.gitattributesfile in a Project.

Storage backend

The integrated git and LFS server store data inside theRapidMiner AI Hub home directoryand all files reside in$persistent-rapidminer-home/data/repositories/git_serverandpersistent-rapidminer-home/data/repositories/git_lfs_serverrespectively. In git terminology, the git data is stored insidebaregit repositories. In the integrated LFS server, file names always match their respective SHA-256 checksum.

Advanced configuration for upload, disk space availability and consistency checks

The integrated git and LFS server which store their data inside theRapidMiner AI Hub home directorydepend on enough disk space being available.

In order to avoid corrupted files after upload, they require a certain amount of disk space being available regardless of the size of the uploaded files. In addition to that, when large files are uploaded to aProject, their expected size and SHA-256 checksum is verified by the integrated LFS server.

The following table outlines important properties which can be changed for disk space and consistency checks with environment variables.

Property Description Availability
REPOSITORIES_MAX_UPLOAD_SIZE By default, the LFS server only allows to upload files smaller than 5 Gigabytes. Identifiers likeGborMbfor gigabytes and megabytes are supported. Any version supportingProjects
REPOSITORIES_GIT_ENABLE_DISKSPACE_CHECK_HOOK Verifies that at leastREPOSITORIES_GIT_DISKSPACE_CHECK_THRESHOLDis available inside theRapidMiner AI Hub home directory >= 9.10.4
REPOSITORIES_GIT_DISKSPACE_CHECK_THRESHOLD Defaults to5120M.Identifiers likeGorMfor gigabytes and megabytes are supported. >= 9.10.4
REPOSITORIES_LFS_ENABLE_DISKSPACE_CHECK Verifies that at leastREPOSITORIES_MIN_LFS_DISKSPACE_CHECK_THRESHOLDis available inside theRapidMiner AI Hub home directory >= 9.10.4
REPOSITORIES_MIN_LFS_DISKSPACE_CHECK_THRESHOLD Defaults to5120Mand is doubled whenREPOSITORIES_LFS_REMOVE_UNSUCCESSFUL_UPLOADSis enabled. Identifiers likeGorMfor gigabytes and megabytes are supported. >= 9.10.4
REPOSITORIES_LFS_REMOVE_UNSUCCESSFUL_UPLOADS Defaults totrue.当一致性检查fails during upload (checksum or size), those files will be directly removed afterwards to avoid keeping failed uploads. >= 9.10.4
REPOSITORIES_LFS_ENBLE_UPLOAD_SIZE_CHECK Defaults totrue.Enables check of LFS files being uploaded. >= 9.10.4
REPOSITORIES_LFS_ENBLE_UPLOAD_CHECKSUM_CHECK Defaults totrue.Enables checksum verification of LFS files being uploaded. >= 9.10.4