Categories

Versions

You are viewing the RapidMiner Python documentation for version 9.10 -Check here for latest version

RapidMiner Notebooks

RapidMiner Notebooks is available as part of the RapidMiner AI Hub. This product enables data science teams of all disciplines (both coders and non-coders) across the enterprise to effectively collaborate on data science projects.

On this page, you will learn how to get started with RapidMiner Notebooks, and gain an understanding of its more advanced features.

Get started with RapidMiner Notebooks

RapidMiner Notebooks ships as part of theRapidMiner AI Hub. This ensures a tight integration with RapidMiner repositories and projects, as well as a single sign-on experience across the platform.

To access RapidMiner Notebooks, navigate to the AI Hub landing page and click onRapidMiner Notebooks.

To start a new notebook, on the Launcher tab shown above, click on the tile representing the kernel of your choice in the Notebook section. All new notebooks you start have a link to our tutorial notebook, which explains how to use the key features of RapidMiner Notebooks. We will only cover some of these features on this page, as the tutorial notebook should provide all the help and context needed.

Environments and kernels

RapidMiner Notebooks comes with a pre-provisioned Jupyter kernel based on a Python environment, containing the most commonly used Python libraries for data science projects (e.g.pandas,numpy,scipy,sklearn, etc.), as well as the libraryrapidminerwhich implements the integration towards other parts of the RapidMiner AI Hub. This environment is centrally managed and is available also for execution in RapidMiner AI Hub (when Python code is embedded into RapidMiner processes, see the chapterCall Python from RapidMinerfor more details).

The centrally managed environments cannot be modified in your notebook instance. To extend the list of kernels available for you, you have two options, depending on your needs:

  • If you are in early stages of development and only need a private kernel for development and experimentation purposes, you should create a local custom kernel.
  • If you want to use a kernel which will be used by others on your team, or it will go into production, you should create a centrally managed environment.

Creating a centrally managed environment

To create a centrally managed environment, you need the right privileges for thePlatform Administration toolin RapidMiner AI Hub. Follow the steps there to learnhow to manage coding environments.

To be able to use a centrally managed environment as a Jupyter kernel inside RapidMiner Notebooks, it has to contain the relevant kernel library (i.e.ipykernelfor Python based kernels,irkernelfor R based kernels).

Once the coding environment is installed, it will show up in RapidMiner Notebooks. No restart is necessary, just allow a few minutes for the environment to get synced and picked up by your notebook instance.

Creating a local custom kernel

As stated above, local custom kernels are only available for the user who created them. The only way to share them is to export their definition file and creating a centrally managed environment based on them (see instructions above).

To create a local custom kernel, open a new Terminal from the Launcher in RapidMiner Notebooks. The terminal will contain instructions on how to clone the existing active environment and go from there, but you can also create a blank new one if you need to start from scratch.

To be able to use your new environment as a kernel inside RapidMiner Notebooks, it has to contain the relevant kernel library (i.e.ipykernelfor Python based kernels,irkernelfor R based kernels).

The kernel will be picked up automatically and become available as one to choose in your notebooks.

Collaboration

The primary method of collaboration we offer in RapidMiner AI Hub is via项目. When using RapidMiner Notebooks, you can access all your projects via the built-in Git integration. (RapidMiner projects are based on Git as a version control and storage system)

You start by cloning the project into your Notebook workspace. This can be done by clicking theClone a Git repository in the current directorybutton, located on the Git panel (you can find it on the left side). On theClone a repodialog, you can choose from a list of projects available in your RapidMiner AI Hub, or alternatively, provide the clone URI of the repository.

Once the clone operation is completed, the project's contents will be available as a local copy in your Notebook workspace. When you have prepared a change that you would like to share with others and store it in the project's history, you need to:

  • click on the Git panel
  • select anyUntracked filesyou may have (these would be new files you added), hover over them and click on theplus iconto track them
  • stage all changed files by hovering over them and clicking on theplus iconto Stage changes. You can also click theStage all changesbutton next to theChangeddropdown.
  • double-check that all the changes you wish to share with others now show up in theStageddropdown
  • add a descriptive message describing your changes into the Summary textbox located at the bottom of the Git panel, and an optional Description.
  • clickCommit

At this point, your changes are stored as a commit in the local copy of your project. To share it with others on your team, you need to push these changes by clicking on thePush committed changesbutton located in the top right corner of the Git panel.

To refresh your local copy with the newest changes others have made, click on thePull latest changesbutton located in the top right corner of the Git panel.

Deployment

Once you're happy with what your code is doing, you will want to deploy it in some way, like a scheduled execution or a web service.

Currently we only support deploying code via RapidMiner processes, then using RapidMiner AI Hub's deployment features to manage the scheduling or the web service deployment.

If you haven't already done so, you will need to alter your code to adhere to thenecessary conventionsto get executed in an Execute Python operator.

Architecture

This section describes the underlying architecture of RapidMiner Notebooks. This can prove useful in understanding its inner workings and limitations.

Under the hood, in each RapidMiner AI Hub deployment, there is a JupyterHub instance running. JupyterHub is responsible for managing the lifecycle of each user's notebook container, as well as authentication and user management. User containers are configured to run with JupyterLab by default, but a fallback to classic Jupyter Notebooks is also available.

To provide a Single Sign-On experience across RapidMiner AI Hub, the deployment of RapidMiner Notebooks is already pre-configured to use the deployed KeyCloak instance as its identity store. This means that there will never be a need for a second authentication when a user starts up RapidMiner Notebooks.

用户笔记本容器仅可在interactive sessions, meaning they are started when the user opens RapidMiner Notebooks, and are stopped when they log out.

There is a single notebook container image used as a template to start each user's own notebook container. The home folder of that container is persisted to a volume, meaning all code, data and private kernels can be stored there and will be available when the users logs back in another session. These volumes are private to the user and are not shared across multiple users.

Each user's notebook container is allowed to consume a preset amount of CPU cores and memory on the host or cluster which is running the RapidMiner AI Hub. See theimage referenceon how to change the resource limit for users. The setting applies to all users of RapidMiner Notebooks in that RapidMiner AI Hub deployment, and cannot be changed per user.

We ship our images with some Jupyter plugins installed and enabled. Currently, users cannot change what plugins are installed in their notebook containers. If you need additional plugins enabled, please contact our support team.