How to Analyze Time Data Per Person

n_alkassab · November 2018

Hey there,

I am very new to RapidMiner and have the following task to do:
I have data collected from activity trackers for different individuals. The trackers show step count, heart rate, and blood pressure and how they change every second. I want to use the step count data to predict blood pressure using different machine learning models. However, I am struggling to set up the data, because of the millions of time data corresponding to one person ( I have a total of 20 people). Any suggestions?

rfuentealba · November 2018

Hi@n_alkassab, and welcome to the RapidMiner Community!

为了帮助你,让我们分解这个问题in some steps first, to help you with this:

Getting the repository prepared.
Adding data to the repository.
Training your models with time series data.

Why? Because since every person is different, so training a single model might be a bit overkill. That is the approach I used, at least

Getting the repository prepared.

To begin, you need your data split in two example sets: one for the people in your study and the other one for the measurements. What we are planning ahead is to build a way to iterate over the patient example set and read the measurements example set, filter by the patient ID and train a single model

I would create a new repository with this shape:

Figure 1: Data, Processes and Models, because we will have one model per person.

Once you have these, you can import your data. I created a simple CSV with Patient ID, Patient Name, Date, Systolic, Diastolic, Pulse. You can find that example one attached to this answer. Of course, that's not the same data you have, but it will help us setting up the rest of the example.

Adding data to the repository.

I imported my data to the repository under the name ofOriginal Patient Data. You can use theRead CSVorRead Exceloperators, but for this little example, I wanted my data inside the RapidMiner repository.

那么你应该获取n a list of patients and a list of measurements separately. I built a process for this, named itProcesses/01 Prepare Patient Dataand saved it.

Figure 2: How to prepare data. The process is called "01 Prepare Patient Data" and is also attached.

Training your models

Finally, to train your models, you should make use of theLoop Examplesoperator in combination with theExtract Macrooperator. Here is a picture:

Inside theLoop Examplesoperator, I have this:

Basically what I do is to extract the Patient ID and Patient Name from a Macro, read all the measures, filter examples per each patient, select only the data I need for my model, train my model with that data and store the results. In this case, I save each cluster model visualization generated from clustering data. I wouldn't want to take from you the joy of building stuff.

This model I made is called02 Train Models, and the result is that it saves the models for each patient in theModelsdirectory from your newly created repository.

From this, you should be able to train your model and apply the corrections needed but you have a working sample. I attached the repository too, so you can know how things work there.

Hope this helps,

Rodrigo.

rfuentealba · November 2018

Hi@n_alkassab,

BTW, further adjustments you can make:

Store your data once it is filtered, so you don't have to work with millions of records but just the ones you need on every second.
Store your data in a relational database so you don't have to redo everything every single time. I always recommend PostgreSQL for these things.

This process I built for you looks complex, but it has a dozen modifications you can make to get it done properly. I encourage you to experiment with these!

All the best,

Rodrigo.

n_alkassab · November 2018

Thank you soo much ! you saved me a ton of time I really appreciate it

Howdy, Stranger!

Quick Links

Categories

RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

How to Analyze Time Data Per Person

Best Answers

Getting the repository prepared.

Adding data to the repository.

Training your models

Answers