How to Analyze Time Data Per Person
I am very new to RapidMiner and have the following task to do:
I have data collected from activity trackers for different individuals. The trackers show step count, heart rate, and blood pressure and how they change every second. I want to use the step count data to predict blood pressure using different machine learning models. However, I am struggling to set up the data, because of the millions of time data corresponding to one person ( I have a total of 20 people). Any suggestions?
Best Answers
-
rfuentealba Moderator, RapidMiner Certified Analyst, Member, University ProfessorPosts:568Unicorn
Hi@n_alkassab, and welcome to the RapidMiner Community!
为了帮助你,让我们分解这个问题in some steps first, to help you with this:
- Getting the repository prepared.
- Adding data to the repository.
- Training your models with time series data.
Why? Because since every person is different, so training a single model might be a bit overkill. That is the approach I used, at least
Getting the repository prepared.
To begin, you need your data split in two example sets: one for the people in your study and the other one for the measurements. What we are planning ahead is to build a way to iterate over the patient example set and read the measurements example set, filter by the patient ID and train a single model
I would create a new repository with this shape:
Figure 1: Data, Processes and Models, because we will have one model per person.Once you have these, you can import your data. I created a simple CSV with Patient ID, Patient Name, Date, Systolic, Diastolic, Pulse. You can find that example one attached to this answer. Of course, that's not the same data you have, but it will help us setting up the rest of the example.
Adding data to the repository.
I imported my data to the repository under the name ofOriginal Patient Data. You can use theRead CSVorRead Exceloperators, but for this little example, I wanted my data inside the RapidMiner repository.
那么你应该获取n a list of patients and a list of measurements separately. I built a process for this, named itProcesses/01 Prepare Patient Dataand saved it.
Figure 2: How to prepare data. The process is called "01 Prepare Patient Data" and is also attached.Training your models
Finally, to train your models, you should make use of theLoop Examplesoperator in combination with theExtract Macrooperator. Here is a picture:
Inside theLoop Examplesoperator, I have this:
Basically what I do is to extract the Patient ID and Patient Name from a Macro, read all the measures, filter examples per each patient, select only the data I need for my model, train my model with that data and store the results. In this case, I save each cluster model visualization generated from clustering data. I wouldn't want to take from you the joy of building stuff.
This model I made is called02 Train Models, and the result is that it saves the models for each patient in theModelsdirectory from your newly created repository.
From this, you should be able to train your model and apply the corrections needed but you have a working sample. I attached the repository too, so you can know how things work there.
Hope this helps,
Rodrigo.
7 -
rfuentealba Moderator, RapidMiner Certified Analyst, Member, University ProfessorPosts:568UnicornHi@n_alkassab,
BTW, further adjustments you can make:
- Store your data once it is filtered, so you don't have to work with millions of records but just the ones you need on every second.
- Store your data in a relational database so you don't have to redo everything every single time. I always recommend PostgreSQL for these things.
All the best,
Rodrigo.7
Answers