Last week I posted: "Is AutoModel on break, or is it just me?"
The post can be seen here:
https://community.www.turtlecreekpls.com/discussion/58722/is-automodel-on-break-or-is-it-just-me#latest
I tried the same "churn" problem dataset, this time using RapidMiner Go, which will cost you $10 per month.
Of the original dataset I used .7 to train, .3 to test.
This is the preview of the train dataset.
I build a new predictive model. I choose the column to predict: Churn.
Goremoves the columns that have quality issues from my data. Next. I select my models, run analysis, choose higher accuracy.
I run the analysis. Go produces a Comparison table with Accuracy rates across nine separate algorithms.
I click on Generalized Linear Model with both highest precision and highest AUC. I get results for predicting churn on the train dataset.
我点击模型应用于测试数据集。我得到Check (test) dataset. The model ignores the target attribute Churn when generating predictions.
I calculate predictions, inspect predictions.
This problem was written about here:
https://medium.com/@ODSC/data-driven-artificial-intelligence-ai-for-churn-reduction-90232c1a0c4
The author chose the logistic regression algorithm, shown as slightly less accurate by RM Go.
Her AUC is slightly smaller, maybe because she started with the less accurate algorithm.
She came to the same predictions as I did with RM Go: telco customers will be churning mainly because of problems with Fiber optic and DSL services.
She probably used up hours of resource time writing then debugging Python.
I am not being smug here, but if RapidMiner Go can work with my datasets no problem, can you agree RapidMiner Auto Model needs more work?
Any helpful suggestions are appreciated. If I'm to use RapidMiner Studio on the job, I need to know beforehand when I'm going wrong.
Thank you for plodding your way through this whole thing.
Tony
https://community.www.turtlecreekpls.com/discussion/58722/is-automodel-on-break-or-is-it-just-me#latest
I tried the same "churn" problem dataset, this time using RapidMiner Go, which will cost you $10 per month.
Of the original dataset I used .7 to train, .3 to test.
This is the preview of the train dataset.
I build a new predictive model. I choose the column to predict: Churn.
Goremoves the columns that have quality issues from my data. Next. I select my models, run analysis, choose higher accuracy.
I run the analysis. Go produces a Comparison table with Accuracy rates across nine separate algorithms.
I click on Generalized Linear Model with both highest precision and highest AUC. I get results for predicting churn on the train dataset.
我点击模型应用于测试数据集。我得到Check (test) dataset. The model ignores the target attribute Churn when generating predictions.
I calculate predictions, inspect predictions.
This problem was written about here:
https://medium.com/@ODSC/data-driven-artificial-intelligence-ai-for-churn-reduction-90232c1a0c4
The author chose the logistic regression algorithm, shown as slightly less accurate by RM Go.
Her AUC is slightly smaller, maybe because she started with the less accurate algorithm.
She came to the same predictions as I did with RM Go: telco customers will be churning mainly because of problems with Fiber optic and DSL services.
She probably used up hours of resource time writing then debugging Python.
I am not being smug here, but if RapidMiner Go can work with my datasets no problem, can you agree RapidMiner Auto Model needs more work?
Any helpful suggestions are appreciated. If I'm to use RapidMiner Studio on the job, I need to know beforehand when I'm going wrong.
Thank you for plodding your way through this whole thing.
Tony
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:953UnicornHi Tony,
thank you for this detailed description of your successful model building.
RapidMiner Go is great for these kinds of problems when you have one example set and a clearly defined modeling task.
RapidMiner Studio, which AutoModel is a part of, is a complete data science environment. AutoModel mostly works great and has a few more options and algorithms it can execute. Because of this complexity, it might be better or worse than Go on one given data mining task.
I use both Go and Studio, depending on the task at hand. Very often I need the preprocessing and data wrangling capabilities of Studio because about 70 % of a data science project is data preprocessing. Deploying a standard model is very easy with Go, but more complex environments and requirements might depend on the advanced functionality in Studio or even AI Hub.
I'm glad you're happy with Go. If it solves all your problems, it's the way to Go.
Best regards,
Balázs
0