preparing data for mining

grafikbggrafikbg MemberPosts:14Contributor I
edited December 2018 inHelp

hello, i am wondering if there is someone willing to help of an absolute novice in data preparing. we have a 124 electrical controllers, named for comfort from 1 to 124 on each shift some of them switch off and cause troubles. would you help me trough the process to create a excel sheat and run the prediction which of them is most likely to switch off the next shift. a can correct the output after each shift and to improve the results but i will need help. thank you in advance... i voted for rm:)

Tagged:

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@grafikbg,

    Could you describe what data exactly do you have available ? What are the structure of these datas ?

    for example something like that :

    Controllers_predictive_maintenance.png

    Regards,

    Lionel

  • grafikbggrafikbg MemberPosts:14Contributor I

    thank you very much Lionel your support is extremely valuable for me. in the moment the data looks like that, /first column the controllers numbered from one to 124, then the info from 25 shifts with "x" the controller that switched off/, but i can transform it in any way that will works.

    data.jpg 1.2M
  • grafikbggrafikbg MemberPosts:14Contributor I

    data.jpgdata

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@grafikbg,

    I'm not electronic expert so to better understand, when there is one (or more) "x" in a row, the associated controller of the row is switched off ?

    for example in your first row, there is a "x" in the column 12 ==>that means the controller 1 is switched off ?

    a controller isonif and only if, all the values in its row are "on" ?

    "x" is a binary variable ("on" or "x") or a real variable x is in range [1,124] ?

    Regards,

    Lionel

  • grafikbggrafikbg MemberPosts:14Contributor I

    thank you Lionel, i will try to explain it:

    1. yes you are right - "x" in the column 12, means that on 12th shift the controler number one is switched off, respectfully the controler number two is sitched off on the 4th, 6th, 14th and 23th shifts. we need to predict on the next shift /27/ which controllers are most likely to switch off. i probably done a mistake with that "x", i am using it as check mark not as math symbol. a could easily change it to "off" or whatsoever will be easier for rm to interprete as data.

    2. if, all the values in its row are "on", this means that this controller was never switch off during the observed period.

    3. "x" is a binary variable ("on" or "x") or a real variable x is in range [1,124] ? unfortunatelly i am not good with math:)... in the begining of each shift all controllers supposed to be "on", but some of them accidentally are "off"...

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@grafikbg,

    Difficult problem ......

    That's how I see things :

    I think it's impossible to forecast the controllers which will switch off based only on your data at time t (your excel file).

    你必须de-pivot和transpose your excel file to have 3324 attributes : The result is like that :

    Controllers_predictive_maintenance_2.png

    It's a time series problem, so you have now to build a database of history of the statut of your 3224 shifts (124 x 26) with a time step of 15 min for example.

    After that based on the historical data, using a time series process, you can train a model and then apply it to forecast the controllers

    which will switch off.

    I hope it helps,

    Regards,

    Lionel

  • grafikbggrafikbg MemberPosts:14Contributor I

    thank you Lionel... time doesn't matter. each shift begins with restarting of the system, few controllers switch off, we go and switch them on manually, then they work flawlessly till next restart. it only happens once in the begining ow each workshift during the restart.

  • grafikbggrafikbg MemberPosts:14Contributor I

    we are hoping with your valuable support to achieve an rm output like:

    ...during the next workshift there is 85% possibility that controller number 23 will switch off, 84% that controller number 49 will switch off and ect.... even 50% accuracy will cut off our delay time by half... we can fill of the data after each shift for the algorithm to get smarter and encrease the acuracy...

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Dear all,

    To introduce this post, I would quote the french humorist Pierre Dac :

    "Forecasts are difficult especially when they relate to the future."

    After cogitation, I do not see how to predict (with an associated probability) which controller(s) will switch off during the next workshift, with only the provided dataset.

    I considered a time to work with "association rules", but it's not conclusive.

    So if a guru of predictive maintenance has an idea of the method to apply on this case study, I will be curious to know it.

    Regards,

    Lionel

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    so first of all I thank@lionelderkrikorfor both his insights and bringing a sense of culture to this community. I have not heard that quotation before and it is quite a propos!

    “时间”是一个有趣的问题。I was just working with a customer last week on this exact same issue. Let me poke around and see what I can find.


    Scott

    lionelderkrikor grafikbg
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, MemberPosts:290Unicorn

    Hi@grafikbg@lionelderkrikor@sgenzer

    I've been following this interesting thread since some time and didn't find an answer for one crucial (at least in my opinion) question:

    Are those controllers independent, or put into some kind of electrical chain (?) which makes the whole system connected?

    In this sense, for example, does fail in controller #1 directly cause fail in another controller #X?

    Or each controller actually works independently from all the others?

    sgenzer MartinLiebig lionelderkrikor grafikbg
  • grafikbggrafikbg MemberPosts:14Contributor I

    thank you vladimir, the controllers are independant... i forgot to mention that there is obvious pattern...for example during the 27 now observed workshifts the controller number 16 switched off 9 times, but controller 42 never... and there are many that switched of 6 or five times till other 1 or two times or not at all... we have had no less than six and no more than eleven contrillers from all 124 that switched off during the restart.

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, MemberPosts:290Unicorn

    Hi@grafikbg

    Well, I am not an electric engineer by any means, but I try to reason just using common sense:)

    Theoretically, if all controllers were dependant (and connected into some sort of circuit), you could predict one controller state based on another 123 controllers, most likely using time series approach.

    But it seems that if controllers are independant, then the event of controller #1 switching off at some point by no means causes another event of switching off controoler #N (any other). This said, the whole task disintegrates into 124 separate tasks of predicting the next state of each controller independently. For this kind of prediction, it's definitely not enough data. I don't think you can efficiently predict each controller's state based ONLY on an observed pattern, at least it won't make a practical sense: if controller #X switches off every week, you could expect it to switch off next week also, but this doesn't take into account the affecting factors; if controller #Y never switched off, you might expect it to continue working flawlessly... but again, this is not true in real life.

    To have meaningful prediction,for each controller you would need at least few meaningful data points, which directly or indirectly may affect on its state, such as:

    • total time in service
    • total number of repairs
    • average / max electrical load
    • time since last fail
    • some runtime characteristics (current voltage, resistance, whatever else)
    • etc etc etc

    Hope this reasoning helps.

    MartinLiebig
  • grafikbggrafikbg MemberPosts:14Contributor I

    thank you vladimir, you are probably right... but as i mentioned before even the 50% acuracy will cut off our delay time by half, even two or three from 10 will give us 20 valuable minutes... so i was thinking..starting from here and adding new data after each shift slowly to achieve more

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist

    Hi@grafikbg,

    independend on the dependendcy between the switches you can still model it with a simple windowing. The difference would be the format going into the windowing. In case of independence you would create a data set like this:

    TimeStamp Id OffIndicator

    ...

    ...

    Use Group into Collection and group by ID. Inside you use a windowing operator.

    Is there any chance to get more data then just "died"? E.g. amplitudes etc?

    Attached is a process shwoing the idea. It needs value series and toolbox to run.

    Cheers,

    Martin































































































    In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)

















    The model created in the Training step is applied to the current test set (10 %).<br/>The performance is evaluated and sent to the operator results.

    A cross-validation evaluating a decision tree model.
















    c

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    sgenzer lionelderkrikor
  • grafikbggrafikbg MemberPosts:14Contributor I

    thank you very much for your support Martin... i am not that good to figure out all that info but it's encouraging...

Sign InorRegisterto comment.