PCA (kernel) RM vs Python : Differents results

lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

Hi,

Sorry in advance if I did a mistake, but I discovered significant differences, between RapidMiner and Python, in the calculation of kpc_i by PCA (kernel).

1. But first, why in PCA (kernel) there is not , like the "classic"PCAoperator :

- in the parameters, the parameterdimensionnality reduction ?

- in the results, the theeigenvectors and eigenvaluestables results (with standard deviation, proportion of variance etc .).

How exploit, in practice, this operator ?

2. Like said above, there is several orders of magnitudes in the calculation of kpc_i (i use for calculation a kernel = "polynomial" and degree = "3"):

RM : kpc_i ~10e12 / Python : kpc_i ~10e5

After research, it seems that kpc_i = eigenvectors x sqrt(eigenvalues). It seems that maybe RM don't take the sqrt in account.

You can find the process here, and the dataset in attached file :







<运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">























































































< portSpacing端口= " sink_result 6”间隔= " 0 " / >



Can you enlighten me about these subjects ?

Thanks you,

Best regards,

Lionel

Tagged:
0
0 votes

Fixed and Released·Last Updated

9.5.0 DC-378

Comments

  • SGolbertSGolbert RapidMiner Certified Analyst, MemberPosts:344Unicorn

    Hi Lionel,

    I checked your results and I've noticed that with the Kernel PCA operator the number of principal components is 77 (equal to the number of examples)! I also tried the tutorial process on Kernel PCA and it goes from 5 attributes to 200 PCs (again 200 examples). Furthermore, all PCs have the same variance (I calculated it with the Covariance Matrix operator). This is surely incorrect.

    It pains me to say this, but I would use the python script for your task.

    Best,

    Sebastian

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi Sebastian,

    Thanks you for your feedback and your analysis.

    Best regards,

    Lionel.

    NB : I suppose that there will be a fix in a next release of RapidMiner ?

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    moving to Product Feedback.

    Scott

  • SGolbertSGolbert RapidMiner Certified Analyst, MemberPosts:344Unicorn

    I have already forwarded the problem to develoment. Can you confirm my observations on your end?

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
  • GottfriedGottfried MemberPosts:17Maven

    I noticed the same issue. The result of PCA (Kernel) has always the same number of principal components equal to the number of records in the example set. This is certainly a bug. Please let me know when this gets fixed ?.

Sign InorRegisterto comment.