"Extract decision tree from Bray-curtis heatmap dendrogram"

jamie_slkjamie_slk MemberPosts:1Learner I
edited June 2019 inHelp

I am performing microbiome study, and have already generated (using another program) a heatmap with dendrograms for clustering samples based on bacterial genus using Bray-Curtis dissimilarity, but I'd like to get the decision tree. I know RapidMiner has a decision tree model, but it must use k-means which is different from Bray-Curtis, and I want to preserve the Bray-Curtis clustering. I wonder if it's possible to load my dendrogram into RapidMiner and have it extract the Bray-Curtis decision tree? Thank you very much.

Tagged:

Answers

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist

    Hi@jamie_slk,

    If you are doing clustering analysis with microbiome data, can you please share some test data?

    First thing, the 'tree' from heatmap may NOT be a 'decision tree'. It is a visulization of your Hierarchical cluster model. If you can get the clustering label out of another program. You can build predictive models (e.g. decision tree, or random forest, or SVM) to find the splits and decision rules that are used for clustering.

    Regarding to the dissimilarity measure, do you want to use jaccard instead of Bray-Curtis? Jaccard index is computed as2B/(1+B), whereBis Bray–Curtis dissimilarity [ref]. Bray–Curtis and Jaccard indices are rank-order similar, but, Jaccard index is metric, and probably should be preferred instead of the default Bray-Curtis which is semimetric [ref]. RapidMiner core has an operator for Hierachical clustering (Agglomerative Clustering) with jaccard similarity on numerical data.

    My process used peerj32 data fromhttps://peerj.com/articles/32/#supplemental-information

    bacteria.PNGdecision-tree-rules.PNGtree.PNG

    You have to install R scripts extension, and operator toolbox extension from marketplace to run it.

    The proces will call R for BC dissmilarities and clustering

    dist.mat<-vegdist(dataframe,method="bray", diag=T, upper=T) # or use jaccard
    clust.res<-hclust(dist.mat)
    cluster.label <- cutree(clust.res, k = 4)
    #cut the tree into four clusters and reconstruct the upper part of the tree from the cluster centers.

    Process code:







    <运营商激活= " true "一堂课s="process" compatibility="6.0.002" expanded="true" name="Process">

    <运营商激活= " true "一堂课s="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve peerj32_microbes" width="90" x="45" y="34">


    <运营商激活= " true "一堂课s="set_role" compatibility="8.1.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">




    <运营商激活= " true "一堂课s="agglomerative_clustering" compatibility="8.1.001" expanded="true" height="82" name="Clustering" width="90" x="313" y="34">


    use jaccard similarity for Hierarchical cluster

    <运营商激活= " true "一堂课s="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="187">

    <参数键= " value_type" value="numeric"/>


    <运营商激活= " true "一堂课s="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Execute R" width="90" x="581" y="187">

    run R scipts for Bray Curtis distances and clustering, return the clustering lables

    <运营商激活= " true "一堂课s="operator_toolbox:merge" compatibility="0.9.000" expanded="true" height="103" name="Merge" width="90" x="715" y="187">


    <运营商激活= " true "一堂课s="numerical_to_polynominal" compatibility="8.1.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="849" y="187">



    <运营商激活= " true "一堂课s="set_role" compatibility="8.1.001" expanded="true" height="82" name="Set Role (2)" width="90" x="983" y="187">




    <运营商激活= " true "一堂课s="concurrency:parallel_decision_tree" compatibility="8.1.001" expanded="true" height="103" name="Decision Tree" width="90" x="1117" y="187"/>
    <运营商激活= " true "一堂课s="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model" width="90" x="1251" y="187">


    <运营商激活= " true "一堂课s="performance_classification" compatibility="8.1.001" expanded="true" height="82" name="Performance" width="90" x="1385" y="238">




    <运营商激活= " true "一堂课s="converters:dectree_2_example_set" compatibility="0.3.001" expanded="true" height="82" name="Decision Tree to ExampleSet" width="90" x="1385" y="85"/>








    <连接from_op = "数值多项式”from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>



















    Cheers,

    YY

    sgenzer dang Thomas_Ott
Sign InorRegisterto comment.