new features in the 4.4 release

emaema MemberPosts:33Guru
edited November 2018 inHelp
Hi,
I got an email that the new rapid miner 4.4 will be release soon,

i cant wait ...

what are the new features
specially in clustering and classifications?

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi,
    here's a snapshot from the changes.txt in the current developer repository:

    Changes from RapidMiner 4.3.2 to RapidMiner 4.4 [2009/??/??]
    ---------------------------------------------------------------

    * New operators:

    - ExampleSetSuperset
    - ExampleSetUnion
    - MacroConstruction
    - CumulateSeries
    - FastLargeMargin


    * Parameters will now be adapted according to an operator
    rename, for example the settings of operators like
    the ProcessLog or the parameter optimization operators
    are automatically corrected to the new operator names

    * Graphs like the similarity graph display the strengths
    of the edges now by their color

    * Added new tree layout algorithm for the decision trees
    preventing most overlapping, the old tighter version
    is available as layout type "Tree (Tight)"

    * Decision trees now show the subtree size as tool tip
    for the inner nodes, the edges are now darker for
    larger subtrees and brighter for smaller ones

    * Tables like the (meta) data view now supports a new
    context menu for common table operations like column
    sorting or row / column selection

    * The New Operator dialog now also supports full text
    search in the description texts of the operators

    * RapidMiner now stores all parameter values in the
    process files including the default values which ensures
    a better compatibility with future versions. The XML tab,
    however, only shows the values differing from the default

    * Univariate and multivariate series windowing operators
    now also support nominal attributes and even mixed
    types in cases where the series is represented by
    the examples (rows) of the data set

    * The range statistics of nominal attributes in the
    meta data view now shows the values with highest and
    lowest occurrency counts, sorts the values according
    to the counts, and displays only an excerpt of the
    occurring values if large amounts of different values
    exist

    * List of recent files is now directly saved after opening
    a new process and not only during shutdown

    * Changes in the process setup are now allowed even during
    process runtime, e.g. when waiting at a breakpoint

    * Updated to latest version of Weka (as of February 26th, 2009)

    * Bugfixes:

    - fixed bug accuracy criterion for the revised decision
    tree learner
    - Fixed bug in parameter list of ValueSubgroupIterator
    - Fixed bug in ExceptionHandling which sometimes led to
    doubled outputs
    - Fixed bug in ProcessBranch which sometimes led to
    doubled outputs
    - ViewAttributes did not add min and max statistics
    so that those statistics where not calculated on
    data table views


    Changes from RapidMiner 4.3.1 to RapidMiner 4.3.2 [2009/02/17]
    ---------------------------------------------------------------

    * New operators:

    - LinearDiscriminantAnalysis
    - QuadraticDiscriminantAnalysis
    - RegularizedDiscriminantAnalysis
    - DasyLabExampleSource
    - FileIterator
    - ExceptionHandling
    - ChangeAttributeNamesReplace
    - ChangeAttributeNames2Generic
    - DateAdjust
    - MinMaxBinDiscretization
    - RainflowMatrix


    * Deprecated operators:

    - DirectoryIterator (use FileIterator instead)

    * Renamed parameters:

    - ExampleSetWriter:
    quote_whitespace is now named quote_nominal_values


    * ExampleSetMerge can now handle missing values

    * RapidMiner does now better support counts for the in-
    and output types which should considerably reduce the
    amount of warnings if operators like IOConsumer,
    IOMultiplier or ExampleSetMerge (reducing several objects
    of the same type to one of the same) are used

    * FileIterator replaces DirectoryIterator and adds many
    new features like recursive iteration, file name based
    filtering, and a new macro for the parent path

    * Centroid based clusterings now support assigning unseen
    examples to the nearest cluster on apply time

    * ProcessBranch now supports a branching with respect
    to the existance of an input object

    * ClearProcessLog now also allows to remove the complete
    logging table

    * The logging tables of the ProcessLog operator will now
    not be generated during start up but during the first
    operator usage (and also during the following if the
    table was deleted in the meantime, e.g. in a loop)

    * Added support for different time zones, users can now
    define the preferred time zone in the settings dialog
    and time conversion operators are not able to respect
    this setting

    * Date and times are now displayed in the system's local
    settings

    * New plotter: Block

    * Added support for applying a log scale for the color
    column for the Scatter plot and the new Block plotter

    * Data tables like those generated by the process log
    are now de-coupled from the table used for plotting
    preventing that the rows will be sampled and rows
    would be removed from the data table

    * A double click on the region between two columns in
    the table header now automatically resizes the left
    column to a fitting size (known from Windows programs)

    * A double click on the same region while pressing CTRL
    will resize all table columns according to the contents

    * GuessValueTypes now only works on regular attributes
    and provides a parameter for extending it on the special
    attributes (work_on_special)

    * AttributeFilter now also provides a new parameter
    work_on_special

    * The operator Replace now also allows empty replace_by
    values

    * The ExampleSetJoin operator now also works if the
    id of the first example set is not part of the second

    * Guess value types can now handle missing values

    * CSVExampleSetWriter now supports the parameter quote_nominal

    * All feature selection and weighting operators now also
    provide the possibility to log the names of the features
    of the current generation's best individual

    * The Replace operator now supports capturing groups

    * The file based example source operators (ExampleSource,
    SimpleExampleSource, CSVExampleSource...) now better
    supports quoted strings and also escaped quotes (escaping
    with \")

    * Implementation details:

    - The method Tools.quotedSplit(...) should now be used
    instead of a regular split followed by the method
    Tools.mergeQuotedSplits(...)


    * Bugfixes:

    - fixed bug in DBScan for empty cluster models
    ——固定错误简单抽样的情况下local
    random seed was used
    - fixed bug in process logging to files which prevented
    the writing of the first logged result
    - fixed bug in PSO optimization for cases where the fitness
    should be minimized instead of maximized
    - fixed bug in binary performance measure which was not
    delivering the fitness for specificity, sensitivity,
    and youden index
    - fixed bug in meta data table viewer in cases where huge
    numbers of long nominal values existed which caused a
    crash of the Java Virtual Machine in some cases


    Changes from RapidMiner 4.3 to RapidMiner 4.3.1 [2009/01/12]
    ---------------------------------------------------------------

    * New operators:

    - RemoveDuplicates
    - Cluster2Prediction
    - DirectoryIterator
    - TextObjectWriter
    - TextObjectLoader
    - TextExtractor
    - SingleTextObjectInput
    - TextCleaner
    - TextObject2ExampleSet
    - TextSegmenter
    - AddAttribute
    - SetData
    - EMClustering
    - AttributeWeights2ExampleSet
    - TransitionGraph
    - DatabaseExampleVisualizationOperator


    * Revised decision tree learning which lead to drastically
    reduced runtimes and better tree models in terms of
    generalization capabilities

    * The bar chart now displays the category as label in the
    domain axis

    * Removed plotter: Bars 3D

    * The IOObjectReader now allows the definition of the expected
    output type

    * The LiftParetoChart does no longer re-apply the input model if
    a predicted label does already exist

    * Added the ability to "explode" tiles of pie and ring charts

    * Added several new options for the reporting operators of the
    RapidMiner Enterprise Edition as well as true parameter handling
    including type checks

    * Updated to latest release of Jung

    * Fixed GUI related memory leaks


    * Implementation details:

    - The class AttributeWeightsCreator was renamed to
    ExampleSet2AttributeWeights


    * Bugfixes:

    - Fixed a combination of GUI and process thread related
    memory leaks
    - Fixed bug in Series Multiple Plotter which prevented
    rescaling
    - Pie and Bar charts used class limit instead of legend
    限制为了决定是否应该年代传奇hown
    - special format in ExampleSetWriter ignored quote
    whitespace setting
    - bug in XVPrediction fixed


    Hope that satisfies your needs :P


    Greetings,
    Sebastian
  • emaema MemberPosts:33Guru
    Thank you very much... can not wait
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi Ema,
    then you could check out the developer version using the developer branch from cvs? A guide for checking out using eclipse is on our website.

    Greetings,
    Sebastian
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    The new version 4.4 will be released this week. So only a few days left for waiting ;D

    Cheers,
    Ingo
  • emaema MemberPosts:33Guru
    Hi ,
    downloaded the new Rapidminer...

    我在想如何use the Cluster2Prediction ?

    Thank you
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi Ema,
    Cluster2Prediction enables you to use classification performance measures for clustering, if label informations are available. For example think of the situation, where you know what has to be in the same cluster for a subset of your data. You then might use any flat clustering algorithm and test if it discovers your cluster structure. To achieve this, the operator matches the given cluster labels with the class labels in the best fitting way and converts the clusterattribute into a prediction attribute. You then might use the standard performance operators for classification to calculate the performance.

    Greetings,
    Sebastian
  • emaema MemberPosts:33Guru
    Hi.
    Thank you very much

    It works great

    but with aggolom_clustering i tried to use it but
    it is not working

    i tried to flattern then to use example2cluster

    but still can not work ...


    Thank you in advance
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    Hi,

    there seems to be a problem during the flattening of the agglomerative clustering. I send this topic to Sebastian who is our clustering expert.

    Cheers,
    Ingo
Sign InorRegisterto comment.