compare and analysis text documents

TobiasNehrig · December 2017

Hi Experts,

I‘m experimenting in text mining and analysis. I’ve created a neighborhood co-occurence from one text and try to analysis and compare it with a larger corpus.

My Example Set look like:

Row No. | Document | Word1 | Word2 | n

1 aaa bbb 2

1 bbb ddd 3

1 aaa bbb 4

2 aaa ccc 3

2 aaa bbb 4

2 ccc aaa 3

This is my process:







<运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">










<参数键= " prune_above_absolute " value = " 3000 "/>






























<操作符= " true " class = " select_attribute激活s" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="34">












































































http://www.spiegel.de"/>































<参数键= " prune_above_absolute " value = " 3000 "/>






























<操作符= " true " class = " select_attribute激活s" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="45" y="34">












































































<连接from_op = " Spon过程文件数据" from_port="example set" to_op="Splitting" to_port="in 1"/>






<连接from_op = " Spon过程文件数据(2)" from_port="example set" to_op="Splitting (2)" to_port="in 1"/>

I’m out of ideas how to compare and analyse them.

Please, has someone an idea how I can do this?

Regards

Tobias

MartinLiebig · December 2017

Hi@TobiasNehrig,

are these texts or tupels you are working on? And does the order matter? I guess the solution is something like Pivot + Cross Distance or Aggregate + Cross Distance. But the precise solution depends on your use case.

Cheers,

Martin

TobiasNehrig · December 2017

Hi@mschmitz,

in my understanding these should be Tupels.

Regards

Tobias

MartinLiebig · December 2017

Ok,

I would concat the two words, Pivot, Replace Missings with 0 and use Cross Distance.

Best,

Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

compare and analysis text documents

Answers