compare and analysis text documents
TobiasNehrig
MemberPosts:41Guru
Hi Experts,
I‘m experimenting in text mining and analysis. I’ve created a neighborhood co-occurence from one text and try to analysis and compare it with a larger corpus.
My Example Set look like:
Row No. | Document | Word1 | Word2 | n
1 aaa bbb 2
1 bbb ddd 3
1 aaa bbb 4
2 aaa ccc 3
2 aaa bbb 4
2 ccc aaa 3
This is my process:
<运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">
<参数键= " prune_above_absolute " value = " 3000 "/>
<操作符= " true " class = " select_attribute激活s" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="34">http://www.spiegel.de"/>
<参数键= " prune_above_absolute " value = " 3000 "/>
<操作符= " true " class = " select_attribute激活s" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="45" y="34">
<连接from_op = " Spon过程文件数据" from_port="example set" to_op="Splitting" to_port="in 1"/>
<连接from_op = " Spon过程文件数据(2)" from_port="example set" to_op="Splitting (2)" to_port="in 1"/>
I’m out of ideas how to compare and analyse them.
Please, has someone an idea how I can do this?
Regards
Tobias
Tagged:
0
Answers
Hi@TobiasNehrig,
are these texts or tupels you are working on? And does the order matter? I guess the solution is something like Pivot + Cross Distance or Aggregate + Cross Distance. But the precise solution depends on your use case.
Cheers,
Martin
Dortmund, Germany
Hi@mschmitz,
in my understanding these should be Tupels.
Regards
Tobias
Ok,
I would concat the two words, Pivot, Replace Missings with 0 and use Cross Distance.
Best,
Martin
Dortmund, Germany