Set Class Label to The Dataset

Fatin_Fezarudin · September 2018

Hi All,

I have questions to ask regarding my dataset. I have a bunch of text and I only want to take out relevant words to make it as a class label.

Is it related to text mining? How I want to set the class label?

Here the example of text in my dataset:

(The red color is the class label that I want to set)

Plan, lead, organize production schedule.Conduct necessary checking of all raw materials, packaging materials and supervise production process to ensure quality assurance. Handling production documentation filing and monitoring company safety and quality programs in accordance with standard of HACCP, ISO, JAKIM Halal and etc.Responsible for inventory management to ensure supply always available.Implementing safe work environment, maintain good housekeeping and ensure compliance with safety standard.Assist in production planning by coordinate production process improvement, raw materials, packaging, storage and manpower to minimize production downline and wastage.Maintain great communication at all level in the organization.

Thank You for your help.

rfuentealba · September 2018

Hello@Fatin_Fezarudin,

Let me see if I get it:

- There is text.

- There should be a text classification based in certain words.

True?

It all depends on how you want to determine what words are important, and there are at least three ways (that I know of) to determine such a thing:

1.- Having a collection of words.

Just:

- Have a list of words somewhere.

- UseLoop Examplesto walk through that list of words, and inside this list:

---->Filter Examplesand use a "contains" filter.

----> Add your word as an attribute.

----> Join and save (you can use aRemember/Retrieveoperator so you can handle what is saved and how)

- Retrieve the final results. AddSet Roleto create your labels.

Except for theRemember/Retrieve, this is the easiest thing you can do, but for that you should already know what words are important.

2.- Creating a collection of words.

Our great community manager@sgenzerposted this solution a while ago. I'm usingBoldto indicate the names of the operators you should use on each step. Unfortunately I'm abroad and don

- (This is a suggestion)Filter Stopwordsbefore doing the rest. Stopwords are words that connect other words but don't add meaning by themselves.

- Take your text and use theSplit操作符创建大量的属性。

- Transpose this mess so that your text is listed word by word in one attribute and a ton of examples.

- Use theJoinoperator with your keyword database list to see overlap.

- Aggregate to see word frequencies.

(这是我的另外tion)Filter Examplesto get the most important words,Select Attributesto get a good grasp of your data, and thenLabelby the word list, and you will have many classes for each doc.

Now, since you have many classes here, I wouldn't save the result of theJoinin a dataset, because that will end up in a huge file.

This is not difficult either, but since you don't have control over what words appear, you should work a lot with adding or removing breakpoints to get an idea on how things go.

3.- Analyze the text with text mining operators.

The usual process is:

- Use theProcess Documents From Filesor one of the appropiate text mining tools to:

----> createTF-IDF向量,

---->Tokenize,

---->Lowercase,

---->Filter Stopwords,

---->Generate N-Gramsif you need associations of words.

----> OrFilter Tokens by POSto get only verbs, nouns, adjectives...

----> OrFilter Tokens byone of the others.

----> Or Lemmatize to create some meaning.

- Once you get your results, you can apply some kind of segmentation operator (it's up to you, I'm running out of knowledge here) to define which words are important.

- Once you get that segmentation, you can do some magic to associate these important words to the original texts.

That said, I consider text mining and natural language processing as a complete area inside Machine Learning. There is so much to know regarding how languages work, sentences and all that... But as a first, this should become your initial guide.

All the best,

Mary61 · June 2020

@rfuentealbaHi , i have the same problem and thank you for your answer . i also have a text and by "process document " i separated each text to words . i have 100 texts i need to do classification . would you please let me know how i can choose one word for each row as a class so i can use clustering operator.

Thanks in advance for the reply

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Set Class Label to The Dataset

Answers