Filter Tokens by POS Tags slow
I have Filter Tokens by POS Tags inside a loop and it's slow. My guess is that each iteration the tagger loads some data (dictionary?) from HD. Any tips on how to improve the performance? I see that same quesiton was asked 4 years ago and it was not answered.
Tagged:
0
Answers
Try to pre-process the data as much as possible so the filter operation doesn't have to work as hard.
I'm experimenting with disabling CPU hyper-threading, maybe you could try that? Another tip is set the amount of memory usable in settings to a higher amount.
Otherwise, I dunno, some processes are SLOW!
Use python NLTK instead if that's an option. It's much more flexible with regards to POS tagging and muuuuuch faster
Or use R, that's also an option
Below you can find something I created a while ago to give me different outputs based on POS combinations, maybe it can help you further.
That's what I do for my research, but for teaching I use Rapidminer...