Text mining in Spanish

MarlaBotMarlaBot Administrator, Moderator, Employee, MemberPosts:57Community Manager
edited March 2020 inHelp
A RapidMiner user wants to know the answer to this question: "I'm trying to see if there is anything in the app for text mining in Spanish? I know about Rosette extension but I wonder if there's an operator for that from RapidMiner. Thank you."

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
    @MarlaBotthat's a good question. You can use the normal text processing tools for Spanish, just like you can for any other language. There is not a stopword dictionary built in but of course you can find many online.

    I am tagging our resident Spanish-speaking unicorn@rfuentealbaand tagging this to his "RapidMiner en Castellano" group in case he has more suggestions.

    Scott

    dbabrauskaite
  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University ProfessorPosts:568Unicorn

    I do have suggestions but not many more than the ones already you did. The only two things I could not achieve with RapidMiner alone (and this was before knowing the Rosette) were POS tagging and words vectors, but for these you can use Python and a package named "pattern".

    By the way and having experience applying NLP in many other languages (Spanish, Portuguese, German and Southern Chilean Spanish), I recommend you to generate your own list of stop words.

    If there is anything I can do to help, just drop us a line over here and we'll see how to solve it:)

    -- Spanish.

    Tengo algunas sugerencias, pero no muchas más que las que Scott ya mencionó. Las únicas dos cosas que no pude hacer con RapidMiner (y esto fue antes de conocer la extensión Rosette) fueron etiquetado de partes del habla y los vectores de palabras, pero para ambas es posible usar Python y un paquete llamado "pattern".

    Por lo demás, y teniendo experiencia en NLP en varios otros idiomas (castellano, portugués, alemán y chileno-sureño), te recomiendo crear tu propia lista de palabras de detención en vez de usar las predefinidas.

    Si hay cualquier cosa en que pueda ayudarte, sólo escríbenos por acá y vemos cómo lo resolvemos:)

    Un abrazo,

    Rodrigo.
    sgenzer
Sign InorRegisterto comment.