Closest encoding to utf8mb4
I am working with social media data and all those emojis are driving me crazy, when I import them they are getting changed to system encoding and are a bunch of squiggles. What encoding is closets to utf8mb4 so that I can preserve the encoding when reading from a CSV?
0
Best Answer
-
Robi_Me MemberPosts:32Maven@jwpfauwhen I am importing into the DB it is failing saying the character is not UTF8 with error message: Incorrect string value: '\xE2 \x94 \x82....'
This is basically all of the emojis that were being rejected. I was under the impression that I needed to set the encoding inside of Rapid Miner, however it was a change that was needed on the DB. Changing the free text field to TEXT and making the encoding UTF8mb4 sorted the issue out.0
Answers
UTF8MB4 is a workaround for the broken UTF8 type in mysql which only supports up to 3 byte character.
In the csv export it should be just regular utf-8.
也许选择RapidMiner Studio font doesn't contain all the smileys and is displaying squares instead?
Greetings,
Jonas