This project is archived and is in readonly mode.
"redundant UTF-8 sequence" in String#to_json
Reported by Jamis Buck | September 25th, 2008 @ 04:41 PM | in 2.x
Certain strings (which are otherwise valid utf-8 sequences) will cause String#to_json to raise an ArgumentError (redundant UTF-8 sequence). Upon investigating, it turns out to be due to String#to_json's use of String#unpack:
s.unpack("U*")...
Further investigating showed that any two byte sequence beginning with 0xC0 or 0xC1, with the second byte in the range 0x80..0xBF, would cause String#unpack("U*") to raise that exception. Because String#to_json explicitly includes 0xC0 and 0xC1 in it's gsub regex, it seems simplest to have it check only for 0xC2 and up, to avoid the ArgumentError. (The alternative would be to find some way to normalize the redundant sequences to their shorter equivalences...but I'm not clear on how to make that happen.)
I've attached a patch making this (trivial) change, as well as a script that can be used to demonstrate the error for the ranges mentioned above.
Comments and changes to this ticket
-
Jamis Buck September 25th, 2008 @ 04:42 PM
Here's the test script; I couldn't see a way to attach more than a single file to a ticket without using the comments. :(
-
Bira January 22nd, 2010 @ 02:03 PM
I've updated the patch - the test attached above should still work for it, I believe.
-
Damien MATHIEU May 17th, 2010 @ 04:14 PM
- Tag changed from activesupport, bug, patch to activesupport, bug, bugmash, patch
-
Damien MATHIEU May 17th, 2010 @ 04:53 PM
I wouldn't be so sure it's stale. Or at least, there's a bug somewhere else.
-
Julien Sanchez May 18th, 2010 @ 09:51 AM
Could the stale status be removed because it appears as closed whereas this bug is still valid?
-
Damien MATHIEU May 18th, 2010 @ 02:26 PM
The patch won't work. It breaks the encoding on some characters (including è). The bug remains however.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
<h2 style="font-size: 14px">Tickets have moved to Github</h2>
The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>