This project is archived and is in readonly mode.

#4350 ✓resolved
Norman Clarke

tidy_bytes fails on 1.9.x

Reported by Norman Clarke | April 8th, 2010 @ 09:26 PM

ActiveSupport::Multibyte::Chars#tidy_bytes does not work on 1.9.x, because its implementation uses a Unicode regular expression that, in 1.9.x, can only operate on valid UTF-8 strings, and therefore always raises an error.

There's already a test named test_tidy_bytes_is_broken_on_1_9_0 in ActiveSupport, which shows that this is a known issue.

After opening this ticket I'll attach a patch which resolves the issue, and also doubles the performance of this method. Additionally, I've added more test cases, a few of which fail on the current master branch and are fixed by this patch.

The patch also adds a force option to tidy_bytes, because some sequences of ISO-8859-1 or CP-1252 characters form a single valid UTF-8 character, and end up transformed to a single unprintable character without this option.

If you want to see this code in isolation, I've also packaged this is as a separate library on Github.

Comments and changes to this ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href=""></a>

People watching this ticket


Referenced by