This project is archived and is in readonly mode.

#4075 ✓invalid

Incorrect behavior of truncate on utf-8 strings

Reported by tulskiy | March 1st, 2010 @ 07:38 AM

I have a problem with truncate on Russian strings. For example if s is "Словарь" then

s = truncate(product.description, :length => 9)

returns correct string with bytes
208 161 208 187 208 190 46 46 46

But if I truncate to 10 characters, it returns the following byte sequence
208 161 208 187 208 190 208 46 46 46
where 208 before last three dots is a start of next UTF-8 character. So it looks like truncate cuts based on byte length and rounds up and gets wrong bytes.

This results in 'invalid byte sequence in UTF-8' exception.

I'm using Rails 2.3.5 and Ruby 1.9.1 on Ubuntu 9.10.

Comments and changes to this ticket

  • tulskiy

    tulskiy March 1st, 2010 @ 07:39 AM

    Forgot to add, I'm using Rails 2.3.5 and Ruby 1.9.1 on Ubuntu 9.10.

  • Jeremy Kemper

    Jeremy Kemper March 1st, 2010 @ 07:32 PM

    • Assigned user set to “Jeremy Kemper”
    • State changed from “new” to “invalid”

    What is product.description.encoding? I bet it's ascii not utf-8. You need to update your mysql/pg/sqlite driver to a newer version that supports 1.9 string encodings.

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href=""></a>

People watching this ticket