This project is archived and is in readonly mode.

#2476 ✓ stale
Hector E. Gomez Morales

ASCII-8BIT encoding of query results in rails 2.3.2 and ruby 1.9.1

Reported by Hector E. Gomez Morales | April 10th, 2009 @ 04:36 PM | in 2.3.10

From #2188:

Hello! We've got the same problem! Only the error occurs when we fetch data from the database. We're using Mysql and Charset is UTF-8, but the Active Record returns ASCII-8BIT. Is it possible to do similar changes to the activerecord as you did to the actionpack? Seems as we're not the only ones with that problem

Problem

Fetching data from any database (Mysql, Postgresql, Sqlite2 & 3), all configured to have UTF-8 as it's character set, returns the data with ASCII-8BIT in ruby 1.9.1 and rails 2.3.2.1.

This has been reported in #2188 and in the rails talk group (1).

Possible Solution

Again like in #2188 rails is not the culprit here, the only problem with rails is it inherent trust that all the data it gets is UTF-8. When the data has another encoding is when the problems arise.

The real problem is that all the current adapters use native C extensions as glue in which they use rb_str_new function that in ruby 1.9.1 creates a String with ASCII-8BIT encoding (2). So that is why all the data is returned with this encoding.

Because the initial problems where detected in MySQL. I made the needed modifications and created a fork in github (3) for mysql-ruby . This fork is only 1.9.1 compatible, returns ASCII-8BIT for binary fields and UTF-8 for all other fields.

With this modified mysql-ruby gem, all activerecord test for mysql passes except test_validate_case_sensitive_uniqueness. The test will fail for all adapters that are encoding aware, this is because in the implementation of this validation the value that need to be unique is converted to downcase and a query using LOWER(#{field}) in the unique field is executed. The downcase in ruby 1.8.1 for non-ASCII strings is done with MultiByte, given that in ruby 1.9.1 downcase still does nothing for non-ASCII encoding strings I use Multibyte#downcase to do the conversion.

I attach a patch so validates_uniqueness uses Multibyte#downcase on the string if we are using ruby 1.9.1. With this patch all test pass for test_mysql in activerecord.

TODO

  • Make all the other adapters 1.9.1 compatible AND encoding aware.
  • Remove hardcoded encoding use of UTF-8 and use the character set used by the DB.

Links

  1. Rails Group Post
  2. ASCII-8BIT default in rb_str_new
  3. mysql-ruby fork

Comments and changes to this ticket