This project is archived and is in readonly mode.
Add magic encoding comment to generated files
Reported by Erik Dahlstrand | June 9th, 2010 @ 10:29 AM | in 3.0.2
An exception is thrown when posting non English text (å, ä, o etc.) in a form.
ruby -v
ruby 1.9.2dev (2010-05-31 revision 28117) [x86_64-darwin10.3.0]
rails -v
Rails 3.0.0.beta4
rails new r3_test && cd r3_test
rails g scaffold post title:string && rake db:migrate && rails s
Visit http://0.0.0.0:3000/posts/new
Title: This will fail åäö
Create Post
Comments and changes to this ticket
-
DHH June 9th, 2010 @ 05:10 PM
- Milestone cleared.
-
ihower June 9th, 2010 @ 07:49 PM
I encounter the problem too, and my error log is http://gist.github.com/431959
Here is my patch for /activesupport/lib/active_support/buffered_logger.rb -
matdrewin June 9th, 2010 @ 11:25 PM
I'm having this problem as well. Seems to affect accented characters.
-
Elise Huard June 10th, 2010 @ 04:54 PM
+1 just had it while migrating an app with rails-07b08721a226ff01f983e61d99ab4da96e296c97-master and ruby 1.9.2-preview3
-
Yehuda Katz (wycats) June 11th, 2010 @ 12:23 AM
- State changed from new to open
Yep. I have a solution for this. I'll get it in tomorrow!
-
netloner June 13th, 2010 @ 08:07 AM
To correct ihower's patch file: patch-logger-on-19.diff
old_buffer = buffer.map { |b| b.force_encoding(Encoding::UTF_8).encode if b.encoding == Encoding::ASCII_8BIT }
is going to empty all your logs when encodings are already in UTF-8 ....
I fixed it as this:
old_buffer = buffer.map { |b| (b.encoding == Encoding::ASCII_8BIT)? b.force_encoding(Encoding::UTF_8).encode : b }
-
Cyrille June 15th, 2010 @ 03:43 PM
Wondering if it's the same problem as "incompatible character encodings: UTF-8 and ASCII-8BIT" which I get trying to display a string in an haml template from an AR with mysql database (encoding utf-8, collation utf8_general_ci)
Maybe linked with #3947 and #2188 ?
env:
ruby 1.9.2dev (2010-05-31 revision 28117) [x86_64-darwin10.3.0]
rails 3.0.0.beta4
haml 3.0.12 -
Cyrille June 15th, 2010 @ 04:13 PM
Reading 2 times this brilliant post http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-th... I found that the answer is a database driver problem... Installing http://github.com/brianmario/mysql2 will solve "incompatible character encodings: UTF-8 and ASCII-8BIT" error
-
Alexandre de Oliveira June 25th, 2010 @ 06:04 PM
I tested it, entering portuguese words using accents and got the same erros. Normal words without accents ran smooth, though.
-
Sam Ruby June 28th, 2010 @ 01:51 PM
- Importance changed from to High
-
Yehuda Katz (wycats) June 29th, 2010 @ 12:43 AM
- State changed from open to committed
http://github.com/rails/rails/commit/25215d7285db10e2c04d903f251b79... should fix this issue. Please see if you can reproduce your issue on the latest Rails master :)
-
Erik Dahlstrand June 29th, 2010 @ 07:21 AM
Works for me. Thanks Yehuda!
Question: The csrf-token is included in both the meta tag and a hidden input. Is that right?
-
ihower June 29th, 2010 @ 07:50 AM
Works for me with sqlite3-ruby and ruby-mysql gem.
But not works with mysql 2.8.1 gem, its SQL log string is ASCII-8BIT, which cause logger crashs :( -
Jonas Nielsen June 29th, 2010 @ 04:06 PM
Appears not to work with characters (danish: æ, ø and å) in file names with file_field.
-
Serge Balyuk July 4th, 2010 @ 06:13 PM
I wonder if the root cause of incorrectly tagged params resides in rack rather than in ActionDispatch.
I have a rack patch (http://rack.lighthouseapp.com/projects/22435/tickets/100-patch-post...) that addresses the same thing as
encode_params
, but slightly more configurable. (see naruse comment here: http://rack.lighthouseapp.com/projects/22435/tickets/48-rackutilsun...)I like form accept encoding and snowman changes for sure, but I'm wondering if
encode_params
better fits somewhere in rack. Does every rack framework have to do it's ownencode_params
thing?@Jonas: I think I've addressed file names encodings in that rack patch. I also have a rack 1.0.1 backported version of it (should better fit 2-3 stable).
-
Rafael Schaer July 8th, 2010 @ 02:45 AM
Adding posts work fine, but in template files, there still break.
I still can reproduce the bug, with adding special chars into the layout file (application.html.erb)<!DOCTYPE html> <html> <head> <title>R3Test</title> <%= stylesheet_link_tag :all %> <%= javascript_include_tag :defaults %> <%= csrf_meta_tag %> </head> <body> <p>Here come some special characters:</p> <ul> <li>Ä</li> <li>ö</li> <li>Ü</li> <li>é</li> <li>É</li> <li>À</li> </ul> <%= yield %> </body> </html>
results in this error:
Encoding::CompatibilityError in Posts#index incompatible character encodings: UTF-8 and ASCII-8BIT Extracted source (around line #16): 13: <li>É</li> 14: <li>À</li> 15: </ul> 16: <%= yield %> 17: 18: </body> 19: </html>
-
Marius Seritan July 12th, 2010 @ 05:45 PM
The logger patch also works for me. I am creating records from a rake tasks and I am not using any of the views code. This is a show stopper for me, +1 for adding this patch to activesupport/lib/active_support/buffered_logger.rb
Thanks!
Marius
-
Serge Balyuk July 12th, 2010 @ 06:22 PM
Hi Marius, did you try adding
#encoding: utf-8
comment as the first line of your rake file? Can you please show an example of failing rake task? And DB adapter are you using? -
James Tippett July 13th, 2010 @ 01:48 AM
While Yehuda's commit fixed the problem under normal circumstances, I wonder if the patch to buffered_logger shouldn't be left in .. without it, visibility into encoding errors is very limited. For example, I spent some time debugging, moving to rails master, etc and still got an encoding error - only when i patched the logger I could see that I had a unicode character in a helper file, so needed to declare the encoding for that file. Without the patched logger, the helper file did not appear in the trace at all and the problem was completely opaque.
+1 to netloner's patch in addition to yehuda's more general improvements.
-
Ian Terrell July 13th, 2010 @ 11:53 PM
I've also encountered this error in a scripted data migration from an old MySQL database. Naturally Yehuda's patch doesn't affect this, but netloner's patch gets the job done.
-
Nicholas Clark July 15th, 2010 @ 01:22 AM
I had a similar experience to Ian. I had a rake task that moved data from an old database. I was stuck but netloner's patch fixed it. +1 to netloner's patch.
-
ihower July 15th, 2010 @ 03:31 AM
I think #3836 is the same issue, and UVSoft patched BufferedLogger#add from the input side.
+1 for fixing BufferedLogger either from the input side(#add) or the output side(#flush). -
marcos.neves (at gmail) July 15th, 2010 @ 08:37 PM
My rake db:seed does not work if I put any not ASCII character on the db/seed.rb file.
I try the #encoding: utf-8 at first in many files, but did not worked. -
Kouhei Sutou July 17th, 2010 @ 08:12 AM
It seems that Yehuda's commit assumes that all logging codes should use the same encoding. But it seems that we can't assume that. Some logging codes are not-Rails related codes. It may use the different encoding for log message. In the above comments, "mysql 2.8.1 gem" will be a case for it.
I think that BufferedLogger accepts og messages that are accepted by wrapped logger (@log). I'll attach a patch for it. It uses StringIO to concatenate all buffered log messages instead of Array#join because StringIO supports different encoding string concatenation.
-
gucki July 18th, 2010 @ 08:54 AM
- Patch by Kouhei Sutou works perfect for me with latest rails 3 edge.
-
Repository July 19th, 2010 @ 11:44 PM
(from [a6e95ba55401ddcaf9ef867a080b30c2d07c56ac]) fix mixed encoding logs can't be logged.
[#4807 state:committed]
Signed-off-by: Kouhei Sutou kou@cozmixng.org
Signed-off-by: Jeremy Kemper jeremy@bitsweat.net
http://github.com/rails/rails/commit/a6e95ba55401ddcaf9ef867a080b30... -
Serge Balyuk July 20th, 2010 @ 01:44 PM
Hey Kouhei,
can you please give an example where
Array#join
can't mix encodings properly? I thought it does this kind of work well:> __ENCODING__ => #<Encoding:UTF-8> > s_utf = 'привет' => "привет" > s_ascii = 'whatever'.force_encoding('ascii') => "whatever" > [s_ascii, s_utf].join.encoding => #<Encoding:UTF-8> > s_utf = 'hello' => "hello" > [s_ascii, s_utf].join.encoding => #<Encoding:US-ASCII>
It seems like there are two kinds of problems: incompatible encodings (addressed by your patch), and strings that are tagged with incorrect encodings.
-
Norman Clarke July 20th, 2010 @ 02:06 PM
@Serge
UTF-8 and ASCII are compatible encodings: ASCII is a proper subset of UTF-8. Try it with, for example, ISO-8859-1 and UTF-8 and you'get get a different result.
-
Norman Clarke July 20th, 2010 @ 02:16 PM
You also need to make sure you go outside the 7-bit range to see the problem as well. Here's a trivial example that shows the problem:
["ü".force_encoding("UTF-8"), "ü".force_encoding("ISO-8859-1")].join
-
Serge Balyuk July 20th, 2010 @ 04:14 PM
Hi Norman, thanks for the great explanation. I guess more valid example would be:
["ü".encode("UTF-8"), "ü".encode("ISO-8859-1")].join
which still proves your valid point.
BTW found a pretty extensive document which describes ruby 1.9 encodings aspects:
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
<h2 style="font-size: 14px">Tickets have moved to Github</h2>
The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>
People watching this ticket
- Akira Matsuda
- Anders Elfving
- Bruno Michel
- Damien MATHIEU
- Daniel Salmeron Amselem
- DHH
- ecleel
- gucki
- iHiD
- ihower
- Jeff Kreeftmeijer
- Jeremy Kemper
- Jonas Nielsen
- Jouko Karvonen
- Marius Seritan
- netloner
- Nicholas Clark
- Peter Berkenbosch
- Rafael Schaer
- Reza
- Ryan Bigg
- sadtuna
- Sam Ruby
- Santiago Pastorino
- Sen
- Serge Balyuk
- The_Lord
- tsechingho
- Yehuda Katz (wycats)
Attachments
Referenced by
- 4683 ASCII-8BIT and UTF-8 in hell I believe this is a duplicate of #4807 now. Yehuda is wor...
- 4336 Ruby1.9: submitted string form parameters with non-ASCII characters cause encoding errors I think this issue was resolved in this ticket https://ra...
- 4807 ERROR Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8 [#4807 state:committed]