This project is archived and is in readonly mode.

#5514 ✓stale

Weird string encoding problem when inside of Rails but works fine outside of Rails

Reported by Brandon | August 31st, 2010 @ 05:58 PM

First off, I'm using Ruby 1.9.2-p0 and Rails 3.0.0. The script I wrote is this:

require 'rubygems'
require 'open-uri'

file = open('')
file ='windows-1252').encode('utf-8')'craigslist.xml', 'w') { |f| f.write(file) }

All it does is fetch an XML feed from Craigslist, encode it and write it to disk. I made it so I could debug some encoding issues I encountered. People occasionally write their postings in Word, then copy and paste it into the new posting form on Craigslist, which results in characters like … (horizontal ellipsis) being inserted into the body or title of a Craigslist post. For example, this one: HUGE 2 Bedroom LOFT in HOT River West! Perfect Price & Amenities…. Craigslist tags their feeds as iso-8859-1, but when I tried to convert from that to utf-8 it didn't work; I either got an error or the wrong characters. Eventually I tried forcing the encoding as windows-1252 before converting to utf-8 and that did the trick. When I run that script, and that posting (which gets re-posted several times a day) is in the latest 25 posts (the number of posts the XML feed returns), I see the ellipsis and everything is great.

However, this morning I did the exact same thing inside of a brand new Rails application, and it doesn't work. I do see the ellipsis, but it's prefixed by another, incorrect character. I pared the code down as much as possible to eliminate any other factors, but for some reason, that code above outside of Rails works fine, but inside of Rails the string is encoded incorrectly.

Unfortunately, I don't know nearly enough about character encoding or Rails internals to figure out what's going on. And I realize using a single post that appears a couple times a day on Craigslist as test data is not ideal, but I don't know how else to get the horizontal ellipsis encoded in windows-1252 in order to more easily reproduce the problem.

If anybody has any idea what might be wrong, or could point me in the right direction, that would be great. Thanks!

Comments and changes to this ticket

  • Brandon

    Brandon August 31st, 2010 @ 08:18 PM

    Well I've played around with it a bit more, and figured out how to manually set the bytes in a string to be a horizontal ellipsis encoded as windows-1252. Then transcoded it in Rails to utf-8, and everything was fine. I'm not exactly sure what means, but I think that probably means it's not a bug in Rails.

  • Santiago Pastorino

    Santiago Pastorino February 2nd, 2011 @ 04:49 PM

    • State changed from “new” to “open”
    • Tag changed from encoding utf8, encoding, rails3, ruby1.9.2 to encoding utf8, encoding, rails3, ruby192

    This issue has been automatically marked as stale because it has not been commented on for at least three months.

    The resources of the Rails core team are limited, and so we are asking for your help. If you can still reproduce this error on the 3-0-stable branch or on master, please reply with all of the information you have about it and add "[state:open]" to your comment. This will reopen the ticket for review. Likewise, if you feel that this is a very important feature for Rails to include, please reply with your explanation so we can consider it.

    Thank you for all your contributions, and we hope you will understand this step to focus our efforts where they are most helpful.

  • Santiago Pastorino

    Santiago Pastorino February 2nd, 2011 @ 04:49 PM

    • State changed from “open” to “stale”

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href=""></a>

People watching this ticket