This project is archived and is in readonly mode.

#951 ✓resolved
Manfred Stienstra

Multibyte revisited

Reported by Manfred Stienstra | September 1st, 2008 @ 02:59 PM

Multibyte Revisited is an attempt to clean up the current implementation of ActiveSupport::Multibyte and make it compatible with Ruby 1.9.

Comments and changes to this ticket

  • Michael Koziarski

    Michael Koziarski September 3rd, 2008 @ 03:27 PM

    • Assigned user set to “Jeremy Kemper”

    Could you write up the changes in a little more detail. We also need to think about how we handle the migration for people using chars to .mb_chars, and which release streams need it

  • Manfred Stienstra

    Manfred Stienstra September 4th, 2008 @ 06:01 PM

    The most significant change in the API is the change from String#chars to String#mb_chars. Because Ruby 1.8.7 and Ruby 1.9 use String#chars as an iterator over the characters in a string we can no longer use that method for our purposes.

    All uses of String#chars in other parts of Rails have been changed to use the new method. In 1.8.6 String#chars is aliased to String#mb_chars.

    || <= 1.8.6 | 1.8.7 | 1.9 | | String#chars | multibyte accessor | iterator | iterator | | String#mb_chars| multibyte accessor | multibyte accessor | multibyte accessor |

    The other big change is that a level of indirection has been removed. In the old version you would define a handler and the Chars proxy class would delegate methods to the handler.

    String#chars --> ActiveSupport::Multibyte::Chars --> ActiveSupport::Multibyte::Handlers::UTF8Handler

    In the new version methods are called directly on the proxy class.

    String#mb_chars --> ActiveSupport::Multibyte::Chars

    Of course you can still define your own proxy class if you want to add support for another encoding:

    class UTF32Chars < ActiveSupport::Multibyte::Chars
      def size
        @wrapped_string.length / 4
    ActiveSupport::Multibyte.proxy_class = UTF32Chars

    This change means less code and less inderection during execution. Because of this the implementation is faster than the old implementation. The speedup varies somewhere between zero and 25% depending on the method.

    We can choose to deprecate the call to String#chars on 1.8.6, but there is no really pressing reason why people should stop using it. When they decide to run their application on 1.8.7 or 1.9 this is probably going to be a concious decision anyway and other stuff will break because the Ruby API changed. I'm not sure if we need to do anything besides warn people about it.

    > "Hello".chars.upcase
    NoMethodError: undefined method `upcase' for #<Enumerator:0x3ab600>
      from (irb):1
      from /opt/ruby19/bin/irb:12:in `<main>'
  • Michael Koziarski

    Michael Koziarski September 11th, 2008 @ 02:53 PM

    This seems good to me, there's a bunch of commented out code in the parsing code.

    As for handling the deprecation of #chars. I think that for 1.8.6 and earlier we define a #chars method which is deprecated and delegates to mb_chars, for 1.8.7 and up we just define the .mb_chars method.

    Am I missing anything?

  • Manfred Stienstra

    Manfred Stienstra September 11th, 2008 @ 03:20 PM

    The commented lines in the parsing code are for Unicode character properties we don't use. It's also in the current parse code. I can remove it if it really bothers you?

    I'm fine with deprecating #chars. I guess it'll save people some trouble if they move over to Ruby 1.9.

  • Michael Koziarski

    Michael Koziarski September 11th, 2008 @ 03:22 PM

    Nah, I don't mind about the commented code, was just wondering if it was deliberate or not. That stuff's only used when generating the tables anyway, right?

    Ironically we chose chars so people could easily move to 1.9... So can you update with the deprecation behaviour, then I think we're probably good to go.

    Jeremy, any thoughts?

  • Manfred Stienstra

    Manfred Stienstra September 11th, 2008 @ 03:31 PM

    Yes, the parsing code is only used to generate the unicode tables.

  • Manfred Stienstra
  • Michael Koziarski

    Michael Koziarski September 12th, 2008 @ 02:36 PM

    I don't see any tests which assert_deprecated 'mb_chars' { "something".chars}

    Apart from that this is looking great. Thanks again for all your awesome work on this stuff.

    Feel free to push this to a rebased branch on github if it's easier for you.

  • Manfred Stienstra

    Manfred Stienstra September 12th, 2008 @ 02:46 PM

    Hmm, I must have messed up the last merge. I'll fix it this weekend.

  • Manfred Stienstra

    Manfred Stienstra September 21st, 2008 @ 05:06 PM

    • Tag changed from multibyte, patch to multibyte, patch

    Michael, I pushed my changes to Manfred/rails:multibyte-revisited.

    Do you guys have any idea where this fits into the Rails timeline? Is this 2.1 or 2.2 material?

  • Michael Koziarski

    Michael Koziarski September 22nd, 2008 @ 07:51 AM

    • Milestone cleared.

    I think we should probably merge this for 2.2 as it helps with the 1.9 / 1.8.7 compatibility stuff.

    Any objections jeremy? Otherwise I'll get to it this week

  • Michael Koziarski

    Michael Koziarski September 22nd, 2008 @ 08:58 PM

    Applied and merged and all that.

    Really nice work!

  • Pratik

    Pratik October 17th, 2008 @ 05:16 PM

    • Assigned user changed from “Jeremy Kemper” to “Michael Koziarski”

    @koz : This is already resolved afaik ?

  • Michael Koziarski

    Michael Koziarski October 17th, 2008 @ 05:17 PM

    • State changed from “new” to “resolved”


Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href=""></a>