This project is archived and is in readonly mode.

#1034 ✓committed
Adam Cigánek

ActiveSuport::Inflector#parameterize should strip leading/trailing dashes

Reported by Adam Cigánek | September 12th, 2008 @ 02:03 PM

Attached is the patch that modifies ActiveSupport::Inflector#parameterize to remove leading and trailing dashes from a string.

Before patch:

"check this out!".parameterize
=> "check-this-out-" # note the trailing dash - not cool

After patch:

"check this out!".parameterize
=> "check-this-out" # nice and clean

Comments and changes to this ticket

  • Henrik Nyh

    Henrik Nyh September 12th, 2008 @ 04:49 PM

    Great initiative.

    But consider that the separator can be configured, so hard-coding it for dashes is not optimal. So /...#{Regexp.escape(sep)}.../ might be a better idea.

    Also, while this method is improved, squeezing of the separator should probably be added – so "foo - bar" becomes "foo-bar" and not "foo---bar".

    http://github.com/henrik/slugalizer has all this implemented, with tests.

    The squeezing part won't translate directly to the Rails code because Slugalizer limits the separator to a single character, and Rails parameterize doesn't. So either Rails could be made to limit it to one character, or the squeeze could be rewritten as a regexp.

  • Adam Cigánek

    Adam Cigánek September 12th, 2008 @ 05:09 PM

    You are right about the separator. I'll post another patch soon.

    The squeezing is already there, but it seems that there are no tests for it. I'll add some.

    I didn't know about slugalizer. What about making it part of rails and doing all this stuff with it?

  • Henrik Nyh

    Henrik Nyh September 12th, 2008 @ 05:15 PM

    Cool, thanks. Slugalizer is very similar to the current version of parameterize. Copying the Slugalizer wholesale would save some work -- the Rails team are free to do so.

    On Fri, Sep 12, 2008 at 6:09 PM, Lighthouse support@lighthouseapp.comwrote:

  • Adam Cigánek

    Adam Cigánek September 12th, 2008 @ 05:42 PM

    Ok, great,

    So this is a new patch that replaces the original code of parameterize with code from slugalizer (just slightly modified for readability). It also adds few more tests.

  • Adam Cigánek
  • Michael Koziarski

    Michael Koziarski September 22nd, 2008 @ 09:04 PM

    • Assigned user set to “Michael Koziarski”
    • Tag changed from activesupport, patch to activesupport, patch
    • Milestone cleared.

    Hey there,

    This doesn't apply cleanly anymore as I've merged all manfred's MB changes. Can you rebase and reupload as a single diff, then I'll apply it.

  • Adam Cigánek
  • Repository

    Repository September 23rd, 2008 @ 07:10 AM

    • State changed from “new” to “committed”

    (from [a4f2ba8fb3c46ef9f7e31725849efdcb1a22c72d]) Modified ActiveSupport::Inflector#parameterize with code from slugalizer (http://github.com/henrik/slugali...

    Handles trailing and leading slashes, and squashes repeated separators into a single character.

    Signed-off-by: Michael Koziarski michael@koziarski.com [#1034 state:committed] http://github.com/rails/rails/co...

  • Rick

    Rick September 25th, 2008 @ 07:59 AM

    • State changed from “committed” to “open”

    I've got a slight modification to the patch to that parameterizes something like Fool's Errand to fools-errand, instead of fool-s-errand. I personally prefer that over the current behavior.

    I also noticed that PermalinkFu was about 10x faster. I did what I could to speed it up, but it all came down to the #mb_chars method. Replacing .mb_chars.normalize(:kd) with Iconv.iconv('ascii//translit//IGNORE', 'utf-8', string).to_s helped things immensely. I left it out of this patch though, I didn't know if there was some reason why the rails multibyte support was preferred over Iconv.

  • Rick
  • Michael Koziarski

    Michael Koziarski September 25th, 2008 @ 08:21 AM

    Is iconv available on all platforms we support?

    I also believe that iconv had some strange behaviour with cases Adam's aware of?

  • Henrik Nyh

    Henrik Nyh September 25th, 2008 @ 09:10 AM

    On Linux (at least some Linuxes, but not OS X), iconv will not properly decompose characters, so "föö" will become "foo" on OS X but "f" on Linux.

    Have noticed this myself, and also seen it mentioned here: http://blade.nagaokaut.ac.jp/cgi...

  • Michael Koziarski

    Michael Koziarski September 25th, 2008 @ 09:52 AM

    a 10x performance difference is quite considerable though, so the iconv version could be worth considering...

  • Henrik Nyh

    Henrik Nyh September 25th, 2008 @ 10:05 AM

    One possibility is to sniff for OS or something else (if someone can figure out exactly where the issue lies) and use different normalizers based on that. Obviously this would make the code more convoluted.

    Might not be a huge win in practice, though, since I would guess a lot of people use OS X (which can use iconv) for dev only, and Linux (which would not use iconv) for production. Then again faster dev is always nice, if you would use this method enough to notice.

    Another possibility is to match the input string against some regexp and run it through iconv if it only contains characters iconv can handle. So people who only ever do unaccented strings get speed, and people who work with the full Unicode palette get their decomposition. Also more convoluted, and not sure it would make for a net speedup, but seems likely.

  • Adam Cigánek

    Adam Cigánek September 25th, 2008 @ 10:18 AM

    Iconv has more robust transliteration. For example, the German character "ß" is transliterated as "ss" in Iconv (which is correct), but is simply removed if mb_chars.normalize(:kd)... is used. There are several more characters like this, that does not contain any diacritics, but are still outside of 7-bit ascii set.

    This can be fixed by creating custom transliteration table for these characters, but that would be messy.

    So Iconv is better solution, but has issues on some platforms as Henrik pointed out (i've also noticed this in the past). Not sure what to do with it though...

  • Michael Koziarski

    Michael Koziarski September 25th, 2008 @ 10:23 AM

    We could also extract the normlization and transliteration into another inflector method, and isolate the optimisations in there?

    We already do something similar to this with sanitize where if there's no "<" chars we return the string unmodified.

  • Henrik Nyh

    Henrik Nyh September 25th, 2008 @ 03:25 PM

    Sounds like a good idea.

  • Michael Koziarski

    Michael Koziarski September 25th, 2008 @ 05:00 PM

    OK, so we need a name for the 'asciify' method (not asciify ;)) and a suggested test for choosing between the two options.

    We can do it when the file is required, so it doesn't have to be fast.

  • Adam Cigánek

    Adam Cigánek September 25th, 2008 @ 05:35 PM

    OK, so we need a name for the 'asciify' method (not asciify ;))

    What about transliterate?

  • Michael Koziarski

    Michael Koziarski September 29th, 2008 @ 04:40 PM

    Quick proof of concept for transliterate. Needs tidying up before it's applied but hopefully it doesn't break anyone's stuff.

    Can you guys tests with your platforms where iconv is broken.

    Plus, will require 'iconv' always work? if not what situations are there where it's broken?

  • Repository

    Repository October 7th, 2008 @ 08:02 PM

    • State changed from “open” to “committed”

    (from [5556db22c57294a9f4e2ee4e633834ec6a200242]) Reduce memory usage slightly in String#parameterize

    [#1034 state:committed] http://github.com/rails/rails/co...

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Referenced by

Pages