#1034 ActiveSuport::Inflector#parameterize should strip leading/trailing dashes - Ruby on Rails

Type	To find
responsible:me	tickets assigned to you
tagged:"@high"	tickets tagged @high
milestone:next	tickets in the upcoming milestone
state:invalid	tickets with the state invalid
created:"last week"	tickets created last week
sort:number, importance, updated	tickets sorted by #, importance or updated
Combine keywords for powerful searching.
Use advanced searching »

This project is archived and is in readonly mode.

#1034 ✓committed

ActiveSuport::Inflector#parameterize should strip leading/trailing dashes

Reported by Adam Cigánek | September 12th, 2008 @ 02:03 PM

Attached is the patch that modifies ActiveSupport::Inflector#parameterize to remove leading and trailing dashes from a string.

Before patch:

"check this out!".parameterize
=> "check-this-out-" # note the trailing dash - not cool

After patch:

"check this out!".parameterize
=> "check-this-out" # nice and clean

inflector_hacks.diff 2.5 KB

Comments and changes to this ticket

You flagged this item as spam.
Henrik Nyh September 12th, 2008 @ 04:49 PM
Great initiative.

But consider that the separator can be configured, so hard-coding it for dashes is not optimal. So /...#{Regexp.escape(sep)}.../ might be a better idea.

Also, while this method is improved, squeezing of the separator should probably be added – so "foo - bar" becomes "foo-bar" and not "foo---bar".

http://github.com/henrik/slugalizer has all this implemented, with tests.

The squeezing part won't translate directly to the Rails code because Slugalizer limits the separator to a single character, and Rails parameterize doesn't. So either Rails could be made to limit it to one character, or the squeeze could be rewritten as a regexp.
You flagged this item as spam.
Adam Cigánek September 12th, 2008 @ 05:09 PM
You are right about the separator. I'll post another patch soon.

The squeezing is already there, but it seems that there are no tests for it. I'll add some.

I didn't know about slugalizer. What about making it part of rails and doing all this stuff with it?
You flagged this item as spam.
Henrik Nyh September 12th, 2008 @ 05:15 PM
Cool, thanks. Slugalizer is very similar to the current version of parameterize. Copying the Slugalizer wholesale would save some work -- the Rails team are free to do so.

On Fri, Sep 12, 2008 at 6:09 PM, Lighthouse support@lighthouseapp.comwrote:
You flagged this item as spam.
Adam Cigánek September 12th, 2008 @ 05:42 PM
Ok, great,

So this is a new patch that replaces the original code of parameterize with code from slugalizer (just slightly modified for readability). It also adds few more tests.
You flagged this item as spam.
Adam Cigánek September 12th, 2008 @ 05:48 PM
just fixed error in commit message
- inflector-parameterize-with-slugalizer.diff 5.5 KB
Michael Koziarski September 22nd, 2008 @ 09:04 PM
- Assigned user set to “Michael Koziarski”
- Tag changed from “activesupport, patch” to “activesupport, patch”
- Milestone cleared.
Hey there,

This doesn't apply cleanly anymore as I've merged all manfred's MB changes. Can you rebase and reupload as a single diff, then I'll apply it.
You flagged this item as spam.
Adam Cigánek September 22nd, 2008 @ 10:34 PM
Hello,

here is the updated patch.
- inflector-parameterize-with-slugalizer-and-mb.diff 5.6 KB
Repository September 23rd, 2008 @ 07:10 AM
- State changed from “new” to “committed”
(from [a4f2ba8fb3c46ef9f7e31725849efdcb1a22c72d]) Modified ActiveSupport::Inflector#parameterize with code from slugalizer (http://github.com/henrik/slugali...

Handles trailing and leading slashes, and squashes repeated separators into a single character.

Signed-off-by: Michael Koziarski michael@koziarski.com [#1034 state:committed] http://github.com/rails/rails/co...
Rick September 25th, 2008 @ 07:59 AM
- State changed from “committed” to “open”
I've got a slight modification to the patch to that parameterizes something like Fool's Errand to fools-errand, instead of fool-s-errand. I personally prefer that over the current behavior.

I also noticed that PermalinkFu was about 10x faster. I did what I could to speed it up, but it all came down to the #mb_chars method. Replacing .mb_chars.normalize(:kd) with Iconv.iconv('ascii//translit//IGNORE', 'utf-8', string).to_s helped things immensely. I left it out of this patch though, I didn't know if there was some reason why the rails multibyte support was preferred over Iconv.
Rick September 25th, 2008 @ 08:03 AM
- no changes were found...
- 0001-change-ActiveSupport-Inflector-parameterize-to-stri.patch 3 KB
Michael Koziarski September 25th, 2008 @ 08:21 AM
Is iconv available on all platforms we support?

I also believe that iconv had some strange behaviour with cases Adam's aware of?
You flagged this item as spam.
Henrik Nyh September 25th, 2008 @ 09:10 AM
On Linux (at least some Linuxes, but not OS X), iconv will not properly decompose characters, so "föö" will become "foo" on OS X but "f" on Linux.

Have noticed this myself, and also seen it mentioned here: http://blade.nagaokaut.ac.jp/cgi...
Michael Koziarski September 25th, 2008 @ 09:52 AM
a 10x performance difference is quite considerable though, so the iconv version could be worth considering...
You flagged this item as spam.
Henrik Nyh September 25th, 2008 @ 10:05 AM
One possibility is to sniff for OS or something else (if someone can figure out exactly where the issue lies) and use different normalizers based on that. Obviously this would make the code more convoluted.

Might not be a huge win in practice, though, since I would guess a lot of people use OS X (which can use iconv) for dev only, and Linux (which would not use iconv) for production. Then again faster dev is always nice, if you would use this method enough to notice.

Another possibility is to match the input string against some regexp and run it through iconv if it only contains characters iconv can handle. So people who only ever do unaccented strings get speed, and people who work with the full Unicode palette get their decomposition. Also more convoluted, and not sure it would make for a net speedup, but seems likely.
You flagged this item as spam.
Adam Cigánek September 25th, 2008 @ 10:18 AM
Iconv has more robust transliteration. For example, the German character "ß" is transliterated as "ss" in Iconv (which is correct), but is simply removed if mb_chars.normalize(:kd)... is used. There are several more characters like this, that does not contain any diacritics, but are still outside of 7-bit ascii set.

This can be fixed by creating custom transliteration table for these characters, but that would be messy.

So Iconv is better solution, but has issues on some platforms as Henrik pointed out (i've also noticed this in the past). Not sure what to do with it though...
Michael Koziarski September 25th, 2008 @ 10:23 AM
We could also extract the normlization and transliteration into another inflector method, and isolate the optimisations in there?

We already do something similar to this with sanitize where if there's no "<" chars we return the string unmodified.
You flagged this item as spam.
Henrik Nyh September 25th, 2008 @ 03:25 PM
Sounds like a good idea.
Michael Koziarski September 25th, 2008 @ 05:00 PM
OK, so we need a name for the 'asciify' method (not asciify ;)) and a suggested test for choosing between the two options.

We can do it when the file is required, so it doesn't have to be fast.
You flagged this item as spam.
Adam Cigánek September 25th, 2008 @ 05:35 PM
OK, so we need a name for the 'asciify' method (not asciify ;))

What about transliterate?
Michael Koziarski September 29th, 2008 @ 04:40 PM
Quick proof of concept for transliterate. Needs tidying up before it's applied but hopefully it doesn't break anyone's stuff.

Can you guys tests with your platforms where iconv is broken.

Plus, will require 'iconv' always work? if not what situations are there where it's broken?
- transliterate.patch 2.2 KB
Repository October 7th, 2008 @ 08:02 PM
- State changed from “open” to “committed”
(from [5556db22c57294a9f4e2ee4e633834ec6a200242]) Reduce memory usage slightly in String#parameterize

[#1034 state:committed] http://github.com/rails/rails/co...