This project is archived and is in readonly mode.

#507 ✓resolved
David Lowenfels

patch to add :tokenizer option to validates_length_of

Reported by David Lowenfels | June 29th, 2008 @ 01:47 AM | in 2.x

see ticket #422 for background info.

:tokenizer - Specifies how to split up the attribute string. (e.g. :tokenizer => lambda{ |str| str.scan(/\w+/)} to count words as in above example.)

Defaults to lambda{ |value| value.split(//) } which counts individual characters.

  def test_validates_length_of_with_block
    Topic.validates_length_of :content, :minimum => 5, :too_short=>"Your essay must be at least %d words.", :tokenizer => lambda{ |str| str.scan(/\w+/) }
    t = Topic.create!(:content => "this content should be long enough" )
    assert t.valid?
    t.content = "not long enough"
    assert !t.valid?
    assert t.errors.on(:content)
    assert_equal "Your essay must be at least 5 words.", t.errors[:content]

Comments and changes to this ticket

  • David Lowenfels

    David Lowenfels June 29th, 2008 @ 01:48 AM

    • Tag changed from activerecord to activerecord, patch

    I decided to go with the convention of passing a keyed proc instead of an anonymous block.

  • José Valim
  • Clemens Kofler

    Clemens Kofler July 1st, 2008 @ 12:12 AM

    +1 - love it.

    How about also adding shortcuts to the most used options (i.e. :characters and :words)? I think that would make it even better!

  • Pratik

    Pratik July 1st, 2008 @ 04:44 PM

    • Assigned user set to “Jeremy Kemper”
  • Repository

    Repository July 4th, 2008 @ 02:13 AM

    • State changed from “new” to “resolved”

    (from [dd8946231c1d8a8f0ac4e299c66d87226a157a1a]) Add :tokenizer option to validates_length_of. [#507 state:resolved]

    Signed-off-by: Pratik Naik

  • Repository

    Repository July 4th, 2008 @ 02:13 AM

    (from [87fbcaa6229e9073095fb8d77c7a536c9466fbce]) Add :tokenizer option to validates_length_of. [#507 state:resolved]

    Signed-off-by: Pratik Naik

  • Tarmo Tänav

    Tarmo Tänav July 7th, 2008 @ 05:31 PM

    Did anyone notice that jeremy in ticket #422 suggested a block that would return "str.scan(/\w+/).size", but in this ticket that somehow got turned into a tokenizer, "str.scan(/\w+/)", which is a lot more limiting. Any reason why that happened?

    It seems obvious that returning a size would be enough for validates_length_of, and that there should be no real need to generate all the tokens, especially in cases where there would not be a sensible way to generate the tokens (for example if you for some reason wanted to count capitalized letters as two characters).

  • David Lowenfels

    David Lowenfels July 7th, 2008 @ 07:53 PM

    @Tarmo: yes, there was a reason. Look at the diff and you will see that it was less messy to substitute a block that returns an array, so that the size method is only called once. Go ahead and make an additional patch if you think it should behave differently.

    Personally, I think the concept of counting the number of tokens is the most semantic. Otherwise the validation code in the model becomes kind of cryptic.

  • klkk

    klkk May 23rd, 2011 @ 03:08 AM

    • Importance changed from “” to “”

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href=""></a>


Referenced by