This project is archived and is in readonly mode.

#2503 open
Dave Grijalva

validates_uniqueness_of is horribly inefficient in mysql

Reported by Dave Grijalva | April 16th, 2009 @ 12:09 AM | in 3.x

mysql is case insensitive by default. for case insensitive comparisions, the following is efficient.

WHERE users.username = 'myUsername'

validates_uniqueness_of currently generates

WHERE LOWER(users.username) = BINARY 'myusername'

this is redundant and completely clobbers performance. If you have a unique index on the column (and you should, because validates_uniqueness_of cannot be trusted), you don't get any of the benefit of the index because the system has to transform the column.

Comments and changes to this ticket

  • CancelProfileIsBroken

    CancelProfileIsBroken May 17th, 2009 @ 01:54 PM

    • State changed from “new” to “incomplete”

    Please submit a patch if you want to propose a change here. You'd want to look in activerecord/lib/active_record/validations.rb. What's happening is that the code first requests a case-sensitive equality operator, then later notices that the database is case-insensitive but doesn't change equality operators. So you'd need to fix the logic and add a method to the adapter to return a case-insensitive equality operator as well.

  • Phil Ross

    Phil Ross May 31st, 2009 @ 03:55 PM

    • Tag set to activerecord, patch, validate_uniqueness_of

    I'm using the attached patch to avoid the full table scan. This adds a :case_sensitive => :db option to validates_uniqueness_of, which causes the normal equality operator to be used. The validation will therefore use the database's case-sensitivity mode.

  • Scott Storck

    Scott Storck June 16th, 2009 @ 12:45 AM

    I am using this patch because I had a lot of problems with the SQL generated by validate_uniqueness_of not matching an existing row which would cause a duplicate row error.

    Removing the BINARY keyword from the sql results in the match working correctly.
    The data being tested for uniqueness which fails to match has trailing spaces.
    Mysql doesn't seem to interested in the trailing space when checking for uniqueness.
    The BINARY keyword however causes mysql to do a byte for byte check.

    I would think it usually would make sense to keep the uniqueness test as close to the test from the underlying database that is used as possible.

    I also have never used a database that doesn't use an index to enforce a unique field.
    I suspect that other databases might not be able to use the index if a modifying clause like BINARY is used.
    It would be interesting to hear from users of the other databases about this.

    I have applied the patch manually as it doesn't apply to my ActiveRecord version (2.3.2).

  • David Stevenson

    David Stevenson June 20th, 2009 @ 06:49 PM

    I've had this problem before too. It's a tricky situation because it depends on the setup of MySQL (and its case sensitivity, which is different by default depending on your platform). I'm completely in favor of a :db option that puts that responsibility on the person who sets up the DB, rather than on rails itself.

  • chad.ingram (at me)

    chad.ingram (at me) June 29th, 2009 @ 11:05 PM

    • Assigned user set to “CancelProfileIsBroken”

    This now has a patch. Assigning so maybe it can move forward since it's no longer 'incomplete' but I can't change that...

  • CancelProfileIsBroken

    CancelProfileIsBroken August 2nd, 2009 @ 11:13 AM

    • State changed from “incomplete” to “open”
    • Tag changed from activerecord, patch, validate_uniqueness_of to activerecord, bugmash, patch, validate_uniqueness_of
  • Renaud Kern

    Renaud Kern August 8th, 2009 @ 10:07 PM

    The patch doesn't pass. The file activerecord/test/cases/validations/uniqueness_validation_test.rb doesn't exit anymore. Test are now in activerecord/test/cases/validations_test.rb

  • Phil Ross

    Phil Ross August 8th, 2009 @ 10:47 PM

    @Renaud Kern I still see activerecord/test/cases/validations/uniqueness_validation_test.rb in the master branch and my patch still applies cleanly.

    There is an activerecord/test/cases/validations_test.rb file, but this doesn't contain any tests for validates_uniqueness_of.

  • Elad Meidar

    Elad Meidar August 8th, 2009 @ 11:06 PM

    +1 on idea, -1 on patch. Patch does not apply on 2-3-stable.

  • Hugo Peixoto

    Hugo Peixoto August 8th, 2009 @ 11:34 PM

    +1 on the idea. But the patch does not apply to 2-3-stable. I've attached a patch that ports Phil's code to this branch.

  • Jeremy Kemper

    Jeremy Kemper August 9th, 2009 @ 09:16 AM

    :case_sensitive => :db is a strange option to add to the public API.

    Ideas for clarifying, or perhaps sidestepping, this issue?

  • Hugo Peixoto

    Hugo Peixoto August 9th, 2009 @ 02:16 PM

    I agree, Jeremy. But sidestepping doesn't seem like the right thing to do.

    How about ":case_sensitive => :use_database_default" ?

  • Dan Croak

    Dan Croak August 10th, 2009 @ 03:01 AM

    -1 on changing case_insensitive to non-boolean.

    My understanding is that the problem here is the BINARY conversion for MySQL, correct? When running a 2.3.3 app, I do not see the reported LOWER().

    So, the fix needs to simply be:

    make MySQL act like SQLite3, for instance, and simply not add the BINARY conversion?

  • Rizwan Reza

    Rizwan Reza August 10th, 2009 @ 03:10 AM

    -1 on changing case_sensitive. Agreed with Dan.

    Hugo: The patch doesn't work anymore.

  • Elad Meidar

    Elad Meidar August 13th, 2009 @ 10:13 PM

    +1 on feature, although not exactly fits into the current boolean convention, it's a well required addition, i would not scope it only for mysql, since FTS are a cross RDBMS issue and there is a need to help developers deal with them before it getting to be a problem.

    since the patch for 2-3-stable did work as Rizwan said, i took the liberty to fix it.

  • CancelProfileIsBroken

    CancelProfileIsBroken August 19th, 2009 @ 12:38 PM

    • Assigned user changed from “CancelProfileIsBroken” to “Pratik”
  • Pratik

    Pratik August 19th, 2009 @ 12:38 PM

    • Assigned user changed from “Pratik” to “Jeremy Kemper”
  • Blue Box Jesse

    Blue Box Jesse September 27th, 2009 @ 01:38 AM

    We've seen this issue bite a number of customers, so this is a very valuable commit.

    BugMash: +1

    I agree that the best naming convention would be :non-boolean but using the same description field.

    I've confirmed the patch applies on 2-3-stable.

  • Jordan Brough

    Jordan Brough November 12th, 2009 @ 10:33 PM

    I think this is worth fixing either way, but for anyone interested here's a ticket I've added for a way to sidestep the issue -- #3486 (update rails to handle db unique-constraints and omit the validates_uniqueness_of)

  • Ryan Angilly

    Ryan Angilly November 17th, 2009 @ 02:52 PM

    Just wanted to throw my $0.02 to keep the discussion going. I do think it's a necessary fix, although I'm not crazy about the :case_sensitive => :db option. I feel like the model should reflect whether or not the intent is for case sensitivity or not. Using :case_sensitive => :db leaves it ambiguous, and would force someone to go check the database engine to find out if case sensitivity was actually enabled. I'm thinking maybe an api like:

    validates_uniqueness_of :email, :db_case_sensitive => :false

    And that option would turn off the LOWER and the BINARY, and let the db do its magic.

  • Ken Miller

    Ken Miller December 30th, 2009 @ 10:14 PM

    I don't at present have a patch, but doesn't this seem like the sort of thing the connection adapter should be able to report on? Either by introspecting the database, or explicitly declared in the connection spec. That would require editing all the connection adapters, but that seems preferable to having to declare it in each model.

  • Rizwan Reza

    Rizwan Reza February 12th, 2010 @ 12:46 PM

    • Tag changed from activerecord, bugmash, patch, validate_uniqueness_of to activerecord, patch, validate_uniqueness_of
  • Neeraj Singh

    Neeraj Singh April 20th, 2010 @ 05:04 PM

    In my test index is not used when I use Lower(user.email). Otherwise index is being used. That means BINARY is not the issue as Dan Croak suggested. Please correct me if I am wrong.

    I will say that current code is okay. However it should be documented that if users want better performance then they need to do two things.

    • set case_sensitive to true
    • add before_validation callback to lowercase the model attribute
  • Jeremy Kemper

    Jeremy Kemper May 4th, 2010 @ 06:48 PM

    • Milestone changed from 2.x to 3.x
  • metavida

    metavida August 12th, 2010 @ 09:19 PM

    • Importance changed from “” to “”

    I wanted to agree with Neeraj that BINARY isn't the problem. On my MySQL 5.0.89 install I get the following results.

    # Using the SQL generated by Rails the index is NOT used.
    EXPLAIN SELECT * FROM `users` WHERE LOWER(login) = BINARY 'test';
    +-------------+------+------+------+------+-------------+
    | select_type | type | key  | ref  | rows | Extra       |
    +-------------+------+------+------+------+-------------+
    | SIMPLE      | ALL  | NULL | NULL |   94 | Using where | 
    +-------------+------+------+------+------+-------------+
    
    # Using neither LOWER nor BINARY the index IS used.
    EXPLAIN SELECT * FROM `users` WHERE login = 'test';
    +-------------+------+-------------------+-------+------+-------------+
    | select_type | type | key               | ref   | rows | Extra       |
    +-------------+------+-------------------+-------+------+-------------+
    | SIMPLE      | ref  | users_login_index | const |    2 | Using where | 
    +-------------+------+-------------------+-------+------+-------------+
    
    # Using only BINARY the index IS used, and the comparison IS case sensitive.
    EXPLAIN SELECT * FROM `users` WHERE login = BINARY 'test';
    +-------------+-------+-------------------+------+------+-------------+
    | select_type | type  | key               | ref  | rows | Extra       |
    +-------------+-------+-------------------+------+------+-------------+
    | SIMPLE      | range | users_login_index | NULL |    2 | Using where | 
    +-------------+-------+-------------------+------+------+-------------+
    

    Seems like a bug in the default behavior to me. With :case_sensitive=>true we should be adding BINARY to the conditions to make the assertion work as it claims it does.

  • Rohit Arondekar
  • Jeff Kreeftmeijer

    Jeff Kreeftmeijer November 1st, 2010 @ 05:07 PM

    • Tag cleared.

    Automatic cleanup of spam.

  • viktor tron (strawberry)
  • Grant Hollingworth

    Grant Hollingworth January 19th, 2011 @ 04:36 PM

    • Tag set to patch

    Here's a patch I've been using, rebased against the current HEAD. It adds a case_insensitive_equality_operator method to ActiveRecord::ConnectionAdapters::DatabaseStatements, following the existing case_sensitive_equality_operator. MysqlAdapter overrides it.

    There are no tests because it changes performance but not behaviour.

  • Raphael Sofaer

    Raphael Sofaer February 17th, 2011 @ 05:11 AM

    That patch (Grant's patch) looks like it will break case-insensitive comparison in databases with a collation like utf8_bin. I'm definitely having this problem too, though. We spend a huge amount of time in those queries.

  • Joseph Palermo

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Tags

Referenced by

Pages