This project is archived and is in readonly mode.

#5060 ✓committed
Wincent Colaiuta

ActiveRelation "offset().empty?" produces unexpected result

Reported by Wincent Colaiuta | July 7th, 2010 @ 11:00 AM | in 3.x

Calling "empty?" on an ActiveRelation instance returned by "offset()" (eg. "Article.scoped.limit(10).offset(10).empty?") reports "true" even when the result set is non-empty.

Demonstration

Given a bunch of sample records:

>> Article.count
  SQL (3.3ms)  SELECT COUNT(*) AS count_id FROM `articles`
=> 38

Expected behavior of "length" and "empty?" methods on ActiveRelation instances:

First, "where":

  • Note how calling "length" actually retrieves the records and the "length" returned is the number of records.
  • Calling "empty?" does not retrieve the records and instead executes a "COUNT" query
>> Article.where(:public => true).length
  Article Load (1.3ms)  SELECT `articles`.* FROM `articles` WHERE (`articles`.`public` = 1)
=> 38

>> Article.where(:public => true).empty?
  SQL (0.3ms)  SELECT COUNT(*) AS count_id FROM `articles` WHERE (`articles`.`public` = 1)
=> false

Now for "limit":

  • Again, "length" retrieves the records and reports the actual number
  • And again, "empty?" does not retrieve the records but instead does a "COUNT" (although note here that the "LIMIT" has absolutely no effect because only 1 row, the count value, will ever be returned)
>> Article.where(:public => true).limit(10).length
  Article Load (0.3ms)  SELECT `articles`.* FROM `articles` WHERE (`articles`.`public` = 1) LIMIT 10
=> 10

>> Article.where(:public => true).limit(10).empty?
  SQL (0.3ms)  SELECT COUNT(*) AS count_id FROM `articles` WHERE (`articles`.`public` = 1) LIMIT 10
=> false

Finally, let's look at "offset":

  • "length" retrieves the records and reports the actual number
  • "empty?" does not retrieve the records, does a "COUNT" query, and returns "true" instead of the expected "false"
>> Article.where(:public => true).limit(10).offset(10).length
  Article Load (0.3ms)  SELECT `articles`.* FROM `articles` WHERE (`articles`.`public` = 1) LIMIT 10, 10
=> 10
>> Article.where(:public => true).limit(10).offset(10).empty?
  SQL (0.4ms)  SELECT COUNT(*) AS count_id FROM `articles` WHERE (`articles`.`public` = 1) LIMIT 10, 10
=> true

If we do offset(0) then the result is the opposite:

>> Article.where(:public => true).limit(10).offset(0).empty?
  SQL (0.3ms)  SELECT COUNT(*) AS count_id FROM `articles` WHERE (`articles`.`public` = 1) LIMIT 0, 10
=> false

Analysis

So this is happening because "empty?" is executing a COUNT query with a LIMIT clause on it.

The COUNT query will always return exactly 1 row, containing the number of matching rows.

In the case where the LIMIT clause is something like "LIMIT 10" this has no effect.

In the case where "offset()" is used and the LIMIT clause becomes something like "LIMIT 10, 10", the COUNT query returns nothing at all because the one row that it would normally return is pruned. This in turn is interpreted as being zero, and so "empty?" returns true.

Implications

The behavior might catch out people upgrading from the older query syntax. For example, I discovered this issue when I converted an old call like this:

@articles = Article.find :all, :conditions => "...", :order => ..., :limit => ..., :offset => ...

To use scopes:

scope :published, where(:public => true)
scope :recent, published.order('updated_at DESC').limit(10)

@articles = Article.recent.offset(...)

All of this works fine, until we hit the view code:

unless @articles.empty?
  ...

All of a sudden, @articles.empty? was returning "true" even though @articles wasn't really empty, thus breaking the view.

So basically this could affect anyone who is using "offset()" and expecting "empty?" to work on it.

Workaround

Must force the execution of the query by using "all()" or some other means before calling "empty?"; eg:

Article.where(:public => true).limit(10).offset(10).all.empty?

Solution

I suspect that the behavior of making "empty?" trigger a COUNT query is probably the wrong thing to do.

Just as "length" triggers record retrieval, "empty?" probably should too.

If, however, it is decided that the different behavior of "length" (triggering SELECT) and "empty?" (triggering SELECT COUNT) is seen as a desirable thing, we should at least modify the behavior of the "empty?" method when it is used on an offset() relation, because the currently constructed query doesn't make any sense.

ie. it doesn't make sense for any query that starts with "SELECT COUNT" to include any "LIMIT" clause at all, because it will either be a no-op, or will prune away the actual result row.

So at the very least, if a relation obtained via "offset()" receives an "empty?" message then it should trigger a "SELECT COUNT" query without the LIMIT clause on it.

Comments and changes to this ticket

  • Wincent Colaiuta

    Wincent Colaiuta July 7th, 2010 @ 12:22 PM

    More digging:

    Example 1:

    >> Article.scoped.limit(10).length
      Article Load (0.3ms)  SELECT `articles`.* FROM `articles` LIMIT 10
    => 10
    >> Article.scoped.limit(10).size
      SQL (0.4ms)  SELECT COUNT(*) AS count_id FROM `articles` LIMIT 10
    => 38
    >> Article.scoped.limit(10).empty?
      SQL (0.3ms)  SELECT COUNT(*) AS count_id FROM `articles` LIMIT 10
    => false
    

    "length" delegates to "to_a" (in activerecord/lib/active_record/relation.rb), so it returns the expected result.

    "size" calls "count" (in activerecord/lib/active_record/relation/calculations.rb), so it constructs a COUNT query which returns a surprising result.

    "empty?" calls the same "count" method under the hood ("count.zero?"), so it behaves in the same way that "size" does.

    Example 2:

    >> Article.scoped.limit(10).offset(10).length
      Article Load (0.4ms)  SELECT `articles`.* FROM `articles` LIMIT 10, 10
    => 10
    >> Article.scoped.limit(10).offset(10).size
      SQL (0.3ms)  SELECT COUNT(*) AS count_id FROM `articles` LIMIT 10, 10
    => 0
    >> Article.scoped.limit(10).offset(10).empty?
      SQL (0.3ms)  SELECT COUNT(*) AS count_id FROM `articles` LIMIT 10, 10
    => true
    

    Here "length" again works because it delegates to "to_a".

    "size" here also calls "count", and returns 0 because it's using a COUNT query with a LIMIT clause. This is a surprising result.

    "empty?" calls "count" under the hood, so returns "true" here, which is a surprising result.

    Example 3:

    >> Article.scoped.size
      SQL (0.3ms)  SELECT COUNT(*) AS count_id FROM `articles`
    => 38
    >> Article.scoped.length
      Article Load (0.4ms)  SELECT `articles`.* FROM `articles`
    => 38
    >> Article.scoped.empty?
      SQL (0.4ms)  SELECT COUNT(*) AS count_id FROM `articles`
    => false
    

    Only here do "size", "length" and "empty?" agree, because there is no LIMIT component.

  • Wincent Colaiuta

    Wincent Colaiuta July 15th, 2010 @ 08:40 PM

    Can somebody from the Rails team comment on this one? I'm happy to work on a patch to fix this, but I'd like to hear beforehand if I'm correcting in stating that these kinds of results are considered incorrect and therefore a bug:

    Article.scoped.limit(10).offset(5).length # 50 (correct)
    Article.scoped.limit(10).offset(5).size   # zero (incorrect)
    Article.scoped.limit(10).offset(5).empty? # true (incorrect)
    
  • Neeraj Singh

    Neeraj Singh July 15th, 2010 @ 09:59 PM

    • State changed from “new” to “open”
    • Importance changed from “” to “Low”

    @Wincent Earlier you mentioned that you might be leaning towards making #size work same as #length; meaning load all the records. I would say that first attempt should be to have sql as efficient as possible and both #size and #empty? should make a COUNT sql.

  • Wincent Colaiuta

    Wincent Colaiuta July 15th, 2010 @ 10:04 PM

    Yes, agreed that it should be efficient if possible. The trouble is that the query that is generated makes no sense:

    SELECT COUNT(*) FROM foo WHERE thing = 'bar' LIMIT 10, 10

    will always return zero rows, because whatever the COUNT is gets lopped off by the LIMIT clause.

    Ideally what .empty? and .size would do is strip off the LIMIT clause and actually do this:

    SELECT COUNT(*) FROM foo WHERE thing = 'bar'

    Which is both correct and efficient.

  • Neeraj Singh

    Neeraj Singh July 16th, 2010 @ 02:39 AM

    • Tag set to activerecord

    The problem with dropping limit is that I can have a condition like

    Article.scoped.limit(10).offset(5).size
    

    and the output could be a number > 10. And that does not make sense.

    I would say for operations like

    Article.scoped.limit(10).offset(5).size   # zero (incorrect)
    Article.scoped.limit(10).offset(5).empty? # true (incorrect)
    

    ActiveRecord should blow up saying that use .length instead.

  • Wincent Colaiuta

    Wincent Colaiuta July 16th, 2010 @ 06:41 AM

    Earlier you said you wanted it to be efficient and use a COUNT query if possible, but if you tell users to use length (which under the hood does to_a first) you're forcing the immediate execution of a non-COUNT query at that moment.

    Isn't it a reasonable expectation for a user to be able to do:

    Model.some_active_record_relation_query.empty?
    

    And get a boolean that says if any results are returned or not? And that this should be done via a COUNT rather than actually retrieving the records and counting them?

    So what I am saying is given the example you cite:

    Article.scoped.limit(10).offset(5).size
    

    ActiveRecord should drop the LIMIT clause from the actual COUNT query (seeing as that would cause us to always return 0), and then when returning the results, compare the returned number to the values passed in via limit and/or offset.

    ie. if the query that would be sent to the database is:

    SELECT COUNT(*) FROM `articles` LIMIT 5, 10
    

    the actual query should be:

    SELECT COUNT(*) FROM `articles`
    

    and if the result is say, 38, ActiveRecord knows that the limit requested by the user was 10, so it should return 10 (ie. the number of rows that would be returned by executing the actual retrieval query).

    What do you think?

  • Neeraj Singh

    Neeraj Singh July 16th, 2010 @ 12:28 PM

    This will fix #empty? but what about #size? How would you get accurate result for #size?

  • Wincent Colaiuta

    Wincent Colaiuta July 16th, 2010 @ 02:06 PM

    Well, that depends on what you define as accurate. Does "accurate" mean the total number of records that match the WHERE condition, or the total number that would be returned if you did the query with a LIMIT applied to the WHERE? I imagine that either behavior can be implemented; it's just we need to decide which is the correct one to proceed with. The same technique (stripping the LIMIT and then comparing it to the result before returning) would work, if that's what we decide to do.

    The way I see it is this:

    If the user writes a broken SQL query like:

    SELECT COUNT(*) FROM `articles` LIMIT 10, 10
    

    Then the database will give them a broken answer (ie. always zero rows returned). It seems like the right thing to do: a broken query should give a broken answer.

    So, if the user uses ActiveRecord::Relation to recreate that same broken query:

    Article.limit(10).offset(10).count
    

    Then perhaps they should get a broken answer back too, for consistency with the analogous case written in raw SQL. Although the count method is defined on Array, in this case it happens to be an SQL keyword too, so it doesn't seem too unreasonable to give it this kind of SQL-centric interpretation when called on an ActiveRecord::Relation instance.

    On the other hand, we tell users that ActiveRecord::Relation objects are basically supposed to be Array-like. You get lazy-evaluation for free, and when you actually try to iterate over the objects with each or for or whatever the actual query gets evaluated.

    So, if the user has a controller like this:

    @articles = Article.find :all, :limit => 10, :offset => 10
    

    And a view like this:

    <%- unless @articles.empty? %>
      <h1>Articles...</h1>
    

    Then they would expect that changing their controller to:

    @articles = Article.limit(10).offset(10)
    

    Would continue to provide them with an Array-like object that they can use in their view in exactly the same way.

    If they call length on it it behaves in an unsurprising way, because to_a is getting called behind the scenes. No lazy evaluation, and no SQL COUNT query either.

    If they call count on it they might be surprised to get a broken answer back, but as I mentioned above, there is a case for this being the right thing to do, seeing as a similar COUNT query written in raw SQL would give them the same broken answer.

    If they call size on it they do get the surprising behavior, because it shares the behavior of count. I'm inclined to think that it probably shouldn't. The fix is either to call to_a just like we do with length, or do the special handling of the LIMIT clause already described. Which one you choose probably depends on whether you want a COUNT query implemented behind the scenes or not.

    If they call empty? they also get the surprising answer. There is no EMPTY keyword in SQL, but there evidently is an empty? method on Array, so it is reasonable for the user to expect Array-like semantics. Again there are two possible fixes: either drop the LIMIT clause, or do a to_a first.

    If you think about the most common use case for empty? I'd say that preferring to do a COUNT query probably isn't right. ie. it is usually used like this:

    # in controller
    @models = ... # model query
    
    # in view
    <%- if @models.empty? %>
      <h1>Models!</h1>
      <%- @models.each |m| %>
        # do something with the models...
    

    So in this case preferring a COUNT query gives a false sense of economy because you end up doing a COUNT and if it succeeds you repeat the query, this time as a normal SELECT.

    If you forget about the COUNT query and just do a to_a, you do only one query, and if no rows are returned (ie. when empty? is true) you haven't wasted time doing a heavyweight query because you got nothing back anyway.

    I guess it is important to get this right, as queries of this time (with LIMIT and offset components) are extremely common in web applications, due to the ubiquity of pagination. At least for me what we should do with the empty? method is clear, and length already works; what I am not sure about is what we should do with size.

  • Neeraj Singh

    Neeraj Singh July 16th, 2010 @ 05:05 PM

    not sure if all the databases sqlite3, mysql, postgresql and oracle produce IMIT 10, 10 in the end.

    Given the hard work that needs to go in and given that it will only solve .empty but not .size I would still say ( a bit reluctantly though) that it is better to blow up than silently provide wrong result. :-(

  • Wincent Colaiuta

    Wincent Colaiuta July 16th, 2010 @ 05:32 PM

    Not sure what you mean about it not solving the size case. It would:

    • drop LIMIT clause
    • do query
    • if result is > limit value, return limit value
    • otherwise, return result obtained via query

    ie.

    • if 42 rows counted without limit clause, and user passed in limit of 10, the result would be 10
    • if 4 rows counted without limit clause, and user passed in limit of 10, the result would be 4

    But as I noted above, I have no idea whether this is more correct than the existing behavior of returning 0. In a way, I like the existing behavior in the sense that it matches up with what you would get if you wrote an SQL COUNT query with a LIMIT and offset. (Definitely for the count method, although not so sure about size, although I seem to recall that size has always mapped to count and therefore COUNTin Rails, and length as always had "Array semantics".) I also don't really like the complexity here of both stripping something off before submitting the query, and also intercepting the result and potentially modifying it.

    In any case, don't think blowing up is a good idea at all. The database doesn't complain about invalid SQL if you do something ill-advised like SELECT COUNT(*) FROM thing LIMIT 10, 10, so Rails shouldn't really be throwing an exception either. It's not invalid, it's just not what the user might think it is.

    It would be better to just leave things as they are and have users scratching their heads about unexpected results when they do empty?. (Although I would rather fix it, to be honest.)

    As for the difficulty of the fix and the question about different database adapters, I think I can see that there is already code in activerecord/lib/active_record/relation/calculations.rb that drops clauses off of queries before firing them.

    If we just want to unbreak empty? the fix could be as minimal as removing:

        def empty?
          loaded? ? @records.empty? : count.zero?
        end
    

    from activerecord/lib/active_record/relation.rb and adding empty? to its list of delegated methods; ie:

    delegate :to_xml, :to_yaml, :length, :collect, :map, :each, :empty?, :all?, :include?, :to => :to_a
    

    Personally that's what I'd do at a minimum. I would probably leave the rest as is, and address the other possible surprises in the documentation by making it clear that:

    • length is equivalent to to_a.length, so causes the query to be executed immediately
    • size maps to count
    • count will do an SQL COUNT query if the relation is not yet loaded
    • empty? is equivalent to to_a.empty?, so again causes the query to be executed immediately
    • mixing size or count with limit() and offset() will always produce 0 counts for any non-zero offset, because the LIMIT clause effectively removes the row containing the COUNT from the result

    What do you think?

  • Neeraj Singh

    Neeraj Singh July 16th, 2010 @ 05:39 PM

    The intent of the author here is to find how many records are there after the offset limiting upto 10. By dropping the LIMIT clause that information is lost.

    Article.limit(10).offset(10).size?
    

    Second look at the query above makes me think it is a meaningless request; valid but meaningless. What the user wants here is confusing because of the usage of both size and offset and limit?

  • Wincent Colaiuta

    Wincent Colaiuta July 16th, 2010 @ 05:51 PM

    This scenario comes up all the time with pagination. Your example:

    @articles = Article.limit(10).offset(10)
    

    That's 10 articles per page, in groups of 10, starting at offset 10 (ie. second page). Isn't it perfectly reasonable to want to know how many articles are in @articles? Or to know if there are any results to be rendered (eg. not empty?).

    So yes, the author does want to know how many records there are after offset 10, up to a limit of 10, but you're wrong when you say that dropping the LIMIT clause loses that information. If you don't drop the LIMIT clause then Rails will issue a query like SELECT COUNT(*) FROM articles LIMIT 10, 10 and the result will be 0, no matter how many rows in the table. This is because a COUNT query always returns 1 row, a row containing the count, and if you use a LIMIT with an offset component, you throw the result away.

    The only way to get an accurate answer is to either:

    • actually retrieve the records and then count them (ie. to_a.count, to_a.size, to_a.length)
    • or exclude the LIMIT clause from the query and then return MAX(result, limit) to the user

    See what I'm saying?

    You're right though that it looks confusing if a user writes a query like Article.limit(10).offset(10).size. But in practice users won't do that. This issue is going to crop up because a user will take a query from a Rails 2 app:

    # in controller
    @articles = Article.find :all, :limit => 10, :offset => 10
    
    # in view
    if @articles.empty? # or similar...
    

    And convert it to use the new API:

    # in controller
    @articles = Article.limit(10).offset(10)
    
    # in view
    if @articles.empty?
    

    And wonder why it's not displaying any records all of a sudden.

  • Chris Hapgood

    Chris Hapgood September 13th, 2010 @ 08:39 PM

    For what it's worth, I've been following/fighting this issue for a while and I agree with Wincent: the OFFSET clause needs to be conditionally stripped from scoped queries.

    This is meaningless in the DBs I'm familiar with:

    SELECT COUNT(*) FROM widgets LIMIT 10 OFFSET 10

    It always returns zero. The only time I see value in including the OFFSET clause in a COUNT is when results are aggregated and grouped -but that's already outside the AR sweet spot.

    My experience matches Wincent's: pagination code applies LIMIT/OFFSET in a scope (very useful), but counting is then screwed up. So if my pagination page size is 10 and I need to count how many records are being shown (perhaps I do something radically different for count == 1 versus count > 1), things get ugly quickly. And stripping the LIMIT during DB counts seems like a painless solution.

    Anybody know the status of this?

    -Chris

  • Chris Hapgood

    Chris Hapgood September 13th, 2010 @ 08:40 PM

    • Tag changed from activerecord to activerecord, scope
  • Santiago Pastorino

    Santiago Pastorino February 2nd, 2011 @ 04:31 PM

    This issue has been automatically marked as stale because it has not been commented on for at least three months.

    The resources of the Rails core team are limited, and so we are asking for your help. If you can still reproduce this error on the 3-0-stable branch or on master, please reply with all of the information you have about it and add "[state:open]" to your comment. This will reopen the ticket for review. Likewise, if you feel that this is a very important feature for Rails to include, please reply with your explanation so we can consider it.

    Thank you for all your contributions, and we hope you will understand this step to focus our efforts where they are most helpful.

  • Santiago Pastorino

    Santiago Pastorino February 2nd, 2011 @ 04:31 PM

    • State changed from “open” to “stale”
  • Wincent Colaiuta

    Wincent Colaiuta February 2nd, 2011 @ 07:01 PM

    • State changed from “stale” to “open”

    Just checked and the problematic behavior is still there in Rails 3.0.4.rc1, so [state:open]

  • Santiago Pastorino
  • Santiago Pastorino
  • atrofast

    atrofast February 13th, 2011 @ 02:55 AM

    This issue caught my attention so I wrote a little patch for it. This is my first attempt at Rails hacking so hopefully it's up to standards. I've also included test cases to verify that it works. All activerecord tests passed for SQLite, Postgres and Mysql. Thanks for taking a look!

  • Santiago Pastorino

    Santiago Pastorino February 27th, 2011 @ 03:15 AM

    • Milestone changed from 3.0.5 to 3.0.6
  • John Mileham

    John Mileham March 4th, 2011 @ 05:04 AM

    Hello,

    I attempted to submit a patch here but got shut down by Lighthouse's spam guards. I tried submitting as a pull request instead:

    https://github.com/rails/rails/pull/201

    Also pertains to #6268.

    Thanks!

    -john

  • Santiago Pastorino

    Santiago Pastorino March 4th, 2011 @ 02:31 PM

    • Assigned user set to “Aaron Patterson”
  • Aaron Patterson

    Aaron Patterson March 23rd, 2011 @ 10:06 PM

    • Milestone changed from 3.0.6 to 3.x

    I'm liking John's ideas, but his changes should not go in a bugfix release. So I'm moving to 3.x.

  • Aaron Patterson

    Aaron Patterson March 30th, 2011 @ 05:55 PM

    • State changed from “open” to “committed”

    I've merged John's stuff to master, so I'm closing this.

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Attachments

Referenced by

Pages