This project is archived and is in readonly mode.
ActiveRelation "offset().empty?" produces unexpected result
Reported by Wincent Colaiuta | July 7th, 2010 @ 11:00 AM | in 3.x
Calling "empty?" on an ActiveRelation instance returned by "offset()" (eg. "Article.scoped.limit(10).offset(10).empty?") reports "true" even when the result set is non-empty.
Demonstration
Given a bunch of sample records:
>> Article.count
SQL (3.3ms) SELECT COUNT(*) AS count_id FROM `articles`
=> 38
Expected behavior of "length" and "empty?" methods on ActiveRelation instances:
First, "where":
- Note how calling "length" actually retrieves the records and the "length" returned is the number of records.
- Calling "empty?" does not retrieve the records and instead executes a "COUNT" query
>> Article.where(:public => true).length
Article Load (1.3ms) SELECT `articles`.* FROM `articles` WHERE (`articles`.`public` = 1)
=> 38
>> Article.where(:public => true).empty?
SQL (0.3ms) SELECT COUNT(*) AS count_id FROM `articles` WHERE (`articles`.`public` = 1)
=> false
Now for "limit":
- Again, "length" retrieves the records and reports the actual number
- And again, "empty?" does not retrieve the records but instead does a "COUNT" (although note here that the "LIMIT" has absolutely no effect because only 1 row, the count value, will ever be returned)
>> Article.where(:public => true).limit(10).length
Article Load (0.3ms) SELECT `articles`.* FROM `articles` WHERE (`articles`.`public` = 1) LIMIT 10
=> 10
>> Article.where(:public => true).limit(10).empty?
SQL (0.3ms) SELECT COUNT(*) AS count_id FROM `articles` WHERE (`articles`.`public` = 1) LIMIT 10
=> false
Finally, let's look at "offset":
- "length" retrieves the records and reports the actual number
- "empty?" does not retrieve the records, does a "COUNT" query, and returns "true" instead of the expected "false"
>> Article.where(:public => true).limit(10).offset(10).length
Article Load (0.3ms) SELECT `articles`.* FROM `articles` WHERE (`articles`.`public` = 1) LIMIT 10, 10
=> 10
>> Article.where(:public => true).limit(10).offset(10).empty?
SQL (0.4ms) SELECT COUNT(*) AS count_id FROM `articles` WHERE (`articles`.`public` = 1) LIMIT 10, 10
=> true
If we do offset(0) then the result is the opposite:
>> Article.where(:public => true).limit(10).offset(0).empty?
SQL (0.3ms) SELECT COUNT(*) AS count_id FROM `articles` WHERE (`articles`.`public` = 1) LIMIT 0, 10
=> false
Analysis
So this is happening because "empty?" is executing a COUNT query with a LIMIT clause on it.
The COUNT query will always return exactly 1 row, containing the number of matching rows.
In the case where the LIMIT clause is something like "LIMIT 10" this has no effect.
In the case where "offset()" is used and the LIMIT clause becomes something like "LIMIT 10, 10", the COUNT query returns nothing at all because the one row that it would normally return is pruned. This in turn is interpreted as being zero, and so "empty?" returns true.
Implications
The behavior might catch out people upgrading from the older query syntax. For example, I discovered this issue when I converted an old call like this:
@articles = Article.find :all, :conditions => "...", :order => ..., :limit => ..., :offset => ...
To use scopes:
scope :published, where(:public => true)
scope :recent, published.order('updated_at DESC').limit(10)
@articles = Article.recent.offset(...)
All of this works fine, until we hit the view code:
unless @articles.empty?
...
All of a sudden, @articles.empty? was returning "true" even though @articles wasn't really empty, thus breaking the view.
So basically this could affect anyone who is using "offset()" and expecting "empty?" to work on it.
Workaround
Must force the execution of the query by using "all()" or some other means before calling "empty?"; eg:
Article.where(:public => true).limit(10).offset(10).all.empty?
Solution
I suspect that the behavior of making "empty?" trigger a COUNT query is probably the wrong thing to do.
Just as "length" triggers record retrieval, "empty?" probably should too.
If, however, it is decided that the different behavior of "length" (triggering SELECT) and "empty?" (triggering SELECT COUNT) is seen as a desirable thing, we should at least modify the behavior of the "empty?" method when it is used on an offset() relation, because the currently constructed query doesn't make any sense.
ie. it doesn't make sense for any query that starts with "SELECT COUNT" to include any "LIMIT" clause at all, because it will either be a no-op, or will prune away the actual result row.
So at the very least, if a relation obtained via "offset()" receives an "empty?" message then it should trigger a "SELECT COUNT" query without the LIMIT clause on it.
Comments and changes to this ticket
-
Wincent Colaiuta July 7th, 2010 @ 12:22 PM
More digging:
Example 1:
>> Article.scoped.limit(10).length Article Load (0.3ms) SELECT `articles`.* FROM `articles` LIMIT 10 => 10 >> Article.scoped.limit(10).size SQL (0.4ms) SELECT COUNT(*) AS count_id FROM `articles` LIMIT 10 => 38 >> Article.scoped.limit(10).empty? SQL (0.3ms) SELECT COUNT(*) AS count_id FROM `articles` LIMIT 10 => false
"length" delegates to "to_a" (in activerecord/lib/active_record/relation.rb), so it returns the expected result.
"size" calls "count" (in activerecord/lib/active_record/relation/calculations.rb), so it constructs a COUNT query which returns a surprising result.
"empty?" calls the same "count" method under the hood ("count.zero?"), so it behaves in the same way that "size" does.
Example 2:
>> Article.scoped.limit(10).offset(10).length Article Load (0.4ms) SELECT `articles`.* FROM `articles` LIMIT 10, 10 => 10 >> Article.scoped.limit(10).offset(10).size SQL (0.3ms) SELECT COUNT(*) AS count_id FROM `articles` LIMIT 10, 10 => 0 >> Article.scoped.limit(10).offset(10).empty? SQL (0.3ms) SELECT COUNT(*) AS count_id FROM `articles` LIMIT 10, 10 => true
Here "length" again works because it delegates to "to_a".
"size" here also calls "count", and returns 0 because it's using a COUNT query with a LIMIT clause. This is a surprising result.
"empty?" calls "count" under the hood, so returns "true" here, which is a surprising result.
Example 3:
>> Article.scoped.size SQL (0.3ms) SELECT COUNT(*) AS count_id FROM `articles` => 38 >> Article.scoped.length Article Load (0.4ms) SELECT `articles`.* FROM `articles` => 38 >> Article.scoped.empty? SQL (0.4ms) SELECT COUNT(*) AS count_id FROM `articles` => false
Only here do "size", "length" and "empty?" agree, because there is no LIMIT component.
-
Wincent Colaiuta July 15th, 2010 @ 08:40 PM
Can somebody from the Rails team comment on this one? I'm happy to work on a patch to fix this, but I'd like to hear beforehand if I'm correcting in stating that these kinds of results are considered incorrect and therefore a bug:
Article.scoped.limit(10).offset(5).length # 50 (correct) Article.scoped.limit(10).offset(5).size # zero (incorrect) Article.scoped.limit(10).offset(5).empty? # true (incorrect)
-
Neeraj Singh July 15th, 2010 @ 09:59 PM
- State changed from new to open
- Importance changed from to Low
@Wincent Earlier you mentioned that you might be leaning towards making #size work same as #length; meaning load all the records. I would say that first attempt should be to have sql as efficient as possible and both #size and #empty? should make a COUNT sql.
-
Wincent Colaiuta July 15th, 2010 @ 10:04 PM
Yes, agreed that it should be efficient if possible. The trouble is that the query that is generated makes no sense:
SELECT COUNT(*) FROM foo WHERE thing = 'bar' LIMIT 10, 10
will always return zero rows, because whatever the
COUNT
is gets lopped off by theLIMIT
clause.Ideally what
.empty?
and.size
would do is strip off theLIMIT
clause and actually do this:SELECT COUNT(*) FROM foo WHERE thing = 'bar'
Which is both correct and efficient.
-
Neeraj Singh July 16th, 2010 @ 02:39 AM
- Tag set to activerecord
The problem with dropping limit is that I can have a condition like
Article.scoped.limit(10).offset(5).size
and the output could be a number > 10. And that does not make sense.
I would say for operations like
Article.scoped.limit(10).offset(5).size # zero (incorrect) Article.scoped.limit(10).offset(5).empty? # true (incorrect)
ActiveRecord should blow up saying that use .length instead.
-
Wincent Colaiuta July 16th, 2010 @ 06:41 AM
Earlier you said you wanted it to be efficient and use a
COUNT
query if possible, but if you tell users to uselength
(which under the hood doesto_a
first) you're forcing the immediate execution of a non-COUNT
query at that moment.Isn't it a reasonable expectation for a user to be able to do:
Model.some_active_record_relation_query.empty?
And get a boolean that says if any results are returned or not? And that this should be done via a
COUNT
rather than actually retrieving the records and counting them?So what I am saying is given the example you cite:
Article.scoped.limit(10).offset(5).size
ActiveRecord should drop the
LIMIT
clause from the actualCOUNT
query (seeing as that would cause us to always return 0), and then when returning the results, compare the returned number to the values passed in vialimit
and/oroffset
.ie. if the query that would be sent to the database is:
SELECT COUNT(*) FROM `articles` LIMIT 5, 10
the actual query should be:
SELECT COUNT(*) FROM `articles`
and if the result is say, 38, ActiveRecord knows that the limit requested by the user was 10, so it should return 10 (ie. the number of rows that would be returned by executing the actual retrieval query).
What do you think?
-
Neeraj Singh July 16th, 2010 @ 12:28 PM
This will fix #empty? but what about #size? How would you get accurate result for #size?
-
Wincent Colaiuta July 16th, 2010 @ 02:06 PM
Well, that depends on what you define as accurate. Does "accurate" mean the total number of records that match the
WHERE
condition, or the total number that would be returned if you did the query with aLIMIT
applied to theWHERE
? I imagine that either behavior can be implemented; it's just we need to decide which is the correct one to proceed with. The same technique (stripping theLIMIT
and then comparing it to the result before returning) would work, if that's what we decide to do.The way I see it is this:
If the user writes a broken SQL query like:
SELECT COUNT(*) FROM `articles` LIMIT 10, 10
Then the database will give them a broken answer (ie. always zero rows returned). It seems like the right thing to do: a broken query should give a broken answer.
So, if the user uses ActiveRecord::Relation to recreate that same broken query:
Article.limit(10).offset(10).count
Then perhaps they should get a broken answer back too, for consistency with the analogous case written in raw SQL. Although the
count
method is defined onArray
, in this case it happens to be an SQL keyword too, so it doesn't seem too unreasonable to give it this kind of SQL-centric interpretation when called on an ActiveRecord::Relation instance.On the other hand, we tell users that ActiveRecord::Relation objects are basically supposed to be Array-like. You get lazy-evaluation for free, and when you actually try to iterate over the objects with
each
orfor
or whatever the actual query gets evaluated.So, if the user has a controller like this:
@articles = Article.find :all, :limit => 10, :offset => 10
And a view like this:
<%- unless @articles.empty? %> <h1>Articles...</h1>
Then they would expect that changing their controller to:
@articles = Article.limit(10).offset(10)
Would continue to provide them with an Array-like object that they can use in their view in exactly the same way.
If they call
length
on it it behaves in an unsurprising way, becauseto_a
is getting called behind the scenes. No lazy evaluation, and no SQLCOUNT
query either.If they call
count
on it they might be surprised to get a broken answer back, but as I mentioned above, there is a case for this being the right thing to do, seeing as a similarCOUNT
query written in raw SQL would give them the same broken answer.If they call
size
on it they do get the surprising behavior, because it shares the behavior ofcount
. I'm inclined to think that it probably shouldn't. The fix is either to callto_a
just like we do withlength
, or do the special handling of theLIMIT
clause already described. Which one you choose probably depends on whether you want aCOUNT
query implemented behind the scenes or not.If they call
empty?
they also get the surprising answer. There is noEMPTY
keyword in SQL, but there evidently is anempty?
method on Array, so it is reasonable for the user to expect Array-like semantics. Again there are two possible fixes: either drop theLIMIT
clause, or do ato_a
first.If you think about the most common use case for
empty?
I'd say that preferring to do aCOUNT
query probably isn't right. ie. it is usually used like this:# in controller @models = ... # model query # in view <%- if @models.empty? %> <h1>Models!</h1> <%- @models.each |m| %> # do something with the models...
So in this case preferring a
COUNT
query gives a false sense of economy because you end up doing aCOUNT
and if it succeeds you repeat the query, this time as a normalSELECT
.If you forget about the
COUNT
query and just do ato_a
, you do only one query, and if no rows are returned (ie. whenempty?
istrue
) you haven't wasted time doing a heavyweight query because you got nothing back anyway.I guess it is important to get this right, as queries of this time (with
LIMIT
and offset components) are extremely common in web applications, due to the ubiquity of pagination. At least for me what we should do with theempty?
method is clear, andlength
already works; what I am not sure about is what we should do withsize
. -
Neeraj Singh July 16th, 2010 @ 05:05 PM
not sure if all the databases sqlite3, mysql, postgresql and oracle produce IMIT 10, 10 in the end.
Given the hard work that needs to go in and given that it will only solve .empty but not .size I would still say ( a bit reluctantly though) that it is better to blow up than silently provide wrong result. :-(
-
Wincent Colaiuta July 16th, 2010 @ 05:32 PM
Not sure what you mean about it not solving the
size
case. It would:- drop
LIMIT
clause - do query
- if result is > limit value, return limit value
- otherwise, return result obtained via query
ie.
- if 42 rows counted without limit clause, and user passed in limit of 10, the result would be 10
- if 4 rows counted without limit clause, and user passed in limit of 10, the result would be 4
But as I noted above, I have no idea whether this is more correct than the existing behavior of returning 0. In a way, I like the existing behavior in the sense that it matches up with what you would get if you wrote an SQL
COUNT
query with aLIMIT
and offset. (Definitely for thecount
method, although not so sure aboutsize
, although I seem to recall thatsize
has always mapped tocount
and thereforeCOUNT
in Rails, andlength
as always had "Array semantics".) I also don't really like the complexity here of both stripping something off before submitting the query, and also intercepting the result and potentially modifying it.In any case, don't think blowing up is a good idea at all. The database doesn't complain about invalid SQL if you do something ill-advised like
SELECT COUNT(*) FROM thing LIMIT 10, 10
, so Rails shouldn't really be throwing an exception either. It's not invalid, it's just not what the user might think it is.It would be better to just leave things as they are and have users scratching their heads about unexpected results when they do
empty?
. (Although I would rather fix it, to be honest.)As for the difficulty of the fix and the question about different database adapters, I think I can see that there is already code in
activerecord/lib/active_record/relation/calculations.rb
that drops clauses off of queries before firing them.If we just want to unbreak
empty?
the fix could be as minimal as removing:def empty? loaded? ? @records.empty? : count.zero? end
from
activerecord/lib/active_record/relation.rb
and addingempty?
to its list of delegated methods; ie:delegate :to_xml, :to_yaml, :length, :collect, :map, :each, :empty?, :all?, :include?, :to => :to_a
Personally that's what I'd do at a minimum. I would probably leave the rest as is, and address the other possible surprises in the documentation by making it clear that:
length
is equivalent toto_a.length
, so causes the query to be executed immediatelysize
maps tocount
count
will do an SQLCOUNT
query if the relation is not yet loadedempty?
is equivalent toto_a.empty?
, so again causes the query to be executed immediately- mixing
size
orcount
withlimit()
andoffset()
will always produce 0 counts for any non-zero offset, because theLIMIT
clause effectively removes the row containing theCOUNT
from the result
What do you think?
- drop
-
Neeraj Singh July 16th, 2010 @ 05:39 PM
The intent of the author here is to find how many records are there after the offset limiting upto 10. By dropping the LIMIT clause that information is lost.
Article.limit(10).offset(10).size?
Second look at the query above makes me think it is a meaningless request; valid but meaningless. What the user wants here is confusing because of the usage of both size and offset and limit?
-
Wincent Colaiuta July 16th, 2010 @ 05:51 PM
This scenario comes up all the time with pagination. Your example:
@articles = Article.limit(10).offset(10)
That's 10 articles per page, in groups of 10, starting at offset 10 (ie. second page). Isn't it perfectly reasonable to want to know how many articles are in
@articles
? Or to know if there are any results to be rendered (eg.not empty?
).So yes, the author does want to know how many records there are after offset 10, up to a limit of 10, but you're wrong when you say that dropping the
LIMIT
clause loses that information. If you don't drop theLIMIT
clause then Rails will issue a query likeSELECT COUNT(*) FROM articles LIMIT 10, 10
and the result will be 0, no matter how many rows in the table. This is because aCOUNT
query always returns 1 row, a row containing the count, and if you use aLIMIT
with an offset component, you throw the result away.The only way to get an accurate answer is to either:
- actually retrieve the records and then count them (ie.
to_a.count
,to_a.size
,to_a.length
) - or exclude the
LIMIT
clause from the query and then returnMAX(result, limit)
to the user
See what I'm saying?
You're right though that it looks confusing if a user writes a query like
Article.limit(10).offset(10).size
. But in practice users won't do that. This issue is going to crop up because a user will take a query from a Rails 2 app:# in controller @articles = Article.find :all, :limit => 10, :offset => 10 # in view if @articles.empty? # or similar...
And convert it to use the new API:
# in controller @articles = Article.limit(10).offset(10) # in view if @articles.empty?
And wonder why it's not displaying any records all of a sudden.
- actually retrieve the records and then count them (ie.
-
Chris Hapgood September 13th, 2010 @ 08:39 PM
For what it's worth, I've been following/fighting this issue for a while and I agree with Wincent: the OFFSET clause needs to be conditionally stripped from scoped queries.
This is meaningless in the DBs I'm familiar with:
SELECT COUNT(*) FROM widgets LIMIT 10 OFFSET 10
It always returns zero. The only time I see value in including the OFFSET clause in a COUNT is when results are aggregated and grouped -but that's already outside the AR sweet spot.
My experience matches Wincent's: pagination code applies LIMIT/OFFSET in a scope (very useful), but counting is then screwed up. So if my pagination page size is 10 and I need to count how many records are being shown (perhaps I do something radically different for count == 1 versus count > 1), things get ugly quickly. And stripping the LIMIT during DB counts seems like a painless solution.
Anybody know the status of this?
-Chris
-
Chris Hapgood September 13th, 2010 @ 08:40 PM
- Tag changed from activerecord to activerecord, scope
-
Santiago Pastorino February 2nd, 2011 @ 04:31 PM
This issue has been automatically marked as stale because it has not been commented on for at least three months.
The resources of the Rails core team are limited, and so we are asking for your help. If you can still reproduce this error on the 3-0-stable branch or on master, please reply with all of the information you have about it and add "[state:open]" to your comment. This will reopen the ticket for review. Likewise, if you feel that this is a very important feature for Rails to include, please reply with your explanation so we can consider it.
Thank you for all your contributions, and we hope you will understand this step to focus our efforts where they are most helpful.
-
Santiago Pastorino February 2nd, 2011 @ 04:31 PM
- State changed from open to stale
-
Wincent Colaiuta February 2nd, 2011 @ 07:01 PM
- State changed from stale to open
Just checked and the problematic behavior is still there in Rails 3.0.4.rc1, so [state:open]
-
Santiago Pastorino February 2nd, 2011 @ 08:30 PM
- Milestone cleared.
-
atrofast February 13th, 2011 @ 02:55 AM
This issue caught my attention so I wrote a little patch for it. This is my first attempt at Rails hacking so hopefully it's up to standards. I've also included test cases to verify that it works. All activerecord tests passed for SQLite, Postgres and Mysql. Thanks for taking a look!
-
Santiago Pastorino February 27th, 2011 @ 03:15 AM
- Milestone changed from 3.0.5 to 3.0.6
-
John Mileham March 4th, 2011 @ 05:04 AM
Hello,
I attempted to submit a patch here but got shut down by Lighthouse's spam guards. I tried submitting as a pull request instead:
https://github.com/rails/rails/pull/201
Also pertains to #6268.
Thanks!
-john
-
Santiago Pastorino March 4th, 2011 @ 02:31 PM
- Assigned user set to Aaron Patterson
-
Aaron Patterson March 23rd, 2011 @ 10:06 PM
- Milestone changed from 3.0.6 to 3.x
I'm liking John's ideas, but his changes should not go in a bugfix release. So I'm moving to 3.x.
-
Aaron Patterson March 30th, 2011 @ 05:55 PM
- State changed from open to committed
I've merged John's stuff to master, so I'm closing this.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
<h2 style="font-size: 14px">Tickets have moved to Github</h2>
The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>
People watching this ticket
Attachments
Tags
Referenced by
- 6268 [Arel] Offset + Count != good Just wanted to note here that I submitted a patch for #50...
- 6268 [Arel] Offset + Count != good https://rails.lighthouseapp.com/projects/8994/tickets/50...