This project is archived and is in readonly mode.
find_in_batches is slow on large datasets
Reported by Chris | October 8th, 2009 @ 12:07 AM
Using find_in_batches on a table of ~1MM rows generates queries that take >3s for a very simple query because the order clause has to consider the whole table greater than the start row.
http://github.com/chriseppstein/rails/commit/0a726f1b1447ddf105a3e5...
is a patch that changes the implementation of find_in_batches such that it selects a section of ids and only sorts that section.
With this patch, my queries are about 3ms on my local system - a 1000x speed improvement - but they are a constant-time query with the number of records instead of growing slower over time.
To do this, I made the assumption that the primary key is an integer -- I'm not sure if that's a valid assumption, but it's certainly the most common scenario.
This patch also adds support for :limit, because there's really no good reason to not support it in code.
Lastly, this patch breaks the behavior that only the last batch will have less than the batch size number of records. However, it does make sure that every batch has at least one record. This doesn't seem like a terribly important aspect of the api, but it's worth noting.
Comments and changes to this ticket
-
Rohit Arondekar October 6th, 2010 @ 06:38 AM
- State changed from new to stale
- Importance changed from to
Marking ticket as stale. If this is still an issue please leave a comment with suggested changes, creating a patch with tests, rebasing an existing patch or just confirming the issue on a latest release or master/branches.
-
Yong Bakos February 8th, 2011 @ 07:26 PM
FWIW, I've confirmed the same slow behavior with Rails 2.3.10.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
<h2 style="font-size: 14px">Tickets have moved to Github</h2>
The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>