#3346 find_in_batches is slow on large datasets - Ruby on Rails

Type	To find
responsible:me	tickets assigned to you
tagged:"@high"	tickets tagged @high
milestone:next	tickets in the upcoming milestone
state:invalid	tickets with the state invalid
created:"last week"	tickets created last week
sort:number, importance, updated	tickets sorted by #, importance or updated
Combine keywords for powerful searching.
Use advanced searching »

This project is archived and is in readonly mode.

#3346 ✓stale

find_in_batches is slow on large datasets

Reported by Chris | October 8th, 2009 @ 12:07 AM

Using find_in_batches on a table of ~1MM rows generates queries that take >3s for a very simple query because the order clause has to consider the whole table greater than the start row.

http://github.com/chriseppstein/rails/commit/0a726f1b1447ddf105a3e5...

is a patch that changes the implementation of find_in_batches such that it selects a section of ids and only sorts that section.

With this patch, my queries are about 3ms on my local system - a 1000x speed improvement - but they are a constant-time query with the number of records instead of growing slower over time.

To do this, I made the assumption that the primary key is an integer -- I'm not sure if that's a valid assumption, but it's certainly the most common scenario.

This patch also adds support for :limit, because there's really no good reason to not support it in code.

Lastly, this patch breaks the behavior that only the last batch will have less than the batch size number of records. However, it does make sure that every batch has at least one record. This doesn't seem like a terribly important aspect of the api, but it's worth noting.

Comments and changes to this ticket

You flagged this item as spam.
Chris October 8th, 2009 @ 12:10 AM
Here's the changeset as a patch.
- 0001-A-faster-more-flexible-implementation-of-find_in_ba.patch 4.8 KB
Rohit Arondekar October 6th, 2010 @ 06:38 AM
- State changed from “new” to “stale”
- Importance changed from “” to “”
Marking ticket as stale. If this is still an issue please leave a comment with suggested changes, creating a patch with tests, rebasing an existing patch or just confirming the issue on a latest release or master/branches.
You flagged this item as spam.
Yong Bakos February 8th, 2011 @ 07:26 PM
FWIW, I've confirmed the same slow behavior with Rails 2.3.10.