This project is archived and is in readonly mode.

#2137 ✓stale
Brian Durand

Allow find_in_batches to use :order, :limit, and :offset

Reported by Brian Durand | March 5th, 2009 @ 05:53 PM | in 2.3.10

Attached is a patch for ActiveRecord::Batches which removes the restriction on not being able to specify :order, :offset, and :limit on batch iterations. The code is base on the code from the pseudo_cursors plugin which provides the same functionality (http://agilewebdevelopment.com/p...>

The :start option is removed since :limit and :offset are more normal for use with ActiveRecord and because it seemed a little confusing since it didn't behave like offset.

Allowing :order clauses is especially important since it lets you control the order records are processed. For instance, a long batch job might want to process the most recently updated records first since they may be the most active ones and give a greater appearance of freshness.

Comments and changes to this ticket

  • Brian Durand

    Brian Durand March 5th, 2009 @ 05:57 PM

    • Title changed from “Allow find_batches to use :order, :limit, and :offset” to “Allow find_in_batches to use :order, :limit, and :offset”

    Also allows iterating over records with non-integer primary keys

  • Ryan Angilly

    Ryan Angilly March 5th, 2009 @ 09:18 PM

    Seems a lot like this one:

    http://rails.lighthouseapp.com/p...

    Even with the order, offset, and limit stuff set up you still have the problem of 'missing' records that are being inserted while find_in_batches is doing it's thing.

    Not to sound like a jerk, but even though this sounds nice to have, I can't think of a place where I would need the batch done in a certain order.

  • Xavier Noria

    Xavier Noria March 5th, 2009 @ 09:55 PM

    Ryan, you could need to do some report, most recent first, ordered by surname, whatever.

    In my view this method should allow the user to do that. It is the user responsability to know whether it is safe, or to decide whether a missing record is a big deal.

    In addition, AR is a standalone ORM, you could even be certain that no other program is touching the database.

  • Ryan Angilly

    Ryan Angilly March 5th, 2009 @ 10:12 PM

    I guess when I think of batch processing I think of sending notifications and carrying out async processes. Things that inherently do not require ordering. If you're doing a report that requires ordering, sort the report when you're done.

    I really have an issue with the second line. find returns all records matching the query. find_in_batches shouldn't have different behavior.

    I dont understand part three. You could be certain that no other program is touching the database, but with cronjobs and multi-mongrel setups it's pretty unlikely that would ever be the case.

  • Brian Durand

    Brian Durand March 5th, 2009 @ 11:29 PM

    Data cleanup or munging is an example where record order might make sense. You may need to process all records, but some records are more active than others. Some may be months old and no one cares when they are fixed. Allowing the order to be set means you can handle this case.

    By selecting all the ids first, you can at least be certain the records you are iterating over. The other method will result in an indeterminate number of records processed if inserts are happening. If the iteration loop updates other records of the same class, it could even change the results since updates to other records may affect whether or not they match the conditions.

  • Brian Durand

    Brian Durand March 5th, 2009 @ 11:37 PM

    To clarify just a little bit, I think the functionality here should be equivalent:

    @@@ruby Post.all(options).each do |post| # do something end

    
    
    and
    
    @@@ruby
    Post.each(options) do |post|
      # do something
    end
    

    By allowing find_in_batches to select new records after it starts, the two statements iterate over different sets of data. With the patch, they can take the same options and will always iterate over the same set.

  • Brian Durand

    Brian Durand March 5th, 2009 @ 11:39 PM

    Sorry, that should have been (my formatting skills are off)

    
    Post.all(options).each do |post|
      # do something end
    and
    
    
    Post.each(options) do |post|
      # do something
    end
    
  • Xavier Noria

    Xavier Noria March 6th, 2009 @ 08:00 AM

    Ryan, I don't think telling the user to order his report makes any sense. Just give him the API to let SQL do his job.

    cronjobs have ways to control they run only once at a time, and in the end is the responsability of the user to decide whether it is safe for him or not.

    It is certainly not the case that it is always nonsense or always dangerous. You just have no idea a priori, the user has.

    Well it is clear that I am +1 here.

  • Brian Durand

    Brian Durand March 6th, 2009 @ 03:28 PM

    One other thing I'd like to add is that this code makes the functionality more consistent with the behavior of database cursors where once a cursor is opened (at least in the implementations I'm familiar with), the data set is static and won't change as you fetch the rows. Cursors don't prohibit you from specifying an ORDER BY clause.

    If we can make the functionality closer to a concept people already understand, I thinking that's a good thing.

  • Ryan Angilly

    Ryan Angilly March 6th, 2009 @ 03:35 PM

    If I understand what you just said, then I didn't know that, and I may now agree with you. I'll change my -1 to a 0 while I mull over it :-)

  • Steve St. Martin

    Steve St. Martin April 15th, 2010 @ 09:00 PM

    • Assigned user set to “Ryan Bigg”

    This ticket essentially duplicates #2135 which was marked as wontfix. This one should be marked the same

  • Ryan Bigg

    Ryan Bigg April 15th, 2010 @ 10:46 PM

    • State changed from “new” to “duplicate”

    Duplicate of #2135

  • Repository

    Repository April 29th, 2010 @ 12:33 PM

    • State changed from “duplicate” to “resolved”

    (from [dcf0f97514851face2dd24b1f39eabee413fcfc2]) making rake:migrate VERSION=0 a noop called in succession. [#2137 state:resolved]

    Signed-off-by: José Valim jose.valim@gmail.com
    http://github.com/rails/rails/commit/dcf0f97514851face2dd24b1f39eab...

  • Repository
  • José Valim

    José Valim April 29th, 2010 @ 01:07 PM

    • State changed from “resolved” to “duplicate”

    Ops, I referenced the wrong ticket in the commit message. :)

  • Jeremy Kemper

    Jeremy Kemper April 30th, 2010 @ 02:33 AM

    • Milestone changed from 2.x to 2.3.6
    • State changed from “duplicate” to “open”
    • Assigned user changed from “Ryan Bigg” to “Jeremy Kemper”

    This isn't really a dupe since it doesn't layer order on top of the pk-order assumption. Agreed that matching normal AR options is desirable, though :start should be deprecated.

    Needs rebase to master + 2-3-stable backport.

  • Tanel Suurhans

    Tanel Suurhans May 5th, 2010 @ 10:44 PM

    Had to rework the functionality a bit, ported test provided in the previous patch and added some new ones. Also made a new example in the doc comments for usage of :order and :limit.
    This all was done against the current master (patch should apply).

  • Rizwan Reza

    Rizwan Reza May 16th, 2010 @ 02:41 AM

    • Tag changed from 2.3.0, active_record, batches to 2.3.0, active_record, batches, bugmash
  • Jeremy Kemper

    Jeremy Kemper May 23rd, 2010 @ 05:54 PM

    • Milestone changed from 2.3.6 to 2.3.7
  • Jeremy Kemper

    Jeremy Kemper May 24th, 2010 @ 09:40 AM

    • Milestone changed from 2.3.7 to 2.3.8
  • Jeremy Kemper

    Jeremy Kemper May 25th, 2010 @ 11:45 PM

    • Milestone changed from 2.3.8 to 2.3.9
  • Jeremy Kemper

    Jeremy Kemper August 30th, 2010 @ 02:28 AM

    • Milestone changed from 2.3.9 to 2.3.10
    • Importance changed from “” to “Medium”
  • rails

    rails February 26th, 2011 @ 12:00 AM

    • Tag changed from 2.3.0, active_record, batches, bugmash to 230, active_record, batches, bugmash

    This issue has been automatically marked as stale because it has not been commented on for at least three months.

    The resources of the Rails core team are limited, and so we are asking for your help. If you can still reproduce this error on the 3-0-stable branch or on master, please reply with all of the information you have about it and add "[state:open]" to your comment. This will reopen the ticket for review. Likewise, if you feel that this is a very important feature for Rails to include, please reply with your explanation so we can consider it.

    Thank you for all your contributions, and we hope you will understand this step to focus our efforts where they are most helpful.

  • rails

    rails February 26th, 2011 @ 12:00 AM

    • State changed from “open” to “stale”

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Referenced by

Pages