This project is archived and is in readonly mode.

#6129 open
Brian Durand

Reduce memory bloat in ActiveRecord transactions

Reported by Brian Durand | December 7th, 2010 @ 11:52 PM

This patch reduces memory bloat in ActiveRecord transactions. With the introduction after_commit and after_rollback callbacks references are kept to all records updated in a transaction until the transaction completes. This can lead to memory bloat with large transactions.

The code change replaces the references to weak references unless the objects implement these callbacks. This allows the garbage collector to reclaim objects that won't be executing a callback.

Comments and changes to this ticket

  • Piotr Sarnacki

    Piotr Sarnacki December 11th, 2010 @ 10:47 AM

    • Importance changed from “” to “Medium”

    I can't remember correctly what was wrong with WeakRef, but during development of IdentitiyMap, something new was implemented (ActiveSupport::WeakHash) and Weakling was used for JRuby: https://github.com/miloops/rails/commit/ada014929b01af5ce8ca5e6fdd1...

    Also you might want to talk to Emilio Tagua, he probably has some more details on that.

  • Piotr Sarnacki

    Piotr Sarnacki December 11th, 2010 @ 10:53 AM

    Just to be clear about IM stuff - it hasn't been merged into master yet, here is the pull request for that: https://github.com/rails/rails/pull/76

  • Jacek Becela

    Jacek Becela December 11th, 2010 @ 11:30 AM

    I like the WeakRef solution very much - works for me(tm) but I don't use anything other than MRI. +1.

  • Brian Durand

    Brian Durand December 13th, 2010 @ 04:48 PM

    It looks to me like the identity map stuff is implementing a weak HashMap and in the jruby case utilizing the existing Java weak reference libraries. I don't see anything in the ActiveSupport::WeakHashMap or the Weakling code that would indicate there was a problem with how WeakRef itself is implemented. They both seem to be doing the same thing internally the WeakRef is doing.

    This patch doesn't need a weak hash map or a reference queue so I think it should be fine with WeakRef. The weak reference exist only to support the callbacks that reset the state of new or deleted objects if the transaction fails. If the objects have been garbage collected because there are no hard references to them, then there is no need to invoke these callbacks.

  • Aaron Patterson

    Aaron Patterson December 13th, 2010 @ 06:43 PM

    The problem is that WeakRef uses _id2ref to look up an object. _id2ref takes an object id, and returns the object corresponding with that id. The problem is that object ids are not unique in MRI, the get reused. That means that you could get back an object that isn't the object you were actually looking for.

    We can easily demonstrate id reuse like so:

    class Foo; end
    class Bar; end
    
    id_to_class = {}
    
    loop do
      obj = [Foo, Bar].sort_by { rand }.first.new
    
      if id_to_class.key? obj.object_id
        puts "obj id: #{obj.object_id} was reused"
    
        if id_to_class[obj.object_id] != obj.class
          puts "omg! they aren't even the same class!"
        end
      end
    
      id_to_class[obj.object_id] = obj.class
    end
    

    Now imagine that two AR objects had object_ids that collided.

    I hesitate to apply this patch because of this problem. Maybe we can find a different way?

  • Brian Durand

    Brian Durand December 16th, 2010 @ 03:16 AM

    I've thoroughly looked into the WeakRef implementation and how it works across the various Ruby runtimes and this is what I've found: it sucks.

    1. Object ids are not reused in Jruby, Rubinius, or IronRuby so WeakRefs do work properly (makes sense since these all run on a VM).
    2. Object ids are reused in MRI 1.8.7 and REE 1.8.7 do reuse object ids, but the WeakRef implementation handles that case properly because of the single system thread the process is running on.
    3. Object ids are reused in MRI 1.9.2 but the WeakRef implementation does not handle this properly since the garbage collector can run concurrently with threads allocating new objects. This messes up cleanup logic since finalizers attached to an object id can be run after that object id has been assigned to a new object.
    4. The WeakRef implementation does not scale on most runtimes. Performance on MRI 1.8.7, REE 1.8.7, and IronRuby is especially bad (performance on Rubinius is actually good, though). For some reason it extends Delegator which redefines all of the wrapped object's methods on initialization. This is both expensive on the CPU and can take up a lot of memory. On MRI 1.8.7 creating an array of 50,000 WeakRefs that wrap Object.new will balloon the heap to well over 1GB and creating each WeakRef takes 15-100 times longer to do. I have no idea why you would want to use a Delegator on a WeakRef unless you want your code to randomly break when the garbage collector kicks in, but it just doesn't seem worth it.
    5. The weakling gem provides a better WeakRef implementation for Jruby that uses the native Java weak references.

    Since WeakRef is either broken or performs horribly on 5 of the 6 Ruby runtimes I tested, and it is a really useful feature to have, I think it makes sense to add it to ActiveSupport so we can have a consistent working interface for weak references. Unfortunately, easier said than done. I started out simply reimplementing WeakRef to make it not extend Delegator and fix the bugs with 1.9. However, Jruby has it's own better implementation and since Rubinius has a very efficient implementation of WeakRef, but doesn't perform at all well with finalizers. Thus, I ended up with three different implementations of WeakReference and the runtime picks the best choice:

    1. Jruby uses one backed by Weakling if it is available otherwise it uses one backed by WeakRef
    2. Rubinius uses the implementation backed by WeakRef
    3. All other runtimes use my reimplementation of WeakRef

    The interface is very simple:

    obj = Object.new
    ref = ActiveSupport::WeakReference.new(obj)
    ref.object                # obj or nil if the reference has been reclaimed
    ref.referenced_object_id  # obj.object_id
    

    Further, I used this implementation to create a WeakHash class where the values are weak references. This could be used to replace the WeakHash shown in the commit linked to above for the identity map stuff.

    Attached is a new patch with the original ActiveRecord changes along with the WeakReference implementations. Also attached is my test script in case anyone wants to verify the bizarre weak reference hell I've stumbled into. To run the test script, simply run it with your favorite flavor of ruby and pass one of object, weakref, or weakreference as the argument:

    ruby weakref_test.rb object         # baseline object creation without weak references
    ruby weakref_test.rb weakref        # uses WeakRef
    ruby weakref_test.rb weakreference  # uses new ActiveSupport::WeakReference
    

    Each test will report on how long it takes to create 1000 instances of the specified class and then will run 100,000 iterations trying to find reused object id's. Be careful if you run the weakref tests under MRI, REE, or IronRuby because your process heap will grow to 2+GB and your computer may become severely unresponsive.

  • Aaron Patterson

    Aaron Patterson December 16th, 2010 @ 03:43 AM

    @Brian Excellent work! The implementation looks great. Is there a reason this should be in ActiveSupport? IMO, a weakref / weakhash library would be useful outside rails.

    Also, if you create a gem outside rails, you can create JRuby specific versions that depend on weakling, which guarantees that weakling is available.

    Finally, shouldn't we be filing bug reports / patches against MRI?

    What do you think?

  • Brian Durand

    Brian Durand December 16th, 2010 @ 03:47 PM

    @Aaron Totally agree that WeakRef needs to be fixed in Ruby 1.9 branch and I will submit a patch for MRI. As for usefulness outside of Rails I also totally concur.

    My original thought on adding it to ActiveSupport is that it fits in as some generally useful functionality that is either missing from core Ruby or not universally available like Base64, Gzip, OrderedHash, TimeWithZone, etc. I do, however, agree that the best method for sharing code like this in this day in age that has nothing to do with Rails (other than I need it to fix the ActiveRecord transaction issue) is as a separate gem. I can work on turning the patch code gem into a gem.

    Before I do that, though, what are the thoughts on adding another gem dependency to Rails for this purpose? Is that the preferred direction for adding new generic functionality vs. updating ActiveSupport with new features?

  • Aaron Patterson

    Aaron Patterson December 16th, 2010 @ 07:34 PM

    @Brian Excellent. Ping me when you submit the patch to MRI and I will help convince the other ruby-core members.

    I prefer that this library would be in a gem, and I think I can convince the rails-core people that it is necessary. Releasing the weakref code as a gem would be beneficial because we can release the gem out of band with rails. Also, we can change dependencies depending on the target platform (like make this gem depend on weakling if the gem is installing on jruby).

    Have you thought about folding this code in to weakling?

    If you need any help, let me know. I think weakrefs are the correct way to fix this issue, so I'm happy to help.

  • Brian Durand
  • Brian Durand

    Brian Durand January 11th, 2011 @ 04:41 PM

    I have released a new "ref" gem (http://rubygems.org/gems/ref) that provides working weak references for all the major ruby implementations. I have tested it on MRI, REE, YARV, Rubinius, Jruby, and IronRuby.

    I improved and expanded on my previous code to provide support for soft references, reference queues, and four different kinds of reference maps. Since someone previously mentioned that work going on to implement an identity map looked at weak references, they might find this code useful.

    Attached is an updated ActiveRecord patch that uses the new gem.

  • rails

    rails April 12th, 2011 @ 01:00 AM

    • State changed from “new” to “open”

    This issue has been automatically marked as stale because it has not been commented on for at least three months.

    The resources of the Rails core team are limited, and so we are asking for your help. If you can still reproduce this error on the 3-0-stable branch or on master, please reply with all of the information you have about it and add "[state:open]" to your comment. This will reopen the ticket for review. Likewise, if you feel that this is a very important feature for Rails to include, please reply with your explanation so we can consider it.

    Thank you for all your contributions, and we hope you will understand this step to focus our efforts where they are most helpful.

  • rails

    rails April 12th, 2011 @ 01:00 AM

    • State changed from “open” to “stale”
  • Brian Durand

    Brian Durand April 13th, 2011 @ 05:32 PM

    • State changed from “stale” to “open”

    Still an issue. This can be a serious issue where many records are updated or deleted within a transaction since the ActiveRecord objects tend to have large memory footprints and can quickly bloat the memory beyond what is available.

    [state:open]

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Pages