This project is archived and is in readonly mode.

#1352 ✓committed
Pete Yandell

Enumerable#group_by is O(n^2)

Reported by Pete Yandell | November 11th, 2008 @ 03:39 AM

Enumerable#group_by is O(n^2) in Rails 2.1 and up, making it extremely slow for big data sets.

Some quick tests show that group_by in Rails 2.1 takes 90 times as long as group_by in Rails 2.0 on an array of 10000 items.

This change:

http://github.com/rails/rails/co...

makes group_by use OrderedHash rather than Hash, which introduces the problem.

It's really the naive implementation of OrderedHash, on which most operations are O(n), that causes the problem.

Comments and changes to this ticket

  • Frederick Cheung

    Frederick Cheung December 8th, 2008 @ 11:26 AM

    As a very quick test I did this (in a rails 2.2 app)

    
    module Enumerable
      def old_group_by
        groups = []
        inject({}) do |grouped, element|
          index = yield(element)
          if group = grouped[index]
            group << element
          else
            group = [element]
            groups << [index, group]
            grouped[index] = group
          end
          grouped
        end
        groups
      end
    end
    
    
    require 'benchmark'
    include Benchmark
    array = (1..1000000).to_a; false
    
    bmbm(5) do |x|
      x.report("current implementation") { array.group_by {|i| i % 13}}
      x.report("old implementation") { array.old_group_by {|i| i % 13}}
    end
    

    I get something like

    current implementation 3.340000 0.030000 3.370000 ( 3.419125) old implementation 1.960000 0.020000 1.980000 ( 1.997561)

    Varying the size of array the ratio of the two seems pretty constant, so to me at least the newer implementation is slower, but only by a constant factor. What testcase did you have that showed much worse results?

  • Pete Yandell

    Pete Yandell December 9th, 2008 @ 06:00 AM

    Hi Frederick. Thanks for picking this up!

    Try it with more groups, by bumping up the i % 13 in your test code to a bigger value.

    For i % 130:

    current implementation 14.610000 0.100000 14.710000 ( 15.687624) old implementation 4.990000 0.040000 5.030000 ( 5.300712)

    For i % 1300:

    current implementation 128.240000 0.460000 128.700000 (130.583804) old implementation 5.410000 0.030000 5.440000 ( 5.639136)

    You can see that it degrades very quickly as the number of groups goes up. This is because OrderedHash (as used by the current implementation) is O(n) for finding elements or inserting new elements.

  • Frederick Cheung

    Frederick Cheung December 9th, 2008 @ 08:36 AM

    Ah, O(n^2) in the number of groups - I'd interpreted the question as being O(n^2) in the size of the array

  • Jeremy Kemper

    Jeremy Kemper December 9th, 2008 @ 06:47 PM

    • Assigned user set to “Jeremy Kemper”
    • State changed from “new” to “open”
    • Milestone cleared.

    Let's swap out OrderedHash implementation, then.

  • Frederick Cheung
  • Repository

    Repository December 10th, 2008 @ 05:08 PM

    • State changed from “open” to “committed”

    (from [355f41d8aafd75d76db25cdda4736e0052b0605c]) Rework ActiveSupport::OrderedHash to make lookups faster

    [#1352 state:committed]

    Signed-off-by: Jeremy Kemper jeremy@bitsweat.net http://github.com/rails/rails/co...

  • klkk

    klkk May 23rd, 2011 @ 03:09 AM

    • Importance changed from “” to “”

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

People watching this ticket

Attachments

Referenced by

Pages