This project is archived and is in readonly mode.
Enumerable#group_by is O(n^2)
Reported by Pete Yandell | November 11th, 2008 @ 03:39 AM
Enumerable#group_by is O(n^2) in Rails 2.1 and up, making it extremely slow for big data sets.
Some quick tests show that group_by in Rails 2.1 takes 90 times as long as group_by in Rails 2.0 on an array of 10000 items.
This change:
http://github.com/rails/rails/co...
makes group_by use OrderedHash rather than Hash, which introduces the problem.
It's really the naive implementation of OrderedHash, on which most operations are O(n), that causes the problem.
Comments and changes to this ticket
-
Frederick Cheung December 8th, 2008 @ 11:26 AM
As a very quick test I did this (in a rails 2.2 app)
module Enumerable def old_group_by groups = [] inject({}) do |grouped, element| index = yield(element) if group = grouped[index] group << element else group = [element] groups << [index, group] grouped[index] = group end grouped end groups end end require 'benchmark' include Benchmark array = (1..1000000).to_a; false bmbm(5) do |x| x.report("current implementation") { array.group_by {|i| i % 13}} x.report("old implementation") { array.old_group_by {|i| i % 13}} end
I get something like
current implementation 3.340000 0.030000 3.370000 ( 3.419125) old implementation 1.960000 0.020000 1.980000 ( 1.997561)
Varying the size of array the ratio of the two seems pretty constant, so to me at least the newer implementation is slower, but only by a constant factor. What testcase did you have that showed much worse results?
-
Pete Yandell December 9th, 2008 @ 06:00 AM
Hi Frederick. Thanks for picking this up!
Try it with more groups, by bumping up the i % 13 in your test code to a bigger value.
For i % 130:
current implementation 14.610000 0.100000 14.710000 ( 15.687624) old implementation 4.990000 0.040000 5.030000 ( 5.300712)
For i % 1300:
current implementation 128.240000 0.460000 128.700000 (130.583804) old implementation 5.410000 0.030000 5.440000 ( 5.639136)
You can see that it degrades very quickly as the number of groups goes up. This is because OrderedHash (as used by the current implementation) is O(n) for finding elements or inserting new elements.
-
Frederick Cheung December 9th, 2008 @ 08:36 AM
Ah, O(n^2) in the number of groups - I'd interpreted the question as being O(n^2) in the size of the array
-
Jeremy Kemper December 9th, 2008 @ 06:47 PM
- Assigned user set to Jeremy Kemper
- State changed from new to open
- Milestone cleared.
Let's swap out OrderedHash implementation, then.
-
Repository December 10th, 2008 @ 05:08 PM
- State changed from open to committed
(from [355f41d8aafd75d76db25cdda4736e0052b0605c]) Rework ActiveSupport::OrderedHash to make lookups faster
[#1352 state:committed]
Signed-off-by: Jeremy Kemper jeremy@bitsweat.net http://github.com/rails/rails/co...
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
<h2 style="font-size: 14px">Tickets have moved to Github</h2>
The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>
People watching this ticket
Attachments
Tags
Referenced by
- 1352 Enumerable#group_by is O(n^2) [#1352 state:committed]