This project is archived and is in readonly mode.

#78 ✓wontfix
vlad.zar (at gmail)

Fixtures.identify should generate uniqueIDs

Reported by vlad.zar (at gmail) | May 1st, 2008 @ 03:36 PM

"Foxy fixtures are great! Manually entered IDs no more required at all" (c) somebody.

Well, that's not actually true (yet).

I've tried to generate a big number of records for upcoming load testing using fixtures and rake task similar to fixtures:load. Here is simplified example of my records.yml file:

<% 1.upto(1_000) do |type_id| %>
<% 1.upto(1_100) do |record_id| %>

type_<%= type_id %>_record_<%= record_id %>:
  name: $LABEL

<% end %>
<% end %>

And I've got an error from MySQL which looks like:

Mysql::Error: #23000Duplicate entry '813474051' for key 1: INSERT INTO `records` (`id`, `name`) VALUES ('813474051', 'type_894_record_204')

Ok, it's clear what is going on here. See fixtures.rb source code.

class Fixtures

# Returns a consistent identifier for +label+. This will always
# be a positive integer, and will always be the same for a given
# label, assuming the same OS, platform, and version of Ruby.
def self.identify(label)
  label.to_s.hash.abs
end

end

So, Object#hash generates Fixnum value, and abs makes it even less. It's ok, but now look at this:

'type_304_record_1070'.hash.abs # => 813474051
'type_894_record_204'.hash.abs # => 813474051

How bad such behavior is? For me, this is bad enough.

I think Fixtures.identify should return not only "consistent identifier, the same for given label" but not less importantly unique identifier (probably inside yml file?) too.

Don't think this error is too rare to bother other people. See this nice fixture keys for users.yml file:

'olga_boyer'.hash.abs # => 603279717
'demond_jacobi'.hash.abs # => 603279717

'chadrick_conroy'.hash.abs # => 786185615
'ardella_vonrueden'.hash.abs # => 786185615

'margarett_jakubowski'.hash.abs # => 35381745
'alexanne_volkman'.hash.abs # => 35381745

'rafael_harris'.hash.abs # => 546356202
'jamar_schimmel'.hash.abs # => 546356202

'carissa_schoen'.hash.abs # => 892394396
'dasia_gusikowski'.hash.abs # => 892394396

'adriana_skiles'.hash.abs # => 31972130
'hester_murazik'.hash.abs # => 31972130

'tatyana_kessler'.hash.abs # => 711293022
'nigel_kuhlman'.hash.abs # => 711293022

'christy_brown'.hash.abs # => 182416424
'clark_weber'.hash.abs # => 182416424

'esmeralda_bauch'.hash.abs # => 206812768
'marcelina_moore'.hash.abs # => 206812768

This nice little keys won't work too!

Ok, ok, I'm cheating a bit. I've generated this keys with faker gem.

Nevertheless I'm looking forward to Fixtures.identify new implementation.

Comments and changes to this ticket

  • Frederick Cheung

    Frederick Cheung May 1st, 2008 @ 06:53 PM

    I've always wondered whether this was going to be a problem with anyone. Typically the number of things in a fixtures file is quite small, so it does seem quite unlikely unless there are many fixtures.

    Fundamentally the problem is that: (as i understand it) we need to be able to generate the label for people(:bob) without having read the people.yml. Perhaps Fixtures.identify could stash away all the ids it has generated so that it knows not to reuse certain values?

  • Frederick Cheung

    Frederick Cheung May 1st, 2008 @ 06:55 PM

    Another thought: is there any evidence that String#hash is generating particular bad values (ie bad uniformity etc...) ?

  • vlad.zar (at gmail)

    vlad.zar (at gmail) May 2nd, 2008 @ 08:06 AM

    String#hash doesn't need to generate unique value, it generates hash and do it pretty well :) Instead, if I see a method named *identify*, I expect unique value to be generated.

    I'm used to this nice new feature in edge Rails. It allows me to have easily reading relations in fixtures.

    Having a big non sequential IDs for records is also a small advantage in tests. It gives me more realistic data, which looks like application is being in use some years maybe :) I mean Some records were deleted. Btw, I've found a bug in project just because I have such a big IDs.

  • josh

    josh July 17th, 2008 @ 01:02 AM

    • State changed from “new” to “wontfix”
    • Tag set to activerecord, bug, fixtures

    Fixtures were not mean for HUGE data stores. 1000 records is going slow your tests down a ton since that gets reloaded every test case. Probably not a good idea. I'd use a factory pattern to create this large data set for the tests you need it. I don't think fixture collision tests are necessary.

  • roobnoob

    roobnoob July 29th, 2008 @ 01:27 AM

    It's really a bad idea to use a hash as a synonym for uniqueness. A collision can happen at any time, whether you have 2 entries or millions. Of course, the probability of a collision with 2 entries is microscopic, but it's there; it's like a mine waiting to be stepped on.

    And fixtures aren't just used for testing. They can be used as seed data sources (as you deploy, you run a rake task that loads up the db with seed data from fixtures). Or as mentioned above, to have a bunch of data for stress testing.

    Once you get into this realm, you might have millions of rows, and now you're hitting real probabilities of a collision.

    There should at least be a big huge disclaimer "WARNING!.." in the foxy fixtures documentation.

    Now, as to how to fix the problem, that's a tough one. A brute force solution to load everything in and track id's works, but then you start running into potential memory issues if you store these hashes in memory when dealing with huge data sets.

    I would probably leave it as "wontfix", since there are 2 cases: a) either somebody is generating the fixtures by hand and runs into the collision, or b) huge data sets are being generated via some program, as in the above.

    For a), just edit the file and change one of the colliding symobls, for b), don't use foxy-fixtures. Since it's being generated by code, simply generate the id as well.

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Pages