This project is archived and is in readonly mode.
Add magic encoding comment to generated files
Reported by Norman Clarke | October 26th, 2008 @ 09:49 PM | in 2.x
This patch adds support for :order => :random in ActiveRecord queries.
MySQL has a different syntax for ordering by random than SQLite and Postgres (RAND() as opposed to RANDOM()). If you develop and test on SQLite and deploy to MySQL, you usually need to include some kind of workaround if you want to do a query that selects random records.
Various Ruby solutions to this problem exist but have drawbacks:
-
Using a randomly generated offset only works well when you are pulling one random record and can require a query to count rows first. Ordering by random in the database is just as easy with one record or many, and can always be done in one query.
-
Selecting all the records into an array and shuffling it can be expensive if you want to randomly select against a large table, even if you just pull the ids. Using the database to do this is much faster and generates less garbage.
Conceptually this tiny patch makes :order work a little more like :conditions hash arguments, in that it abstracts away the underlying SQL. It makes ActiveRecord::Base delegate the formatting of the "ORDER BY" arguments to the various adapters. At the moment only the argument ":random" is handled specially; all other arguments are simply passed back unmodified.
Comments and changes to this ticket
-
Lee Hambley January 17th, 2009 @ 10:20 PM
- Tag changed from :order, activerecord, adapters, random to :order, activerecord, adapters, patch, random
+1
-
Norman Clarke February 27th, 2009 @ 05:12 PM
Attaching new patch against latest master, with documentation.
-
Pratik March 9th, 2009 @ 05:59 PM
- Assigned user set to Pratik
- State changed from new to wontfix
- Title changed from [PATCH] add support for :order => :random in queries to Add support for :order => :random in queries
ORDER BY RAND() usually have a performance implication. Finding random records doesn't really have a single solution for all the possible scenarios. Also, for the patch to be accepted, it'll need to mess with query caching, which starts to get dirty. It's best to leave this job to a plugin.
Thanks.
-
Norman Clarke March 9th, 2009 @ 06:24 PM
"Performance" is probably the single worst reason to reject this patch; if you have a table with a million users, what's the "higher performance" way of pulling out 10 random records? Have you ever done this? Database engines are optimized for this sort of thing. I have a hard time imagining how you'd do this faster and simpler in straight Ruby.
This isn't just some border case either, lots of websites do this. The fact that there is no single solution to finding random records should not impede AR offering a solution to a very common usage scenario. Rails never purports to elegantly solve every problem, only the most common ones.
As regards caching, I don't see the problem with a query that includes ORDER BY RAND() returning the same dataset if it is performed two times during the same request. This is what would happen if you just enter the query in manually right now without my patch. What's the problem?
In active record there's an existing pattern of deferring adapter-specific syntax to the adapters. This patch simply extends this to cover the "ORDER BY" clause as well. In that sense it actually removes something exceptional from the existing code.
Maybe I'm misunderstading something but I find your explantion puzzing.
-
Pratik March 9th, 2009 @ 07:00 PM
Here are some benchmarks with 100k rows ( forget millions ).
[lifo@null]$ script/performance/benchmarker 1 'Item.all(:limit => 5, :order => "rand()")' user system total real #1 0.000000 0.000000 0.000000 ( 4.020854)
Here's a relevant thread - http://groups.google.com/group/r...
-
Norman Clarke March 9th, 2009 @ 07:14 PM
Limiting that to selecting only an indexed column would cut down your query time significantly. On my system, I get these times querying against a table with lots of columns and about 100k rows:
SELECT * FROM hits ORDER BY RAND() LIMIT 10;
-> 2.19 seconds
SELECT id FROM hits ORDER BY RAND() LIMIT 10;
-> 0.27 seconds
Let's say you have a table with not so many rows, then you can simply use :order => :random without thinking about it.
If you have a table with lots of rows, then OK, you will at some point have to optimize your queries. This is the same as many of other things you do with SQL, not just ORDER BY RANDOM.
-
Michael Koziarski March 9th, 2009 @ 08:36 PM
I agree with pratik here, the RAND() in the order forces a table scan for EVERY query, doing this by default is far too dangerous.
If you want to efficiently select random records something like this will work:
module ActiveRecord class Base def self.random if (c = count) != 0 find(:first, :offset =>rand(c)) end end end end
Even then the performance with large offsets sucks.
-
Norman Clarke March 9th, 2009 @ 08:45 PM
Koz, that's not true when you select only indexed columns:
mysql> explain select id from hits order by rand()\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: hits type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 122869 Extra: Using index; Using temporary; Using filesort 1 row in set (0.00 sec)
If you you are querying against a small dataset you could simply use the :order => :random as is. This would likely cover a very significant percentage of the actual use cases, should you put this patch in Rails.
Otherwise for large datasets an easy optimization would be to build your result set by :select-ing only the primary key, then loop through the result set and use reload to get the model's other fields. Something like:
@users = User.all(:select => :id, :order => :random, :limit => 5) @users.each { |u| u.reload}
Pulling out only the fields you need using :select is usually one of the first database optimizations people tend to make anyway.
-
Michael Koziarski March 9th, 2009 @ 09:17 PM
With a sufficiently large table even an index scan will be painful. And to add a feature that's only safe to use if you have a :select with a covering index or a tiny table just doesn't seem worth the risk.
However having said all this, it's just about adding syntactic sugar to turn
:order=>"rand()"
Into
:order=>:random
The risks here outweigh the benefits for now.
-
Norman Clarke March 9th, 2009 @ 09:22 PM
I'm really surprised you would reject something because "with large datasets and no optimization it could create slow queries." I mean, isn't that true for a lot of things? And isn't this going to be used by people who are just going to use RAND() or RANDOM() anyway? What exactly are you protecting people from?
And it's NOT just syntactic sugar. The whole point of the patch is to make it possible to use the same syntax across database adapters. MySQL uses a different syntax than SQLite and PostgreSQL, which means if you develop and test on SQLite and deploy on MySQL then you experience annoyance and pain.
Anyway it's your guys' baby so no problem - I guess I just don't see things the same way.
-
Max Lapshin April 20th, 2009 @ 05:22 PM
Norman. I advise you to add function RAND into postgresql and RAND into sqlite. And I'm also against including this fix into Rails. Use Postgres to make workaround.
-
Norman Clarke April 20th, 2009 @ 06:19 PM
Max, I think most people working with Rails would rather install a plugin than mess around with PG's stored procedure languages or program SQLite extensions in C.
-
grosser May 18th, 2010 @ 06:19 AM
another alternative:
random_records http://github.com/grosser/random_records makes multiple random records available, by fetching them in user-defined random chunks -
Norman Clarke May 18th, 2010 @ 02:59 PM
@grosser
Well, performance seems to be the thing that keeps coming up with this issue. None of the Ruby solutions I've ever seen for this problem perform as well as an optimized query. Which, if you stop and think about it for a second, is not in the least bit surprising. I guess I have to shout it from a rooftop to make people believe it, but, MySQL's RAND is fast when selecting a single indexed column. Basically, something like:
ids = User.find(:select => :id, :order => "RANDOM()" ...).to_s(:db) users = User.find(ids)
Will perform well with both large and small datasets; no need for any plugin. The example given higher up in this thread ran in .27 seconds against 120k rows on my ancient Macbook.
If this patch had been accepted, you'd have an easy way to make it work across any database adapter too, so if you test on SQLite and deploy on MySQL, it would work.
As it stands, if you want to use it everywhere you have to look at
ActiveRecord::Base.connection.adapter_name
in order to use
RAND
rather thanRANDOM
for MySQL.I'm still surprised that this patch was never added because "with a sufficiently large table even an index scan will be painful" - that can be said of any query - or that it was called "syntactic sugar", when in fact its point was to add to the adapter a needed abstraction for a syntax which is in wide use.
-
mikeymicrophone June 19th, 2010 @ 04:23 AM
Norman, I wish I could use this! For us mere mortals who want to use databases, and not marry them, this is a major issue.
Koz, in your first comment you indicated that this patch would slow down queries that were not querying for random content (as well as those that were). Am I interpreting you correctly?
I want to submit for consideration the possibility that randomization is a very important issue for many reasons - partly the fun and usability, and even the efficiency and arresting quality, of apps, and partly the perennial dumbing-down of our culture that has resulted from just reading the first page of google results, and just reading the front page of nytimes instead of browsing through the entire issue. A persistent ordering not only obscures many good answers and feeds back into itself to bury them even deeper; it also gives lay-people a false sense of authority and helps them forget to ask questions.
A big part of why Rails has always been a high-productivity environment is the instant gratification. As in, things change. I believe it is easier to stay motivated when you make small accommodations in your environment that add to the veracity of your sample data. You just don't want to see the same five things pop up every day - it feels like no progress is being made.
That said, here is a 10-line solution for others to use.
module Random def db_random case ActiveRecord::Base.connection.adapter_name when 'MySQL' 'rand()' when 'SQLite' 'random()' end end end
Then you can do this.
extend Random named_scope :randomized, :order => db_random
-
Norman Clarke June 19th, 2010 @ 04:46 AM
To be fair to Koz and Pratik, I've looked into this issue some more and despite my earlier insistence, I now think they're more right than I was giving them credit for. With tables of 100,000 records or so, RAND/RANDOM is "probably fast enough" but it does break down quite a bit as tables grow.
I still do think that this would be a useful patch, but I just want to back off on my earlier claims that this was "fast", because I was mistaken.
-
yooo April 15th, 2011 @ 09:33 AM
For those of you who are purchasing Titanium Jewelry or Titanium Necklaces, we realize that this is one of the most important purchasing decisions you will make.
-
dswllxy April 26th, 2011 @ 09:44 AM
To be fair to Koz and Pratik, beats by dre I've looked into this issue some more and despite my earlier insistence, monster beatsI now think they're more right than I was giving them credit for. beats headphoneWith tables of 100,000 records or so, beats studioRAND/RANDOM is "probably fast enough" but it does break down quite a bit as tables grow.
I still do think that this would be a useful patch, but I just want to back off on my earlier claims that this was "fast", powerbeatsbecause I was mistaken.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
<h2 style="font-size: 14px">Tickets have moved to Github</h2>
The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>