This project is archived and is in readonly mode.
nested includes in AR causes duplicates in to_json / to_xml
Reported by David | June 25th, 2010 @ 05:15 PM | in 3.x
In Rails edge using #to_xml or #to_json on an ActiveRecord::Relation with includes causes duplicate entries.
The model is: Projects has_many Tasks habtm Users
Example:
ActiveRecord::Schema.define(:version => 20100625013546) do
create_table "projects", :force => true do |t|
t.string "name"
t.datetime "created_at"
t.datetime "updated_at"
end
create_table "tasks", :force => true do |t|
t.string "name"
t.integer "project_id"
end
create_table "tasks_users", :id => false, :force => true do |t|
t.integer "task_id", :null => false
t.integer "user_id", :null => false
end
create_table "users", :force => true do |t|
t.string "name"
end
end
class Project < ActiveRecord::Base
has_many :tasks
end
class Task < ActiveRecord::Base
belongs_to :project
has_and_belongs_to_many :users
end
class User < ActiveRecord::Base
has_and_belongs_to_many :tasks
end
seeds.rb
task1 = Task.create!(:name => 'task1')
task2 = Task.create!(:name => 'task2')
task3 = Task.create!(:name => 'task3')
pr1 = Project.create!(:name => 'pr1')
pr2 = Project.create!(:name => 'pr2')
task2.project_id = pr2.id
task2.save!
task2.users.create(:name => 'Peter')
task2.users.create(:name => 'David')
task2.users.create(:name => 'Bernd')
Code that causes duplicates:
puts Project.includes(:tasks => :users).where('tasks.id is not null')
.to_xml(:include => {:tasks => {:include => :users}})
This will return 3 times the same task (task2) in the xml output:
<project>
<name>Pr2</name>
<tasks>
<task>
<name>task2</name>
<users>
<user> <name>Peter</name> </user>
<user> <name>David</name> </user>
<user> <name>Bernd</name> </user>
</users>
</task>
<task>
<name>task2</name>
<users>
<user> <name>Peter</name> </user>
<user> <name>David</name> </user>
<user> <name>Bernd</name> </user>
</users>
</task>
<task>
<name>task2</name>
<users>
<user> <name>Peter</name> </user>
<user> <name>David</name> </user>
<user> <name>Bernd</name> </user>
</users>
</task>
</tasks>
</project>
This code works fine:
puts Project.includes(:tasks).where('tasks.id is not null')
.to_xml(:include => {:tasks => {:include => :users}})
This issue has been discussed on the core group:
http://groups.google.de/group/rubyonrails-core/browse_thread/thread...
Comments and changes to this ticket
-
David June 25th, 2010 @ 06:04 PM
- Tag set to activerecord rails3, activerelation, to_json, to_xml
-
Neeraj Singh June 25th, 2010 @ 07:31 PM
- State changed from new to open
-
Santiago Pastorino June 25th, 2010 @ 10:10 PM
David can you provide a failing test case following http://rails.lighthouseapp.com/projects/8994/sending-patches
-
Neeraj Singh July 1st, 2010 @ 06:13 PM
- Importance changed from to Low
I spent some time and it seems issue is not with to_xml. Check this out
# case 1 projects = Project.where('tasks.id is not null').includes(:tasks) p = projects.first count1 = p.tasks.size puts "count1 is #{count1}" #=> 1 # case 2 projects = Project.where('tasks.id is not null').includes(:tasks => :users) p = projects.first count2 = p.tasks.size puts "count2 is #{count2}" #=> 3
Notice that the count result is 3 in case2. In case2 p has three tasks and all the three tasks are the same task but it is repeated three times.
Rails allows eager loading to preload some data. In this case the count is changing just because I am doing eager loading. And that sounds like a bug to me.
sql generated in case 1 is
Project Load (35.0ms) SELECT DISTINCT "projects".id FROM "projects" LEFT OUTER JOIN "tasks" ON "tasks"."project_id" = "projects"."id" WHERE (tasks.id is not null) LIMIT 1 Project Load (0.3ms) SELECT "projects"."id" AS t0_r0, "projects"."name" AS t0_r1, "projects"."created_at" AS t0_r2, "projects"."updated_at" AS t0_r3, "tasks"."id" AS t1_r0, "tasks"."name" AS t1_r1, "tasks"."project_id" AS t1_r2 FROM "projects" LEFT OUTER JOIN "tasks" ON "tasks"."project_id" = "projects"."id" WHERE (tasks.id is not null) AND ("projects"."id" IN (4))
sql geenerated in case 2 is
Project Load (0.2ms) SELECT DISTINCT "projects".id FROM "projects" LEFT OUTER JOIN "tasks" ON "tasks"."project_id" = "projects"."id" LEFT OUTER JOIN "tasks_users" ON "tasks_users"."task_id" = "tasks"."id" LEFT OUTER JOIN "users" ON "users"."id" = "tasks_users"."user_id" WHERE (tasks.id is not null) LIMIT 1 Project Load (0.3ms) SELECT "projects"."id" AS t0_r0, "projects"."name" AS t0_r1, "projects"."created_at" AS t0_r2, "projects"."updated_at" AS t0_r3, "tasks"."id" AS t1_r0, "tasks"."name" AS t1_r1, "tasks"."project_id" AS t1_r2, "users"."id" AS t2_r0, "users"."name" AS t2_r1 FROM "projects" LEFT OUTER JOIN "tasks" ON "tasks"."project_id" = "projects"."id" LEFT OUTER JOIN "tasks_users" ON "tasks_users"."task_id" = "tasks"."id" LEFT OUTER JOIN "users" ON "users"."id" = "tasks_users"."user_id" WHERE (tasks.id is not null) AND ("projects"."id" IN (4))
If someone wants to recreate this case then the full dump is
ActiveRecord::Schema.define(:version => 20100625013546) do create_table "projects", :force => true do |t| t.string "name" t.datetime "created_at" t.datetime "updated_at" end create_table "tasks", :force => true do |t| t.string "name" t.integer "project_id" end create_table "tasks_users", :id => false, :force => true do |t| t.integer "task_id", :null => false t.integer "user_id", :null => false end create_table "users", :force => true do |t| t.string "name" end end class Project < ActiveRecord::Base has_many :tasks def self.setup Project.delete_all Task.delete_all User.delete_all task1 = Task.create!(:name => 'task1') task2 = Task.create!(:name => 'task2') task3 = Task.create!(:name => 'task3') pr1 = Project.create!(:name => 'pr1') pr2 = Project.create!(:name => 'pr2') task2.project_id = pr2.id task2.save! task2.users.create(:name => 'Peter') task2.users.create(:name => 'David') task2.users.create(:name => 'Bernd') end def self.lab self.setup projects = Project.where('tasks.id is not null').includes(:tasks) p = projects.first count1 = p.tasks.size puts "count1 is #{count1}" projects = Project.where('tasks.id is not null').includes(:tasks => :users) p = projects.first count2 = p.tasks.size puts "count2 is #{count2}" end end class Task < ActiveRecord::Base belongs_to :project has_and_belongs_to_many :users end class User < ActiveRecord::Base has_and_belongs_to_many :tasks end
-
José Valim July 2nd, 2010 @ 11:20 AM
Hey Neeraj, thanks for the rock-solid investigation and I completely agree with the conclusion. The issue is, instead of having, one project, three tasks and three users as result, we should have one project, only one task and three users.
Not sure how hard would be to fix that but, since Rails is already nesting the users properly inside tasks, it seems to be mainly an issue of guaranteeing the uniqueness of the task object (for example, we could check the primary key uniqueness before adding each task to the tasks array).
-
Neeraj Singh July 14th, 2010 @ 10:57 PM
While working on some other ticket I came across :uniq => true option and that seems to solve this problem.
In the comment that above posted if I change from
class Project < ActiveRecord::Base has_many :tasks end
to
class Project < ActiveRecord::Base has_many :tasks, :uniq => true end
then I get only one record where previously I used to get 3 records.
@ David can you try :uniq => true and see if you are getting to_xml output in the desired format?
-
David July 16th, 2010 @ 02:17 PM
Hey Neeraj, I still get 3 records in the to_xml / to_json output. The count for count2 = p.tasks.size is 1 though.
-
Neeraj Singh July 26th, 2010 @ 05:26 PM
@David can you try this
~/activerecord/lib/active_record/associations.rb
Change the following method from
def remove_duplicate_results!(base, records, associations) case associations when Symbol, String reflection = base.reflections[associations] if reflection && reflection.collection? records.each { |record| record.send(reflection.name).target.uniq! } end when Array associations.each do |association| remove_duplicate_results!(base, records, association) end when Hash associations.keys.each do |name| reflection = base.reflections[name] parent_records = [] records.each do |record| descendant = record.send(reflection.name) if descendant if reflection.collection? parent_records.concat descendant.target.uniq else parent_records << descendant end end end remove_duplicate_results!(reflection.klass, parent_records, associations[name]) unless parent_records.empty? end end end
To
def remove_duplicate_results!(base, records, associations) case associations when Symbol, String reflection = base.reflections[associations] if reflection && reflection.collection? records.each { |record| record.send(reflection.name).target.uniq! } end when Array associations.each do |association| remove_duplicate_results!(base, records, association) end when Hash associations.keys.each do |name| reflection = base.reflections[name] if records.any? && reflection.options && reflection.options[:uniq] records.each { |record| record.send(reflection.name).target.uniq! } end parent_records = [] records.each do |record| descendant = record.send(reflection.name) if descendant if reflection.collection? parent_records.concat descendant.target.uniq else parent_records << descendant end end end remove_duplicate_results!(reflection.klass, parent_records, associations[name]) unless parent_records.empty? end end end
Before that fix I was getting
<projects type="array"> <project> <name>pr2</name> <created-at type="datetime">2010-07-26T16:22:50Z</created-at> <updated-at type="datetime">2010-07-26T16:22:50Z</updated-at> <id type="integer">104</id> <tasks type="array"> <task> <name>task2</name> <project-id type="integer">104</project-id> <id type="integer">105</id> <users type="array"> <user> <name>John1</name> <id type="integer">298</id> </user> <user> <name>John2</name> <id type="integer">299</id> </user> <user> <name>John3</name> <id type="integer">300</id> </user> </users> </task> <task> <name>task2</name> <project-id type="integer">104</project-id> <id type="integer">105</id> <users type="array"> <user> <name>John1</name> <id type="integer">298</id> </user> <user> <name>John2</name> <id type="integer">299</id> </user> <user> <name>John3</name> <id type="integer">300</id> </user> </users> </task> <task> <name>task2</name> <project-id type="integer">104</project-id> <id type="integer">105</id> <users type="array"> <user> <name>John1</name> <id type="integer">298</id> </user> <user> <name>John2</name> <id type="integer">299</id> </user> <user> <name>John3</name> <id type="integer">300</id> </user> </users> </task> </tasks> </project> </projects>
Now I am getting
<projects type="array"> <project> <name>pr2</name> <created-at type="datetime">2010-07-26T16:23:31Z</created-at> <updated-at type="datetime">2010-07-26T16:23:31Z</updated-at> <id type="integer">105</id> <tasks type="array"> <task> <name>task2</name> <project-id type="integer">105</project-id> <id type="integer">106</id> <users type="array"> <user> <name>John1</name> <id type="integer">301</id> </user> <user> <name>John2</name> <id type="integer">302</id> </user> <user> <name>John3</name> <id type="integer">303</id> </user> </users> </task> </tasks> </project> </projects>
Can you verify if the proposed solution works for you or not.
All the active_record tests are passing after this fix. If it works for you then I will write a test.
-
David July 26th, 2010 @ 06:27 PM
Yes it works now! With :uniq => true and your changes i'm getting the desired results, thanks!
-
Neeraj Singh July 26th, 2010 @ 08:15 PM
- Assigned user set to José Valim
- Milestone set to 3.x
Attached is code fix with tests.
-
Repository August 2nd, 2010 @ 04:20 PM
- State changed from open to resolved
(from [009aa8825b6932b006f005ac351b82ad8100d7f1]) Eager loading an association should not change the count of children
[#4971 state:resolved]
Signed-off-by: José Valim jose.valim@gmail.com
http://github.com/rails/rails/commit/009aa8825b6932b006f005ac351b82...
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
<h2 style="font-size: 14px">Tickets have moved to Github</h2>
The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>
People watching this ticket
Attachments
Referenced by
- 4971 nested includes in AR causes duplicates in to_json / to_xml [#4971 state:resolved]