This project is archived and is in readonly mode.

#4971 ✓resolved
David

nested includes in AR causes duplicates in to_json / to_xml

Reported by David | June 25th, 2010 @ 05:15 PM | in 3.x

In Rails edge using #to_xml or #to_json on an ActiveRecord::Relation with includes causes duplicate entries.

The model is: Projects has_many Tasks habtm Users

Example:

ActiveRecord::Schema.define(:version => 20100625013546) do 
  create_table "projects", :force => true do |t| 
    t.string   "name" 
    t.datetime "created_at" 
    t.datetime "updated_at" 
  end 
  create_table "tasks", :force => true do |t| 
    t.string  "name" 
    t.integer "project_id" 
  end 
  create_table "tasks_users", :id => false, :force => true do |t| 
    t.integer "task_id", :null => false 
    t.integer "user_id", :null => false 
  end 
  create_table "users", :force => true do |t| 
    t.string "name" 
  end 
end 

class Project < ActiveRecord::Base 
  has_many :tasks 
end 
class Task < ActiveRecord::Base 
  belongs_to :project 
  has_and_belongs_to_many :users 
end 
class User < ActiveRecord::Base 
  has_and_belongs_to_many :tasks 
end

seeds.rb

task1 = Task.create!(:name => 'task1') 
task2 = Task.create!(:name => 'task2') 
task3 = Task.create!(:name => 'task3') 
pr1 = Project.create!(:name => 'pr1') 
pr2 = Project.create!(:name => 'pr2') 
task2.project_id = pr2.id 
task2.save! 
task2.users.create(:name => 'Peter') 
task2.users.create(:name => 'David') 
task2.users.create(:name => 'Bernd')

Code that causes duplicates:

puts Project.includes(:tasks => :users).where('tasks.id is not null')
.to_xml(:include => {:tasks => {:include => :users}})

This will return 3 times the same task (task2) in the xml output:

<project>
  <name>Pr2</name>
    <tasks>
      <task>
        <name>task2</name>
          <users>
            <user>  <name>Peter</name> </user>
            <user>  <name>David</name> </user>
            <user>  <name>Bernd</name> </user>
          </users>
      </task>
      <task>
        <name>task2</name>
          <users>
            <user>  <name>Peter</name> </user>
            <user>  <name>David</name> </user>
            <user>  <name>Bernd</name> </user>
          </users>
      </task>
      <task>
        <name>task2</name>
          <users>
            <user>  <name>Peter</name> </user>
            <user>  <name>David</name> </user>
            <user>  <name>Bernd</name> </user>
          </users>
      </task>
    </tasks>
  </project>

This code works fine:

puts Project.includes(:tasks).where('tasks.id is not null')
.to_xml(:include => {:tasks => {:include => :users}})

This issue has been discussed on the core group:
http://groups.google.de/group/rubyonrails-core/browse_thread/thread...

Comments and changes to this ticket

  • David

    David June 25th, 2010 @ 06:04 PM

    • Tag set to activerecord rails3, activerelation, to_json, to_xml
  • Neeraj Singh

    Neeraj Singh June 25th, 2010 @ 07:31 PM

    • State changed from “new” to “open”
  • Santiago Pastorino
  • Neeraj Singh

    Neeraj Singh June 25th, 2010 @ 10:13 PM

    I have a test case. Will upload it soon.

  • Santiago Pastorino
  • Neeraj Singh

    Neeraj Singh July 1st, 2010 @ 06:13 PM

    • Importance changed from “” to “Low”

    I spent some time and it seems issue is not with to_xml. Check this out

        # case 1
        projects = Project.where('tasks.id is not null').includes(:tasks)
        p = projects.first
        count1 =  p.tasks.size
        puts "count1 is #{count1}"  #=> 1
    
        # case 2
        projects = Project.where('tasks.id is not null').includes(:tasks => :users)
        p = projects.first
        count2 =  p.tasks.size
        puts "count2 is #{count2}"  #=> 3
    

    Notice that the count result is 3 in case2. In case2 p has three tasks and all the three tasks are the same task but it is repeated three times.

    Rails allows eager loading to preload some data. In this case the count is changing just because I am doing eager loading. And that sounds like a bug to me.

    sql generated in case 1 is

     Project Load (35.0ms)  SELECT DISTINCT "projects".id FROM "projects" LEFT OUTER JOIN "tasks" ON "tasks"."project_id" = "projects"."id" WHERE (tasks.id is not null) LIMIT 1
      Project Load (0.3ms)  SELECT "projects"."id" AS t0_r0, "projects"."name" AS t0_r1, "projects"."created_at" AS t0_r2, "projects"."updated_at" AS t0_r3, "tasks"."id" AS t1_r0, "tasks"."name" AS t1_r1, "tasks"."project_id" AS t1_r2 FROM "projects" LEFT OUTER JOIN "tasks" ON "tasks"."project_id" = "projects"."id" WHERE (tasks.id is not null) AND ("projects"."id" IN (4))
    

    sql geenerated in case 2 is

    Project Load (0.2ms)  SELECT DISTINCT "projects".id FROM "projects" LEFT OUTER JOIN "tasks" ON "tasks"."project_id" = "projects"."id" LEFT OUTER JOIN "tasks_users" ON "tasks_users"."task_id" = "tasks"."id" LEFT OUTER JOIN "users" ON "users"."id" = "tasks_users"."user_id" WHERE (tasks.id is not null) LIMIT 1
      Project Load (0.3ms)  SELECT "projects"."id" AS t0_r0, "projects"."name" AS t0_r1, "projects"."created_at" AS t0_r2, "projects"."updated_at" AS t0_r3, "tasks"."id" AS t1_r0, "tasks"."name" AS t1_r1, "tasks"."project_id" AS t1_r2, "users"."id" AS t2_r0, "users"."name" AS t2_r1 FROM "projects" LEFT OUTER JOIN "tasks" ON "tasks"."project_id" = "projects"."id" LEFT OUTER JOIN "tasks_users" ON "tasks_users"."task_id" = "tasks"."id" LEFT OUTER JOIN "users" ON "users"."id" = "tasks_users"."user_id" WHERE (tasks.id is not null) AND ("projects"."id" IN (4))
    

    If someone wants to recreate this case then the full dump is

    ActiveRecord::Schema.define(:version => 20100625013546) do
    
      create_table "projects", :force => true do |t|
        t.string   "name"
        t.datetime "created_at"
        t.datetime "updated_at"
      end
    
      create_table "tasks", :force => true do |t|
        t.string  "name"
        t.integer "project_id"
      end
    
      create_table "tasks_users", :id => false, :force => true do |t|
        t.integer "task_id", :null => false
        t.integer "user_id", :null => false
      end
    
      create_table "users", :force => true do |t|
        t.string "name"
      end
    
    end
    class Project < ActiveRecord::Base
      has_many :tasks
    
      def self.setup
        Project.delete_all
        Task.delete_all
        User.delete_all
        task1 = Task.create!(:name => 'task1')
        task2 = Task.create!(:name => 'task2')
        task3 = Task.create!(:name => 'task3')
        pr1 = Project.create!(:name => 'pr1')
        pr2 = Project.create!(:name => 'pr2')
        task2.project_id = pr2.id
        task2.save!
        task2.users.create(:name => 'Peter')
        task2.users.create(:name => 'David')
        task2.users.create(:name => 'Bernd')
      end
    
      def self.lab
        self.setup
        projects = Project.where('tasks.id is not null').includes(:tasks)
        p = projects.first
        count1 =  p.tasks.size
        puts "count1 is #{count1}"
    
        projects = Project.where('tasks.id is not null').includes(:tasks => :users)
        p = projects.first
        count2 =  p.tasks.size
        puts "count2 is #{count2}"
      end
    end
    
    class Task < ActiveRecord::Base
      belongs_to :project
      has_and_belongs_to_many :users
    end
    class User < ActiveRecord::Base
      has_and_belongs_to_many :tasks
    end
    
  • José Valim

    José Valim July 2nd, 2010 @ 11:20 AM

    Hey Neeraj, thanks for the rock-solid investigation and I completely agree with the conclusion. The issue is, instead of having, one project, three tasks and three users as result, we should have one project, only one task and three users.

    Not sure how hard would be to fix that but, since Rails is already nesting the users properly inside tasks, it seems to be mainly an issue of guaranteeing the uniqueness of the task object (for example, we could check the primary key uniqueness before adding each task to the tasks array).

  • Neeraj Singh

    Neeraj Singh July 14th, 2010 @ 10:57 PM

    While working on some other ticket I came across :uniq => true option and that seems to solve this problem.

    In the comment that above posted if I change from

    class Project < ActiveRecord::Base
      has_many :tasks
    end
    

    to

    class Project < ActiveRecord::Base
      has_many :tasks, :uniq => true
    end
    

    then I get only one record where previously I used to get 3 records.

    @ David can you try :uniq => true and see if you are getting to_xml output in the desired format?

  • David

    David July 16th, 2010 @ 02:17 PM

    Hey Neeraj, I still get 3 records in the to_xml / to_json output. The count for count2 = p.tasks.size is 1 though.

  • Neeraj Singh

    Neeraj Singh July 26th, 2010 @ 05:26 PM

    @David can you try this

    ~/activerecord/lib/active_record/associations.rb

    Change the following method from

    def remove_duplicate_results!(base, records, associations)
       case associations
         when Symbol, String
           reflection = base.reflections[associations]
           if reflection && reflection.collection?
             records.each { |record| record.send(reflection.name).target.uniq! }
           end
         when Array
           associations.each do |association|
             remove_duplicate_results!(base, records, association)
           end
         when Hash
           associations.keys.each do |name|
             reflection = base.reflections[name]
    
             parent_records = []
             records.each do |record|
               descendant = record.send(reflection.name)
               if descendant
                 if reflection.collection?
                   parent_records.concat descendant.target.uniq
                 else
                   parent_records << descendant
                 end
               end
             end
    
             remove_duplicate_results!(reflection.klass, parent_records, associations[name]) unless parent_records.empty?
           end
       end
     end
    

    To

     def remove_duplicate_results!(base, records, associations)
        case associations
          when Symbol, String
            reflection = base.reflections[associations]
            if reflection && reflection.collection?
              records.each { |record| record.send(reflection.name).target.uniq! }
            end
          when Array
            associations.each do |association|
              remove_duplicate_results!(base, records, association)
            end
          when Hash
            associations.keys.each do |name|
              reflection = base.reflections[name]
    
              if records.any? && reflection.options && reflection.options[:uniq]
                records.each { |record| record.send(reflection.name).target.uniq! }
              end
    
              parent_records = []
              records.each do |record|
                descendant = record.send(reflection.name)
                if descendant
                  if reflection.collection?
                    parent_records.concat descendant.target.uniq
                  else
                    parent_records << descendant
                  end
                end
              end
    
              remove_duplicate_results!(reflection.klass, parent_records, associations[name]) unless parent_records.empty?
            end
        end
      end
    

    Before that fix I was getting

    <projects type="array">
      <project>
        <name>pr2</name>
        <created-at type="datetime">2010-07-26T16:22:50Z</created-at>
        <updated-at type="datetime">2010-07-26T16:22:50Z</updated-at>
        <id type="integer">104</id>
        <tasks type="array">
          <task>
            <name>task2</name>
            <project-id type="integer">104</project-id>
            <id type="integer">105</id>
            <users type="array">
              <user>
                <name>John1</name>
                <id type="integer">298</id>
              </user>
              <user>
                <name>John2</name>
                <id type="integer">299</id>
              </user>
              <user>
                <name>John3</name>
                <id type="integer">300</id>
              </user>
            </users>
          </task>
          <task>
            <name>task2</name>
            <project-id type="integer">104</project-id>
            <id type="integer">105</id>
            <users type="array">
              <user>
                <name>John1</name>
                <id type="integer">298</id>
              </user>
              <user>
                <name>John2</name>
                <id type="integer">299</id>
              </user>
              <user>
                <name>John3</name>
                <id type="integer">300</id>
              </user>
            </users>
          </task>
          <task>
            <name>task2</name>
            <project-id type="integer">104</project-id>
            <id type="integer">105</id>
            <users type="array">
              <user>
                <name>John1</name>
                <id type="integer">298</id>
              </user>
              <user>
                <name>John2</name>
                <id type="integer">299</id>
              </user>
              <user>
                <name>John3</name>
                <id type="integer">300</id>
              </user>
            </users>
          </task>
        </tasks>
      </project>
    </projects>
    

    Now I am getting

    <projects type="array">
      <project>
        <name>pr2</name>
        <created-at type="datetime">2010-07-26T16:23:31Z</created-at>
        <updated-at type="datetime">2010-07-26T16:23:31Z</updated-at>
        <id type="integer">105</id>
        <tasks type="array">
          <task>
            <name>task2</name>
            <project-id type="integer">105</project-id>
            <id type="integer">106</id>
            <users type="array">
              <user>
                <name>John1</name>
                <id type="integer">301</id>
              </user>
              <user>
                <name>John2</name>
                <id type="integer">302</id>
              </user>
              <user>
                <name>John3</name>
                <id type="integer">303</id>
              </user>
            </users>
          </task>
        </tasks>
      </project>
    </projects>
    

    Can you verify if the proposed solution works for you or not.

    All the active_record tests are passing after this fix. If it works for you then I will write a test.

  • David

    David July 26th, 2010 @ 06:27 PM

    Yes it works now! With :uniq => true and your changes i'm getting the desired results, thanks!

  • Neeraj Singh

    Neeraj Singh July 26th, 2010 @ 08:15 PM

    • Assigned user set to “José Valim”
    • Milestone set to 3.x

    Attached is code fix with tests.

  • Repository

    Repository August 2nd, 2010 @ 04:20 PM

    • State changed from “open” to “resolved”

    (from [009aa8825b6932b006f005ac351b82ad8100d7f1]) Eager loading an association should not change the count of children

    [#4971 state:resolved]

    Signed-off-by: José Valim jose.valim@gmail.com
    http://github.com/rails/rails/commit/009aa8825b6932b006f005ac351b82...

  • Jeff Kreeftmeijer

    Jeff Kreeftmeijer November 8th, 2010 @ 08:52 AM

    • Tag cleared.

    Automatic cleanup of spam.

  • bingbing

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Attachments

Referenced by

Pages