This project is archived and is in readonly mode.

#2084 ✓resolved
Bart ten Brinke

Alternative XML parsers support in ActiveSupport for ActiveResource

Reported by Bart ten Brinke | February 26th, 2009 @ 09:59 AM

The XMLmini implementation now switches between a LibXML or REXML implementation, depending on weather the LibXML library is available. As the implementation of the XMLmini is now separated out, it is very easy to implement Nokogiri or JREXML. The speed increase is quite stunning (up to 10 times as fast, probably more for large XML files).

It falls back to the old ruby implementation (REXML) if no other parsers are available.

Comments and changes to this ticket

  • José Valim

    José Valim February 26th, 2009 @ 12:42 PM

    Really great! +1 Just noticed that the patch file contains two patches:

    [PATCH] Added deflate, gzip support to active resource

    And:

    [PATCH] xml mini libxml support

    I think you want to send just the second one!

    Cheers!

  • Alex Coles
  • Bart ten Brinke

    Bart ten Brinke February 26th, 2009 @ 01:26 PM

    Sorry, squashed the correct tickets into a new diff.

  • Willem van Bergen

    Willem van Bergen February 26th, 2009 @ 01:30 PM

    +1, this would be really helpful for our API needs as well.

  • Michael Koziarski

    Michael Koziarski March 2nd, 2009 @ 05:49 AM

    • Tag changed from activeresource, activesupport, patch, xml to activeresource, activesupport, patch, verified, xml

    I'd be nice to be able to choose between the different parsers rather than just relying on the presence of libxml, though that'd be a sensible default.

    Also, the new non-derivative work code (your changes) shouldn't have the copyright header as they can safely be MIT licensed.

  • Michael Koziarski

    Michael Koziarski March 2nd, 2009 @ 05:49 AM

    Also, how would you suggest the jruby guys could plug in an alternative implementation with this patch? Could be a good thought experiment to make sure the modularity works.

  • Bart ten Brinke

    Bart ten Brinke March 3rd, 2009 @ 09:08 PM

    Well, the selection code currently is located in the XMLmini Module. You could easily have a fall through there, but I could also set something up like a hook. Which would you prefer?

  • Bart ten Brinke

    Bart ten Brinke March 3rd, 2009 @ 10:49 PM

    Let me rephrase that, as editing my comment seems to be impossible. This should be viewed as the base for alternative XML to hash parser implementations. XMLMini will be the front end and all alternative parsers should just implement ActiveSupport::XmlMini.parse

    In the future alternative parsers can be added to Rails core (and placed in the xml_mini sub folder). XMLmini can then implement a fall through that will decide which implementation is available.

    In the mean time people can write their own mixin in config/initializers that implements XmlMini.parse. The JRuby/Nokogiri people can either do the mix in on require or have a special function like Nokogiri.enable_active_resource.

    If you have a better suggestion to do this, please post it or post an example.

    You are right about the licenses. I have attached a new diff in which they are removed.

  • Repository

    Repository March 8th, 2009 @ 08:42 PM

    • State changed from “new” to “committed”

    (from [822c41d69d9228c9912d29ac45155d3a16bb5c50]) XmlMini supports different backend parsers, starting with libxml

    [#2084 state:committed]

    Signed-off-by: Jeremy Kemper jeremy@bitsweat.net http://github.com/rails/rails/co...

  • Michael Koziarski

    Michael Koziarski March 9th, 2009 @ 08:08 AM

    • State changed from “committed” to “open”
    • Milestone cleared.

    I'm seeing lots of failures now with:

    
    test_serialization_of_nested_resource(FormatTest):
    NoMethodError: undefined method `default_keep_blanks=' for XML:Module
    

    Do we need an additional check on the version of libxml that's installed? This is with a stock ruby install from 10.5.6

  • Eloy Duran

    Eloy Duran March 9th, 2009 @ 09:31 AM

    Do we need an additional check on the version of libxml that's installed? This is with a stock ruby install from 10.5.6

    Yes I think we do. The version that ships with OSX 10.5.x is 0.3.8.4, after installing 1.0.0 it works as expected.

  • CancelProfileIsBroken

    CancelProfileIsBroken March 9th, 2009 @ 12:37 PM

    I'm seeing a failure on a Debian box with libxml-ruby 1.0.0:

    
    
    Name: test_expansion_count_is_limited(QueryTest) Type: Failure Message: <RuntimeError> exception expected but was Class: <LibXML::XML::Error> Message: <"Fatal error: Detected an entity reference loop at :12."> ---Backtrace--- ./test/../lib/active_support/xml_mini/libxml.rb:16:in `parse' ./test/../lib/active_support/xml_mini/libxml.rb:16:in `parse' ./test/../lib/active_support/core_ext/hash/conversions.rb:153:in `from_xml' ./test/core_ext/hash_ext_test.rb:903:in `test_expansion_count_is_limited' ./test/core_ext/hash_ext_test.rb:887:in `test_expansion_count_is_limited' /usr/local/lib/ruby/gems/1.8/gems/mocha-0.9.5/lib/mocha/test_case_adapter.rb:69:in `__send__' /usr/local/lib/ruby/gems/1.8/gems/mocha-0.9.5/lib/mocha/test_case_adapter.rb:69:in `run'
     
    

    This is using the standard Rails CI setup with just a few changes (like installing libxml-ruby) so it's a pretty clean test environment.

  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 12:44 PM

    Thats weird. How can the wrapper throw a different exception on Debian as on OSX? Currently I can't reproduce this because I had to send my Debian machine RMA :(. If I build fix, could you test it for me, Mike?

  • CancelProfileIsBroken

    CancelProfileIsBroken March 9th, 2009 @ 12:58 PM

    Sure. Probably best to post any proposed patch here as I've heard a couple other reports of failure on os x this morning.

  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 01:01 PM

    Isn't that the libxml 1.0.0 version issue?

  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 01:53 PM

    I am using 0.9.7 on OSX BTW. I thought I was default. My bad.

  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 03:39 PM

    It seems that libxml-ruby 1.0.0 is pretty broken on OSX. Well at least it gives me a segfault on the recursion test ([BUG] Bus Error). I will make the patch libxml 0.9.7 specific if I can, but at the moment I haven't really got a good idea as how to do that.

  • CancelProfileIsBroken

    CancelProfileIsBroken March 9th, 2009 @ 03:47 PM

    Seems like patch that ends up needing to target one specific version of the library should probably just be reverted until things are stabilized.

  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 03:59 PM

    I would think 1.0 would be stabilized :X. I think we should keep the XMLmini and rexml separation, but drop the LibXML implementation. It would be a shame as it really is a LOT faster.

  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 04:12 PM

    This limits it to libxml 0.9.4 and 0.9.7. Could you check it, Mike?

  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 08:00 PM

    I really feel strongly about the fact that ActiveResource needs a fast XML parser if it is to be considered as an option in any real world application.

  • Michael Koziarski

    Michael Koziarski March 9th, 2009 @ 08:05 PM

    We could just adjust the defaults to always be rexml. Then let people do something like XmlMini.parser=:libxml to switch to the libxml implementation.

    As for the test failure for the ci box, that's actually fine. the test is really just meant to be testing "won't die when passed a loop" so the fact that libxml is raising a different error, is no big thing. Just need to adjust the test.

  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 08:15 PM

    You misunderstood. It raises something else in Debian 1.0.0, which is fine, but it segfaults in OSX, which is NOT fine.

    Wow, Jeremy fixed everything perfectly: http://github.com/rails/rails/co...

    Nice work!

  • Michael Koziarski

    Michael Koziarski March 9th, 2009 @ 08:18 PM

    He has a habit of doing that :)

    If it's only our million-laughs test that causes a segfault, perhaps that should be included in a test case to upstream? Or are there additional cases which segfault?

  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 08:29 PM

    His fix however, is not working in my environment:

    ruby script/console Loading development environment (Rails 2.3.1)

    >> gem 'libxml-ruby', '=0.9.4', '=0.9.7' Gem::LoadError: RubyGem version error: libxml-ruby(0.9.4 not = 0.9.4, = 0.9.7)

    >> Gem.loaded_specs['libxml-ruby'].version.version => "0.9.7"

  • Jeremy Kemper

    Jeremy Kemper March 9th, 2009 @ 08:36 PM

    What version of rubygems?

  • Bart ten Brinke
  • Jeremy Kemper

    Jeremy Kemper March 9th, 2009 @ 08:44 PM

    • State changed from “open” to “resolved”
  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 08:56 PM

    That probably is the best solution :). Nice!

  • Bart ten Brinke

    Bart ten Brinke March 9th, 2009 @ 09:04 PM

    The delegation seems to break if you switch it to libxml :(

  • Bart ten Brinke

    Bart ten Brinke March 10th, 2009 @ 07:35 AM

    E(DELEGATION):2: warning: instance variable @backend not initialized. The libxml.rb does not return to the Xml_mini.rb flow after the mixin. I have no clue why not.

  • Repository

    Repository March 10th, 2009 @ 06:58 PM

    (from [d4091d3bc79731f55491cfb51c604a66502c944f]) Properly set up libxml includes. Don't include LibXML in toplevel.

    [#2084 state:resolved] http://github.com/rails/rails/co...

  • cfis (at savagexi)

    cfis (at savagexi) March 12th, 2009 @ 04:52 PM

    Hey guys,

    Integrating libxml-ruby bindings in Rails would be great. But you positively, absolutely want to use libxml 1.1.1 and higher. Its the first version that works with ruby 1.9.1 and compiles cleanly on OS X. Let me know if you run into any issues and we'll get them cleaned up on our side.

    Charlie Savage (maintainer libxml-ruby).

  • Jeremy Kemper

    Jeremy Kemper March 12th, 2009 @ 05:07 PM

    Hey Charlie, was 1.1.1 just released? The latest release segfaulted. 0.9.7 works and compiles fine on OSX (though not on 1.9).

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Referenced by

Pages