This project is archived and is in readonly mode.

#1142 ✓invalid
Thomas Lee

Improved parameter parsing for complex forms

Reported by Thomas Lee | September 30th, 2008 @ 11:16 AM | in 2.x

  • This patch was written with the intent of simplifying the implementation of UrlEncodedPairParser, as well as making its behaviour somewhat more predictable/logical.
  • Changes to semantics are minimal: only breaks seven actionpack unit tests which are either testing broken inputs or strange behaviour in the old parser.
  • Of those seven unit tests, five can be made to pass with some minimal changes to the parser.
  • New semantics would make nested forms and/or manipulation of collections in forms much easier to deal with -- think editing has_many associations in the same form as the parent.

The semantic changes are as follows:

INPUT: "a[][x]=5&a[][y]=10"
OLD PARSER OUTPUT: {"a" => [{"x" => "5", "y" => "10"}]}
NEW PARSER OUTPUT: {"a" => [{"x" => "5"}, {"y" => "10"}]}

INPUT: "a[0][x]=5&a[0][y]=10"
OLD PARSER OUTPUT: {"a" => {"0" => {"x" => "5", "y" => "10"}}}
NEW PARSER OUTPUT: {"a" => [{"x" => "5", "y" => "10"}]}

INPUT: "a[0][x]=5&a[1][y]=10"
OLD PARSER OUTPUT: {"a" => {"0" => {"x" => "5"}, "1" => {"y" => "10"}}}
NEW PARSER OUTPUT: {"a" => [{"x" => "5"}, {"y" => "10"}]}


Issues to be resolved:

  1. Should the parser be modified to pass the old tests for invalid inputs?
  2. "a[0]=1&a[999]=2" currently yields a 1000 element array with 998 nil values. Should we implement a SparseArray class that minimizes memory usage for such a situation, or force sequential indexes?

Comments and changes to this ticket

  • Hongli Lai

    Hongli Lai October 1st, 2008 @ 10:37 AM

    INPUT: "a[][x]=5&a[][y]=10"

    OLD PARSER OUTPUT: {"a" => [{"x" => "5", "y" => "10"}]}
    NEW PARSER OUTPUT: {"a" => [{"x" => "5"}, {"y" => "10"}]}
    
    

    I'm not sure I agree with this new parser output. With this behavior, how is one supposed to pass an array of hashes to the web application, where each hash has more than 1 element? Would one be forced to write "a[0][x]=5&a[0][y]=10"?

  • Thomas Lee

    Thomas Lee October 2nd, 2008 @ 12:18 AM

    That's correct. And the extra characters really won't kill you. Hell, you probably won't even notice them with the right helpers in place. Call me crazy, but IMO correctness and clarity trump query string length. :)

    Explicit indices actually make a lot of sense in practice for this particular use-case. Take this example:

    a[][b][][x]=5&a[][b][][y]=10&a[][b][][y]=15&a[][b][][x]=20
    
    

    Now tell me: at a glance, what does this do?

    With explicit indices, it becomes slightly more verbose but it also eliminates the ambiguity:

    a[0][b][0][x]=5&a[0][b][0][y]=10&a[0][b][1][y]=15&a[0][b][1][x]=20
    
    
  • Hongli Lai

    Hongli Lai October 2nd, 2008 @ 12:41 AM

    I see. Your format does make more sense for complex forms. I'm a bit worried about backwards compatibility but I guess the people at the mailing list have already told you that. :)

    A backwards compatibility mode, perhaps?

  • Thomas Lee

    Thomas Lee October 3rd, 2008 @ 01:59 AM

    I can probably include backport the "[]" syntax so that it behaves like the old parser, although I'm not sure how difficult that would be off the top of my head. In all honesty I'd prefer to avoid it, but perhaps the backwards compatibility argument is compelling enough to warrant this.

    The other option is switching between the old and the new parser using a configuration parameter in the environment.rb Rails::Initializer config block. config.parser or something like that?

    I'd probably prefer the latter to ensure maximum backwards compatibility for those who need/want it, but I'm open to suggestion.

  • Thomas Lee

    Thomas Lee October 3rd, 2008 @ 08:58 AM

    • Title changed from “Simplified parameter parsing code” to “Improved parameter parsing for complex forms”

    Changing the description of this ticket to better reflect what I'm trying to achieve with this.

  • Hongli Lai

    Hongli Lai October 3rd, 2008 @ 12:00 PM

    The latter sounds good to me.

  • Thomas Lee

    Thomas Lee October 4th, 2008 @ 12:11 AM

    Here's a patch implementing pluggable UrlEncodedPairParsers. You can switch between parsers using the config.action_controller.pair_parser configuration setting in environment.rb. e.g.

    config.action_controller.pair_parser = ActionController::UrlEncodedPairParser
    
    

    This revision probably needs more tests. Most notably the individual quirks of the old and new parsers need to be tested independently.

  • Michael Koziarski

    Michael Koziarski October 4th, 2008 @ 12:31 PM

    We actually already have pluggable parameter parsing. Is there a reason that you couldn't use that?

    
    ActionController::Base.param_parsers[Mime::XML] = Proc.new { |data| XmlSimple.xml_in(data, 'ForceArray' => false) }
    
  • Thomas Lee

    Thomas Lee October 6th, 2008 @ 09:22 AM

    Ah, I wasn't aware of that one.

    I can't directly use it myself, but if I actually remove my changes to add the pair_parser option, this would provide a suitable way for people to revert back to the legacy parser if they needed to without any additional effort on my part.

    
    ActionController::Base.param_parsers[Mime::URL_ENCODED_FORM] = Proc.new { |data| ... }
    
    ActionController::Base.param_parsers[Mime::MULTIPART_ENCODED_FORM] = Proc.new { |data| ... }
    

    For this to be truly useful, the code implementing the :url_encoded_form and :multipart_encoded_form strategies should be extracted out so as to be more easily reused by people wishing to do just that, but I think that might be a follow-up patch. Any strong feelings about that?

    I'll upload an updated patch later tonight.

  • Michael Koziarski

    Michael Koziarski October 6th, 2008 @ 01:07 PM

    How about you make whatever changes you need to make your new parser work as a drop in plugin option.

    That way we can push that change into 2.2 and encourage people to experiement with the new parser via plugins and edge changes.

  • Thomas Lee

    Thomas Lee October 6th, 2008 @ 01:23 PM

    Sure thing. Before I submit another patch, should the old parser still be the default?

  • Michael Koziarski

    Michael Koziarski October 6th, 2008 @ 01:25 PM

    Yes, I think that changing the parser is a risky prospect and I'd want it to be fully bedded-in over a 2.3 release cycle before we made the switch.

    So I see two patches:

    Make it easy to change the parser so the new one can be a plugin without much hacking

    Integrate the new parser

    The first one can go in now, and the second change is something that can happen alongside the work on replacing :accessible=>true to make sure we have a really well integrated way of dealing with those more complicated cases.

  • Thomas Lee

    Thomas Lee October 6th, 2008 @ 01:29 PM

    Sounds good. I'll shuffle things around a bit and submit the changes either tonight or tomorrow. Thanks for the support.

  • Thomas Lee

    Thomas Lee October 6th, 2008 @ 02:24 PM

    Alright, here's patch #1 which restructures the guts of AbstractRequest to make it a little easier to reuse the code in there for custom UrlEncodedPairParsers. It might seem like overkill to provide a parse#{foo}with_parser method for all those methods, but it will come in handy when it's time to test new parser implementations (most of the tests use parse_query_parameters and parse_request_parameters

    Also, the strategy.arity == 4 check feels a little bit dirty, but I couldn't think of a nicer way to pass through this information to Procs that want to override the parser for Mime::MULTIPART_FORM short of forcing them to parse that info for themselves. Which IMHO would suck.

    Pending acceptance of this patch, I'll put together a patch to integrate a new parameter parser as you suggest. Should I get in touch with anybody else on the mailing list regarding these patches? The folks responsible for the :accessible => true stuff maybe?

  • Thomas Lee

    Thomas Lee October 7th, 2008 @ 08:19 AM

    In case it isn't immediately obvious from the patch, this will allow folks to override the default parsers for URL and MULTIPART encoded form data as follows:

    
    ActionController::Base.param_parsers[Mime::URL_ENCODED_FORM] = Proc.new do |data|
      ar = ActionController::AbstractRequest
      ar.parse_url_encoded_form_parameters_with_parser(MyCustomParser, data)
    end
    
    ActionController::Base.param_parsers[Mime::MULTIPART_FORM] = Proc.new do |data, boundary, content_length, env|
      ar = ActionController::AbstractRequest
      ar.parse_multipart_encoded_form_parameters_with_parser(
        MyCustomParser, data, boundary, content_length, env)
    end
    
  • Thomas Lee

    Thomas Lee October 8th, 2008 @ 10:48 AM

    Any feedback on this patch Michael?

  • Michael Koziarski

    Michael Koziarski October 12th, 2008 @ 05:45 PM

    I think we should just hold off on the changes till post 2.2, it doesn't seem this will make it too much easier for user's to test, and if it turns out I'm wrong we can include the hooks in a point release.

  • Michael Johnston

    Michael Johnston December 4th, 2008 @ 09:46 AM

    are you guys aware of http://dev.rubyonrails.org/ticke...?

    this is a parameter parser currently in use in an app I work on for Sage Software. The last patch I uploaded may have a bug...not sure if I uploaded the fix.

    I've simply never been given time to maintain this patch against rails edge.

    we use it to manage deeply nested forms.

    The existing parser semantics can't work for more than 1 level of nesting (think checkboxes).

    problem is, prototype also needs to be patched to preserve the param order when marshalling a form (if it hasn't been in the last eight months)

  • Michael Johnston

    Michael Johnston December 4th, 2008 @ 09:49 AM

    btw, how does your parser do with nesting + checkboxes?

  • Thomas Lee

    Thomas Lee December 4th, 2008 @ 11:37 AM

    @Michael Johnston: does just fine with nesting and checkboxes, but ActionView::Helpers::FormHelper::InstanceTag#to_check_box_tag needs to be modified so that the hidden field is returned after the actual checkbox due to some assumptions the old code makes about the parser behavior (the parameter order thing, which it sounds like you encountered).

    Unfortunately this is not yet in the latest patch -- I have an updated patch sitting in my local repo, but it needs a few unit tests.

  • Pratik

    Pratik March 6th, 2009 @ 07:51 PM

    • Assigned user changed from “Michael Koziarski” to “josh”
  • Thomas Lee

    Thomas Lee March 6th, 2009 @ 10:09 PM

    FWIW, the patch(es) provided to date are no longer relevant with the move to Rack in 2.3 and beyond.

    I'm not sure if anybody's still interested in the semantics or features provided by this patch, but if there's sufficient interest I can rework it for the newer versions of Rails. I'm still not sure I agree that the implemented semantics are ideal (using hashes for numerical indexes is not sane IMO), but at a glance it looks better than the old UrlEncodedPairParser stuff. Having said that, my concerns will probably mostly go away with Ruby 1.9 anyway.

    Anyway, feel free to close this one off if there's no further interest in this stuff.

  • Pratik

    Pratik March 6th, 2009 @ 10:14 PM

    • State changed from “new” to “invalid”

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Pages