This project is archived and is in readonly mode.

#4825 ✓committed
Wincent Colaiuta

Some text helper methods inappropriately calling "sanitize"

Reported by Wincent Colaiuta | June 10th, 2010 @ 08:39 PM | in 3.0.2

Just noticed a behavior change that was introduced on 6 June by commit ab764ec ("Makes text_helper methods sanitize the input if the input is not safe or :safe => true option is not provided").

Specifically I noticed the change in the "truncate" method, but I see that the same change has been applied to other methods as well.

Under Rails 2, and under Rails 3 up until the recent commit, the "truncate" method did exactly what its name and documentation imply: it returned a possibly shortened version of the string.

After this commit, it returns a string which has not only been marked HTML safe, but potentially transformed in other ways after having been run through the "sanitize" method. The commit message doesn't explain why this is a desirable behavior change, and to be honest, I can't really see how it could be desirable...

Here's my use case: issue tracker listing with "summary" field in a table column, truncated to 30 characters.

Before commit ab764ec, a user could create issues with summaries like:

<script> tags not working in admin section

or

missing <em> tag in blog title

When displayed in the listing, these would end up getting HTML escaped (explicitly via h() in Rails 2, and automatically in Rails 3) and would appear correctly in the table.

After commit ab8764ec we now get erratic behavior. A summary like:

<script> tags not working in admin section

now gets sanitized away to nothing (an empty string) due to the invocation of the sanitize method which deletes the "script" tag and everything after it as well (now that's what I call truncation!), and a summary like

missing <em> tag in blog title

is allowed through unchanged, but is marked as HTML safe and so the "em" tag now appears unescaped and the presentation of the table is broken (unwanted italics).

My main concern with this commit is that the name of the method "truncate" doesn't in any way imply that this kind of action is going to be taken (possible mutilation of string, and automatic marking as HTML safe).

So I think perhaps the commit should be reverted, at least the part of it that touches the "truncate" method and possibly the rest as well. Or at least an explanation of what it does, in addition to truncating, be added to the documentation.

If it's decided that the change stays in, at least the ":safe" option which the commit adds should be documented, seeing as this provides a way for callers to get the old behavior back if desired.

Comments and changes to this ticket

  • Wincent Colaiuta

    Wincent Colaiuta June 10th, 2010 @ 11:58 PM

    Bah, Lighthouse ate the two example summaries from my post above. Trying to repost them using code tags:

    The first example was:

    <script> tags not working in admin section
    

    The second example was:

    missing <em> tag in blog title
    

    So, in Rails 2 and Rails 3 up to commit ab8764ec, these would be truncated and then safely displayed in the issue listing (using the h() method in Rails 2, and automatically in Rails 3) with HTML entities for the less-than and greater-than characters.

    Now, with commit ab8764ec, the first example gets sanitized away to nothing (ie. "", an empty string) because the script tag and everything after it gets blown away.

    And the second example is allowed through as is (ie. the "em" tag remains intact) but it is now marked as HTML safe and so the display of the item is messed up: the issues table now appears partially in italics and the summary reads as "missing tag in blog title".

    Sorry about the confusion. I hope Lighthouse doesn't chew up the examples in code tags above this time...

  • Wincent Colaiuta

    Wincent Colaiuta June 12th, 2010 @ 08:13 PM

    So I spotted the "Edit ticket" link at the top of the page and fixed the damage in my initial post. Can't see a "delete" or "edit" link though for my follow-up comment though, which is now redundant. (Can't stand this Lighthouse UI.) Sorry about the noise.

  • Rohit Arondekar

    Rohit Arondekar June 13th, 2010 @ 01:47 AM

    • Assigned user set to “Santiago Pastorino”
  • Santiago Pastorino

    Santiago Pastorino June 13th, 2010 @ 01:58 AM

    • Milestone cleared.
    • State changed from “new” to “resolved”

    Confirm!, now uses html_escape. Sorry about the commit message my english skills are not good enough, also i have in my todo list to write the docs about :safe => true and safe input.
    BTW if you don't want your input to be escaped now you need to mark the string as safe or use :safe => true.

  • Wincent Colaiuta

    Wincent Colaiuta June 13th, 2010 @ 09:14 AM

    This is certainly an improvement on what was there before, at least for my use case (ie. it not longer mutilates the string in surprising ways) but I still wonder why you're wanting to sanitize or escape in a "truncate" method.

    The method is called "truncate", not "truncate_and_escape". At least for me, following the "principal of least surprise", I would expect a "truncate" method to shorten a string, but not modify or transform it in other ways.

    If there's something I'm missing here (like some rule that absolutely all Rails helper methods are supposed to produce "safe" output or something), then let me know and I'll shut up.

    One thing about ":safe => true" though, I think it could be named better as it is a little ambiguous. ie. without looking at the source code, a reader can't know if you are saying "this string is safe, so don't touch it" or you are saying "I want you to make this string safe, so please touch it". Ironically, in my use case, this API obliges me to say ":safe => true" even when my input is quite explicitly not safe at all, just because I don't want the "truncate" method to mess with it at that point.

    ":escape => true" would be clearer (ie. "please escape this string"), with ":escape => false" meaning "don't touch this string". ":escape => true" could be the default if you want to maintain the behavior that you've introduced in these two commits. You could also have ":escape => :sanitize" for those wanting to sanitize the input rather than just escape it. (Once again, no idea why people would want this in a "truncate" method...)

  • Santiago Pastorino

    Santiago Pastorino June 13th, 2010 @ 10:06 AM

    Rails helpers methods are supposed to produce safe, i you want to improve the docs you are welcome.
    Remember that in http://github.com/lifo/docrails everyone can push ... so go ahead ;)

  • Wincent Colaiuta

    Wincent Colaiuta June 15th, 2010 @ 08:25 AM

    Created a "ticket4825" branch to demonstrate the API change I'm proposing to resolve the ambiguity with the option name:

    http://github.com/wincent/rails/tree/ticket4825

    Only one commit on the branch right now:

    http://github.com/wincent/rails/commit/cf7eb5868a8023e5b4b88086ce01...

    This implements ":escape => false" as well as ":escape => :sanitize".

    If you want the same handling applied to the other helpers which currently use ":safe", let me know.

  • Santiago Pastorino

    Santiago Pastorino June 15th, 2010 @ 11:27 PM

    Wincent agree with :escape options but :escape => true or :escape => false and this always do html_escape over the param, so send a patch for this

  • Santiago Pastorino

    Santiago Pastorino June 15th, 2010 @ 11:28 PM

    I mean :escape => true do html_escape, and :escape => false do nothing.

  • José Valim

    José Valim June 15th, 2010 @ 11:28 PM

    I agree with renaming :safe to :escape (please do so in other methods for consistency) but I'm not ok with :escape => :sanitize. It is confusing and it is easier to call sanitize() yourself.

  • Rohit Arondekar

    Rohit Arondekar June 16th, 2010 @ 02:05 AM

    To be honest, I don't see why truncate("some text", :safe) is so confusing.

    You are explicitly telling the method that the input you are passing is safe. And not telling the method what to do with the input. Internally what the method does with safe input should be left to the method. So :escape => :sanitize is definitely not nice.

    Now, by default all text helpers in Rails 3 escape their inputs. This is done so that the output can be flagged with confidence as html_safe. If you want to skip escaping, then you need to tell the method that the input is indeed safe. You can do that by either passing the :safe option or by calling html_safe on the input text, like "truncate me".html_safe

  • Neeraj Singh

    Neeraj Singh June 16th, 2010 @ 05:06 AM

    'safe' is an implementation detail. 'escape' is what user interface.

    I would admit that I find truncate("some text", :safe) a bit confusing.

  • Ryan Bigg

    Ryan Bigg June 16th, 2010 @ 06:10 AM

    "some text".html_safe is the best implementation. I don't see why it can't be like this. If the string input is html safe, then why would the truncated string change?

  • Ryan Bigg

    Ryan Bigg June 16th, 2010 @ 06:10 AM

    • State changed from “resolved” to “open”
  • Wincent Colaiuta

    Wincent Colaiuta June 16th, 2010 @ 06:51 AM

    :escape => :sanitize
    

    is evidently not very popular, so forget about it. (I only added it because a commenter above requested it.)

    If the default is for all Rails helpers to produce safe output, then the presence of this option (either as ':safe => true' or ':escape => false' or whatever) is a bit of an anomaly. Why would only these helpers and not all the others implement the same option?

    I think it really should be as Ryan says: there should be no ':safe' or ':escape' option at all, and if you want to turn off escaping you should use it as:

    truncate foo.html_safe
    

    I am out for a few hours now but will make a proposed patch when I get back later today.

  • Wincent Colaiuta

    Wincent Colaiuta June 16th, 2010 @ 03:35 PM

    Ok, new version just pushed to my "ticket4825-2" branch:

    http://github.com/wincent/rails/tree/ticket4825-2

    The specific commit is:

    http://github.com/wincent/rails/commit/9027ceefb4fdff0629bf0ed4b9e2...

    This drops the ":safe" option in favor of marking the input with html_safe.

  • Santiago Pastorino

    Santiago Pastorino June 16th, 2010 @ 10:12 PM

    • Tag set to patch
    • Assigned user changed from “Santiago Pastorino” to “José Valim”

    We can just rely on input being html_safe

  • Michael Koziarski

    Michael Koziarski June 16th, 2010 @ 10:21 PM

    • Tag cleared.
    • Assigned user changed from “José Valim” to “Santiago Pastorino”

    the question of whether the input to these helpers is html_safe is not enough to determine whether the output will be.

    I'd like to suggest we split this discussion in two. What to do with truncate, and what to do with all the others.

    truncate should simply return unsafe strings to avoid breaking things as wincent mentions at the top of this ticket., if you want them marked safe you use raw. The reason not to rely on the safety of the input is this case:

      <%= truncate(h("wtf&"), 4) %>
    

    This will return a raw ampersand to the document body, that string is not safe. In this situation returning a raw string (in all cases) should be enough to ensure that there are no attack vectors, and users wanting to put raw truncated strings in their document bodies can do so with an explicit call to raw().

    As for textilize and the other html generating functions, there are various attack vectors that are similarly surprising and could leave apps open if we rely on html safety of the input. Consider this case:

      <%= textilize(h("p{-moz-binding:url('http://golem.ph.u&#x74;exas.edu/&#x7E;distler/blog/files/warning.xml#xss')}")) %>
    

    h marks the input string as 'safe', however that only means that the string itself is safe, not that you can apply transformations to that string and have the result be safe. The only safe way to handle the HTML generating functions is to use sanitize (as we do at present) or to return raw strings (as we used to do, to much dismay). The option should probably be called :santize and default to true, users who want to shoot themselves in the foot can pass false for those options.

  • Wincent Colaiuta

    Wincent Colaiuta June 16th, 2010 @ 10:24 PM

    • Tag set to patch
    • Assigned user changed from “Santiago Pastorino” to “José Valim”

    Comparing the patch you just attached to the commit I posted earlier, it looks like your patch is missing some things, like removing references to ":safe" in the documentation, as well as reverting the changes to the "auto_link_" methods that you originally introduced when you added the ":safe" option in commit ab764ec.

  • Michael Koziarski

    Michael Koziarski June 16th, 2010 @ 10:27 PM

    • Tag cleared.
    • Assigned user changed from “José Valim” to “Michael Koziarski”

    I've added #4878 to handle the truncate case, please use that ticket for any further discussions of that particular helper.

    As for the remaining helpers, any takers for a patch to switch the option name over to :sanitize, and default it to true?

  • Wincent Colaiuta

    Wincent Colaiuta June 16th, 2010 @ 10:29 PM

    • Assigned user changed from “Michael Koziarski” to “Santiago Pastorino”

    Bah, sorry about messing with the "tag" and "assigned user" fields... all I did was type a reply, but it conflicted with the reply that Michael just sent...

    As for your comments, Michael, interesting points. Too sleepy to reply right now though, so best not to say anything.

  • Wincent Colaiuta

    Wincent Colaiuta June 16th, 2010 @ 10:29 PM

    • Assigned user changed from “Santiago Pastorino” to “Michael Koziarski”

    Doh, there we go again... another mid-air collision...

  • Michael Koziarski

    Michael Koziarski June 16th, 2010 @ 10:37 PM

    Lighthouse needs to learn about optimistic locking ;)

    I'll be on irc all day my time if you want to chat further, but really I think these are the only reasonable and safe options to proceed.

  • Wincent Colaiuta

    Wincent Colaiuta June 16th, 2010 @ 10:45 PM

    Ok, one more post before bed.

    Seeing as I'm the one who originally posted this, I'd obviously be happy if the change to truncate() that you're advocating goes through.

    About the other methods, I've never actually used them so my interest in them is purely academic. But the comments here have made me wonder if the blanket statement that "all Rails helpers return safe strings" is probably a little too, er, emphatic.

    I mean, while it makes perfect sense for a method like "link_to()" (and many others) to return an HTML-safe string, methods like "truncate()" should probably be used just like they always were:

    ie. if you didn't trust the input, you did h(truncate(input))...

    And sometimes even if you did "trust" it, you never knew if it could cut a "safe" tag in half and produce invalid HTML, so if you wanted to cover your butt you had to act accordingly. Feed HTML fragments into it at your own risk.

    Cause "truncate" someone seems lower level than that of a typical helper... it shouldn't even know about HTML. It should just do one thing: chop strings.

    I guess what I'm saying is that while it is a noble goal to be "safe by default", there really are some methods where the user needs to be aware that they are transforming the input, and unless you know and control the input exactly, that transformation might produce unexpected results (not only maliciously unsafe HTML, but unmaliciously invalid HTML). So this and other methods like it, I wonder if they fall into the "shoot yourself in the foot" category. For these the sanitize-by-default idea is probably sensible, as long as there is a way of opting out of it like you say.

  • Michael Koziarski

    Michael Koziarski June 16th, 2010 @ 11:08 PM

    yes, I agree with that statement. The goal shouldn't be 'all helpers are safe' but rather 'all helpers that can be are safe'.

    The truncate case being a great case in point, there's just not really any way to guarantee that for all possible offsets, the truncated string will be safe. In those cases we just have to be sure that the result won't destroy your app, and the safest way to do that is to leave it as unsafe so it gets escaped. Ideally those cases can be limited to a few helpers, the previous situation where you called textilize and got escaped html was shitty.

  • DHH

    DHH June 17th, 2010 @ 03:42 PM

    • State changed from “open” to “resolved”

    Solved in #4878

  • Santiago Pastorino

    Santiago Pastorino June 17th, 2010 @ 05:00 PM

    • Tag set to patch
    • State changed from “resolved” to “open”

    I've made text_helpers methods which return html to return it as safe and sanitize the input always unless :sanitize => false is set as Koz requested :)

  • Repository

    Repository June 17th, 2010 @ 07:34 PM

    • State changed from “open” to “committed”

    (from [84d387bc0f3f3f6641b08d0ce40e924f09105c19]) Make text_helpers methods which return valid html to return it as safe and sanitize the input always unless :sanitize => false is set

    [#4825 state:committed]

    Signed-off-by: David Heinemeier Hansson david@loudthinking.com
    http://github.com/rails/rails/commit/84d387bc0f3f3f6641b08d0ce40e92...

  • Jeremy Kemper

    Jeremy Kemper October 15th, 2010 @ 11:01 PM

    • Milestone set to 3.0.2
    • Importance changed from “” to “Low”

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Tags

Referenced by

Pages