This project is archived and is in readonly mode.

#683 ✓committed
sauce

Problem with RailsSanitize.white_list_sanitizer.sanitize

Reported by sauce | July 23rd, 2008 @ 03:14 PM | in 2.x

Problem with RailsSanitize.white_list_sanitizer.sanitize.

Example :

RailsSanitize.white_list_sanitizer.sanitize("<a href=\"http://www.domain.com?var1=1&var2=2\">my link</a>") is OK and gives "<a href=\"http://www.domain.com?var1=1&amp;var2=2\">my link</a>"

RailsSanitize.white_list_sanitizer.sanitize("<a href=\"http://www.domain.com?var1=1&amp;var2=2\">my link</a>") is BAD and gives "<a href=\"http://www.domain.com?var1=1&amp;amp;var2=2\">my link</a>"

As I sanitize each time I save my model, rails sanitize each time "&" so string is growing every time. (&amp;amp;amp;amp;...)

Sorry for my english I am french !

Comments and changes to this ticket

  • antonmos

    antonmos August 27th, 2008 @ 07:50 PM

    • Tag set to 2.0-stable, 2.1, patch, sanitize

    This change set fixes this issue

  • Tietew

    Tietew September 2nd, 2008 @ 06:11 AM

    It seems not to be fixed. CGI.unescape does not know much named entities and character references over U+FFFF.

    &amp;#131072; is a &amp;laquo;kanji&amp;raquo; gives &amp;amp;#131072; is a &amp;amp;laquo;kanji&amp;amp;raquo;

    U+20000 is valid character.

    In category "CJK Unified Ideographs Extension B"

    GOOD result is: &amp;#131072; is a &amp;laquo;kanji&amp;raquo; or &amp;#131072; is a &amp;#171;kanji&amp;#187; or ? is a &laquo;kanji&raquo;

  • Tietew

    Tietew September 2nd, 2008 @ 06:15 AM

    Oops, tags are stripped...

    It seems not to be fixed. CGI.unescape does not know much named entities and character references over U+FFFF.

    
    <img alt="&#131072; is a &laquo;kanji&raquo;" ...>
    gives
    <img alt="&amp;#131072; is a &amp;laquo;kanji&amp;raquo;" ...>
    

    # U+20000 is valid character. # In category "CJK Unified Ideographs Extension B"

    
    GOOD result is:
    <img alt="&#131072; is a &laquo;kanji&raquo;" ...>
    or
    <img alt="&#131072; is a &#171;kanji&#187;" ...>
    or
    <img alt="? is a «kanji»" ...>
    
  • Ryan McGeary

    Ryan McGeary October 17th, 2008 @ 03:41 AM

    +1 on antonmos's changes. Here's a formatted patch that incorporates the changes to comply with the Rails contribution guidelines.

  • Christopher Murphy

    Christopher Murphy October 17th, 2008 @ 03:55 AM

    1. This bug has burned me in my app.

    Patch works, tests passed.

  • Christopher Murphy
  • Coderifous

    Coderifous October 23rd, 2008 @ 01:47 AM

    +1 This bug has ruined my life.

  • theflow

    theflow November 6th, 2008 @ 11:17 AM

    +1

    Patch works great for me. The problem Tietew mentions is a general problem with sanitize and not caused by the proposed patch.

    Any chance to get this into 2.2?

  • Repository

    Repository November 6th, 2008 @ 12:09 PM

    • State changed from “new” to “committed”

    (from [a358d87e16fa876de29286b69474ab6aaee4a80b]) Fixed the sanitize helper to avoid double escaping already properly escaped entities [#683 state:committed] http://github.com/rails/rails/co...

  • Repository

    Repository November 6th, 2008 @ 12:09 PM

    (from [6406a87eedb74a41f19f5ad21ea1b8f97dd45755]) Fixed the sanitize helper to avoid double escaping already properly escaped entities [#683 state:committed] http://github.com/rails/rails/co...

  • iGEL

    iGEL November 14th, 2008 @ 03:15 PM

    Sorry, I havn't tried it yet, but it sounds like a bad idea. Maybe as an option, disabled by default.

    But what, if the user wants to write & or ä as a comment? As far as I understand, Rails will now ignore these entities and it will be displayed as & or ä.

  • iGEL

    iGEL November 14th, 2008 @ 03:20 PM

    If you want a demonstration of the problem, look above. Of course I wanted to write & a m p ; and & a u m l; without spaces (I try double escaping: &amp; &auml;). What the user enters should be displayed, he shouldn't care about entities and everything...

  • Ryan McGeary

    Ryan McGeary November 14th, 2008 @ 03:32 PM

    iGEL, I disagree. The purpose of the sanitize method is to sanitize HTML input. The ampersand entities are valid HTML. When displayed, they should display the entities they represent, not the literal codes.

    In other words, you should expect [& a m p ;] to be displayed as an ampersand. Just like an html tag shouldn't be escaped to it's lt and gt entities, entities themselves should not be escaped either.

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

<h2 style="font-size: 14px">Tickets have moved to Github</h2>

The new ticket tracker is available at <a href="https://github.com/rails/rails/issues">https://github.com/rails/rails/issues</a>

Attachments

Referenced by

Pages