Spooky Sanitization Stories: Analyzing the XSS Flaw in Reddit Enhancement Suite

TL;DR

The library that Reddit Enhancement Suite (a browser extension for reddit users) used for HTML sanitization had a bug that bit them pretty hard, enabling DOM-based XSS of 1.5~ million reddit users. RES pushed out a fixed version, and reddit deployed a script to prevent users of the old version from accidentally getting exploited; thus preventing an XSS worm.

Introduction

If you’re a user of Reddit Enhancement Suite, chances are you recently saw this big scary alert() box when you tried to click an expando button:

For those who aren’t familiar with RES, an expando is an inline preview of offsite content, or content that would normally require a clickthrough, that can be viewed by pressing an “expando” button:

A few people have asked questions like “why am I getting that alert?”, “what exactly is this bug?”, “why can’t I just use the vulnerable version anyways?”. Rather than respond to each question separately, I decided to write something that would hopefully answer everyone’s questions at once.

Unusual Beginnings

Interestingly, the most important part of the RES exploit wasn’t found by looking at RES at all. I actually found it by poking around reddit’s markdown library, Snudown. Snudown is mostly written in C, and is a fork of the Sundown markdown library. Snudown departs from Sundown in a number of ways, the most important to us being that it adds the ability to include a restricted subset of HTML alongside your markdown. On reddit, markdown with inline HTML may only be used on the wiki, as it’s intended to allow using HTML tables instead of Sundown’s unwieldly markdown tables.

Taking Apart the Sanitizer

So let’s go into a simplified view of how Snudown did attribute whitelisting. Snudown scanned everything after the tag name, and before a > for valid attr=value pairs, reading everything into the attr buffer as it went. Once Snudown realized it was not dealing with a valid valid/value pair, it would clear the attr buffer and start looking for the next valid pair. Once it decided it had hit the end of the value (by encountering a space outside the quotes, or a > anywhere), it would output everything in the attr buffer, clear it, then continue parsing attributes. Some interesting consequences of the process, <table scope== scope=bar> was sanitized to <table scope=bar>, and <table bad=scope="bar"> was sanitized to <table scope="bar">.

Those outputs aren’t consistent with most HTML parsers, but the biggest issue was how it handled quotes: Snudown saw <table scope=a' bar=baz'> as a table with a single scope attribute, but every mainstream browser sees this as a table with both scope and bar attributes. Quotes are only treated as attribute delimiters when they occur at the beginning of the value, otherwise whitespace is the delimiter. Since Snudown was outputting every validated attr/value pair verbatim, we could abuse this behaviour to sneak attributes like onmouseover by the whitelist!

So wait, this was an XSS on reddit’s wikis?

No. See, even though Snudown performs its own sanitization on inline HTML, Snudown’s output generally isn’t trusted within reddit’s codebase. All of the HTML that comes out of Snudown gets put through a SAX / BeautifulSoup-based sanitizer that validates the HTML and reserializes it in a way that’s unambiguous across browsers. For example, the ambiguous:

<table scope=`bar cellspacing='` baz=heyIE ignore=a'>

passes both validation steps, but becomes the unambiguous:

<table scope="`bar" cellspacing="` baz=heyIE ignore=a'">

when reserialized by reddit’s SAX sanitizer.

To reiterate, though reddit used Snudown’s wiki rendering mode, reddit was never vulnerable to XSS due to additional precautions taken with its output.

But what does any of this have to do with RES’ image expandos?

I knew that reddit itself wasn’t vulnerable, so before I did anything, I wanted to check if anyone else was using Snudown’s wiki rendering mode in production, outside of users of the reddit codebase. One thing that kept popping up was SnuOwnd, a faithful port of Snudown (with all its quirks) to JS. As some of you may have noticed from the RES changelogs, the Reddit Enhancement Suite also includes SnuOwnd. RES actually uses SnuOwnd for a number of things, and that used to include HTML sanitization:

RESUtils.sanitizeHTML = function(htmlStr) {
    if (!this.sanitizer) {
        var SnuOwnd = window.SnuOwnd;
        var redditCallbacks = SnuOwnd.getRedditCallbacks();
        var callbacks = SnuOwnd.createCustomCallbacks({
            paragraph: function(out, text, options) {
                if (text) out.s += text.s;
            },
            autolink: redditCallbacks.autolink,
            raw_html_tag: redditCallbacks.raw_html_tag
        });
        var rendererConfig = SnuOwnd.defaultRenderState();
        rendererConfig.flags = SnuOwnd.DEFAULT_WIKI_FLAGS;
        rendererConfig.html_element_whitelist = [
            'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'span', 'div', 'code',
            'br', 'hr', 'p', 'a', 'img', 'pre', 'blockquote', 'table',
            'thead', 'tbody', 'tfoot', 'tr', 'th', 'td', 'strong', 'em',
            'i', 'b', 'u', 'ul', 'ol', 'li', 'dl', 'dt', 'dd',
            'font', 'center', 'small', 's', 'q', 'sub', 'sup', 'del'
        ];
        rendererConfig.html_attr_whitelist = [
            'href', 'title', 'src', 'alt', 'colspan',
            'rowspan', 'cellspacing', 'cellpadding', 'scope',
            'face', 'color', 'size', 'bgcolor', 'align'
        ];
        this.sanitizer = SnuOwnd.getParser({
            callbacks: callbacks,
            context: rendererConfig
        });
    }
    return this.sanitizer.render(htmlStr);
};

//...snip

$.fn.safeHtml = function(string) {
    if (!string) return '';
    else return $(this).html(RESUtils.sanitizeHTML(string));
}

Eep.

Even if we can’t get an XSS on reddit.com proper, RES is still a pretty juicy target. With an install base of 1.5~ million users — which includes a good chunk of reddit’s moderators — an XSS in RES could do a lot of damage.

Finding the Attack Vector

Now all that’s left is to find where safeHTML or sanitizeHTML are passed untrusted data, and we’ve got ourselves an XSS via extension. If it wasn’t apparent from the alert dialog, that injection point is in RES’ expandos:

generateTextExpando: function(expandoButton) {
    var imageLink = expandoButton.imageLink;
    var wrapperDiv = document.createElement('div');
    wrapperDiv.className = 'usertext';

    var imgDiv = document.createElement('div');
    imgDiv.className = 'madeVisible usertext-body';

    var header = document.createElement('h3');
    header.className = 'imgTitle';
    $(header).safeHtml(imageLink.imageTitle);
    imgDiv.appendChild(header);

    //...snip
    expandoButton.expandoBox = imgDiv;
    //...snip
}

imageLink.imageTitle is completely controlled by the attacker, and provided we can get one of RES’ supported sites to serve our Snudown-evading payload, RES will inject it into the DOM.

RES supports expanding text posts from Tumblr inline, and Tumblr allows us to use valid HTML in post titles, so if we made a post with the title Foobar <img src=foo' onerror="alert(1)" ' />, alert(1) would be called when they expanded our link:

Worst-case Scenario

This was about as bad as it gets without having a zero-interaction XSS. Comment pages have a “Show Images” button that expands all images on the page, and those get used a lot for picture threads. Had someone posted a child comment to one of the top comments with a link to the payload, they could have easily started an XSS worm that spammed links to the payload in other threads. Once it spread, the worm could have done things like upvote specific submissions, nuke a subreddit if a moderator got exploited, etc.

The Fix

This bug required fixes in a number of places to keep it fully closed. First, Snudown’s HTML sanitization was changed to first parse the attributes, then unambiguously reserialize its interpretation. That fix was then ported to SnuOwnd’s JS implementation.

Secondly, RES was changed to use a custom HTML sanitizer based on DOMParser since things like href sanitization were out of scope for Snudown. I’m not super happy with this filter, and I think Google Caja should be used in the future, but this one had to go in due to time constraints.

Third, since the issue was so trivial to exploit, and had such high impact, it was necessary to block users still on vulnerable versions of RES from opening expandos to prevent an XSS worm from spreading. reddit ended up doing this on their end by detecting attempts to open the expandos and blocking it based on a version number RES places in the DOM:

r.resAdvisory = {};
r.resAdvisory.minResVersion = [4, 3, 2, 1];
r.resAdvisory.checkRESClick = function (e) {
  if (e.target.id == "viewImagesButton" || $(e.target).hasClass("expando-button")) {
      if(r.resAdvisory.checkRESVersion()) {
          document.body.removeEventListener("click", r.resAdvisory.checkRESClick, true)
      } else {
          e.preventDefault();
          e.stopPropagation();
          alert("The version of Reddit Enhancement Suite you are using has a bug which makes expanding posts insecure to use. Please update Reddit Enhancement Suite to continue using post expandos.  Please visit /r/Enhancement for more information."));
      }
  }
};
r.resAdvisory.checkRESVersion = _.memoize(function () {
    if (!$("#RESMainGearOverlay").length) return true;
    var version = $("#RESConsoleVersion").text();
    if (!version)
      return false;
    version = version.substring(1).split(".");
    version = _.map(ver, function (num) {
        return parseInt(num);
    });
    return version >= r.resAdvisory.minResVersion;
});
r.resAdvisory.init = function () {
    if(document.body.addEventListener)
      document.body.addEventListener("click", r.resAdvisory.checkRESClick, true);
}

Closing Notes

If you use a markup library that generates HTML, putting an HTML filter in front of it as a failsafe is a good idea.
If you don’t have to insert untrusted HTML into the DOM, it’s best not to. For example, most of the image titles and captions RES handles aren’t even meant to be interpreted as HTML. Putting the rest of the content in a sandboxed iframe would also have mitigated the issue, and I’ve recommended doing that in a future version of RES.
Be careful about messing with the page’s DOM at all if you don’t have to, the page can see everything that you add or remove.
Sanitizing untrusted HTML is tricky business. If you need to roll your own sanitizer (and please don’t), make sure that you fully parse the document, then reserialize it so it’s unambiguous across parsers. Checking that it looks sane in your parser is not enough.
For developers of popular web platforms: It might make sense to audit extensions / third party clients targeted at your users to reduce the amount of cleanup needed should they get exploited.

Defined Misbehaviour

Web security, programming, reverse-engineering, and everything related.