Anti-XSS .NET libraries in 2013
One of the more important parts in Roadkill Wiki is removing malicious HTML from the markup that’s entered, even when the markup (Creole and Markdown) is controlled.
The markup you enter is converted into HTML, and then sanitized internally to remove any HTML that is bad, the markup can also contain HTML itself - the Creole parser allows this as does the Markdown parser to an extent.
For the most part Roadkill Wiki doesn’t need this level of sophistication in its text parsing,
as its target audience is mostly internal wikis for businesses or personal use. I keep the extra protection in as I prefer to think that for those one or two wikis that are public, they won’t run the risk of having scripts injected into the page. One small incident like that and Roadkill is forever known as the .NET wiki with a security vulnerability, something PHPBB has suffered from over the years.
ASP.NET has a sledgehammer approach to dealing with the problem by out-of-the-box, which is to disallow any HTML being posted from a page. Once you turn this feature off, your options besides basic HTML encoding are restricted to a couple of libraries:
The first library is a bit of an embarrassment to Microsoft - it’s broken and looks like it will stay that way for a bit. It stripped some safe HTML of mine away, and judging by the feedback is doing the same for everyone else. After playing with prototype apps with it for a few hours I gave up.
The 2nd library is an OWASP project and is a lot better thought out and thorough. OWASP is “The Open Web Application Security Project (OWASP) is a 501(c)(3) worldwide not-for-profit charitable organization focused on improving the security of software” and where most of the advice that Roadkill incorporates into its solution comes from. However the the Anti-Samy project was a really over kill for what I was after, and would’ve involved a lot of work to integrate it into Roadkill.
I couldn’t find any other web security frameworks out there - I’m guessing there are quite a few in house ones floating about for .NET-based websites, but nothing on the open source market. So my solution to the problem was to take this sanitizer from the AJAX Control library and re-purpose it for what Roadkill needed. The project is vaguely related to web security but the sanitizer code is in fact a hidden gem: it’s comprehensive and uses the reliable Html Agility Pack for its HTML parsing.
After a few iterations, this is what Roadkill does/can do with the HTML:
1) Uses a HTML white list of elements and attributes. This is stored in an XML file, where any tag/attribute not present in the list is removed. The list of HTML tags allowed by default can be seen here, and customizable.
I’ve turned this off by default as it was affecting some people’s urls that contained ‘script’ in them - the AJAX control library’s source code needs some tweaking in this area.
3) Any characters in the HTML that are not alphanumeric are encoded into their hex equivalent. This is in the original AJAX control kit’s source and is the advice from OWASP
\n, \r and \t are ignored, and any control characters (<32 in ASCII) or characters between the ASCII range of 127 to 159 are encoded into the Unicode 0xFFDD. In unicode, this number is “used to replace a character whose value is unknown or unrepresentable in Unicode”.
This step is on by default in Roadkill. It makes for some more bulkier HTML being sent down to the browser, as every character is hex encoded, but also ensures that Roadkill sticks fairly closely to the OWASP recommendations.
The AJAX control kit gets the credit for 99% of the work, and also included a very comprehensive set of unit tests too - 118 in total! The official Microsoft Web Protection library would be better off as a rewrite using the Ajax Control kit’s source, and the OWASP AntiSamy project or referencing their recommendations.