Writing secure applications with HTML

HTML is used to create interactive sites, care needs to be taken to avoid introducing vulnerabilities through which attackers can compromise the integrity of the site itself or of the site’s users.

A comprehensive study of this matter is beyond the scope of this document, and the authors are strongly encouraged to study the matter in more detail. However, this section attempts to provide a quick introduction to some common pitfalls in HTML application development.

The security model of the Web is based on the concept of “origins”, and correspondingly many of the potential attacks on the Web involve cross-origin actions.

When accepting untrusted input, e.g. user-generated content such as text comments, values in URL parameters, messages from third-party sites, etc, it is imperative that the data be validated before use, and properly escaped when displayed. Failing to do this can allow a hostile user to perform a variety of attacks, ranging from the potentially benign, such as providing bogus user information like a negative age, to the serious, such as running scripts every time a user looks at a page that includes the information, potentially propagating the attack in the process, to the catastrophic, such as deleting all data in the server.

When writing filters to validate user input, it is imperative that filters always be whitelist-based, allowing known-safe constructs and disallowing all other input. Blacklist-based filters that disallow known-bad inputs and allow everything else is not secure, as not everything that is bad is yet known (for example, because it might be invented in the future).

If the attacker then convinced a victim user to visit this page, a script of the attacker’s choosing would run on the page. Such a script could do any number of hostile actions, limited only by what the site offers: if the site is an e-commerce shop, for instance, such a script could cause the user to unknowingly make arbitrarily many unwanted purchases.

This is called a cross-site scripting attack.

There are many constructs of HTML that can be used to try to trick a site into executing code. Here are some that authors are encouraged to consider when writing whitelist filters:

When allowing harmless-seeming elements like img, it is important to whitelist any provided attributes as well. If one allowed all attributes then an attacker could, for instance, use the onload attribute to run an arbitrary script.
When allowing URLs to be provided (e.g. for links), the scheme of each URL also needs to be explicitly whitelisted, as there are many schemes that can be abused. The most prominent example is “javascript:”, but user agents can implement (and indeed, have historically implemented) others.
Allowing a base element to be inserted means any script elements in the page with relative links can be hijacked, and similarly that any form submissions can get redirected to a hostile site.

Visit our blog and check out more articles like this one!