How to Sanitize HTML Using JavaScript: Prevent XSS Attacks

Robin
Updated on May 5, 2023

Hackers can inject malicious code into your website through HTML and steal sensitive information from your users. This is known as cross-site scripting (XSS) attack.

To prevent XSS attacks, you have to sanitize user-generated HTML before rendering it on your web pages. It will protect your website as well as your user's information from getting stolen.

HTML sanitization involves removing or escaping any dangerous or unwanted HTML tags, attributes, and JavaScript code from user-generated HTML content.

There are many ways to sanitize HTML using JavaScript in the front end of your website. In this blog post, I will focus on DOMPurify and sanitize-html, the 2 most popular libraries for sanitizing HTML, and how to use them.

Using DOMPurify to Sanitize HTML in JavaScript

The DOMPurify package is a DOM-only sanitizer for HTML. It is very fast and works in all modern browsers. You don't need Node.js to use this package.

Just download the minified JavaScript file (purify.min.js) from its GitHub repository. This file is available inside the dist folder. Otherwise, you can also copy the code from this link if you want.

Copy the code and save it to a JavaScript file. Then it is easy to link JavaScript files to an HTML page in order to use the DOMPurify object in another JS file linked with that page.

          const dirty = '<h2>Hello World!</h2><script>alert("hello")</script>'

const clean = DOMPurify.sanitize(dirty)

console.log(clean)
// <h2>Hello World!</h2>
        

Suppose you have HTML code generated by a User and you want to sanitize it. Because anyone can inject JavaScript code inside it. If you render this code without sanitization, browsers will also execute that JavaScript code.

As you have linked the purify.min.js file with your HTML page, you have access to the sanitize() method from the DOMPurify library.

Pass your user-generated HTML code to this method as its argument and it will remove or escape any potentially dangerous or unwanted HTML tags, attributes, and JavaScript code.


How Does DOMPurify Sanitize HTML?

The DOMPurify library works by using a whitelist approach, where it only allows a pre-defined set of safe HTML tags and attributes and removes or escapes any potentially dangerous or unwanted code.

By default, it allows common safe HTML tags such as <a>, <b>, <i>, and <img>, and safe attributes like href, src, alt, and title.

You can also customize the whitelist to include or exclude specific tags or attributes and provides options to control the level of sanitization. Provide an object with all your customization options as the second argument to this method.

          const clean = DOMPurify.sanitize(dirty, {
    // customization options 
})
        

You can learn all the customization options this library supports in their official documentation.

Also Read: How to Sanitize User Input in JavaScript: Prevent XSS Attack


Using sanitize-html For Sanitizing HTML in JavaScript

If you are using a module bundler like Webpack or Vite on the website, you can install the sanitize-html package and use it to clean your HTML code. Because it is not a DOM-only library.

This package works with Node.js that's why it is necessary to use a module bundler for sanitizing HTML in browsers.

Install the sanitize-html package with the following command:

          // With NPM
npm install sanitize-html

// Or with Yarn
yarn add sanitize-html

        

Import this package into your JavaScript file. It will give you a function to sanitize the HTML string.

          import sanitizeHtml from 'sanitize-html'

const dirty = '<h2>Hello World!</h2><script>alert("hello")</script>'

const clean = sanitizeHtml(dirty)

console.log(clean)
// <h2>Hello World!</h2>
        

Call this sanitizeHTML() function with your HTML string and it will return clean HTML by removing all the unwanted tags and attributes.

By default, this function supports many options out of the box. But you can pass an object with your custom options that are necessary for your application to this function as the second argument.

          const clean = sanitizeHtml(dirty, {
    // Customization options
})
        

You will find all the customization options for this package in the official documentation.


Conclusion

You already know why HTML sanitization is important for a website. If you don't sanitize your user-generated HTML code property, your website will remain at risk.

Writing your own function for this purpose is difficult and time-consuming. That's why libraries like DOMPurify and sanitize-html provide you with easy-to-use solutions for sanitizing HTML input.

DOMPurify is a powerful and DOM-only JavaScript library to sanitize HTML strings. You can easily add this to your HTML pages without Node.js and any module bundler.

On the other hand, the sanitize-html is a lightweight package for HTML sanitization. But it works with Node.js and requires a module bundler to use in browsers.

Both libraries have their own strengths and weaknesses. Therefore, you can choose any one of these for your website to sanitize HTML using JavaScript and protect it from cross-site scripting (XSS) attacks.

Related Posts