How to Sanitize HTML in Node.js Express Server Properly

Robin
Updated on May 9, 2023

Cross-site scripting (XSS) is one of the common vulnerabilities for a web server. This happens when an attacker can inject malicious code into a website through user input.

You can prevent this kind of attack in your Node.js server by sanitizing user data like HTML. Often we receive data from users in our Express server through req.body, params, and query string.

That's why it is very important to sanitize HTML or any other user inputs in Node.js before using them in our application or saving them in our database.

I will show you how to sanitize the HTML string in the Node.js Express server correctly generated by a user. We will use the sanitize-html package. It is a powerful and lightweight package for an Express server.

Sanitize HTML in Node.js Using sanitize-html Package

The sanitize-html package provides a function that takes a string and sanitizes it by removing any unwanted HTML tags and potentially dangerous code.

Therefore, if anyone tries to inject any dangerous code, this function will remove it from the string. Now it is safe to use this string in your application.

Install the sanitize-html package by running the following command in your Node.js project directory:

          npm install sanitize-html
        

You can require this module in your file. It will give you access to the function necessary for HTML sanitization.

          const sanitizeHtml = require('sanitize-html');

const dirtyHtml = '<h2>Hello World!</h2><script>alert("hello")</script>';

const cleanHtml = sanitizeHtml(dirtyHtml);

console.log(cleanHtml);
// <h2>Hello World!</h2>
        

Here I have an HTML string that contains a <script> tag. An attacker can place any JavaScript code using this tag which is dangerous. That's why it is necessary to remove it from our string.

When you call sanitizeHtml() function and pass this string, it will remove the <script> tag and other risky HTML tags from your string.

By default, this function allows many HTML tags and attributes. You can also pass options to the sanitizeHtml() function as its second argument to customize the sanitization process.

          const cleanHtml = sanitizeHtml(dirtyHtml, {
    // Customization options
});
        

This options object has many properties like allowedTags, allowedAttributes, allowedClasses, allowedStyles, selfClosing etc. You can use these options to overwrite the default behavior and get custom results according to your requirements.

You will find all the customization options for this package in its official documentation.

Also Read: How to Use ES6 Modules Import and Export Syntax in Node.js


Implementing HTML Sanitization in Node.js Express Server

Other sanitizing HTML strings using just NodeJS, it is also easy to implement this in an Express server. The process is the same. The only difference is where we use this package on our server.

When we submit a form, it sends a POST request to our server. You can access those form data from the req.body property. You don't know what a user is sending to your server. That's why you should sanitize it before doing anything with it.

Sanitizing req.body in Express:

          const express = require('express')
const bodyParser = require('body-parser')
const sanitizeHtml = require('sanitize-html')

const app = express()

app.use(bodyParser.json())

// ...

app.post('/message', (req, res) => {
    const { message } = req.body

    const clean = sanitizeHtml(message)

    // Save it to database

    res.json({
        success: true,
        data: 'Message received',
    })
})

// ...

app.listen(3000, () => {
    console.log('Server listening on port 3000...')
})
        

Suppose you have a post route called /message where users can submit messages. First, get the message text from req.body property.

Now pass the text to the sanitizeHtml() function. It will return clean and save user input. If anyone tries to inject any script into the message, this package will remove them.

Also Read: Getting Data From req.body in Node.js & Express Server

Sanitizing req.params and req.query in Express:

          app.get('/posts/:id', (req, res) => {
    // Sanitizing request Parameters
    let { id } = req.params;
    id = sanitizeHtml(id);

    // Sanitizing request Parameters
    let { action } = req.query;
    action = sanitizeHtml(action);

    // Get the user from database

    res.json({
        success: true,
        data: 'Send user data',
    })
});
        

Here I have a GET route to access a single post. This route has a parameter called id and will have an action query string. You can take out both of these values from req.params and req.query properties.

Now pass them to the sanitizeHtml() function for sanitization. This way you can sanitize both request parameters and queries in the Express server.

Also Read: Get Query Strings and Parameters in Express Routes on NodeJS


Create an Express Middleware Function for Sanitizing Data

When we want to sanitize req.body, req.params, and req.query for every route, it becomes very painful. Because you have to use sanitize-html package manually one by one.

But we can simplify this process by creating a middleware function. This function will automatically sanitize everything from incoming requests for every route.

          const sanitizeHtml = require('sanitize-html')

const clean = (data) => {
    data = JSON.stringify(data)

    data = sanitizeHtml(data, {
        // Configuration options
    })

    data = JSON.parse(data)

    return data
}

        

First, create this utility function called clean() in your project. This function takes one parameter and sanitizes it using the sanitizeHtml() function.

Finally, returns the clean data. You will call this function from the middleware.

Creating middleware function:

          const sanitize = () => {
    return (req, res, next) => {
        if (
            Object.keys(req.body).length > 0 &&
            req.body.constructor === Object
        ) {
            req.body = clean(req.body)
        }

        if (
            Object.keys(req.query).length > 0 &&
            req.query.constructor === Object
        ) {
            req.query = clean(req.query)
        }

        if (
            Object.keys(req.params).length > 0 &&
            req.params.constructor === Object
        ) {
            req.params = clean(req.params)
        }

        next()
    }
}

        

Now define a middleware function called sanitize that sanitizes the request body, query parameters, and route parameters.

Check if the req.body, req.query, and req.params objects exist and are not empty. If they pass the if check, call the clean() function with each object.

Finally, call the next function to pass control to the next middleware or route handler in the Express middleware chain.

Registering the custom middleware function in Express:

          const app = express()

// ...

app.use(sanitize())

// ...
        

To add the sanitize() middleware function in your Express server, call the app.use() method. This means that for every incoming request to the app, the sanitize() middleware function will be executed.

Therefore, you don't have to sanitize req.body, req.params and req.query objects for every route manually. This function will check and sanitize them automatically for every route.

Also Read: Access Command Line Arguments Using process.argv in Node.js


Conclusion

You know how important sanitization is for your application's security. If you want to protect your server from cross-site scripting (XSS) attacks, you must clean every user input.

The sanitize-html package makes the whole process very easy and simple. You can install and use it in Node.js applications. You can also use it in your Express server.

The best way to implement sanitization in an Express server is by creating a middleware function. This middleware will sanitize HTML and user data from req.body, req.params, and req.query objects automatically.

Related Posts