Cross-site scripting (XSS) is one of the common vulnerabilities for a web server. This happens when an attacker can inject malicious code into a website through user input.
You can prevent this kind of attack in your Node.js server by sanitizing user data like HTML. Often we receive data from users in our Express server through req.body
, params
, and query
string.
That's why it is very important to sanitize HTML or any other user inputs in Node.js before using them in our application or saving them in our database.
I will show you how to sanitize the HTML string in the Node.js Express server correctly generated by a user. We will use the sanitize-html
package. It is a powerful and lightweight package for an Express server.
Sanitize HTML in Node.js Using sanitize-html Package
The sanitize-html
package provides a function that takes a string and sanitizes it by removing any unwanted HTML tags and potentially dangerous code.
Therefore, if anyone tries to inject any dangerous code, this function will remove it from the string. Now it is safe to use this string in your application.
Install the sanitize-html
package by running the following command in your Node.js project directory:
npm install sanitize-html
You can require this module in your file. It will give you access to the function necessary for HTML sanitization.
const sanitizeHtml = require('sanitize-html');
const dirtyHtml = '<h2>Hello World!</h2><script>alert("hello")</script>';
const cleanHtml = sanitizeHtml(dirtyHtml);
console.log(cleanHtml);
// <h2>Hello World!</h2>
Here I have an HTML string that contains a <script>
tag. An attacker can place any JavaScript code using this tag which is dangerous. That's why it is necessary to remove it from our string.
When you call sanitizeHtml()
function and pass this string, it will remove the <script> tag and other risky HTML tags from your string.
By default, this function allows many HTML tags and attributes. You can also pass options to the sanitizeHtml()
function as its second argument to customize the sanitization process.
const cleanHtml = sanitizeHtml(dirtyHtml, {
// Customization options
});
This options object has many properties like allowedTags
, allowedAttributes
, allowedClasses
, allowedStyles
, selfClosing
etc. You can use these options to overwrite the default behavior and get custom results according to your requirements.
You will find all the customization options for this package in its official documentation.
Also Read: How to Use ES6 Modules Import and Export Syntax in Node.js
Implementing HTML Sanitization in Node.js Express Server
Other sanitizing HTML strings using just NodeJS, it is also easy to implement this in an Express server. The process is the same. The only difference is where we use this package on our server.
When we submit a form, it sends a POST
request to our server. You can access those form data from the req.body
property. You don't know what a user is sending to your server. That's why you should sanitize it before doing anything with it.
Sanitizing req.body
in Express:
const express = require('express')
const bodyParser = require('body-parser')
const sanitizeHtml = require('sanitize-html')
const app = express()
app.use(bodyParser.json())
// ...
app.post('/message', (req, res) => {
const { message } = req.body
const clean = sanitizeHtml(message)
// Save it to database
res.json({
success: true,
data: 'Message received',
})
})
// ...
app.listen(3000, () => {
console.log('Server listening on port 3000...')
})
Suppose you have a post
route called /message
where users can submit messages. First, get the message text from req.body
property.
Now pass the text to the sanitizeHtml()
function. It will return clean and save user input. If anyone tries to inject any script into the message, this package will remove them.
Also Read: Getting Data From req.body in Node.js & Express Server
Sanitizing req.params
and req.query
in Express:
app.get('/posts/:id', (req, res) => {
// Sanitizing request Parameters
let { id } = req.params;
id = sanitizeHtml(id);
// Sanitizing request Parameters
let { action } = req.query;
action = sanitizeHtml(action);
// Get the user from database
res.json({
success: true,
data: 'Send user data',
})
});
Here I have a GET
route to access a single post. This route has a parameter called id
and will have an action
query string. You can take out both of these values from req.params
and req.query
properties.
Now pass them to the sanitizeHtml()
function for sanitization. This way you can sanitize both request parameters and queries in the Express server.
Also Read: Get Query Strings and Parameters in Express Routes on NodeJS
Create an Express Middleware Function for Sanitizing Data
When we want to sanitize req.body
, req.params
, and req.query
for every route, it becomes very painful. Because you have to use sanitize-html
package manually one by one.
But we can simplify this process by creating a middleware function. This function will automatically sanitize everything from incoming requests for every route.
const sanitizeHtml = require('sanitize-html')
const clean = (data) => {
data = JSON.stringify(data)
data = sanitizeHtml(data, {
// Configuration options
})
data = JSON.parse(data)
return data
}
First, create this utility function called clean()
in your project. This function takes one parameter and sanitizes it using the sanitizeHtml()
function.
Finally, returns the clean data. You will call this function from the middleware.
Creating middleware function:
const sanitize = () => {
return (req, res, next) => {
if (
Object.keys(req.body).length > 0 &&
req.body.constructor === Object
) {
req.body = clean(req.body)
}
if (
Object.keys(req.query).length > 0 &&
req.query.constructor === Object
) {
req.query = clean(req.query)
}
if (
Object.keys(req.params).length > 0 &&
req.params.constructor === Object
) {
req.params = clean(req.params)
}
next()
}
}
Now define a middleware function called sanitize
that sanitizes the request body, query parameters, and route parameters.
Check if the req.body
, req.query
, and req.params
objects exist and are not empty. If they pass the if
check, call the clean()
function with each object.
Finally, call the next
function to pass control to the next middleware or route handler in the Express middleware chain.
Registering the custom middleware function in Express:
const app = express()
// ...
app.use(sanitize())
// ...
To add the sanitize()
middleware function in your Express server, call the app.use()
method. This means that for every incoming request to the app, the sanitize()
middleware function will be executed.
Therefore, you don't have to sanitize req.body
, req.params
and req.query
objects for every route manually. This function will check and sanitize them automatically for every route.
Also Read: Access Command Line Arguments Using process.argv in Node.js
Conclusion
You know how important sanitization is for your application's security. If you want to protect your server from cross-site scripting (XSS) attacks, you must clean every user input.
The sanitize-html
package makes the whole process very easy and simple. You can install and use it in Node.js applications. You can also use it in your Express server.
The best way to implement sanitization in an Express server is by creating a middleware function. This middleware will sanitize HTML and user data from req.body
, req.params
, and req.query
objects automatically.