We often want to split a paragraph into individual sentences for analysis or formatting purposes. You can use built-in JavaScript methods or a regular expression (RegEx) to break your string into sentences quickly and easily.
Today you will learn 2 different ways to achieve this. These are:
- Only using JavaScript methods.
- Using regular expressions with the
split()
method.
In this blog post, we'll walk through a step-by-step guide on splitting paragraphs into an array of sentences using JavaScript with complete code examples and explanations of those methods.
Let's dive in!
Split Paragraphs into Sentences Using JavaScript
When you want to split your paragraphs into sentences only using JavaScript built-in methods, you have to use a few of them. Identify the start and end position of a sentence and extract it with the substring()
method.
This process requires a few steps to complete. These steps are:
- Set the starting position of the sentence (it will be 0 in the beginning).
- Loop through each character of a paragraph or string.
- Check if the current character is a period, question mark, or exclamation point inside the loop.
- Extract the sentence from the paragraph using
substring()
method. - Update the starting position value with the current index inside the loop.
Let's see an example and understand each step in detail.
const paragraph = `JavaScript is a powerful programming language used to create dynamic and interactive websites. Did you know that it is also used for server-side programming? That's right, you can now use it for both client-side and server-side programming! It's a great time to learn this programming language.`
const sentences = []
let start = 0
for (let i = 0; i < paragraph.length; i++) {
if (paragraph[i] === '.' || paragraph[i] === '?' || paragraph[i] === '!') {
const sentence = paragraph.substring(start, i + 1).trim()
sentences.push(sentence)
start = i + 1
}
}
console.log(sentences)
Output:
[
"JavaScript is a powerful programming language used to create dynamic and interactive websites.",
"Did you know that it is also used for server-side programming?",
"That's right, you can now use it for both client-side and server-side programming!",
"It's a great time to learn this programming language."
]
Here I have a paragraph of text that I want to split into individual sentences and an empty array to store those sentences.
Define a variable start
to track the starting position of each sentence in the paragraph. The initial value of this variable will be 0.
Then use a for
loop to iterate through each character in the paragraph. Inside the loop, check if the current character is a period ('.'), question mark ('?'), or exclamation point ('!') with an if
statement.
If the current character is one of these punctuation marks, you know that you have reached the end of a sentence.
You can use the substring()
method to extract the sentence from the paragraph, starting at the start
position and ending at the current position (i + 1).
You also use the trim()
method to remove any extra whitespace from the beginning or end of the sentence. Once you have extracted the sentence, you can push it to the sentences
array using the push()
method.
Finally, you need to update the start
position value to the current position plus 1, so that we can start the next sentence from the correct position.
After completing the for loop, the sentences
array will contain all of the sentences from the original paragraph.
Also Read: Capitalize The First Letter of Each Sentence in JavaScript
Use RegEx to Split Strings into Sentences in JavaScript
As you can see previous technique requires a lot of code and it also looks a little complex. But if you want to split the text into sentences easily with a single line of code, you can use a regular expression with the split()
method.
const paragraph = `JavaScript is a powerful programming language used to create dynamic and interactive websites. Did you know that it is also used for server-side programming? That's right, you can now use it for both client-side and server-side programming! It's a great time to learn this programming language.`
const sentences = paragraph.split(/(?<=[.!?])\s+/)
console.log(sentences)
Output:
[
"JavaScript is a powerful programming language used to create dynamic and interactive websites.",
"Did you know that it is also used for server-side programming?",
"That's right, you can now use it for both client-side and server-side programming!",
"It's a great time to learn this programming language."
]
The regular expression /(?<=[.!?])\s+/
matches any whitespace character that comes after a period ('.'), question mark ('?'), or exclamation point ('!').
The syntax is using the (?<=pattern) for the regular expression which is known as a positive lookbehind assertion.
When we apply this regular expression to the paragraph
string using the split()
method, it will split the string at each whitespace character that comes after a period, question mark, or exclamation point.
This will effectively split the paragraph into an array of individual sentences.
Also Read: How to Capitalize the First Letter of Each Word in JavaScript
Conclusion
Splitting a paragraph into individual sentences can be a useful task in many JavaScript applications. By leveraging built-in string methods, we can easily extract sentences and format them as needed.
Additionally, you have seen how to use regular expressions to split sentences from a string based on different punctuation marks. It is a lot cleaner and needs less code.