How to Tokenize a String in JavaScript

Anika Tabassum Era Feb 02, 2024
How to Tokenize a String in JavaScript

The JavaScript token is more like a visualization concept of individually scanning characters, expressions, and strings.

If we supposedly take 10+5 as an expression, then the lexer(the process that differs each valid character as a token) will define 10 as a Number type token, + as a Plus, and 5 as a Number type.

After all the characters have been tokenized, more specifically categorized, then they will be sent to parse. The parser rule will then specify the tokens to define the expression.

This article can be followed for a more detailed explanation. We will consider one example that will cover the concept of token in JavaScript.

Use the split() Method to Tokenize a String in JavaScript

We will follow the lexer and parser rules to define each word in the following example. The full text will first be scanned as individual words differentiated by space.

And then, the whole tokenized group will fall under parsing. This concept gives a step-by-step rail for splitting a string or any expression.

An abstract syntax tree performs the visual of the expression. Let’s dive into the code for a demonstrative explanation.

var text = 'Is not it weird to live in a world like this? It is a 42';
var words = text.toLowerCase();
var okay = words.split(/\W+/).filter(function(token) {
  return token.length == 2;
});
console.log(okay);

Output:

Use split() Method to Tokenize a String

So, the text string is converted to lowercase, and then the split() method completes the task of tokenizing.

The procedure is abstracted, so we cannot visually determine the internal work process. We have filtered out some specific lengths of words from the tokens.

Anika Tabassum Era avatar Anika Tabassum Era avatar

Era is an observer who loves cracking the ambiguos barriers. An AI enthusiast to help others with the drive and develop a stronger community.

LinkedIn Facebook

Related Article - JavaScript String