Using BiLSTM for Name Entity Recognition

Nov 28, 2021 · 519 words · 3 minute read

Name Entity Recognition (NER) is a natural language processing task for categorizing words into name entities with numerous applications (e.g., searching, text classification, etc.). The following is a demo on how I trained a Bi-directional Long-Short Term Memory (BiLSTM) model to predict name entities with an accuracy of over .95 using the CoNLL-2003 Dataset.

Quick Summary 🔗

This project demonstrates the workflow of building recurrent neural network models to predict name entity labels using TensorFlow. With a simple structure composed of one embedding layer, one bidirectional LSTM layer, and one dense layer, prediction accuracy on hold-out test datasets can achieve above .95 on the CoNLL-2003 Dataset. Below is a brief introduction of the workflow (tokenization, padding, model buildup, hyperparameter tuning and test).

Dataset: 🔗

The CoNLL-2003 Dataset was separated into training, validation and testing sets. All three datasets contain sentences and name-entity tags corresponding to each word in the text.

Step 1: Build corpus & tags dictionary 🔗

To convert words to numbers, we first read in the sentences and name entities in the training dataset, and thereafter break sentences into word lists (tokenization) and tags. Then we create a dictionary, denoted “Corpus,” to map every word to a number and use an UNK token for unknown words. Another dictionary, denoted “Tags,” is used to store mapping of all name entity tags and their corresponding numbers.

Step 2: Turning sentences into lists of tokens and name entity tags 🔗

To convert words and name entity tags into numbers, we need to first break sentences in to lists of words (or tokens).

Step 3: Map words and tags to numbers 🔗

With a map of token-number and tag-number pairs, we can convert dataset sentences into numbers for later model build-up.

Step 4: Padding 🔗

To make all sentences in the dataset the same length to build models, zeros are added to the ends of each sentence (i.e. padding) to create a uniform length based on the longest sentence.

Step 5: Model buildup 🔗

I used TensorFlow to build a sequence model containing an embedding layer, a Bi-directional Long-Short Term Memory (BiLSTM) layer and a dense layer. The structures (model layers) and other hyperparameters (e.g., learning rate, activation function, dropout rate, regularization) are all hyperparameters that can be tuned to optimize evaluation matrices using the training and validation dataset.

Step 6: Model training 🔗

After defining model structure, we can feed the preprocessed (tokenized, mapped and padded) training data to train the model. Epochs (times the model runs through all the training data) and batch size (during an epoch, training data are processed on batch at a time until the whole training set is processed) are adjusted to optimize performance.

Step 7: Validation 🔗

Hyperparameters can be tuned to optimize the model performance of the validation set. The performances of the training and validation set both serve as references for hyperparameter tuning(e.g. bias-variance issue).

Step 8: Test the model with unseen test data 🔗

The final model and hyperparameters should be tested with an unseen (hold-out) dataset. This is to avoid overfitting during hyperparameter tuning. In this demo, the testing accuracy reached above .95 using bidirectional LSTM on CoNLL-2003 test data.

Machine Learning Deep Learning Natural Language Processing