Python NLP Library: Flair

Sebastian
9 min readSep 14, 2023

Flair is a modern NLP library. From text processing to document semantics, all core NLP tasks are supported. Flair uses modern transformer neural networks models for several tasks, and it incorporates other Python libraries which enables to choose specific models. Its clear API and data structures that annotate text, as well as multi-language support, makes it a good candidate for NLP projects.

This article helps you to get started with Flair. Following its installation, you will learn how to apply text processing and text syntax task, and then see the rich support for text and document semantics.

The technical context of this article is Python v3.11 and Flair v0.12.2. All examples should work with newer versions too.

This article originally appeared at my blog admantium.com.

Installation

The Flair library can be installed via pip:

python3 -m pip install flair

The installation can take up to 30 minutes because several other libraries need to be installed as well. Also, when working with sequencers, taggers or datasets, additional data will need to be downloaded.

NLP Tasks

Flair supports all core NLP tasks, and provides additional features to create word vectors and to train custom sequencers.

Text Processing

  • Tokenization
  • Lemmatization
  • Syntactic Chunking

Text Syntax

  • Part-of-Speech Tagging

Text Semantics

  • Semantic Frames Parsing
  • Named Entity Recognition

Document Semantics

  • Sentiment Analysis
  • Language Toxicity Analysis

Furthermore, Flair supports these additional features:

  • Datasets
  • Corpus Management
  • Text Vectors
  • Model Training

Text Processing

Tokenization

Tokenization is applied in Flair automatically. The basic data structure Sentence wraps any-length text and generates tokens.

from flair.data import Sentence

# Source: Wikipedia, Artificial Intelligence, https://en.wikipedia.org/wiki/Artificial_intelligence
paragraph = '''Artificial intelligence was founded as an academic discipline in 1956, and in the years since it has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success, and renewed funding. AI research has tried and discarded many different approaches, including simulating the brain, modeling human problem solving, formal logic, large databases of knowledge, and imitating animal behavior. In the first decades of the 21st century, highly mathematical and statistical machine learning has dominated the field, and this technique has proved highly successful, helping to solve many challenging problems throughout industry and academia.'''

doc = Sentence(paragraph)
tokens = [token for token in doc]

print(tokens)
# [Token[0]: "Artificial", Token[1]: "intelligence", Token[2]: "was", Token[3]: "founded", Token[4]: "as", Token[5]: "an", Token[6]: "academic", Token[7]: "discipline",

Part-of-Speech Tagging

Detecting additional syntactic (and semantic) information in texts involved the usage of Classifiers, a Flair-specific data structure that incorporates pre-defined transformer models for specific tasks.

For POS tagging, Flair provides 14 different models, supporting English, German, Portuguese and more languages.

The basic, English language classifier pos defines these types:

The following snippet shows how to use the POS sequencer:

from flair.data import Sentence
from flair.nn import Classifier

pos = Classifier.load('pos')
# SequenceTagger predicts: Dictionary with 53 tags: <unk>, O, UH, ,, VBD, PRP, VB, PRP$, NN, RB, ., DT, JJ, VBP, VBG, IN, CD, NNS, NNP, WRB, VBZ, WDT, CC, TO, MD, VBN, WP, :, RP, EX, JJR, FW, XX, HYPH, POS, RBR, JJS, PDT, NNPS, RBS, AFX, WP$, -LRB-, -RRB-, ``, '', LS, $, SYM, ADD

doc = Sentence(paragraph)
pos.predict(doc)

print(doc.to_tagged_string())
# ["Artificial"/JJ,
# "intelligence"/NN,
# "was"/VBD,
# "founded"/VBN,
# "as"/IN,
# "an"/DT,
# "academic"/JJ,

Syntactic Chunking

Chunking is the process of extracting coherent sets of tokens that have a distinct meaning, for example noun phrases, prepositional phrases, adjective phrases and more.

The chunk classifier is used for this task. Here is an example:

from flair.data import Sentence
from flair.nn import Classifier

chunk = Classifier.load('chunk')
# SequenceTagger predicts: Dictionary with 47 tags: O, S-NP, B-NP, E-NP, I-NP, S-VP, B-VP, E-VP, I-VP, S-PP, B-PP, E-PP, I-PP, S-ADVP, B-ADVP, E-ADVP, I-ADVP, S-SBAR, B-SBAR, E-SBAR, I-SBAR, S-ADJP, B-ADJP, E-ADJP, I-ADJP, S-PRT, B-PRT, E-PRT, I-PRT, S-CONJP, B-CONJP, E-CONJP, I-CONJP, S-INTJ, B-INTJ, E-INTJ, I-INTJ, S-LST, B-LST, E-LST, I-LST, S-UCP, B-UCP, E-UCP, I-UCP, <START>, <STOP>

doc = Sentence(paragraph)
chunk.predict(doc)

print(doc.to_tagged_string())
# ["Artificial intelligence"/NP,
# "was founded"/VP,
# "as"/PP,
# "an academic discipline"/NP,
# "in"/PP,
# "1956"/NP,
# "and"/PP,
# "in"/PP,
# "the years"/NP,

Text Semantics

Semantic Frames Parsing

Semantic framing is an NLP technique that marks sequences of tokens with its semantic meaning. This is helpful for determining the sentiments and the topic of a sentence.

Just as before, semantic framing is used by loading a specific classifier. Although this feature is marked as experimental, at the time of writing this article using flair v0.12.2, it worked fine.

frame = Classifier.load('frame')
# SequenceTagger predicts: Dictionary with 4852 tags: <unk>, be.01, be.03, have.01, say.01, do.01, have.03, do.02, be.02, know.01, think.01, come.01, see.01, want.01, go.02, ...

doc = Sentence(paragraph)
frame.predict(doc)
print(doc.to_tagged_string())
# ["was"/be.03, "founded"/found.01, "has"/have.01, "experienced"/experience.01, "waves"/wave.04, "followed"/follow.01, "disappointment"/disappoint.01,

Named Entity Recognition

Named entities are persons, places, or dates inside a sentence. Flair provides different NER models.

Let’s compare the default ner with the larger ner-ontonotes-fast.

#Source: Wikipedia, Artificial Intelligence, https://en.wikipedia.org/wiki/Artificial_intelligence
paragraph = '''
In 2011, in a Jeopardy! quiz show exhibition match, IBM's question answering system, Watson, defeated the two greatest Jeopardy! champions, Brad Rutter and Ken Jennings, by a significant margin.
'''

ner = Classifier.load('ner')
# SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>

doc = Sentence(paragraph)
ner.predict(doc)

print(doc.get_spans('ner'))
# [Span[5:7]: "Jeopardy!" → MISC (0.5985)
# Span[12:13]: "IBM" → ORG (0.998)
# Span[18:19]: "Watson" → PER (1.0)
# Span[28:30]: "Brad Rutter" → PER (1.0)
# Span[31:33]: "Ken Jennings" → PER (0.9999)]

With the ner model, all persons and the organization is recognized.

ner = Classifier.load('ner-ontonotes-fast')
# SequenceTagger predicts: Dictionary with 75 tags: O, S-PERSON, B-PERSON, E-PERSON, I-PERSON, S-GPE, B-GPE, E-GPE, I-GPE, S-ORG, B-ORG, E-ORG, I-ORG, S-DATE, B-DATE, E-DATE, I-DATE, S-CARDINAL, B-CARDINAL, E-CARDINAL, I-CARDINAL, S-NORP, B-NORP, E-NORP, I-NORP, S-MONEY, B-MONEY, E-MONEY, I-MONEY, S-PERCENT, B-PERCENT, E-PERCENT, I-PERCENT, S-ORDINAL, B-ORDINAL, E-ORDINAL, I-ORDINAL, S-LOC, B-LOC, E-LOC, I-LOC, S-TIME, B-TIME, E-TIME, I-TIME, S-WORK_OF_ART, B-WORK_OF_ART, E-WORK_OF_ART, I-WORK_OF_ART, S-FAC

doc = Sentence(paragraph)
ner.predict(doc)

print(list(doc.get_labels()))
# [Span[1:2]: "2011"'/'DATE' (0.9984)
# Span[12:13]: "IBM"'/'ORG' (1.0)
# Span[18:19]: "Watson"'/'PERSON' (0.9913)
# Span[22:23]: "two"'/'CARDINAL' (0.9995)
# Span[24:25]: "Jeopardy"'/'WORK_OF_ART' (0.938)
# Span[28:30]: "Brad Rutter"'/'PERSON' (0.9939)
# Span[31:33]: "Ken Jennings"'/'PERSON' (0.9914)]

With ner-ontonotes-fast, numbers, date, and even Jeopardy are recognized.

Document Semantics

Sentiment Analysis

Flair’s sentiment analysis is typically applied to a sentence, but by wrapping a whole text in the Sentence data structure, it can be applied to a whole text as well. It will output a binary classification of positive or negative for a sentence.

#Source: Wikipedia, Artificial Intelligence,https://en.wikipedia.org/wiki/Artificial_intelligence
sentiment = Classifier.load('sentiment')

doc = Sentence(paragraph)
sentiment.predict(doc)
print(doc)
# Sentence[124]: "Artificial intelligence was founded ..." → POSITIVE (0.9992)

Language Toxicity Analysis

Flair provides a model for detecting language toxicity, but in German only. It was trained on a specific dataset downloadable from the Heidelberg university,

The following snippet detects the usage of offensive language.

paragraph = '''
Was für Bullshit.
'''

toxic_language = Classifier.load('de-offensive-language')
doc = Sentence(paragraph)
toxic_language.predict(doc)

print(list(doc.get_lables()))
# Sentence[16]: "Was für Bullshit." → OFFENSE (0.9772)

Additional Features

Datasets

Flair includes several datasets and corpus, see the complete list.

Some of these datasets were used to train models for the Flair-specific tasks, for example for NER or relation extraction. Other datasets are the GLUE language benchmark and text collections.

Here is an example how to load a text classification datasets for detecting emotions in Reddit posts.

import flair.datasets

data = flair.datasets.GO_EMOTIONS()

len(data.train)
# 43410

data.train[42000]
# This is quite common on people on such forums. I have a feeling they are a tad sarcastic." → APPROVAL (1.0); NEUTRAL (1.0)

Corpus Management

In Flair, the Corpus object represents documents prepared for training a new tagger or classifier. This object consists of tree distinct collections called train, dev and test, and each contains Sentence objects.

Text Vectors

Flair support different vectorization schemes: Pretrained word vectors, like glove, and word vectors from different transformer models, loaded via the transformers library, is enabled.

Let’s see how to tokenize a paragraph with these two methods.

from flair.embeddings import WordEmbeddings

embeddings = WordEmbeddings('glove')
doc = Sentence(paragraph)
embeddings.embed(doc)
for token in doc:
print(token)
print(token.embedding)

# Token[0]: "Artificial"
# tensor([ 0.3455, 0.3144, -0.0313, 0.6368, 0.2727, -0.6197, -0.5177, -0.2368,
# -0.0166, 0.0344, -0.1542, 0.0435, 0.7298, 0.1112, 1.3430, ...,
# Token[1]: "intelligence"
# tensor([-0.3110, -0.4329, 0.7773, -0.3112, 0.0529, -0.8502, -0.3537, -0.7053,
# 0.0845, 0.8877, 0.8353, -0.4164, 0.3670, 0.6083, 0.0085, ...,

And for the Transformer embeddings:

from flair.embeddings import TransformerWordEmbeddings

embedding = TransformerWordEmbeddings('bert-base-uncased')
doc = Sentence(paragraph)
embedding.embed(doc)
for token in doc:
print(token)
print(token.embedding)

# Token[0]: "Artificial"
# tensor([ 1.0723e-01, 9.7490e-02, -6.8251e-01, -6.4322e-02, 6.3791e-01,
# 3.8582e-01, -2.0940e-01, 1.4441e-01, 2.4147e-01, ...)
# Token[1]: "intelligence"
# tensor([-9.9221e-02, -1.9465e-01, -4.9403e-01, -4.1582e-01, 1.4902e+00,
# 3.6126e-01, 3.6648e-01, 3.7578e-01, -4.8785e-01, ...)

Furthermore, instead of individual tokens, complete documents can be vectorized as well with document embeddings.

Model Training

Flair includes functions to train new models that can be used as sequence taggers or text classifiers. It provides loading of datasets, model definition, training configuration and execution. For most of these steps, the transformers library is used.

Here is an example from the official documentation to train a model on the Cornell Corpus for Part-of-Speech tagging.

# Source: FlairNLP, How model Training works in Flair, https://flairnlp.github.io/docs/tutorial-training/how-model-training-works#example-training-a-part-of-speech-tagger

from flair.datasets import UD_ENGLISH
from flair.embeddings import WordEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer

# 1. load the corpus
corpus = UD_ENGLISH().downsample(0.1)
print(corpus)
#Corpus: 1254 train + 200 dev + 208 test sentences

# 2. what label do we want to predict?
label_type = 'upos'

# 3. make the label dictionary from the corpus
label_dict = corpus.make_label_dictionary(label_type=label_type)
print(label_dict)
# Dictionary created for label 'upos' with 18 values: NOUN (seen 3642 times), VERB (seen 2375 times), PUNCT (seen 2359 times), ADP (seen 1865 times), PRON (seen 1852 times), DET (seen 1721 times), ADJ (seen 1321 times), AUX (seen 1269 times), PROPN (seen 1203 times), ADV (seen 1083 times), CCONJ (seen 700 times), PART (seen 611 times), SCONJ (seen 405 times), NUM (seen 398 times), INTJ (seen 75 times), X (seen 63 times), SYM (seen 60 times)

# 4. initialize embeddings
embeddings = WordEmbeddings('glove')

# 5. initialize sequence tagger
model = SequenceTagger(hidden_size=256,
embeddings=embeddings,
tag_dictionary=label_dict,
tag_type=label_type)
print(model)
# Model: "SequenceTagger(
# (embeddings): WordEmbeddings(
# 'glove'
# (embedding): Embedding(400001, 100)
# )
# (word_dropout): WordDropout(p=0.05)
# (locked_dropout): LockedDropout(p=0.5)
# (embedding2nn): Linear(in_features=100, out_features=100, bias=True)
# (rnn): LSTM(100, 256, batch_first=True, bidirectional=True)
# (linear): Linear(in_features=512, out_features=20, bias=True)
# (loss_function): ViterbiLoss()
# (crf): CRF()
# )"

# 6. initialize trainer
trainer = ModelTrainer(model, corpus)

# 7. start training
trainer.train('resources/taggers/example-upos',
learning_rate=0.1,
mini_batch_size=32,
max_epochs=10)
# Parameters:
# - learning_rate: "0.100000"
# - mini_batch_size: "32"
# - patience: "3"
# - anneal_factor: "0.5"
# - max_epochs: "10"
# - shuffle: "True"
# - train_with_dev: "False"
# epoch 1 - iter 4/40 - loss 3.12352573 - time (sec): 1.06 - samples/sec: 2397.20 - lr: 0.100000
# ...
# epoch 1 - iter 4/40 - loss 3.12352573 - time (sec): 1.06 - samples/sec: 2397.20 - lr: 0.100000
# Results:
# - F-score (micro) 0.7877
# - F-score (macro) 0.6621
# - Accuracy 0.7877
# By class:
# precision recall f1-score support
# NOUN 0.7231 0.8495 0.7812 412
# PUNCT 0.9082 0.9858 0.9454 281
# VERB 0.7048 0.7403 0.7221 258
# PRON 0.9070 0.8986 0.9028 217
# ADP 0.8377 0.8791 0.8579 182
# DET 1.0000 0.8757 0.9338 169
# ADJ 0.6087 0.6490 0.6282 151
# PROPN 0.7538 0.5568 0.6405 176
# AUX 0.8077 0.8678 0.8367 121
# ADV 0.5446 0.4661 0.5023 118
# CCONJ 0.9880 0.9425 0.9647 87
# PART 0.6825 0.8600 0.7611 50
# NUM 0.7368 0.5000 0.5957 56
# SCONJ 0.6667 0.3429 0.4528 35
# INTJ 1.0000 0.4167 0.5882 12
# SYM 0.5000 0.0833 0.1429 12
# X 0.0000 0.0000 0.0000 9
# accuracy 0.7877 2346
# macro avg 0.7276 0.6420 0.6621 2346
# weighted avg 0.7854 0.7877 0.7808 2346

Summary

Flair is a modern NLP library that supports all core NLP tasks. This article showed how to apply text processing, text syntax, text semantics and document semantics tasks. Distinguished features of Flair are its multi-language support for selected tasks, for example named-entity-recognition and part-of-speech tagging, and its usage of transformer neural networks. Also, a complete feature set for model training exists, from training data preparation, model and training configuration, to training execution and metrics computation.

--

--