Tutorial On Chunking Task - CRFsuite - Naoaki Okazaki

Task description

Text chunking divides a text into syntactically correlated parts of words. For example, the sentence He reckons the current account deficit will narrow to only # 1.8 billion in September. can be divided as follows:

[NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP September ] .

In this example, NP stands for a noun phrase, VP for a verb phrase, and PP for a prepositional phrase. This task is formalized as a sequential labeling task in which a sequence of tokens in a text is assigned with a sequence of labels. In order to represent a chunk (a span of tokens) with labels, we often use the IOB2 notation. Using the IOB2 notation, a chunk NP is represented by a begin of a chunk (B-NP) and an inside of a chunk (I-NP). Tokens that do not belong to a chunk are represented by O labels.

B-NP He B-VP reckons B-NP the I-NP current I-NP account I-NP deficit B-VP will I-VP narrow B-PP to B-NP only I-NP # I-NP 1.8 I-NP billion B-PP in B-NP September O .

The goal of this tutorial is to build a model that predicts chunk labels for a given sentence (sequence of tokens) by using CRFsuite.

Từ khóa » Chunki Bu Bz