![]() | ||
Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF can take context into account; e.g., the linear chain CRF (which is popular in natural language processing) predicts sequences of labels for sequences of input samples.
Contents
- Description
- Inference
- Parameter Learning
- Examples
- Higher order CRFs and semi Markov CRFs
- Latent dynamic conditional random field
- Software
- References
CRFs are a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between observations and construct consistent interpretations. It is often used for labeling or parsing of sequential data, such as natural language text or biological sequences and in computer vision. Specifically, CRFs find applications in shallow parsing, named entity recognition, gene finding and peptide critical functional region finding, among other tasks, being an alternative to the related hidden Markov models (HMMs). In computer vision, CRFs are often used for object recognition and image segmentation.
Description
Lafferty, McCallum and Pereira define a CRF on observations
Let
What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets
Inference
For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold. However, there exist special cases for which exact inference is feasible:
If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include:
Parameter Learning
Learning the parameters
Examples
In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables
The conditional dependency of each
Linear-chain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence.
Notably in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence
Higher-order CRFs and semi-Markov CRFs
CRFs can be extended into higher order models by making each
There exists another generalization of CRFs, the semi-Markov conditional random field (semi-CRF), which models variable-length segmentations of the label sequence
Latent-dynamic conditional random field
Latent-dynamic conditional random fields (LDCRF) or discriminative probabilistic latent variable models (DPLVM) are a type of CRFs for sequence tagging tasks. They are latent variable models that are trained discriminatively.
In an LDCRF, like in any sequence tagging task, given a sequence of observations x =
This allows capturing latent structure between the observations and labels. While LDCRFs can be trained using quasi-Newton methods, a specialized version of the perceptron algorithm called the latent-variable perceptron has been developed for them as well, based on Collins' structured perceptron algorithm. These models find applications in computer vision, specifically gesture recognition from video streams and shallow parsing.
Software
This is a partial list of software that implement generic CRF tools.
This is a partial list of software that implement CRF related tools.