Supriya Ghosh (Editor)

Automatic taxonomy construction

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

Automatic taxonomy construction (ATC) is the use of autonomous or semi-autonomous software programs to create hierarchical outlines or taxonomical classifications from a body of texts (corpus). It is a branch of natural language processing, which in turn is a branch of artificial intelligence. ATC programs are examples of software agents and intelligent agents, and may be autonomous as well (see autonomous agent).

Other names for ATC include taxonomy generation, taxonomy learning, taxonomy extraction, taxonomy building, and taxonomy induction. Any of these terms may be preceded by the word "automatic", as in automatic taxonomy induction. ATC is also referred to as semantic taxonomy induction.

A taxonomy is a tree structure and includes familial (parent-offspring, sibling, etc.) relationships built-in (like in a family tree). For example, physics is an offspring of physical science, which in turn is an offspring of science.

As mentioned above, the process is also called taxonomy induction. This is because, in order for a software program to construct a taxonomy from a corpus (for example, from Wikipedia, a web page, or the World Wide Web), it must induce which terms belong to the taxonomy and what the relationships between them are. Such as by identifying hyponym-hypernym pairs, among other approaches. This is done using algorithms, including statistical algorithms. Note that deduction (deductive logic) is often also employed (e.g., if B is a sibling of A, then B has the same parent as A and gets placed under that parent in the taxonomy).

The primary application of automatic taxonomy construction is in ontology learning, a central activity within ontology engineering. In computer science and artificial intelligence, an ontology is a conceptual model of a (subject) domain. A domain is a given subject area or specifically defined sphere of interest. An ontology of a domain includes the vocabulary of that domain and the relationships between those concepts or entities. The backbone of most ontologies is a taxonomy, and taxonomical structure may be used throughout an ontology.

As building taxonomies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

References

Automatic taxonomy construction Wikipedia