![]() | ||
Conceptual clustering is a machine learning paradigm for unsupervised classification developed mainly during the 1980s. It is distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating hierarchical category structures; see Categorization for more information on hierarchy. Conceptual clustering is closely related to formal concept analysis, decision tree learning, and mixture model learning.
Contents
Conceptual clustering vs. data clustering
Conceptual clustering is obviously closely related to data clustering; however, in conceptual clustering it is not only the inherent structure of the data that drives cluster formation, but also the Description language which is available to the learner. Thus, a statistically strong grouping in the data may fail to be extracted by the learner if the prevailing concept description language is incapable of describing that particular regularity. In most implementations, the description language has been limited to feature conjunction, although in COBWEB (see "COBWEB" below), the feature language is probabilistic.
List of published algorithms
A fair number of algorithms have been proposed for conceptual clustering. Some examples are given below:
More general discussions and reviews of conceptual clustering can be found in the following publications:
Example: A basic conceptual clustering algorithm
This section discusses the rudiments of the conceptual clustering algorithm COBWEB. There are many other algorithms using different heuristics and "category goodness" or category evaluation criteria, but COBWEB is one of the best known. The reader is referred to the bibliography for other methods.
Knowledge representation
The COBWEB data structure is a hierarchy (tree) wherein each node represents a given concept. Each concept represents a set (actually, a multiset or bag) of objects, each object being represented as a binary-valued property list. The data associated with each tree node (i.e., concept) are the integer property counts for the objects in that concept. For example, (see figure), let a concept
[1 0 1]
[0 1 1]
[0 1 0]
[0 1 1]
The three properties might be, for example, [is_male, has_wings, is_nocturnal]
. Then what is stored at this concept node is the property count [1 3 3]
, indicating that 1 of the objects in the concept is male, 3 of the objects have wings, and 3 of the objects are nocturnal. The concept description is the category-conditional probability (likelihood) of the properties at the node. Thus, given that an object is a member of category (concept) [.25 .75 .75]
, which corresponds to the
The figure to the right shows a concept tree with five concepts.
The COBWEB language
The description language of COBWEB is a "language" only in a loose sense, because being fully probabilistic it is capable of describing any concept. However, if constraints are placed on the probability ranges which concepts may represent, then a stronger language is obtained. For example, we might permit only concepts wherein at least one probability differs from 0.5 by more than [.6 .5 .7]
could not be constructed by the learner; however a concept such as [.6 .5 .9]
would be accessible because at least one probability differs from 0.5 by more than
Evaluation criterion
In Fisher's (1987) description of COBWEB, the measure he uses to evaluate the quality of the hierarchy is Gluck and Corter's (1985) category utility (CU) measure, which he re-derives in his paper. The motivation for the measure is highly similar to the "information gain" measure introduced by Quinlan for decision tree learning. It has previously been shown that the CU for feature-based classification is the same as the mutual information between the feature variables and the class variable (Gluck & Corter, 1985; Corter & Gluck, 1992), and since this measure is much better known, we proceed here with mutual information as the measure of category "goodness".
What we wish to evaluate is the overall utility of grouping the objects into a particular hierarchical categorization structure. Given a set of possible classification structures, we need to determine whether one is better than another.