In computer science, a rough set, first described by Polish computer scientist Zdzisław I. Pawlak, is a formal approximation of a crisp set (i.e., conventional set) in terms of a pair of sets which give the lower and the upper approximation of the original set. In the standard version of rough set theory (Pawlak 1991), the lower- and upper-approximation sets are crisp sets, but in other variations, the approximating sets may be fuzzy sets.
Contents
- Definitions
- Information system framework
- Example equivalence class structure
- Definition of a rough set
- Lower approximation and positive region
- Upper approximation and negative region
- Boundary region
- The rough set
- Objective analysis
- Definability
- Reduct and core
- Attribute dependency
- Rule extraction
- Decision matrices
- LERS rule induction system
- Incomplete data
- Applications
- History
- Extensions and generalizations
- Rough membership
- Other generalizations
- References
Definitions
The following section contains an overview of the basic framework of rough set theory, as originally proposed by Zdzisław I. Pawlak, along with some of the key definitions. More formal properties and boundaries of rough sets can be found in Pawlak (1991) and cited references. The initial and basic theory of rough sets is sometimes referred to as "Pawlak Rough Sets" or "classical rough sets", as a means to distinguish from more recent extensions and generalizations.
Information system framework
Let
With any
The relation
If
The equivalence classes of the
Example: equivalence-class structure
For example, consider the following information table:
When the full set of attributes
Thus, the two objects within the first equivalence class,
It is apparent that different attribute subset selections will in general lead to different indiscernibility classes. For example, if attribute
Definition of a rough set
Let
For example, consider the target set
However, the target set
Lower approximation and positive region
The
Upper approximation and negative region
The
The set
Boundary region
The boundary region, given by set difference
In summary, the lower approximation of a target set is a conservative approximation consisting of only those objects which can positively be identified as members of the set. (These objects have no indiscernible "clones" which are excluded by the target set.) The upper approximation is a liberal approximation which includes all objects that might be members of target set. (Some objects in the upper approximation may not be members of the target set.) From the perspective of
The rough set
The tuple
The accuracy of the rough-set representation of the set
That is, the accuracy of the rough set representation of
Objective analysis
Rough set theory is one of many methods that can be employed to analyse uncertain (including vague) systems, although less common than more traditional methods of probability, statistics, entropy and Dempster–Shafer theory. However a key difference, and a unique strength, of using classical rough set theory is that it provides an objective form of analysis (Pawlak et al. 1995). Unlike other methods, as those given above, classical rough set analysis requires no additional information, external parameters, models, functions, grades or subjective interpretations to determine set membership – instead it only uses the information presented within the given data (Düntsch and Gediga 1995). More recent adaptations of rough set theory, such as dominance-based, decision-theoretic and fuzzy rough sets, have introduced more subjectivity to the analysis.
Definability
In general, the upper and lower approximations are not equal; in such cases, we say that target set
Reduct and core
An interesting question is whether there are attributes in the information system (attribute-value table) which are more important to the knowledge represented in the equivalence class structure than other attributes. Often, we wonder whether there is a subset of attributes which can, by itself, fully characterize the knowledge in the database; such an attribute set is called a reduct.
Formally, a reduct is a subset of attributes
A reduct can be thought of as a sufficient set of features – sufficient, that is, to represent the category structure. In the example table above, attribute set
Attribute set
The reduct of an information system is not unique: there may be many subsets of attributes which preserve the equivalence-class structure (i.e., the knowledge) expressed in the information system. In the example information system above, another reduct is
The set of attributes which is common to all reducts is called the core: the core is the set of attributes which is possessed by every legitimate reduct, and therefore consists of attributes which cannot be removed from the information system without causing collapse of the equivalence-class structure. The core may be thought of as the set of necessary attributes – necessary, that is, for the category structure to be represented. In the example, the only such attribute is
It is possible for the core to be empty, which means that there is no indispensable attribute: any single attribute in such an information system can be deleted without altering the equivalence-class structure. In such cases, there is no essential or necessary attribute which is required for the class structure to be represented.
Attribute dependency
One of the most important aspects of database analysis or data acquisition is the discovery of attribute dependencies; that is, we wish to discover which variables are strongly related to which other variables. Generally, it is these strong relationships that will warrant further investigation, and that will ultimately be of use in predictive modeling.
In rough set theory, the notion of dependency is defined very simply. Let us take two (disjoint) sets of attributes, set
Let
That is, for each equivalence class
Another, intuitive, way to consider dependency is to take the partition induced by Q as the target class C, and consider P as the attribute set we wish to use in order to "re-construct" the target class C. If P can completely reconstruct C, then Q depends totally upon P; if P results in a poor and perhaps a random reconstruction of C, then Q does not depend upon P at all.
Thus, this measure of dependency expresses the degree of functional (i.e., deterministic) dependency of attribute set
Rule extraction
The category representations discussed above are all extensional in nature; that is, a category or complex class is simply the sum of all its members. To represent a category is, then, just to be able to list or identify all the objects belonging to that category. However, extensional category representations have very limited practical use, because they provide no insight for deciding whether novel (never-before-seen) objects are members of the category.
What is generally desired is an intentional description of the category, a representation of the category based on a set of rules that describe the scope of the category. The choice of such rules is not unique, and therein lies the issue of inductive bias. See Version space and Model selection for more about this issue.
There are a few rule-extraction methods. We will start from a rule-extraction procedure based on Ziarko & Shan (1995).
Decision matrices
Let us say that we wish to find the minimal set of consistent rules (logical implications) that characterize our sample system. For a set of condition attributes
where
This is best explained by example (which also avoids a lot of notation). Consider the table above, and let
First, we look at the case
To read this decision matrix, look, for example, at the intersection of row
Next, from each decision matrix we form a set of Boolean expressions, one expression for each row of the matrix. The items within each cell are aggregated disjunctively, and the individuals cells are then aggregated conjunctively. Thus, for the above table we have the following five Boolean expressions:
Each statement here is essentially a highly specific (probably too specific) rule governing the membership in class
- Either
P 1 P 3 -
P 2 - Either
P 1 P 3 - Either
P 1 P 2 P 3 -
P 2
It is clear that there is a large amount of redundancy here, and the next step is to simplify using traditional Boolean algebra. The statement
Likewise, the statement
The above implications can also be written as the following rule set:
It can be noted that each of the first two rules has a support of 1 (i.e., the antecedent matches two objects), while each of the last two rules has a support of 2. To finish writing the rule set for this knowledge system, the same procedure as above (starting with writing a new decision matrix) should be followed for the case of
LERS rule induction system
The data system LERS (Learning from Examples based on Rough Sets) Grzymala-Busse (1997) may induce rules from inconsistent data, i.e., data with conflicting objects. Two objects are conflicting when they are characterized by the same values of all attributes, but they belong to different concepts (classes). LERS uses rough set theory to compute lower and upper approximations for concepts involved in conflicts with other concepts.
Rules induced from the lower approximation of the concept certainly describe the concept, hence such rules are called certain. On the other hand, rules induced from the upper approximation of the concept describe the concept possibly, so these rules are called possible. For rule induction LERS uses three algorithms: LEM1, LEM2, and IRIM.
The LEM2 algorithm of LERS is frequently used for rule induction and is used not only in LERS but also in other systems, e.g., in RSES (Bazan et al. (2004). LEM2 explores the search space of attribute-value pairs. Its input data set is a lower or upper approximation of a concept, so its input data set is always consistent. In general, LEM2 computes a local covering and then converts it into a rule set. We will quote a few definitions to describe the LEM2 algorithm.
The LEM2 algorithm is based on an idea of an attribute-value pair block. Let
Set
each member
For our sample information system, LEM2 will induce the following rules:
Other rule-learning methods can be found, e.g., in Pawlak (1991), Stefanowski (1998), Bazan et al. (2004), etc.
Incomplete data
Rough set theory is useful for rule induction from incomplete data sets. Using this approach we can distinguish between three types of missing attribute values: lost values (the values that were recorded but currently are unavailable), attribute-concept values (these missing attribute values may be replaced by any attribute value limited to the same concept), and "do not care" conditions (the original values were irrelevant). A concept (class) is a set of all objects classified (or diagnosed) the same way.
Two special data sets with missing attribute values were extensively studied: in the first case, all missing attribute values were lost (Stefanowski and Tsoukias, 2001), in the second case, all missing attribute values were "do not care" conditions (Kryszkiewicz, 1999).
In attribute-concept values interpretation of a missing attribute value, the missing attribute value may be replaced by any value of the attribute domain restricted to the concept to which the object with a missing attribute value belongs (Grzymala-Busse and Grzymala-Busse, 2007). For example, if for a patient the value of an attribute Temperature is missing, this patient is sick with flu, and all remaining patients sick with flu have values high or very-high for Temperature when using the interpretation of the missing attribute value as the attribute-concept value, we will replace the missing attribute value with high and very-high. Additionally, the characteristic relation, (see, e.g., Grzymala-Busse and Grzymala-Busse, 2007) enables to process data sets with all three kind of missing attribute values at the same time: lost, "do not care" conditions, and attribute-concept values.
Applications
Rough set methods can be applied as a component of hybrid solutions in machine learning and data mining. They have been found to be particularly useful for rule induction and feature selection (semantics-preserving dimensionality reduction). Rough set-based data analysis methods have been successfully applied in bioinformatics, economics and finance, medicine, multimedia, web and text mining, signal and image processing, software engineering, robotics, and engineering (e.g. power systems and control engineering). Recently the three regions of rough sets are interpreted as regions of acceptance, rejection and deferment. This leads to three-way decision making approach with the model which can potentially lead to interesting future applications.
History
The idea of rough set was proposed by Pawlak (1981) as a new mathematical tool to deal with vague concepts. Comer, Grzymala-Busse, Iwinski, Nieminen, Novotny, Pawlak, Obtulowicz, and Pomykala have studied algebraic properties of rough sets. Different algebraic semantics have been developed by P. Pagliani, I. Duntsch, M. K. Chakraborty, M. Banerjee and A. Mani; these have been extended to more generalized rough sets by D. Cattaneo and A. Mani, in particular. Rough sets can be used to represent ambiguity, vagueness and general uncertainty.
Extensions and generalizations
Since the development of rough sets, extensions and generalizations have continued to evolve. Initial developments focused on the relationship - both similarities and difference - with fuzzy sets. While some literature contends these concepts are different, other literature considers that rough sets are a generalization of fuzzy sets - as represented through either fuzzy rough sets or rough fuzzy sets. Pawlak (1995) considered that fuzzy and rough sets should be treated as being complementary to each other, addressing different aspects of uncertainty and vagueness.
Three notable extensions of classical rough sets are:
Rough membership
Rough sets can be also defined, as a generalisation, by employing a rough membership function instead of objective approximation. The rough membership function expresses a conditional probability that
Rough membership primarily differs from the fuzzy membership in that the membership of union and intersection of sets cannot, in general, be computed from their constituent membership as is the case of fuzzy sets. In this, rough membership is a generalization of fuzzy membership. Furthermore, the rough membership function is grounded more in probability than the conventionally held concepts of the fuzzy membership function.
Other generalizations
Several generalizations of rough sets have been introduced, studied and applied to solving problems. Here are some of these generalizations: