Data Reduction is the transformation of data into a corrupted, ordered and simplified form.
Classification of Data Reduction techniques
Data cube aggregation
➢ Aggregation operations are applied to the data in the construction of a data cube.
Attribute subset selection – Irrelevant, weakly relevant or redundant attributes or dimensions may be detected and removed. The familiar ways are:
➢ Step wise forward selection
➢ Step wise backward elimination
➢ Combination of forward selection and backward elimination
➢ Decision tree induction-ID3, C4.5, ASSISTANT, CART
Dimensionality reduction – Encoding or transformation mechanisms are applied to reduce the size of the dataset.
Numerosity reduction
➢ The data are replaced or estimated by alternative, smaller data representations such as parametric models or nonparametric methods such as clustering, sampling, and the use of histograms.
Discretization and concept hierarchy generation
➢ Raw data values for attributes are replaced by ranges or higher conceptual levels.
➢ Discretization and concept hierarchy generation are powerful tools for data mining, in that they allow the mining of data at multiple levels of abstraction.