Feature extraction

Updated on Sep 20, 2024

Edit

Comment

Mod 01 lec 02 feature extraction i

In machine learning, pattern recognition and in image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning and generalization steps, and in some cases leading to better human interpretations. Feature extraction is related to dimensionality reduction.

When the input data to an algorithm is too large to be processed and it is suspected to be redundant (e.g. the same measurement in both feet and meters, or the repetitiveness of images presented as pixels), then it can be transformed into a reduced set of features (also named a feature vector). Determining a subset of the initial features is called feature selection. The selected features are expected to contain the relevant information from the input data, so that the desired task can be performed by using this reduced representation instead of the complete initial data.

Feature extraction machine learning 6

General

Feature extraction involves reducing the amount of resources required to describe a large set of data. When performing analysis of complex data one of the major problems stems from the number of variables involved. Analysis with a large number of variables generally requires a large amount of memory and computation power, also it may cause a classification algorithm to overfit to training samples and generalize poorly to new samples. Feature extraction is a general term for methods of constructing combinations of the variables to get around these problems while still describing the data with sufficient accuracy.

The best results are achieved when an expert constructs a set of application-dependent features, a process called feature engineering. Nevertheless, if no such expert knowledge is available, general dimensionality reduction techniques may help. These include:

Independent component analysis

Isomap

Kernel PCA

Latent semantic analysis

Partial least squares

Principal component analysis

Multifactor dimensionality reduction

Nonlinear dimensionality reduction

Multilinear Principal Component Analysis

Multilinear subspace learning

Semidefinite embedding

Autoencoder

Deep feature synthesis

Image processing

One very important area of application is image processing, in which algorithms are used to detect and isolate various desired portions or shapes (features) of a digitized image or video stream. It is particularly important in the area of optical character recognition.

Low-level

Edge detection

Corner detection

Blob detection

Ridge detection

Scale-invariant feature transform

Curvature

Edge direction, changing intensity, autocorrelation.

Image motion

Motion detection. Area based, differential approach. Optical flow.

Shape based

Thresholding

Blob extraction

Template matching

Hough transform

Lines

Circles/ellipses

Arbitrary shapes (generalized Hough transform)

Works with any parameterizable feature (class variables, cluster detection, etc..)

Flexible methods

Deformable, parameterized shapes

Active contours (snakes)

Feature extraction in software

Many data analysis software packages provide for feature extraction and dimension reduction. Common numerical programming environments such as MATLAB, SciLab, NumPy and the R language provide some of the simpler feature extraction techniques (e.g. principal component analysis) via built-in commands. More specific algorithms are often available as publicly available scripts or third-party add-ons.

References

Feature extraction Wikipedia

(Text) CC BY-SA