|  | ||
Time delay neural network (TDNN) is an artificial neural network architecture whose primary purpose is to work on sequential data. The TDNN units recognise features independent of time-shift (i.e. sequence position) and usually form part of a larger pattern recognition system. An example would be converting continuous audio into a stream of classified phoneme labels for speech recognition.
Contents
An input signal is augmented with delayed copies as other inputs, the neural network is time-shift invariant since it has no internal state.
The original paper presented a perceptron network whose connection weights were trained with the back-propagation algorithm, this may be done in batch or online. The Stuttgart Neural Network Simulator implements that version.
Overview
The Time Delay Neural Network, like other neural networks, operates with multiple interconnected layers composed of clusters. These clusters are meant to represent neurons in a brain and, like the brain, each cluster need only focus on small regions of the input. A proto-typical TDNN has three layers of clusters, one for input, one for output, and the middle layer which handles manipulation of the input through filters. Due to their sequential nature, TDNN’s are implemented as a feedforward neural network instead of a recurrent neural network.
In order to achieve time-shift invariance, a set of delays are added to the input (audio file, image, etc.) so that the data is represented at different points in time. These delays are arbitrary and application specific, which generally means the input data is customized for a specific delay pattern. There has been work done in creating an adaptable time-delay TDNN. where this manual tuning is eradicated. The delays are an attempt to add a temporal dimension to the network which is not present in Recurrent Neural Networks or Multi-Layer Perceptrons with a sliding window. The combination of past inputs with present inputs make the TDNN’s approach unique.
A key feature for TDNN’s are the ability to express a relation between inputs in time. This relation can be the result of a feature detector and is used within the TDNN to recognize patterns between the delayed inputs.
One of the main advantages of neural networks is the lack of a dependence on prior knowledge to set up the banks of filters at each layer. However, this entails that the network must learn the optimal value for these filters through processing numerous training inputs. Supervised learning is generally the learning algorithm associated with TDNN’s due to its strength in pattern recognition and function approximation. Supervised learning is commonly implemented with a back propagation algorithm.
Applications
Speech Recognition
TDNN’s used to solve problems in speech recognition that were introduced in 1989 and initially focused on phoneme detection. Speech lends itself nicely to TDNN’s as spoken sounds are rarely of uniform length. By examining a sound shifted in the past and future, the TDNN is able to construct a model for that sound that is time-invariant. This is especially helpful in speech recognition as different dialects and languages pronounce the same sounds with different lengths. Spectral coefficients are used to describe the relation between the input samples.
Video Analysis
Video has a temporal dimension which makes a TDNN an ideal solution to analyzing motion patterns. An example of this analysis is a combination of vehicle detection and recognizing pedestrians. When examining videos, subsequent images are fed into the TDNN as input where each image is the next frame in the video. The strength of the TDNN comes from its ability to examine objects shifted in time forward and backward to define an object detectable as the time is altered. If an object can be recognized in this manner, an application can plan on that object to be found in the future and perform an optimal action.
Common Libraries
