In machine learning, the delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network. It is a special case of the more general backpropagation algorithm. For a neuron
where
It holds that
The delta rule is commonly stated in simplified form for a neuron with a linear activation function as
While the delta rule is similar to the perceptron's update rule, the derivation is different. The perceptron uses the Heaviside step function as the activation function
Derivation of the delta rule
The delta rule is derived by attempting to minimize the error in the output of the neural network through gradient descent. The error for a neural network with
In this case, we wish to move through "weight space" of the neuron (the space of all possible values of all of the neuron's weights) in proportion to the gradient of the error function with respect to each weight. In order to do that, we calculate the partial derivative of the error with respect to each weight. For the
Because we are only concerning ourselves with the
Next we use the chain rule to split this into two derivatives:
To find the left derivative, we simply apply the general power rule:
To find the right derivative, we again apply the chain rule, this time differentiating with respect to the total input to
Note that the output of the
Next we rewrite
Because we are only concerned with the
giving us our final equation for the gradient:
As noted above, gradient descent tells us that our change for each weight should be proportional to the gradient. Choosing a proportionality constant