Maximally informative dimensions is a dimensionality reduction technique used in the statistical analyses of neural responses. Specifically, it is a way of projecting a stimulus onto a lowdimensional subspace so that as much information as possible about the stimulus is preserved in the neural response. It is motivated by the fact that natural stimuli are typically confined by their statistics to a lowerdimensional space than that spanned by white noise. Within this subspace, however, stimulusresponse functions may be either linear or nonlinear. The idea was originally developed by Tatyana Sharpee, Nicole Rust, and William Bialek in 2003.
Neural stimulusresponse functions are typically given as the probability of a neuron generating an action potential, or spike, in response to a stimulus
s
. The goal of maximally informative dimensions is to find a small relevant subspace of the much larger stimulus space that accurately captures the salient features of
s
. Let
D
denote the dimensionality of the entire stimulus space and
K
denote the dimensionality of the relevant subspace, such that
K
≪
D
. We let
{
v
K
}
denote the basis of the relevant subspace, and
s
K
the projection of
s
onto
{
v
K
}
. Using Bayes' theorem we can write out the probability of a spike given a stimulus:
P
(
s
p
i
k
e

s
K
)
=
P
(
s
p
i
k
e
)
f
(
s
K
)
where
f
(
s
K
)
=
P
(
s
K

s
p
i
k
e
)
P
(
s
K
)
is some nonlinear function of the projected stimulus.
In order to choose the optimal
{
v
K
}
, we compare the prior stimulus distribution
P
(
s
)
with the spiketriggered stimulus distribution
P
(
s

s
p
i
k
e
)
using the Shannon information. The average information (averaged across all presented stimuli) per spike is given by
I
s
p
i
k
e
=
∑
s
P
(
s

s
p
i
k
e
)
l
o
g
2
[
P
(
s

s
p
i
k
e
)
/
P
(
s
)
]
.
Now consider a
K
=
1
dimensional subspace defined by a single direction
v
. The average information conveyed by a single spike about the projection
x
=
s
⋅
v
is
I
(
v
)
=
∫
d
x
P
v
(
x

s
p
i
k
e
)
l
o
g
2
[
P
v
(
x

s
p
i
k
e
)
/
P
v
(
x
)
]
,
where the probability distributions are approximated by a measured data set via
P
v
(
x

s
p
i
k
e
)
=
⟨
δ
(
x
−
s
⋅
v
)

s
p
i
k
e
⟩
s
and
P
v
(
x
)
=
⟨
δ
(
x
−
s
⋅
v
)
⟩
s
, i.e., each presented stimulus is represented by a scaled Dirac delta function and the probability distributions are created by averaging over all spikeeliciting stimuli, in the former case, or the entire presented stimulus set, in the latter case. For a given dataset, the average information is a function only of the direction
v
. Under this formulation, the relevant subspace of dimension
K
=
1
would be defined by the direction
v
that maximizes the average information
I
(
v
)
.
This procedure can readily be extended to a relevant subspace of dimension
K
>
1
by defining
P
v
K
(
x

s
p
i
k
e
)
=
⟨
∏
i
=
1
K
δ
(
x
i
−
s
⋅
v
i
)

s
p
i
k
e
⟩
s
and
P
v
K
(
x
)
=
⟨
∏
i
=
1
K
δ
(
x
i
−
s
⋅
v
i
)
⟩
s
and maximizing
I
(
v
K
)
.
Maximally informative dimensions does not make any assumptions about the Gaussianity of the stimulus set, which is important, because naturalistic stimuli tend to have nonGaussian statistics. In this way the technique is more robust than other dimensionality reduction techniques such as spiketriggered covariance analyses.