The block Wiedemann algorithm for computing kernel vectors of a matrix over a finite field is a generalisation of an algorithm due to Don Coppersmith.
Let
M
be an
n
×
n
square matrix over some finite field F, let
x
b
a
s
e
be a random vector of length
n
, and let
x
=
M
x
b
a
s
e
. Consider the sequence of vectors
S
=
[
x
,
M
x
,
M
2
x
,
…
]
obtained by repeatedly multiplying the vector by the matrix
M
; let
y
be any other vector of length
n
, and consider the sequence of finite-field elements
S
y
=
[
y
⋅
x
,
y
⋅
M
x
,
y
⋅
M
2
x
…
]
We know that the matrix
M
has a minimal polynomial; by the Cayley–Hamilton theorem we know that this polynomial is of degree (which we will call
n
0
) no more than
n
. Say
∑
r
=
0
n
0
p
r
M
r
=
0
. Then
∑
r
=
0
n
0
y
⋅
(
p
r
(
M
r
x
)
)
=
0
; so the minimal polynomial of the matrix annihilates the sequence
S
and hence
S
y
.
But the Berlekamp–Massey algorithm allows us to calculate relatively efficiently some sequence
q
0
…
q
L
with
∑
i
=
0
L
q
i
S
y
[
i
+
r
]
=
0
∀
r
. Our hope is that this sequence, which by construction annihilates
y
⋅
S
, actually annihilates
S
; so we have
∑
i
=
0
L
q
i
M
i
x
=
0
. We then take advantage of the initial definition of
x
to say
M
∑
i
=
0
L
q
i
M
i
x
b
a
s
e
=
0
and so
∑
i
=
0
L
q
i
M
i
x
b
a
s
e
is a hopefully non-zero kernel vector of
M
.
The natural implementation of sparse matrix arithmetic on a computer makes it easy to compute the sequence S in parallel for a number of vectors equal to the width of a machine word – indeed, it will normally take no longer to compute for that many vectors than for one. If you have several processors, you can compute the sequence S for a different set of random vectors in parallel on all the computers.
It turns out, by a generalization of the Berlekamp–Massey algorithm to provide a sequence of small matrices, that you can take the sequence produced for a large number of vectors and generate a kernel vector of the original large matrix. You need to compute
y
i
⋅
M
t
x
j
for some
i
=
0
…
i
max
,
j
=
0
…
j
max
,
t
=
0
…
t
max
where
i
max
,
j
max
,
t
max
need to satisfy
t
max
>
d
i
max
+
d
j
max
+
O
(
1
)
and
y
i
are a series of vectors of length n; but in practice you can take
y
i
as a sequence of unit vectors and simply write out the first
i
max
entries in your vectors at each time t.