Definition: A
t
×
n
matrix
M
is
d
-separable if and only if
∀
S
1
≠
S
2
⊆
[
n
]
where
|
S
1
|
,
|
S
2
|
≤
d
such that
⋃
j
∈
S
1
M
j
≠
⋃
i
∈
S
2
M
i
First we will describe another way to look at the problem of group testing and how to decode it from a different notation. We can give a new interpretation of how group testing works as follows:
Group testing: Given input
M
and
r
such that
r
=
M
x
output
x
Take
M
j
to be the
j
t
h
column of
M
Define
S
M
j
⊆
[
t
]
so that
M
j
(
i
)
=
1
if and only if
i
∈
S
M
j
This gives that
S
r
=
⋃
j
∈
[
n
]
,
x
j
=
1
S
M
j
This formalizes the relation between
x
and the columns of
M
and
r
in a way more suitable to the thinking of
d
-separable and
d
-disjunct matrices. The algorithm to decode a
d
-separable matrix is as follows:
Given a
t
×
n
matrix
M
such that
M
is
d
-separable:
- For each
T
⊆
[
n
]
such that
|
T
|
≤
d
check if
S
r
=
⋃
j
∈
T
S
M
j
This algorithm runs in time
n
O
(
d
)
.
In literature disjunct matrices are also called super-imposed codes and d-cover-free families.
Definition: A
t
x
n
matrix
M
is d-disjunct if
∀
S
⊆
[
n
]
such that
|
S
|
≤
d
,
∀
j
∉
S
∃
i
such that
M
i
,
j
=
1
but
∀
k
∈
S
,
M
i
,
k
=
0
. Denoting
M
a
is the
a
t
h
column of
M
and
S
M
a
⊆
[
t
]
where
M
a
(
b
)
=
1
if and only if
b
∈
S
M
a
gives that
M
is
d
-disjunct if and only if
S
M
j
⊄
∪
k
∈
S
S
M
k
Claim:
M
is
d
-disjunct implies
M
is
d
-separable
Proof: (by contradiction) Let
M
be a
t
x
n
d
-disjunct matrix. Assume for contradiction that
M
is not
d
-separable. Then there exists
T
1
,
T
2
∈
[
n
]
and
T
1
≠
T
2
with
|
T
1
|
,
|
T
2
|
≤
d
such that
⋃
i
∈
T
1
M
i
=
∪
i
∈
T
2
S
M
i
. This implies that
∃
j
∈
T
2
∖
T
1
such that
S
M
j
⊆
⋃
k
∈
T
1
T
M
k
. This contradicts the fact that
M
is
d
-disjunct. Therefore
M
is
d
-separable.
◻
The algorithm for
d
-separable matrices was still a polynomial in
n
. The following will give a nicer algorithm for
d
-disjunct matrices which will be a
d
multiple instead of raised to the power of
d
given our bounds for
t
. The algorithm is as follows in the proof of the following lemma:
Lemma 1: There exists an
O
(
n
t
)
time decoding for any
d
-disjunct
t
x
n
matrix.
Observation 1: For any matrix
M
and given
M
x
=
r
if
r
i
=
1
it implies
∃
j
such that
M
i
,
j
=
1
and
x
j
=
1
where
1
≤
i
≤
t
and
1
≤
j
≤
n
. The opposite is also true. If
r
i
=
0
it implies
∀
j
if
M
i
,
j
=
1
then
x
j
=
0
. This is the case because
r
is generated by taking all of the logical or of the
x
j
's where
M
i
,
j
=
1
.
Observation 2: For any
d
-disjunct matrix and every set
T
=
{
j
|
x
j
=
1
}
where
|
T
|
≤
d
and for each
j
∉
T
where
1
≤
j
≤
n
there exists some
i
where
1
≤
i
≤
t
such that
M
i
,
j
=
1
but
M
i
,
l
=
0
∀
l
∈
T
. Thus, if
r
i
=
0
then
x
j
=
0
.
Proof of Lemma 1: Given as input
r
∈
{
0
,
1
}
t
,
M
use the following algorithm:
- For each
j
∈
[
n
]
set
x
j
=
1
- For
i
=
1
…
t
, if
r
i
=
0
then for all
j
∈
[
n
]
, if
M
i
,
j
=
1
set
x
j
=
0
By Observation 1 we get that any position where
r
i
=
0
the appropriate
x
j
's will be set to 0 by step 2 of the algorithm. By Observation 2 we have that there is at least one
i
such that if
x
j
is supposed to be 1 then
M
i
,
j
=
1
and, if
x
j
is supposed to be 1, it can only be the case that
r
i
=
1
as well. Therefore step 2 will never assign
x
j
the value 0 leaving it as a 1 and solving for
x
. This takes time
O
(
n
t
)
overall.
◻
Definition:A matrix
M
is
d
e
-disjunct if for any
d
+
1
columns
M
0
′
,
M
1
′
,
⋯
,
M
d
′
of
M
there are at least
e
+
1
elements
1
in
M
0
′
−
∪
i
=
1
d
M
i
′
.
Definition:Let
M
be a
t
×
n
d
e
-disjunct matrix. The output
o
(
S
)
of
S
in
M
is the union of those columns indexed by
S
, where
S
is a subset of
{
1
,
2
,
⋯
,
n
}
with at most size
d
.
Proposition: Let
M
be a
d
e
-disjunct matrix. Let
S
,
T
⊆
{
1
,
2
,
⋯
,
n
}
be two distinct subsets with each at most
d
elements. Then the Hamming distance of the outputs
o
(
S
)
and
o
(
T
)
is at least
e
+
1
.
Proof: Without loss of generality, we may assume
∃
j
s.t.
j
∈
S
and
j
∉
T
. We consider the
j
-th column of
M
and those columns of
M
indexed by
T
, then we can find
e
+
1
different
ℓ
such that
M
ℓ
j
=
1
and
M
ℓ
k
=
0
for all
k
∈
T
, because the definition of
d
e
-disjunct. Hence we complete the proof.
Then we have the following corollary.
Corollary We can detect
e
errors and correct
⌊
e
2
⌋
errors on the outcome of
d
e
-disjunct matrix.
The results for these upper bounds rely mostly on the properties of
d
-disjunct matrices. Not only are the upper bounds nice, but from Lemma 1 we know that there is also a nice decoding algorithm for these bounds. First the following lemma will be proved since it is relied upon for both constructions:
Lemma 2: Given
1
≤
d
≤
n
let
M
be a
t
×
n
matrix and:
-
∀
j
∈
[
n
]
,
|
S
M
j
|
≥
w
min
-
∀
i
≠
j
∈
[
n
]
,
|
S
M
i
∩
S
M
j
|
≤
a
max
for some integers
a
max
≤
w
min
≤
t
then
M
is
≥
d
′
⌊
w
min
−
1
a
max
⌋
-disjunct.
Note: these conditions are stronger than simply having a subset of size
d
but rather applies to any pair of columns in a matrix. Therefore no matter what column
i
that is chosen in the matrix, that column will contain at least
w
min
1's and the total number of shared 1's by any two columns is
a
max
.
Proof of Lemma 2: Fix an arbitrary
S
⊆
[
n
]
,
|
S
|
≤
d
,
j
∉
S
and a matrix
M
. There exists a match between
i
∈
S
and
j
∉
S
if column
i
has a 1 in the same row position as in column
j
. Then the total number of matches is
≤
a
max
⋅
d
≤
a
max
⋅
(
w
min
−
1
a
max
)
=
w
min
−
1
<
w
min
, i.e. a column
j
has a fewer number of matches than the number of ones in it. Therefore there must be a row with all 0s in
S
but a 1 in
j
.
◻
We will now generate constructions for the bounds.
Constructions can be either randomized (brute-force), explicit or strongly explicit. This concerns the time in which the incidence matrix can be generated. An explicit construction for a
n
×
m
matrix has a complexity
O
(
poly
(
n
+
m
)
)
, whereas a randomized construction takes more than that. For a strongly explicit construction, one can find a single entry of the matrix in
poly
(
m
)
time.
Randomized construction
This first construction will use a probabilistic argument to show the property wanted, in particular the Chernoff bound. Using this randomized construction gives that
t
(
d
,
n
)
≤
O
(
d
2
log
n
)
. The following lemma will give the result needed.
Theorem 1: There exists a random
d
-disjunct matrix with
O
(
d
2
log
n
)
rows.
Proof of Theorem 1: Begin by building a random
t
×
n
matrix
M
with
t
=
c
d
2
log
n
(where
c
will be picked later). It will be shown that
M
is
Ω
(
d
)
-disjunct. First note that
M
i
,
j
∈
{
0
,
1
}
and let
M
i
,
j
=
1
independently with probability
1
d
for
i
∈
[
t
]
and
j
∈
[
n
]
. Now fix
j
∈
[
n
]
. Denote the
j
t
h
column of
M
as
T
j
⊆
[
t
]
. Then the expectancy is
E
[
|
T
j
|
]
=
t
d
. Using the Chernoff bound, with
μ
=
1
2
, gives
P
r
[
|
T
j
|
<
t
2
d
]
≤
e
−
t
12
d
=
e
−
c
d
log
n
12
≤
n
−
2
d
[
if
c
≥
24
]
. Taking the union bound over all columns gives
P
r
[
∃
j
,
|
T
j
|
<
t
2
d
]
≤
n
⋅
n
−
2
d
≤
n
−
d
. This gives
P
r
[
∀
j
,
|
T
j
|
≥
t
2
d
]
≥
1
−
n
−
d
. Therefore
w
min
≥
t
2
d
with probability
≥
1
−
n
−
d
.
Now suppose
j
≠
k
∈
[
n
]
and
i
∈
[
t
]
then
P
r
[
M
i
,
j
=
M
i
,
k
=
1
]
=
1
d
2
. So
E
[
|
T
j
∩
T
k
|
]
=
t
d
2
. Using the Chernoff bound on this gives
P
r
[
|
T
j
∩
T
k
|
>
2
t
d
2
]
≤
e
−
t
3
d
2
=
e
−
2
log
n
≤
n
−
4
[
if
c
≥
12
]
. By the union bound over
(
j
,
k
)
pairs
P
r
[
∃
(
j
,
k
)
such that
|
T
j
∩
T
k
|
>
2
t
d
2
]
≤
n
2
⋅
n
−
4
=
n
−
2
. This gives that
a
max
≤
2
t
d
2
and
w
min
≥
t
2
d
with probability
≥
1
−
n
−
d
−
n
−
2
≥
1
−
1
n
. Note that by changing
c
the probability
1
−
1
n
can be made to be
1
−
1
p
o
l
y
(
n
)
. Thus
d
′
=
⌊
t
2
d
−
1
2
t
d
2
⌋
≈
d
4
. By setting
d
to be
4
d
, the above argument shows that
M
is
d
-disjunct.
Note that in this proof
t
=
d
2
log
n
thus giving the upper bound of
t
(
d
,
n
)
≤
O
(
d
2
log
n
)
.
◻
It is possible to prove a bound of
t
(
d
,
n
)
≤
O
(
d
2
log
2
n
)
using a strongly explicit code. Although this bound is worse by a
log
n
factor, it is preferable because this produces a strongly explicit construction instead of a randomized one.
Theorem 2: There exists a strongly explicit
d
-disjunct matrix with
O
(
d
2
log
2
n
)
rows.
This proof will use the properties of concatenated codes along with the properties of disjunct matrices to construct a code that will satisfy the bound we are after.
Proof of Theorem 2: Let
C
⊆
{
0
,
1
}
t
,
|
C
|
=
n
such that
C
=
{
c
1
,
…
,
c
n
}
. Denote
M
C
as the matrix with its
i
t
h
column being
c
i
. If
C
∗
can be found such that
-
∀
i
∈
C
∗
,
|
c
i
|
≥
w
min
-
∀
c
1
≠
c
2
∈
C
∗
,
|
{
i
|
c
i
1
=
c
i
2
=
1
}
|
≤
a
max
,
then
M
C
∗
is
⌊
w
min
−
1
a
max
⌋
-disjunct. To complete the proof another concept must be introduced. This concept uses code concatenation to obtain the result we want.
Kautz-Singleton '64
Let
C
∗
=
C
o
u
t
∘
C
i
n
. Let
C
o
u
t
be a
[
q
,
k
]
q
-Reed–Solomon code. Let
C
i
n
=
[
q
]
→
{
0
,
1
}
q
such that for
i
∈
[
q
]
,
c
i
n
(
i
)
=
(
0
,
…
,
0
,
1
,
0
,
…
,
0
)
where the 1 is in the
i
t
h
position. Then
n
=
q
k
,
t
=
q
2
, and
w
min
=
q
.
---
Example: Let
k
=
1
,
q
=
3
,
C
o
u
t
=
{
(
0
,
0
,
0
)
,
(
1
,
1
,
1
)
,
(
2
,
2
,
2
)
}
. Below,
M
C
denotes the matrix of codewords for
C
o
u
t
and
M
C
∗
denotes the matrix of codewords for
C
∗
=
C
o
u
t
∘
C
i
n
, where each column is a codeword. The overall image shows the transition from the outer code to the concatenated code.
M
C
=
[
0
1
2
0
1
2
0
1
2
]
⇒
M
C
∗
=
[
0
0
1
0
1
0
1
0
0
0
0
1
0
1
0
1
0
0
0
0
1
0
1
0
1
0
0
]
---
Divide the rows of
M
C
∗
into sets of size
q
and number them as
(
i
,
j
)
∈
[
q
]
×
[
q
]
where
i
indexes the set of rows and
j
indexes the row in the set. If
M
(
i
,
j
)
,
k
1
=
M
(
i
,
j
)
,
k
2
=
1
then note that
c
k
1
(
i
)
=
c
k
2
(
i
)
=
j
where
c
k
1
,
c
k
2
∈
C
o
u
t
. So that means
|
M
k
1
∩
M
k
2
|
=
q
−
Δ
(
c
k
1
,
c
k
2
)
. Since
Δ
(
c
k
1
,
c
k
2
)
≥
q
−
k
+
1
it gives that
|
M
k
1
∩
M
k
2
|
≤
k
−
1
so let
a
max
=
k
−
1
. Since
t
=
q
2
, the entries in each column of
M
C
∗
can be looked at as
q
sets of
q
entries where only one of the entries is nonzero (by definition of
C
i
n
) which gives a total of
q
nonzero entries in each column. Therefore
w
min
=
q
and
d
=
d
e
f
⌊
w
min
−
1
a
max
⌋
(so
M
C
∗
is
d
-disjunct).
Now pick
q
and
k
such that
⌊
q
−
1
k
−
1
⌋
=
d
(so
⌊
q
k
⌋
≈
d
). Since
q
k
=
n
we have
k
=
log
n
log
q
≤
log
n
. Since
q
≈
k
d
and
t
=
q
2
it gives that
t
=
q
2
≈
(
k
d
)
2
≤
(
d
log
n
)
2
.
◻
Thus we have a strongly explicit construction for a code that can be used to form a group testing matrix and so
t
(
d
,
n
)
≤
(
d
log
n
)
2
.
For non-adaptive testing we have shown that
Ω
(
d
log
n
)
≤
t
(
d
,
n
)
and we have that (i)
t
(
d
,
n
)
≤
O
(
d
2
log
2
n
)
(strongly explicit) and (ii)
t
(
d
,
n
)
≤
O
(
d
2
log
n
)
(randomized). As of recent work by Porat and Rothscheld, they presented an explicit method construction (i.e. deterministic time but not strongly explicit) for
t
(
d
,
n
)
≤
O
(
d
2
log
n
)
, however it is not shown here. There is also a lower bound for disjunct matrices of
t
(
d
,
n
)
≥
Ω
(
d
2
log
d
log
n
)
which is not shown here either.
Here is the 2-disjunct matrix
M
9
×
12
:
M
9
×
12
=
[
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
1
1
1
0
0
0
1
0
0
1
1
1
0
0
0
0
0
0
1
0
0
0
0
1
0
0
1
0
0
1
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
1
0
0
1
0
0
1
0
0
0
1
0
0
1
0
1
0
0
0
0
1
0
0
1
0
0
1
0
1
0
1
0
0
0
0
1
1
0
0
0
0
1
0
1
0
0
0
1
]