In theoretical computer science, a small-bias sample space (also known as
ϵ
-biased sample space,
ϵ
-biased generator, or small-bias probability space) is a probability distribution that fools parity functions. In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.
The main useful property of small-bias sample spaces is that they need far fewer truly random bits than the uniform distribution to fool parities. Efficient constructions of small-bias sample spaces have found many applications in computer science, some of which are derandomization, error-correcting codes, and probabilistically checkable proofs. The connection with error-correcting codes is in fact very strong since
ϵ
-biased sample spaces are equivalent to
ϵ
-balanced error-correcting codes.
Let
X
be a probability distribution over
{
0
,
1
}
n
. The bias of
X
with respect to a set of indices
I
⊆
{
1
,
…
,
n
}
is defined as
bias
I
(
X
)
=
|
Pr
x
∼
X
(
∑
i
∈
I
x
i
=
0
)
−
Pr
x
∼
X
(
∑
i
∈
I
x
i
=
1
)
|
=
|
2
⋅
Pr
x
∼
X
(
∑
i
∈
I
x
i
=
0
)
−
1
|
,
where the sum is taken over
F
2
, the finite field with two elements. In other words, the sum
∑
i
∈
I
x
i
equals
0
if the number of ones in the sample
x
∈
{
0
,
1
}
n
at the positions defined by
I
is even, and otherwise, the sum equals
1
. For
I
=
∅
, the empty sum is defined to be zero, and hence
bias
∅
(
X
)
=
1
.
A probability distribution
X
over
{
0
,
1
}
n
is called an
ϵ
-biased sample space if
bias
I
(
X
)
≤
ϵ
holds for all non-empty subsets
I
⊆
{
1
,
2
,
…
,
n
}
.
An
ϵ
-biased sample space
X
that is generated by picking a uniform element from a multiset
X
⊆
{
0
,
1
}
n
is called
ϵ
-biased set. The size
s
of an
ϵ
-biased set
X
is the size of the multiset that generates the sample space.
An
ϵ
-biased generator
G
:
{
0
,
1
}
ℓ
→
{
0
,
1
}
n
is a function that maps strings of length
ℓ
to strings of length
n
such that the multiset
X
G
=
{
G
(
y
)
|
y
∈
{
0
,
1
}
ℓ
}
is an
ϵ
-biased set. The seed length of the generator is the number
ℓ
and is related to the size of the
ϵ
-biased set
X
G
via the equation
s
=
2
ℓ
.
There is a close connection between
ϵ
-biased sets and
ϵ
-balanced linear error-correcting codes. A linear code
C
:
{
0
,
1
}
n
→
{
0
,
1
}
s
of message length
n
and block length
s
is
ϵ
-balanced if the Hamming weight of every nonzero codeword
C
(
x
)
is between
(
1
2
−
ϵ
)
s
and
(
1
2
+
ϵ
)
s
. Since
C
is a linear code, its generator matrix is an
(
n
×
s
)
-matrix
A
over
F
2
with
C
(
x
)
=
x
⋅
A
.
Then it holds that a multiset
X
⊂
{
0
,
1
}
n
is
ϵ
-biased if and only if the linear code
C
X
, whose columns are exactly elements of
X
, is
ϵ
-balanced.
Usually the goal is to find
ϵ
-biased sets that have a small size
s
relative to the parameters
n
and
ϵ
. This is because a smaller size
s
means that the amount of randomness needed to pick a random element from the set is smaller, and so the set can be used to fool parities using few random bits.
The probabilistic method gives a non-explicit construction that achieves size
s
=
O
(
n
/
ϵ
2
)
. The construction is non-explicit in the sense that finding the
ϵ
-biased set requires a lot of true randomness, which does not help towards the goal of reducing the overall randomness. However, this non-explicit construction is useful because it shows that these efficient codes exist. On the other hand, the best known lower bound for the size of
ϵ
-biased sets is
s
=
Ω
(
n
/
(
ϵ
2
log
(
1
/
ϵ
)
)
, that is, in order for a set to be
ϵ
-biased, it must be at least that big.
There are many explicit, i.e., deterministic constructions of
ϵ
-biased sets with various parameter settings:
Naor & Naor (1990) achieve
s
=
n
poly
(
ϵ
)
. The construction makes use of Justesen codes (which is a concatenation of Reed–Solomon codes with the Wozencraft ensemble) as well as expander walk sampling.
Alon et al. (1992) achieve
s
=
O
(
n
ϵ
log
(
n
/
ϵ
)
)
2
. One of their constructions is the concatenation of Reed–Solomon codes with the Hadamard code; this concatenation turns out to be an
ϵ
-balanced code, which gives rise to an
ϵ
-biased sample space via the connection mentioned above.
Concatenating Algebraic geometric codes with the Hadamard code gives an
ϵ
-balanced code with
s
=
O
(
n
ϵ
3
log
(
1
/
ϵ
)
)
.
Ben-Aroya & Ta-Shma (2009) achieves
s
=
O
(
n
ϵ
2
log
(
1
/
ϵ
)
)
5
/
4
.
These bounds are mutually incomparable. In particular, none of these constructions yields the smallest
ϵ
-biased sets for all settings of
ϵ
and
n
.
An important application of small-bias sets lies in the construction of almost k-wise independent sample spaces.
A random variable
Y
over
{
0
,
1
}
n
is a k-wise independent space if, for all index sets
I
⊆
{
1
,
…
,
n
}
of size
k
, the marginal distribution
Y
|
I
is exactly equal to the uniform distribution over
{
0
,
1
}
k
. That is, for all such
I
and all strings
z
∈
{
0
,
1
}
k
, the distribution
Y
satisfies
Pr
Y
(
Y
|
I
=
z
)
=
2
−
k
.
Constructions and bounds
k-wise independent spaces are fairly well-understood.
A simple construction by Joffe (1974) achieves size
n
k
.
Alon, Babai & Itai (1986) construct a k-wise independent space whose size is
n
k
/
2
.
Chor et al. (1985) prove that no k-wise independent space can be significantly smaller than
n
k
/
2
.
Joffe (1974) constructs a
k
-wise independent space
Y
over the finite field with some prime number
n
>
k
of elements, i.e.,
Y
is a distribution over
F
n
n
. The initial
k
marginals of the distribution are drawn independently and uniformly at random:
(
Y
0
,
…
,
Y
k
−
1
)
∼
F
n
k
.
For each
i
with
k
≤
i
<
n
, the marginal distribution of
Y
i
is then defined as
Y
i
=
Y
0
+
Y
1
⋅
i
+
Y
2
⋅
i
2
+
⋯
+
Y
k
−
1
⋅
i
k
−
1
,
where the calculation is done in
F
n
. Joffe (1974) proves that the distribution
Y
constructed in this way is
k
-wise independent as a distribution over
F
n
n
. The distribution
Y
is uniform on its support, and hence, the support of
Y
forms a
k
-wise independent set. It contains all
n
k
strings in
F
n
k
that have been extended to strings of length
n
using the deterministic rule above.
A random variable
Y
over
{
0
,
1
}
n
is a
δ
-almost k-wise independent space if, for all index sets
I
⊆
{
1
,
…
,
n
}
of size
k
, the restricted distribution
Y
|
I
and the uniform distribution
U
k
on
{
0
,
1
}
k
are
δ
-close in 1-norm, i.e.,
∥
Y
|
I
−
U
k
∥
1
≤
δ
.
Naor & Naor (1990) give a general framework for combining small k-wise independent spaces with small
ϵ
-biased spaces to obtain
δ
-almost k-wise independent spaces of even smaller size. In particular, let
G
1
:
{
0
,
1
}
h
→
{
0
,
1
}
n
be a linear mapping that generates a k-wise independent space and let
G
2
:
{
0
,
1
}
ℓ
→
{
0
,
1
}
h
be a generator of an
ϵ
-biased set over
{
0
,
1
}
h
. That is, when given a uniformly random input, the output of
G
1
is a k-wise independent space, and the output of
G
2
is
ϵ
-biased. Then
G
:
{
0
,
1
}
ℓ
→
{
0
,
1
}
n
with
G
(
x
)
=
G
1
(
G
2
(
x
)
)
is a generator of an
δ
-almost
k
-wise independent space, where
δ
=
2
k
/
2
ϵ
.
As mentioned above, Alon, Babai & Itai (1986) construct a generator
G
1
with
h
=
k
2
log
n
, and Naor & Naor (1990) construct a generator
G
2
with
ℓ
=
log
s
=
log
h
+
O
(
log
(
ϵ
−
1
)
)
. Hence, the concatenation
G
of
G
1
and
G
2
has seed length
ℓ
=
log
k
+
log
log
n
+
O
(
log
(
ϵ
−
1
)
)
. In order for
G
to yield a
δ
-almost k-wise independent space, we need to set
ϵ
=
δ
2
−
k
/
2
, which leads to a seed length of
ℓ
=
log
log
n
+
O
(
k
+
log
(
δ
−
1
)
)
and a sample space of total size
2
ℓ
≤
log
n
⋅
poly
(
2
k
⋅
δ
−
1
)
.