In theoretical computer science, a small-bias sample space (also known as ϵ -biased sample space, ϵ -biased generator, or small-bias probability space) is a probability distribution that fools parity functions. In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.
The main useful property of small-bias sample spaces is that they need far fewer truly random bits than the uniform distribution to fool parities. Efficient constructions of small-bias sample spaces have found many applications in computer science, some of which are derandomization, error-correcting codes, and probabilistically checkable proofs. The connection with error-correcting codes is in fact very strong since ϵ -biased sample spaces are equivalent to ϵ -balanced error-correcting codes.
Let X be a probability distribution over { 0 , 1 } n . The bias of X with respect to a set of indices I ⊆ { 1 , … , n } is defined as
bias I ( X ) = | Pr x ∼ X ( ∑ i ∈ I x i = 0 ) − Pr x ∼ X ( ∑ i ∈ I x i = 1 ) | = | 2 ⋅ Pr x ∼ X ( ∑ i ∈ I x i = 0 ) − 1 | , where the sum is taken over F 2 , the finite field with two elements. In other words, the sum ∑ i ∈ I x i equals 0 if the number of ones in the sample x ∈ { 0 , 1 } n at the positions defined by I is even, and otherwise, the sum equals 1 . For I = ∅ , the empty sum is defined to be zero, and hence bias ∅ ( X ) = 1 .
A probability distribution X over { 0 , 1 } n is called an ϵ -biased sample space if bias I ( X ) ≤ ϵ holds for all non-empty subsets I ⊆ { 1 , 2 , … , n } .
An ϵ -biased sample space X that is generated by picking a uniform element from a multiset X ⊆ { 0 , 1 } n is called ϵ -biased set. The size s of an ϵ -biased set X is the size of the multiset that generates the sample space.
An ϵ -biased generator G : { 0 , 1 } ℓ → { 0 , 1 } n is a function that maps strings of length ℓ to strings of length n such that the multiset X G = { G ( y ) | y ∈ { 0 , 1 } ℓ } is an ϵ -biased set. The seed length of the generator is the number ℓ and is related to the size of the ϵ -biased set X G via the equation s = 2 ℓ .
There is a close connection between ϵ -biased sets and ϵ -balanced linear error-correcting codes. A linear code C : { 0 , 1 } n → { 0 , 1 } s of message length n and block length s is ϵ -balanced if the Hamming weight of every nonzero codeword C ( x ) is between ( 1 2 − ϵ ) s and ( 1 2 + ϵ ) s . Since C is a linear code, its generator matrix is an ( n × s ) -matrix A over F 2 with C ( x ) = x ⋅ A .
Then it holds that a multiset X ⊂ { 0 , 1 } n is ϵ -biased if and only if the linear code C X , whose columns are exactly elements of X , is ϵ -balanced.
Usually the goal is to find ϵ -biased sets that have a small size s relative to the parameters n and ϵ . This is because a smaller size s means that the amount of randomness needed to pick a random element from the set is smaller, and so the set can be used to fool parities using few random bits.
The probabilistic method gives a non-explicit construction that achieves size s = O ( n / ϵ 2 ) . The construction is non-explicit in the sense that finding the ϵ -biased set requires a lot of true randomness, which does not help towards the goal of reducing the overall randomness. However, this non-explicit construction is useful because it shows that these efficient codes exist. On the other hand, the best known lower bound for the size of ϵ -biased sets is s = Ω ( n / ( ϵ 2 log ( 1 / ϵ ) ) , that is, in order for a set to be ϵ -biased, it must be at least that big.
There are many explicit, i.e., deterministic constructions of ϵ -biased sets with various parameter settings:
Naor & Naor (1990) achieve s = n poly ( ϵ ) . The construction makes use of Justesen codes (which is a concatenation of Reed–Solomon codes with the Wozencraft ensemble) as well as expander walk sampling.Alon et al. (1992) achieve s = O ( n ϵ log ( n / ϵ ) ) 2 . One of their constructions is the concatenation of Reed–Solomon codes with the Hadamard code; this concatenation turns out to be an ϵ -balanced code, which gives rise to an ϵ -biased sample space via the connection mentioned above.Concatenating Algebraic geometric codes with the Hadamard code gives an ϵ -balanced code with s = O ( n ϵ 3 log ( 1 / ϵ ) ) .Ben-Aroya & Ta-Shma (2009) achieves s = O ( n ϵ 2 log ( 1 / ϵ ) ) 5 / 4 .These bounds are mutually incomparable. In particular, none of these constructions yields the smallest ϵ -biased sets for all settings of ϵ and n .
An important application of small-bias sets lies in the construction of almost k-wise independent sample spaces.
A random variable Y over { 0 , 1 } n is a k-wise independent space if, for all index sets I ⊆ { 1 , … , n } of size k , the marginal distribution Y | I is exactly equal to the uniform distribution over { 0 , 1 } k . That is, for all such I and all strings z ∈ { 0 , 1 } k , the distribution Y satisfies Pr Y ( Y | I = z ) = 2 − k .
Constructions and bounds
k-wise independent spaces are fairly well-understood.
A simple construction by Joffe (1974) achieves size n k .Alon, Babai & Itai (1986) construct a k-wise independent space whose size is n k / 2 .Chor et al. (1985) prove that no k-wise independent space can be significantly smaller than n k / 2 .Joffe (1974) constructs a k -wise independent space Y over the finite field with some prime number n > k of elements, i.e., Y is a distribution over F n n . The initial k marginals of the distribution are drawn independently and uniformly at random:
( Y 0 , … , Y k − 1 ) ∼ F n k .
For each i with k ≤ i < n , the marginal distribution of Y i is then defined as
Y i = Y 0 + Y 1 ⋅ i + Y 2 ⋅ i 2 + ⋯ + Y k − 1 ⋅ i k − 1 , where the calculation is done in F n . Joffe (1974) proves that the distribution Y constructed in this way is k -wise independent as a distribution over F n n . The distribution Y is uniform on its support, and hence, the support of Y forms a k -wise independent set. It contains all n k strings in F n k that have been extended to strings of length n using the deterministic rule above.
A random variable Y over { 0 , 1 } n is a δ -almost k-wise independent space if, for all index sets I ⊆ { 1 , … , n } of size k , the restricted distribution Y | I and the uniform distribution U k on { 0 , 1 } k are δ -close in 1-norm, i.e., ∥ Y | I − U k ∥ 1 ≤ δ .
Naor & Naor (1990) give a general framework for combining small k-wise independent spaces with small ϵ -biased spaces to obtain δ -almost k-wise independent spaces of even smaller size. In particular, let G 1 : { 0 , 1 } h → { 0 , 1 } n be a linear mapping that generates a k-wise independent space and let G 2 : { 0 , 1 } ℓ → { 0 , 1 } h be a generator of an ϵ -biased set over { 0 , 1 } h . That is, when given a uniformly random input, the output of G 1 is a k-wise independent space, and the output of G 2 is ϵ -biased. Then G : { 0 , 1 } ℓ → { 0 , 1 } n with G ( x ) = G 1 ( G 2 ( x ) ) is a generator of an δ -almost k -wise independent space, where δ = 2 k / 2 ϵ .
As mentioned above, Alon, Babai & Itai (1986) construct a generator G 1 with h = k 2 log n , and Naor & Naor (1990) construct a generator G 2 with ℓ = log s = log h + O ( log ( ϵ − 1 ) ) . Hence, the concatenation G of G 1 and G 2 has seed length ℓ = log k + log log n + O ( log ( ϵ − 1 ) ) . In order for G to yield a δ -almost k-wise independent space, we need to set ϵ = δ 2 − k / 2 , which leads to a seed length of ℓ = log log n + O ( k + log ( δ − 1 ) ) and a sample space of total size 2 ℓ ≤ log n ⋅ poly ( 2 k ⋅ δ − 1 ) .