A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likelihood of collisions found between random attack attempts and a fixed degree of permutations (pigeonholes).
Contents
Understanding the problem
As an example, consider the scenario in which a teacher with a class of 30 students asks for everybody's birthday (for simplicity, ignore leap years), to determine whether any two students have the same birthday (corresponding to a hash collision as described further). Intuitively, this chance may seem small. If the teacher picked a specific day (say September 16), then the chance that at least one student was born on that specific day is
Mathematics
Given a function
We consider the following experiment. From a set of H values we choose n values uniformly at random thereby allowing repetitions. Let p(n; H) be the probability that during this experiment at least one value is chosen more than once. This probability can be approximated as
Let n(p; H) be the smallest number of values we have to choose, such that the probability for finding a collision is at least p. By inverting this expression above, we find the following approximation
and assigning a 0.5 probability of collision we arrive at
Let Q(H) be the expected number of values we have to choose before finding the first collision. This number can be approximated by
As an example, if a 64-bit hash is used, there are approximately 1.8 × 1019 different outputs. If these are all equally probable (the best case), then it would take 'only' approximately 5 billion attempts (5.38 × 109) to generate a collision using brute force. This value is called birthday bound and for n-bit codes it could be computed as 2n/2. Other examples are as follows:
Table shows number of hashes n(p) needed to achieve the given probability of success, assuming all hashes are equally likely. For comparison, 10−18 to 10−15 is the uncorrectable bit error rate of a typical hard disk [1]. In theory, MD5 hashes or UUIDs, being 128 bits, should stay within that range until about 820 billion documents, even if its possible outputs are many more.It is easy to see that if the outputs of the function are distributed unevenly, then a collision could be found even faster. The notion of 'balance' of a hash function quantifies the resistance of the function to birthday attacks (exploiting uneven key distribution) and allows the vulnerability of popular hashes such as MD and SHA to be estimated (Bellare and Kohno, 2004).
The subexpression log(1/(1-p))
due to loss of significance. When log1p
is available (as it is in C99) for example, the equivalent expression -log1p(-p)
should be used instead. If this is not done, the first column of the above table is computed as zero, and several items in the second column do not have even one correct significant digit.
Source code example
Here is a Python function that can accurately generate most of the above table:
If the code is saved in a file named birthday.py
, it can be run interactively as in the following example:
Simple approximation
A good rule of thumb which can be used for mental calculation is the relation
which can also be written as
or
This works well for probabilities less than or equal to 0.5.
This approximation scheme is especially easy to use for when working with exponents. For instance, suppose you are building 32-bit hashes (
which is close to the correct answer of 93.
Digital signature susceptibility
Digital signatures can be susceptible to a birthday attack. A message
In a similar manner, Mallory also creates a huge number of variations on the fraudulent contract
The probabilities differ slightly from the original birthday problem, as Mallory gains nothing by finding two fair or two fraudulent contracts with the same hash. Mallory's strategy is to generate pairs of one fair and one fraudulent contract. The birthday problem equations apply where
To avoid this attack, the output length of the hash function used for a signature scheme can be chosen large enough so that the birthday attack becomes computationally infeasible, i.e. about twice as many bits as are needed to prevent an ordinary brute-force attack.
Besides using a larger bit length, the signer (Bob) can protect himself by making some random, inoffensive changes to the document before signing it, and by keeping a copy of the contract he signed in his own possession, so that he can at least demonstrate in court that his signature matches that contract, not just the fraudulent one.
Pollard's rho algorithm for logarithms is an example for an algorithm using a birthday attack for the computation of discrete logarithms.