Collision resistance is a property of cryptographic hash functions: a hash function H is collision resistant if it is hard to find two inputs that hash to the same output; that is, two inputs a and b such that H(a) = H(b), and a ≠ b.
Contents
Every hash function with more inputs than outputs will necessarily have collisions.Consider a hash function such as SHA-256 that produces 256 bits of output from an arbitrarily large input. Since it must generate one of 2256 outputs for each member of a much larger set of inputs, the pigeonhole principle guarantees that some inputs will hash to the same output. Collision resistance doesn't mean that no collisions exist; simply that they are hard to find.
The "birthday paradox" places an upper bound on collision resistance: if a hash function produces N bits of output, an attacker who computes only 2N/2 (or
Cryptographic hash functions are usually designed to be collision resistant. But many hash functions that were once thought to be collision resistant were later broken. MD5 and SHA-1 in particular both have published techniques more efficient than brute force for finding collisions. However, some hash functions have a proof that finding collisions is at least as difficult as some hard mathematical problem (such as integer factorization or discrete logarithm). Those functions are called provably secure.
Definition
A family of functions {hk : {0, 1}m(k) → {0, 1}l(k)} generated by some algorithm G is a family of collision resistant hash functions, if |m(k)| > |l(k)| for any k, i.e., hk compresses the input string, and every hk can be computed within polynomial time given k, but for any probabilistic polynomial algorithm A, we have
Pr [k ← G(1n), (x1, x2) ← A(k, 1n) s.t. x1 ≠ x2 but hk(x1) = hk(x2)] < negl(n),where negl(·) denotes some negligible function, and n is the security parameter.
Rationale
Collision resistance is desirable for several reasons.