Rahul Sharma (Editor)

Collatz conjecture

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Collatz conjecture

The Collatz conjecture is a conjecture in mathematics named after Lothar Collatz. The conjecture is also known as the 3n + 1 conjecture, the Ulam conjecture (after Stanisław Ulam), Kakutani's problem (after Shizuo Kakutani), the Thwaites conjecture (after Sir Bryan Thwaites), Hasse's algorithm (after Helmut Hasse), or the Syracuse problem; the sequence of numbers involved is referred to as the hailstone sequence or hailstone numbers (because the values are usually subject to multiple descents and ascents like hailstones in a cloud), or as wondrous numbers.

Contents

The conjecture can be summarized as follows. Take any positive integer n. If n is even, divide it by 2 to get n / 2. If n is odd, multiply it by 3 and add 1 to obtain 3n + 1. Repeat the process (which has been called "Half Or Triple Plus One", or HOTPO) indefinitely. The conjecture is that no matter what number you start with, you will always eventually reach 1.

Paul Erdős said about the Collatz conjecture: "Mathematics may not be ready for such problems." He also offered $500 for its solution. Jeffrey Lagarias in 2010 claimed that based only on known information about this problem, "this is an extraordinarily difficult problem, completely out of reach of present day mathematics."

Statement of the problem

Consider the following operation on an arbitrary positive integer:

  • If the number is even, divide it by two.
  • If the number is odd, triple it and add one.
  • In modular arithmetic notation, define the function f as follows:

    f ( n ) = { n / 2 if  n 0 ( mod 2 ) 3 n + 1 if  n 1 ( mod 2 ) .

    Now, form a sequence by performing this operation repeatedly, beginning with any positive integer, and taking the result at each step as the input at the next.

    In notation:

    a i = { n for  i = 0 f ( a i 1 ) for  i > 0

    (that is: a i is the value of f applied to n recursively i times; a i = f i ( n ) ).

    The Collatz conjecture is: This process will eventually reach the number 1, regardless of which positive integer is chosen initially.

    That smallest i such that ai = 1 is called the total stopping time of n. The conjecture asserts that every n has a well-defined total stopping time. If, for some n, such an i doesn't exist, we say that n has infinite total stopping time and the conjecture is false.

    If the conjecture is false, it can only be because there is some starting number which gives rise to a sequence that does not contain 1. Such a sequence might enter a repeating cycle that excludes 1, or increase without bound. No such sequence has been found.

    Examples

    For instance, starting with n = 12, one gets the sequence 12, 6, 3, 10, 5, 16, 8, 4, 2, 1.

    n = 19, for example, takes longer to reach 1: 19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1.

    The sequence for n = 27, listed and graphed below, takes 111 steps (41 steps through odd numbers), climbing to 9232 before descending to 1.

    27, 82, 41, 124, 62, 31, 94, 47, 142, 71, 214, 107, 322, 161, 484, 242, 121, 364, 182, 91, 274, 137, 412, 206, 103, 310, 155, 466, 233, 700, 350, 175, 526, 263, 790, 395, 1186, 593, 1780, 890, 445, 1336, 668, 334, 167, 502, 251, 754, 377, 1132, 566, 283, 850, 425, 1276, 638, 319, 958, 479, 1438, 719, 2158, 1079, 3238, 1619, 4858, 2429, 7288, 3644, 1822, 911, 2734, 1367, 4102, 2051, 6154, 3077, 9232, 4616, 2308, 1154, 577, 1732, 866, 433, 1300, 650, 325, 976, 488, 244, 122, 61, 184, 92, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2, 1 (sequence A008884 in the OEIS)

    Numbers with a total stopping time longer than any smaller starting value form a sequence beginning with:

    1, 2, 3, 6, 7, 9, 18, 25, 27, 54, 73, 97, 129, 171, 231, 313, 327, 649, 703, 871, 1161, 2223, 2463, 2919, 3711, 6171, … (sequence A006877 in the OEIS).

    The starting values whose maximum trajectory point is greater than that of any smaller starting value are as follows:

    1, 2, 3, 7, 15, 27, 255, 447, 639, 703, 1819, 4255, 4591, 9663, 20895, 26623, 31911, 60975, 77671, 113383, 138367, 159487, 270271, 665215, 704511, ... (sequence A006884 in the OEIS)

    Number of steps for n to reach 1 are

    0, 1, 7, 2, 5, 8, 16, 3, 19, 6, 14, 9, 9, 17, 17, 4, 12, 20, 20, 7, 7, 15, 15, 10, 23, 10, 111, 18, 18, 18, 106, 5, 26, 13, 13, 21, 21, 21, 34, 8, 109, 8, 29, 16, 16, 16, 104, 11, 24, 24, ... (sequence A006577 in the OEIS)

    The longest progression for any initial starting number

    less than 10 is 9, which has 19 steps, less than 100 is 97, which has 118 steps, less than 1,000 is 871, which has 178 steps, less than 10,000 is 6,171, which has 261 steps, less than 100,000 is 77,031, which has 350 steps, less than 1 million is 837,799, which has 524 steps, less than 10 million is 8,400,511, which has 685 steps, less than 100 million is 63,728,127, which has 949 steps, less than 1 billion is 670,617,279, which has 986 steps, less than 10 billion is 9,780,657,631, which has 1132 steps, and less than 100 billion is 75,128,138,247, which has 1228 steps.

    The powers of two converge to one quickly because 2 n is halved n times to reach one, and is never increased, but for Mersenne number Mn, they need to increase n times and usually need more steps to reach 1.

    Cycles

    Any counterexample to the Collatz conjecture would have to consist either of an infinite divergent trajectory or a cycle different from the trivial (4; 2; 1) cycle. Thus, if one could prove that neither of these types of counterexample could exist, then all positive integers would have a trajectory that reaches the trivial cycle. Such a strong result is not known, but certain types of cycles have been ruled out.

    The type of a cycle may be defined with reference to the "shortcut" definition of the Collatz map, f ( n ) = ( 3 n + 1 ) / 2 for odd n and f ( n ) = n / 2 for even n. A cycle is a sequence ( a 0 ; a 1 ; a q ) where f ( a 0 ) = a 1 , f ( a 1 ) = a 2 , and so on, up to f ( a q ) = a 0 in a closed loop. For this shortcut definition, the only known cycle is (1; 2). Although 4 is part of the single known cycle for the original Collatz map, it is not part of the cycle for the shortcut map.

    A k-cycle is a cycle that can be partitioned into 2k contiguous subsequences: k increasing sequences of odd numbers alternating with k decreasing sequences of even numbers. For instance, if the cycle consists of a single increasing sequence of odd numbers followed by a decreasing sequence of even numbers, it is called a 1-cycle.

    Steiner (1977) proved that there is no 1-cycle other than the trivial (1;2). Simons (2004) used Steiner's method to prove that there is no 2-cycle. Simons & de Weger (2003) extended this proof up to 68-cycles: there is no k-cycle up to k = 68. Beyond 68, this method gives upper bounds for the elements in such a cycle: for example, if there is a 75-cycle, then at least one element of the cycle is less than 2385×250. Therefore, as exhaustive computer searches continue, larger cycles may be ruled out. To state the argument more intuitively: we need not look for cycles that have at most 68 trajectories, where each trajectory consists of consecutive ups followed by consecutive downs. See below for an idea of how one might find an upper bound for the elements of a cycle.

    Supporting arguments

    Although the conjecture has not been proven, most mathematicians who have looked into the problem think the conjecture is true because experimental evidence and heuristic arguments support it.

    Experimental evidence

    The conjecture has been checked by computer for all starting values up to 260. All initial values tested so far eventually end in the repeating cycle (4; 2; 1), which has only three terms. From this lower bound on the starting value, a lower bound can also be obtained for the number of terms a repeating cycle other than (4; 2; 1) must have. When this relationship was established in 1981, the formula gave a lower bound of 35,400 terms.

    This computer evidence is not a proof that the conjecture is true. As shown in the cases of the Pólya conjecture, the Mertens conjecture and the Skewes' number, sometimes a conjecture's only counterexamples are found when using very large numbers.

    A probabilistic heuristic

    If one considers only the odd numbers in the sequence generated by the Collatz process, then each odd number is on average 3/4 of the previous one. (More precisely, the geometric mean of the ratios of outcomes is 3/4.) This yields a heuristic argument that every Hailstone sequence should decrease in the long run, although this is not evidence against other cycles, only against divergence. The argument is not a proof because it assumes that Hailstone sequences are assembled from uncorrelated probabilistic events. (It does rigorously establish that the 2-adic extension of the Collatz process has two division steps for every multiplication step for almost all 2-adic starting values.)

    And even if the probabilistic reasoning were rigorous, this would still imply only that the conjecture is almost surely true for any given integer, which does not necessarily imply that it is true for all integers.

    Rigorous bounds

    Although it is not known rigorously whether all positive numbers eventually reach one according to the Collatz iteration, it is known that many numbers do so. In particular, Krasikov and Lagarias showed that the number of integers in the interval [1,x] that eventually reach one is at least proportional to x0.84.

    In reverse

    There is another approach to prove the conjecture, which considers the bottom-up method of growing the so-called Collatz graph. The Collatz graph is a graph defined by the inverse relation

    R ( n ) = { { 2 n } if  n 0 , 1 , 2 , 3 , 5 { 2 n , ( n 1 ) / 3 } if  n 4 ( mod 6 ) .

    So, instead of proving that all positive integers eventually lead to 1, we can try to prove that 1 leads to all positive integers. For any integer n, n ≡ 1 (mod 2) iff 3n + 1 ≡ 4 (mod 6). Equivalently, (n − 1)/3 ≡ 1 (mod 2) iff n ≡ 4 (mod 6). Conjecturally, this inverse relation forms a tree except for the 1–2–4 loop (the inverse of the 1–4–2 loop of the unaltered function f defined in the Statement of the problem section of this article).

    When the relation 3n + 1 of the function f is replaced by the common substitute "shortcut" relation (3n + 1)/2, the Collatz graph is defined by the inverse relation,

    R ( n ) = { { 2 n } if  n 0 , 1 { 2 n , ( 2 n 1 ) / 3 } if  n 2 ( mod 3 ) .

    For any integer n, n ≡ 1 (mod 2) iff (3n + 1)/2 ≡ 2 (mod 3). Equivalently, (2n − 1)/3 ≡ 1 (mod 2) iff n ≡ 2 (mod 3). Conjecturally, this inverse relation forms a tree except for a 1–2 loop (the inverse of the 1–2 loop of the function f(n) revised as indicated above).

    Alternately, replace the 3n + 1 with n' / H(n') where n' = 3n + 1 and H(n') is the highest power of 2 that divides n' (with no remainder). The resulting function f maps from odd numbers to odd numbers. Now suppose that for some odd number n, applying this operation k times yields the number 1 (that is, f k ( n ) = 1 ). Then in binary, the number n can be written as the concatenation of strings wk wk-1 … w1 where each wh is a finite and contiguous extract from the representation of 1 / 3h. The representation of n therefore holds the repetends of 1 / 3h, where each repetend is optionally rotated and then replicated up to a finite number of bits. It is only in binary that this occurs. Conjecturally, every binary string s that ends with a '1' can be reached by a representation of this form (where we may add or delete leading '0's to s).

    As an abstract machine that computes in base two

    Repeated applications of the Collatz function can be represented as an abstract machine that handles strings of bits. The machine will perform the following three steps on any odd number until only one "1" remains:

    1. Append 1 to the (right) end of the number in binary (giving 2n + 1);
    2. Add this to the original number by binary addition (giving 2n + 1 + n = 3n + 1);
    3. Remove all trailing "0"s (i.e. repeatedly divide by two until the result is odd).

    This prescription is plainly equivalent to computing a Hailstone sequence in base two.

    Example

    The starting number 7 is written in base two as 111. The resulting Hailstone sequence is:

    As a parity sequence

    For this section, consider the Collatz function in the slightly modified form

    f ( n ) = { n / 2 if  n 0 ( 3 n + 1 ) / 2 if  n 1. ( mod 2 )

    This can be done because when n is odd, 3n + 1 is always even.

    If P(…) is the parity of a number, that is P(2n) = 0 and P(2n + 1) = 1, then we can define the Hailstone parity sequence (or parity vector) for a number n as pi = P(ai), where a0 = n, and ai+1 = f(ai).

    What operation is performed (3n + 1)/2 or n/2 depends on the parity. The parity sequence is the same as the sequence of operations.

    Using this form for f(n), it can be shown that the parity sequences for two numbers m and n will agree in the first k terms if and only if m and n are equivalent modulo 2k. This implies that every number is uniquely identified by its parity sequence, and moreover that if there are multiple Hailstone cycles, then their corresponding parity cycles must be different.

    Applying the f function k times to the number a·2k + b will give the result a·3c + d, where d is the result of applying the f function k times to b, and c is how many odd numbers were encountered during that sequence.

    As a tag system

    For the Collatz function in the form

    f ( n ) = { n / 2 if  n 0 ( 3 n + 1 ) / 2 if  n 1. ( mod 2 )

    Hailstone sequences can be computed by the extremely simple 2-tag system with production rules abc, ba, caaa. In this system, the positive integer n is represented by a string of n a, and iteration of the tag operation halts on any word of length less than 2. (Adapted from De Mol.)

    The Collatz conjecture equivalently states that this tag system, with an arbitrary finite string of a's as the initial word, eventually halts (see Tag system#Example: Computation of Collatz sequences for a worked example).

    Iterating on all integers

    An extension to the Collatz conjecture is to include all integers, not just positive integers. Leaving aside the cycle 0 → 0 which cannot be entered from outside, there are a total of 4 known cycles, which all nonzero integers seem to eventually fall into under iteration of f. These cycles are listed here, starting with the well-known cycle for positive n:

    Odd values are listed in large bold. Each cycle is listed with its member of least absolute value (which is always odd) first.

    The generalized Collatz conjecture is the assertion that every integer, under iteration by f, eventually falls into one of the four cycles above or the cycle 0 → 0. The 0 → 0 cycle is often regarded as "trivial" by the argument, as it is only included for the sake of completeness.

    Iterating with odd denominators or 2-adic integers

    The standard Collatz map can be extended to (positive or negative) rational numbers which have odd denominators when written in lowest terms. The number is taken to be odd or even according to whether its numerator is odd or even. A closely related fact is that the Collatz map extends to the ring of 2-adic integers, which contains the ring of rationals with odd denominators as a subring.

    The parity sequences as defined above are no longer unique for fractions. However, it can be shown that any possible parity cycle is the parity sequence for exactly one fraction: if a cycle has length n and includes odd numbers exactly m times at indices k0, …, km−1, then the unique fraction which generates that parity cycle is

    For example, the parity cycle (1 0 1 1 0 0 1) has length 7 and has 4 odd numbers at indices 0, 2, 3, and 6. The unique fraction which generates that parity cycle is

    3 3 2 0 + 3 2 2 2 + 3 1 2 3 + 3 0 2 6 2 7 3 4 = 151 47 ,

    the complete cycle being: 151/47 → 250/47 → 125/47 → 211/47 → 340/47 → 170/47 → 85/47 → 151/47

    Although the cyclic permutations of the original parity sequence are unique fractions, the cycle is not unique, each permutation's fraction being the next number in the loop cycle:

    (0 1 1 0 0 1 1) → 3 3 2 1 + 3 2 2 2 + 3 1 2 5 + 3 0 2 6 2 7 3 4 = 250 47 (1 1 0 0 1 1 0) → 3 3 2 0 + 3 2 2 1 + 3 1 2 4 + 3 0 2 5 2 7 3 4 = 125 47 (1 0 0 1 1 0 1) → 3 3 2 0 + 3 2 2 3 + 3 1 2 4 + 3 0 2 6 2 7 3 4 = 211 47 (0 0 1 1 0 1 1) → 3 3 2 2 + 3 2 2 3 + 3 1 2 5 + 3 0 2 6 2 7 3 4 = 340 47 (0 1 1 0 1 1 0) → 3 3 2 1 + 3 2 2 2 + 3 1 2 4 + 3 0 2 5 2 7 3 4 = 170 47 (1 1 0 1 1 0 0) → 3 3 2 0 + 3 2 2 1 + 3 1 2 3 + 3 0 2 4 2 7 3 4 = 85 47

    Also, for uniqueness, the parity sequence should be "prime", i.e., not partitionable into identical sub-sequences. For example, parity sequence (1 1 0 0 1 1 0 0) can be partitioned into two identical sub-sequences (1 1 0 0)(1 1 0 0). Calculating the 8-element sequence fraction gives

    (1 1 0 0 1 1 0 0) → 3 3 2 0 + 3 2 2 1 + 3 1 2 4 + 3 0 2 5 2 8 3 4 = 125 175

    But when reduced to lowest terms {5/7}, it is the same as that of the 4-element sub-sequence

    (1 1 0 0) → 3 1 2 0 + 3 0 2 1 2 4 3 2 = 5 7 .

    And this is because the 8-element parity sequence actually represents two circuits of the loop cycle defined by the 4-element parity sequence.

    In this context, the Collatz conjecture is equivalent to saying that (0 1) is the only cycle which is generated by positive whole numbers (i.e. 1 and 2).

    Cycle bounds

    Eq. (1) also gives a rough idea about how one can prove that cycles of certain lengths do not exist. For a hypothetical cycle of length n, the numerator is bounded above by 3n - 2n (this corresponds to a cycle of all odd numbers). A lower bound for the denominator can be obtained by letting n/m be an optimal rational approximation to log(3)/log(2). Together these give an upper bound for the unique fraction that generates a cycle of length n. If this upper bound is smaller than the largest number for which the conjecture has been verified to hold, then a cycle of length n is impossible.

    Iterating on real or complex numbers

    The Collatz map can be viewed as the restriction to the integers of the smooth real and complex map

    f ( z ) = 1 2 z cos 2 ( π 2 z ) + ( 3 z + 1 ) sin 2 ( π 2 z ) ,

    which simplifies to 1 4 ( 2 + 7 z ( 2 + 5 z ) cos ( π z ) ) .

    If the standard Collatz map defined above is optimized by replacing the relation 3n + 1 with the common substitute "shortcut" relation (3n + 1)/2, it can be viewed as the restriction to the integers of the smooth real and complex map

    f ( z ) = 1 2 z cos 2 ( π 2 z ) + ( 3 z + 1 ) 2 sin 2 ( π 2 z ) ,

    which simplifies to 1 4 ( 1 + 4 z ( 1 + 2 z ) cos ( π z ) ) .

    Collatz fractal

    Iterating the above optimized map in the complex plane produces the Collatz fractal.

    The point of view of iteration on the real line was investigated by Chamberland (1996), and on the complex plane by Letherman, Schleicher, and Wood (1999).

    Time-space tradeoff

    The section As a parity sequence above gives a way to speed up simulation of the sequence. To jump ahead k steps on each iteration (using the f function from that section), break up the current number into two parts, b (the k least significant bits, interpreted as an integer), and a (the rest of the bits as an integer). The result of jumping ahead k steps can be found as:

    f k(a 2k + b) = a 3c(b) + d(b).

    The c (or better 3c) and d arrays are precalculated for all possible k-bit numbers b, where d(b) is the result of applying the f function k times to b, and c(b) is the number of odd numbers encountered on the way. For example, if k=5, you can jump ahead 5 steps on each iteration by separating out the 5 least significant bits of a number and using:

    c(0..31) = {0,3,2,2,2,2,2,4,1,4,1,3,2,2,3,4,1,2,3,3,1,1,3,3,2,3,2,4,3,3,4,5} d(0..31) = {0,2,1,1,2,2,2,20,1,26,1,10,4,4,13,40,2,5,17,17,2,2,20,20,8,22,8,71,26,26,80,242}.

    This requires 2k precomputation and storage to speed up the resulting calculation by a factor of k, a space-time tradeoff.

    Modular restrictions

    For the special purpose of searching for a counterexample to the Collatz conjecture, this precomputation leads to an even more important acceleration, used by Tomás Oliveira e Silva in his computational confirmations of the Collatz conjecture up to large values of n. If, for some given b and k, the inequality

    f k(a 2k + b) = a 3c(b) + d(b) < a 2k + b

    holds for all a, then the first counterexample, if it exists, cannot be b modulo 2k. For instance, the first counterexample must be odd because f(2n) = n, smaller than 2n; and it must be 3 mod 4 because f2(4n + 1) = 3n + 1, smaller than 4n + 1. For each starting value a which is not a counterexample to the Collatz conjecture, there is a k for which such an inequality holds, so checking the Collatz conjecture for one starting value is as good as checking an entire congruence class. As k increases, the search only needs to check those residues b that are not eliminated by lower values of k. Only an exponentially small fraction of the residues survive. For example, the only surviving residues mod 32 are 7, 15, 27, and 31.

    Syracuse function

    If k is an odd integer, then 3 k + 1 is even, so 3 k + 1 = 2 a k with k odd and a 1 . The Syracuse function is the function f from the set I of odd integers into itself, for which f ( k ) = k (sequence A075677 in the OEIS).

    Some properties of the Syracuse function are:

  • k I , f ( 4 k + 1 ) = f ( k ) . (Because 3 ( 4 k + 1 ) + 1 = 12 k + 4 = 4 ( 3 k + 1 )
  • In more generality: For all p 1 and odd h , f p 1 ( 2 p h 1 ) = 2 3 p 1 h 1 . (Here f p 1 is function iteration notation.)
  • For all odd h , f ( 2 h 1 ) 3 h 1 2
  • The Collatz conjecture is equivalent to the statement that, for all k in I , there exists an integer n 1 such that f n ( k ) = 1 .

    Undecidable generalizations

    In 1972, J. H. Conway proved that a natural generalization of the Collatz problem is algorithmically undecidable.

    Specifically, he considered functions of the form

    g ( n ) = a i n + b i , n i ( mod P )

    where a 0 , b 0 , , a P 1 , b P 1 are rational numbers which are so chosen that g ( n ) is always integral.

    The standard Collatz function is given by P = 2 , a 0 = 1 2 , b 0 = 0 , a 1 = 3 , b 1 = 1 . Conway proved that the problem:

    Given g and n, does the sequence of iterates g k ( n ) reach 1?

    is undecidable, by representing the halting problem in this way. Closer to the Collatz problem is the following universally quantified problem:

    Given g does the sequence of iterates g k ( n ) reach 1, for all n > 0 ?

    Modifying the condition in this way can make a problem either harder or easier to solve (intuitively, it is harder to justify a positive answer but might be easier to justify a negative one). Kurtz and Simon proved that the above problem is, in fact, undecidable and even higher in the arithmetical hierarchy, specifically Π 2 0 -complete. This hardness result holds even if one restricts the class of functions g by fixing the modulus P to 6480.

    References

    Collatz conjecture Wikipedia