Remember the Count from Sesame Street? The Count loves nothing more than to count, all day long. Unfortunately for him, he’s limited in how fast he can speak the numbers aloud. Imagine if the Count could program! Why, there’d be no limit to the numbers he could count to! Right?
Nope. Most programming languages don’t count. That’s not to say that they don’t matter, but that they don’t count: 1, 2, 3, 4, and so on.
C makes the Count sad
Let’s count in C!
1 2 3 4 5 6 7 8 9
Look at all those numbers. 1, 2, 100, 1000! 6,827,251! They just keep going up and up and up! 2,000,000,000! 2,147,483,647! -2,147,483,648!
“Wait!” says the Count. “My program was counting up, up, up! Where’d this negative number come from? Adding one to a number should always make it bigger, not smaller.”
The Count has fallen victim to integer overflow. In C, the declaration “int x” causes the compiler to allocate a small chunk of memory for the variable x. Ints are quite small; usually 32 bits (it’s system-dependent; YMMV). That means that an int can only have 232 different values. The possible values are split in half between negative and positive numbers, so there are only 231 possible positive numbers and 231 possible negative numbers that an int can hold.
To see why 2,147,483,647 + 1 = -2,147,483,648, the Count needs to understand how counting in binary works. Let’s look at a few 32-bit binary numbers.
1 = 00000000000000000000000000000001
2 = 00000000000000000000000000000010
3 = 00000000000000000000000000000011
4 = 00000000000000000000000000000100
(skip ahead a bit)
127,553,821 = 00000111100110100101000100011101
Now, C ints are typically represented in two’s complement format. In two’s complement, the most significant (i.e. leftmost) bit of a number is the sign bit. If it’s 0, then the number is positive, and if it’s 1, then the number is negative.
So, what happens when we get to 2,147,483,647 + 1? Well, let’s look at it in binary.
2,147,483,647 = 01111111111111111111111111111111
2,147,483,648 = 10000000000000000000000000000000
But the leftmost bit of that number is 1, making it negative. That’s the kicker: the leftmost bit is special, so as soon as you count high enough to change it, the value that the bits represent becomes negative.
1 2 3 4 5 6 7 8 9
And we get (skipping a few at the start)
Same problem. C and Java aren’t alone here; other languages with this problem include C++, C#, Objective C, PHP, Visual Basic, Delphi, COBOL, and Fortran. 1
1 2 3 4 5
“Looks pretty good,” says the Count. “1, 2, 3, 4, …, 9007199254740991, 9007199254740992, 9007199254740992, 9007199254740992… Oh, no! The numbers have stopped counting up!”
In binary scientific notation, 1100101 would be written as mantissa = 1.100101, e = 110 (decimal 6).
So we’ve got this number 253, represented with mantissa 1.0000…000 and exponent 110101 (decimal 53). The next biggest number we can represent is the one with mantissa 1.0000…001 and the same exponent. That number is 253 + 2, or in decimal, 9007199254740994.
“What happened to 253 + 1?” asked the Count. “Shouldn’t that come before 253 + 2?”
Counting on Clojure
“Is there no language out there that knows how to count?” laments the Count. “Surely one of them must get it right! What about Clojure? Perhaps it has learned from the mistakes of others.”
This starts off as you’d expect. 1, 2, 3…
Then comes 2147483647 (231 – 1) and 2147483648 (232) and 2147483649 (232 + 1); no problems with 32-bit integer overflow here.
A while later, here comes 9007199254740992, and 9007199254740993, and 9007199254740994; no problems with floating-point rounding.
And if this program runs long enough, it’ll count to 9223372036854775808 (263 – 1), 340282366920938463463374607431768211455 (2128 – 1), and on and on as long as the Count’s computer still works.
The way Clojure achieves this is by using a thing called a big (arbitrary-precision) integer 2. It expands to be as big as it needs to be. Think of this: when the Count adds 1 to 9999, he can’t write it using just 4 places, so he makes a fifth number place and writes 10,000. Big integers work the same way; 231 + 1 won’t fit into 32 bits, so more memory gets allocated so that 231 + 1 will fit.
This means Clojure can count until the Count’s computer runs out of memory to allocate, and that’ll take a very, very long time. With an amount of memory that’s infinitesimally small, like 1 KB (modern computers tend to have 2 GB of memory, or about 2.1 million times that much), the Count can count until the heat death of the universe.
“Finally!” says the Count.
Choose your programming language carefully, or you may find that your project doesn’t count.
Of course there’s java.lang.BigInteger; but
x = x.add(1)just isn’t the same. Also, unless your code and every library you interface with accepts, returns, and uses BigIntegers internally, you’re still susceptible to integer overflow.↩
Clojure doesn’t start off with BigIntegers; it waits until the numbers are big enough to warrant their use. That way, operations on small numbers stay fast.↩