Sam Merritt

torgomatic

Programming Languages Don't Count

The Count

Remember the Count from Sesame Street? The Count loves nothing more than to count, all day long. Unfortunately for him, he’s limited in how fast he can speak the numbers aloud. Imagine if the Count could program! Why, there’d be no limit to the numbers he could count to! Right?

Nope. Most programming languages don’t count. That’s not to say that they don’t matter, but that they don’t count: 1, 2, 3, 4, and so on.

C makes the Count sad

Let’s count in C!

Counting in C
1
2
3
4
5
6
7
8
9
#include <stdio.h>

int main(int argc, char* argv[]) {
  int x = 1;
  while (1) {
    printf("%d\n", x);
    x++;
  }
}

Look at all those numbers. 1, 2, 100, 1000! 6,827,251! They just keep going up and up and up! 2,000,000,000! 2,147,483,647! -2,147,483,648!

“Wait!” says the Count. “My program was counting up, up, up! Where’d this negative number come from? Adding one to a number should always make it bigger, not smaller.”

The Count has fallen victim to integer overflow. In C, the declaration “int x” causes the compiler to allocate a small chunk of memory for the variable x. Ints are quite small; usually 32 bits (it’s system-dependent; YMMV). That means that an int can only have 232 different values. The possible values are split in half between negative and positive numbers, so there are only 231 possible positive numbers and 231 possible negative numbers that an int can hold.

To see why 2,147,483,647 + 1 = -2,147,483,648, the Count needs to understand how counting in binary works. Let’s look at a few 32-bit binary numbers.

1 = 00000000000000000000000000000001

2 = 00000000000000000000000000000010

3 = 00000000000000000000000000000011

4 = 00000000000000000000000000000100

(skip ahead a bit)

127,553,821 = 00000111100110100101000100011101

Now, C ints are typically represented in two’s complement format. In two’s complement, the most significant (i.e. leftmost) bit of a number is the sign bit. If it’s 0, then the number is positive, and if it’s 1, then the number is negative.

So, what happens when we get to 2,147,483,647 + 1? Well, let’s look at it in binary.

2,147,483,647 = 01111111111111111111111111111111

Add 1:

2,147,483,648 = 10000000000000000000000000000000

But the leftmost bit of that number is 1, making it negative. That’s the kicker: the leftmost bit is special, so as soon as you count high enough to change it, the value that the bits represent becomes negative.

Java, maybe?

Count.java
1
2
3
4
5
6
7
8
9
class Count {
    public static void main(String argv[]) {
        int x = 1;
        while (true) {
            System.out.println(x);
            x++;
        }
    }
}

And we get (skipping a few at the start)

2,147,483,645,

2,147,483,646,

2,147,483,647…

-2,147,483,648.

Same problem. C and Java aren’t alone here; other languages with this problem include C++, C#, Objective C, PHP, Visual Basic, Delphi, COBOL, and Fortran. 1

Javascript makes the Count sad in an entirely different way

“Well,” says the Count, “Java has many different kinds of numbers. There’s byte, short, int, long, float, and double. That’s six! Six kinds of numbers! Ah, ah, ah. But I digress. What about Javascript? It only has one kind of number. Surely it can do the job.”

Let’s count in Javascript!

Counting in Javascript
1
2
3
4
5
var x = 1;
while (true) {
    console.log(x);
    x = x + 1;
}

“Looks pretty good,” says the Count. “1, 2, 3, 4, …, 9007199254740991, 9007199254740992, 9007199254740992, 9007199254740992… Oh, no! The numbers have stopped counting up!”

Sadly, the Count’s dream of counting forever won’t work in Javascript either. Javascript numbers are 64-bit IEEE floating-point numbers. Basically, that means that Javascript numbers are stored in binary scientific notation. That’s where you write numbers as a mantissa m and an exponent e, and then the value is m × 2e. The mantissa is stored using 53 bits, the exponent is stored using 10 bits, and the remaining bit is a sign bit. (There’s those pesky fixed sizes again. The Count had a hunch they’d make an appearance.)

In binary scientific notation, 1100101 would be written as mantissa = 1.100101, e = 110 (decimal 6).

Let’s take a look at that number where Javascript stopped counting. 9007199254740992 is 253, which means it’s the smallest number that can’t be represented with 53 bits (you can get from 0 to 253 – 1).

So we’ve got this number 253, represented with mantissa 1.0000…000 and exponent 110101 (decimal 53). The next biggest number we can represent is the one with mantissa 1.0000…001 and the same exponent. That number is 253 + 2, or in decimal, 9007199254740994.

“What happened to 253 + 1?” asked the Count. “Shouldn’t that come before 253 + 2?”

Well, yes, it should, but it was swallowed up by rounding error. With an exponent of 53, there’s no way to represent the number 1. Adding 1 to the mantissa adds 2 to the value of the number. Essentially, the exponent got big enough that now there’s no more ones place. When we added 253 and 1, Javascript had to round it to either 253 or 253 + 2, and the rules for floating-point numbers say that it must be rounded down.

Counting on Clojure

“Is there no language out there that knows how to count?” laments the Count. “Surely one of them must get it right! What about Clojure? Perhaps it has learned from the mistakes of others.”

Counting in Clojure
1
(dorun (map println (iterate inc 1)))

This starts off as you’d expect. 1, 2, 3…

Then comes 2147483647 (231 – 1) and 2147483648 (232) and 2147483649 (232 + 1); no problems with 32-bit integer overflow here.

A while later, here comes 9007199254740992, and 9007199254740993, and 9007199254740994; no problems with floating-point rounding.

And if this program runs long enough, it’ll count to 9223372036854775808 (263 – 1), 340282366920938463463374607431768211455 (2128 – 1), and on and on as long as the Count’s computer still works.

The way Clojure achieves this is by using a thing called a big (arbitrary-precision) integer 2. It expands to be as big as it needs to be. Think of this: when the Count adds 1 to 9999, he can’t write it using just 4 places, so he makes a fifth number place and writes 10,000. Big integers work the same way; 231 + 1 won’t fit into 32 bits, so more memory gets allocated so that 231 + 1 will fit.

This means Clojure can count until the Count’s computer runs out of memory to allocate, and that’ll take a very, very long time. With an amount of memory that’s infinitesimally small, like 1 KB (modern computers tend to have 2 GB of memory, or about 2.1 million times that much), the Count can count until the heat death of the universe.

“Finally!” says the Count.

Conclusion

Choose your programming language carefully, or you may find that your project doesn’t count.


  1. Of course there’s java.lang.BigInteger; but x = x.add(1) just isn’t the same. Also, unless your code and every library you interface with accepts, returns, and uses BigIntegers internally, you’re still susceptible to integer overflow.

  2. Clojure doesn’t start off with BigIntegers; it waits until the numbers are big enough to warrant their use. That way, operations on small numbers stay fast.