The Problem with rand(3)

Not all random numbers are created equal

2016

Some random numbers are less random than others, and most are less random then you think. This entry outlines some of the flaws in the standard ‘c’ libraries random number generators.

rand(3), rand48(3), random(3)

The trivial ‘c’ approach to generating a random number is to use a function in the standard library. The rand(3), rand48(3), and random(3) functions, however all suffer from the same basic flaws: the generator must be seeded, and the output is less random and unpredictable than it appears. These issues among others make rand and rand-like functions largely unsuitable for security applications.

Seeding

One of the greatest flaws of rand and rand-like functions is their need to be seeded. Seeding a these generators involves using srand(3), srand48(3), seed48(3), or srandom(3), which appears straightforward until one considers what initial value to seed these functions with. A common approach is to seed with a static value (0, 1, 42, or 0xdeadbeef are popular choices), but this produces the same ‘random’ stream of numbers every time the program is run, making the output completely predictable.

A slightly better approach is to seed with time(NULL) or getpid(2). However, time(NULL) is not only predictable but on the second scale, and getpid only changes once per run depending on how it is run and what operating system it is used with.

Unfortunately, regardless of what seeding method is used, the algorithm used to generate the actual numbers is still inherently predictable.

How Much Random

Another human-factor concern with rand and rand-like functions is how much randomness they actually generate. At worst rand generates a ‘random’ int in the range [0, 0x7fff], while rand48 and random a long in [0, 0x7fffffff]. However, this still poses a problem. One would reasonably expect a function which returns a random int or long to have its returned value fall in the full range of that datatype, but this isn’t the case. This means for unsigned types rand will produce only half the possible values, and for signed types it will only produce positive values.

According to research performed by Michael Ligh and presented at DEFCON 17, the Conficker.B Worm used the rand function to generate target IP addresses. To do this it needed to generate two random numbers in the range [0, 0xffff] but due to the deficiencies of rand instead generated a number in the range [0, 0x7fff]. These two random numbers were then used for the first and last half of a target IP address. This error in expectation meant that no IP address with the first and third octets in [0x80, 0xff] would be generated. This left approximately half of possible IP addresses un-targetable.

Quality

Traditional random-number generators are entirely deterministic, meaning that if the seed is known every number in the output sequence can be predicted. If the generator is seeded with the time an attacker can estimate the moment (within a second of accuracy) in which srand was called and generate the same ‘random’ sequence of numbers. Similarly, most operating systems don’t randomize their pid’s, making this form of seeding entirely predictable.

Even if the pid is randomized, like in OpenBSD, the quality of traditional algorithms leaves much to be desired. Many implementations of rand and all implementations of rand48 use a linear-shift feedback register which is fast and easy, but produces predictable output. In the case of rand it is also known that in many implementations there exists a predictable cyclic pattern to the least significant bits. In my research I discovered a similar pattern regarding the most significant bits, observing little change between generations and the first generated random number of nearby seeds on OS X. For example srand(0x00ff) through srand(0x00f0) generates the following for the first call to rand:

0x3d8c90, 0x3dce37, 0x3e0fde, 0x3e5185, 0x3e932c, 0x3ed4d3, 0x3f167a, 0x3f5821, 0x3f99c8, 0x3fdb6f

The most significant bits remain mostly unchanged, with all variations being incremental and minor. This phenomenon occurs with almost any sequential seed, and is unlikely unique to OS X.

Solution?

The best way to seed a random function is to use /dev/urandom or /dev/random if the urandom device isn’t available, though this method has is own problems. If a program runs out of file descriptors (or, more importantly, is made to run out of file descriptors) or is in a chroot(2) and /dev/random isn’t available, it is impossible to seed or generate cryptographically secure numbers. Worse still, it is possible to accidentally set the seed as -1 or 0x7fffffff, which is non-random, and the desired effect for a knowledgeable attacker.

When using OpenBSD, OS X, or other UNIX’s, arc4random(3) may be available. This function does not suffer from any of the above issues, and is the best solution I have thus far found.

Further Resources

Licence

Any source in this article is released under the ISC License.