display | more...
A pseudo-random sequence of numbers (as opposed to a random sequence) has some properties to it that are not random. Randomness can be looked at in the following ways:
  • Chi Squared: This test is the test used in looking at randomness of data. It measures the deviation of a sample from the expectation. In a purely random sample of data, it would be expected that the numbers be evenly random. If you roll a fair six sided dice 60 times, you would expect to get 10 of each number. If one gets extremely high values or extremely low values on the chi-squared test it means that it is skewed somewhere.

    For the data run that I did, it was 204,800 bytes. One would expect to see approximately 800 of each valued byte. This was not the case. The lower 128 values had a mean of 999. The upper 128 had a mean of 600. This is clearly not random.

    Chi square distribution for 204800 samples is 12954.37, and randomly would exceed this value 0.01 percent of the times.

    The low 8 bits of rand() is very non-random:
    Chi square distribution for 500000 samples is 0.01, and randomly would exceed this value 99.99 percent of the times.

    A truly random source of numbers looks like (this example is from radioactive decay):
    Chi square distribution for 32768 samples is 237.05, and randomly would exceed this value 75.00 percent of the times.

  • Arithmetic mean: Take all the bytes in the file, sum them up, and divide it by the file length. Each byte can have a value between 0 and 255, thus the expected mean is '127.5'.

    As you can probably guess, the mean of this entire value is on the low side:
    Arithmetic mean value of data bytes is 111.5296 (127.5 = random).

  • Monte Carlo value for Pi: This test takes every 24 bit as an X and Y location inside a square. The distance is then calculated from the center. If it is within the radius then it is counted as a "hit". The percentage of hits can then be used to calculate pi.
    Monte Carlo value for Pi is 3.488002813 (error 11.03 percent).

    For a truly random source generated by radioactive delay, a 32768 byte file approximates:
    Monte Carlo value for Pi is 3.139648438 (error 0.06 percent).

  • Serial Correlation Coefficient: This test measures the nature of each byte depending on the previous byte. For a random sequence, this value will be close to zero. A non-random sequence such as a text file will provide a number close to '0.5'. Bitmaps will approach 1. Serial correlation coefficient is -0.052083 (totally uncorrelated = 0.0).
Running these tests for yourself can show that rand() is not a true random number generator, especially the count of each byte and seeing that while the numbers look random, they are not perfectly random, thus pseudo-random.
Source code used for the above tests follows:
#include <stdlib.h>
#include <stdio.h>

main()
{
  int i,r;

  srand(1);
  for(i = 0; i < 512000; i++)
  {
    r = rand();
    printf("%c%c%c%c",
    (r & 0xff000000) >> 24,
    (r & 0x00ff0000) >> 16,
    (r & 0x0000ff00) >> 8,
     r & 0x000000ff);

  }
}
The program to generate these values and tests:
http://www.fourmilab.ch/random/