pseudo-random (idea) by m_turner

A pseudo-random sequence of numbers (as opposed to a random sequence) has some properties to it that are not random. Randomness can be looked at in the following ways:

Chi Squared: This test is the test used in looking at randomness of data. It measures the deviation of a sample from the expectation. In a purely random sample of data, it would be expected that the numbers be evenly random. If you roll a fair six sided dice 60 times, you would expect to get 10 of each number. If one gets extremely high values or extremely low values on the chi-squared test it means that it is skewed somewhere.
For the data run that I did, it was 204,800 bytes. One would expect to see approximately 800 of each valued byte. This was not the case. The lower 128 values had a mean of 999. The upper 128 had a mean of 600. This is clearly not random.
Chi square distribution for 204800 samples is 12954.37, and randomly would exceed this value 0.01 percent of the times.
The low 8 bits of rand() is very non-random:
Chi square distribution for 500000 samples is 0.01, and randomly would exceed this value 99.99 percent of the times.
A truly random source of numbers looks like (this example is from radioactive decay):
Chi square distribution for 32768 samples is 237.05, and randomly would exceed this value 75.00 percent of the times.
Arithmetic mean: Take all the bytes in the file, sum them up, and divide it by the file length. Each byte can have a value between 0 and 255, thus the expected mean is '127.5'.
As you can probably guess, the mean of this entire value is on the low side:
Arithmetic mean value of data bytes is 111.5296 (127.5 = random).
Monte Carlo value for Pi: This test takes every 24 bit as an X and Y location inside a square. The distance is then calculated from the center. If it is within the radius then it is counted as a "hit". The percentage of hits can then be used to calculate pi.
Monte Carlo value for Pi is 3.488002813 (error 11.03 percent).
For a truly random source generated by radioactive delay, a 32768 byte file approximates:
Monte Carlo value for Pi is 3.139648438 (error 0.06 percent).
Serial Correlation Coefficient: This test measures the nature of each byte depending on the previous byte. For a random sequence, this value will be close to zero. A non-random sequence such as a text file will provide a number close to '0.5'. Bitmaps will approach 1. Serial correlation coefficient is -0.052083 (totally uncorrelated = 0.0).

Running these tests for yourself can show that rand() is not a true random number generator, especially the count of each byte and seeing that while the numbers look random, they are not perfectly random, thus pseudo-random.

Source code used for the above tests follows:

#include <stdlib.h>
#include <stdio.h>

main()
{
  int i,r;

  srand(1);
  for(i = 0; i < 512000; i++)
  {
    r = rand();
    printf("%c%c%c%c",
    (r & 0xff000000) >> 24,
    (r & 0x00ff0000) >> 16,
    (r & 0x0000ff00) >> 8,
     r & 0x000000ff);

  }
}

The program to generate these values and tests:
http://www.fourmilab.ch/random/

You know, that really wasn't a good way to get rid of the Universe forever	Stupidest thing you've coded just to see if you could	The Church of Photoshop	What do guys think of girls who hook up with pseudo-random guys?
Monte Carlo	RAND	How to flirt	Chaos theory
Random	random number generator	chi square	RNG
radioactive decay	This sentence is true	Edible Arts Graduate	randomized algorithm
Handbook of Applied Cryptography	entropy pool	preventing sprite flicker on old consoles	chance operation
pi	aleatory music	Huffman coding	Havok