The ergodicity of a physical or computational process
is a statement
about initial and final states of that process.

What defines a state? It depends on your
particular problem. If we think of a classical gas of atoms, then
the positions and velocities of those atoms
at a given instant in time
would define the state of the gas. The state of a computer's memory could be represented by a sequence of 0's and 1's.

Let's use the letter S(t) to represent the state of a given system at
a given time, labeled by t. And let's say t=0 is the initial time
and t=1 is the final time. Let's also use an arrow to denote the
process under study. So under our process, S(0) → S(1), that is
the process starts from initial state S(0) at t=0 and ends up at
S(1).

Now imagine **any** 2 possible states. Let one of them be
the initial state of the system, and the other the final. A process
is defined to be **ergodic** if there is a nonzero
probability for the process to evolve S(0) to S(1), that is, if
Prob( S(0) → S(1) ) > 0 *no matter which 2 states you picked*. Another way of phrasing the definition is:
a process is ergodic if every point in state space will be visited
at least once if the process is applied for an infinite number of steps.

If you have a nasty multi-dimensional integral, Monte Carlo
methods are a very efficient way to evaluate the integral numerically.
Let's say the integrand depends on a field. Think of the Ising
model: a 2 dimensional lattice of points, where at each point the
field is equal to +1 or -1; quantities like the magnetization and
internal energy are functions of the field configuration, or state,
of the spins. To calculate thermal averages of these quantities,
integrals must be done over all possible spin configurations, weighted
by a Boltzmann factor, exp(-E/T). (E=energy, T=temperature).
Due to the decaying exponential, most field configurations will give
exponentially small contributions to the thermal average, so
it is tremendously effecient to
use an algorithmic updating process to generate just the states which
dominate the integral. In order not to skew the sample, that is, in
order to find the important configurations with the right weight,
the updating process must be able to, in principle, visit every single
state. That is, the updating process must be ergodic in order to
have an unbiased calculation. (The process
must also tend to the canonical distribution after repeated application,
see below.)

Nontrivial, nonpathalogical processes which are ergodic, are called
Markov chains.
If a process is ergodic, **and** if repeated applications
of the process will take any distribution of states toward the
canonical distribution (Boltzmann distribution), then the process
will (eventually) yield a sample of states which can be used as a
realistic sample. The sample average, which you can calculate using
your generated chain of states, will tend toward the desired
thermal average (more generally, the ensemble average),
the answer you're looking for. Detailed balance is a sufficient,
but not generally necessary, condition for tending to the canonical
distribution.

For references, see any text on Monte Carlo methods.
The one I have handy right now is __Quantum Fields on a Lattice__
by I. Montvay and G. Münster (Cambridge Univ. Press, 1994).