(probability theory, information theory, based on statistical mechanics:)

The entropy of a random variable X is

H(X) = supX1,...,Xni=1n P(X∈Xi) log2 1/P(X∈Xi)

where the supremum is taken over all partitions of the range of X.

 

As partitions become finer, the finite sum above cannot decrease.  So, when the range of X is finite and X∈{x1,...,xm}, we simply have the "well known" formula for entropy

H(X) = ∑i=1m P(X=xi) log2 1/P(X=xi).

 

We can also go in the other direction, and start with the above finite version. The entropy of any variable X with any range is the most you can get out of taking the above and applying it to all projections of X onto a finite set. If X takes on finitely many values, there is no difference -- the "best" partition turns out to be the finest one, i,e, by isolating each value of X into its own partition. But the first formula lets you compute the entropy of other X's, ones that have an infinite range.

If X is a continuous random variable with probability density function or PDF p(x) then you get the expected formula for entropy

H(X) = ∫-∞ p(x) log2 (1/p(x)) dx