H(X)=−∑ip(xi)logp(xi)
What it is:
This formula calculates the Shannon
entropy, often just called entropy, of a discrete random variable X.
In simple terms, entropy measures the average amount of uncertainty or surprise
associated with the possible outcomes of that random variable.
Breaking down the components:
- H(X):
This represents the entropy of the random variable X. It's the value we
are calculating.
- X:
This is a random variable, which means it can take on different possible
values or outcomes. Think of tossing a coin (X can be Heads or Tails) or
rolling a die (X can be 1, 2, 3, 4, 5, or 6).
- xi:
This represents a specific possible outcome or value that the
random variable X can take. For a coin toss, x1 could be Heads and x2
could be Tails.
- p(xi):
This is the probability that the random variable X takes on the
specific value xi. For a fair coin, p(Heads)=0.5 and p(Tails)=0.5.
- logp(xi):
This is the logarithm of the probability p(xi).
- The
base of the logarithm determines the units of entropy.
- Base
2 (log2): Units are bits (most common in information theory).
- Base
e (ln): Units are nats.
- Base
10 (log10): Units are hartleys or dits.
- Since
probabilities p(xi) are between 0 and 1, their logarithms (for bases
> 1) will be negative or zero.
- ∑i:
This is the summation symbol. It means we need to calculate the term p(xi)logp(xi)
for every possible outcome xi of the random variable X, and then
add all those terms together.
- −:
The negative sign at the beginning ensures that the final entropy value H(X)
is non-negative (since the logp(xi) terms are non-positive, and we are
summing them).
Putting it together:
The term −p(xi)logp(xi)
quantifies the "surprise" or information content associated with
outcome xi, weighted by how likely that outcome is. Less likely events (small p(xi))
have a higher "surprise" (large negative logp(xi)). The formula sums
these weighted surprise values across all possible outcomes to give the average
surprise or uncertainty of the random variable X.
- High
Entropy: Means high uncertainty. The outcomes are more evenly spread
out in probability (like a fair coin).
- Low
Entropy: Means low uncertainty. One or a few outcomes are much more
likely than others (like a biased coin that almost always lands heads).
The minimum entropy is 0, which occurs when one outcome has a probability
of 1 (no uncertainty at all).
Example: Coin Toss
Let X be the outcome of a coin
toss. Possible outcomes are Heads (H) and Tails (T). We'll use log2 for units
in bits.
Case 1: Fair Coin
- Probabilities:
p(H)=0.5, p(T)=0.5
- Calculation:
H(X)=−[p(H)log2p(H)+p(T)log2p(T)] H(X)=−[0.5log2(0.5)+0.5log2(0.5)] H(X)=−[0.5×(−1)+0.5×(−1)]
(since log2(0.5)=log2(2−1)=−1) H(X)=−[−0.5−0.5] H(X)=−[−1] H(X)=1 bit
- Interpretation:
There is 1 bit of uncertainty associated with a fair coin toss. This is
the maximum possible entropy for a variable with two outcomes.
Case 2: Biased Coin (Always
Lands Heads)
- Probabilities:
p(H)=1, p(T)=0
- Calculation:
We need to use the fact that limp→0plogp=0. H(X)=−[p(H)log2p(H)+p(T)log2p(T)]
H(X)=−[1×log2(1)+0×log2(0)] H(X)=−[1×0+0] (since log2(1)=0 and using
the limit for the second term) H(X)=0 bits
- Interpretation:
There is 0 bits of uncertainty. We know the outcome before the toss, so
there's no surprise.
Case 3: Slightly Biased Coin
- Probabilities:
p(H)=0.8, p(T)=0.2
- Calculation:
H(X)=−[p(H)log2p(H)+p(T)log2p(T)] H(X)=−[0.8log2(0.8)+0.2log2(0.2)] H(X)≈−[0.8×(−0.3219)+0.2×(−2.3219)]
H(X)≈−[−0.2575−0.4644] H(X)≈−[−0.7219] H(X)≈0.7219 bits
- Interpretation:
There is less uncertainty than the fair coin (1 bit) but more than the
completely predictable coin (0 bits).
In essence, the formula provides
a way to quantify the average unpredictability of a system or a source of
information based on the probabilities of its different states or symbols.
Sources and related content
Comments
Post a Comment