Known Public Domain

H(X)=−∑ip(xi)logp(xi)

What it is:

This formula calculates the Shannon entropy, often just called entropy, of a discrete random variable X. In simple terms, entropy measures the average amount of uncertainty or surprise associated with the possible outcomes of that random variable.

Breaking down the components:

H(X): This represents the entropy of the random variable X. It's the value we are calculating.
X: This is a random variable, which means it can take on different possible values or outcomes. Think of tossing a coin (X can be Heads or Tails) or rolling a die (X can be 1, 2, 3, 4, 5, or 6).
xi: This represents a specific possible outcome or value that the random variable X can take. For a coin toss, x1 could be Heads and x2 could be Tails.
p(xi): This is the probability that the random variable X takes on the specific value xi. For a fair coin, p(Heads)=0.5 and p(Tails)=0.5.
logp(xi): This is the logarithm of the probability p(xi).

The base of the logarithm determines the units of entropy.

Base 2 (log2): Units are bits (most common in information theory).
Base e (ln): Units are nats.
Base 10 (log10): Units are hartleys or dits.

Since probabilities p(xi) are between 0 and 1, their logarithms (for bases > 1) will be negative or zero.

∑i: This is the summation symbol. It means we need to calculate the term p(xi)logp(xi) for every possible outcome xi of the random variable X, and then add all those terms together.
−: The negative sign at the beginning ensures that the final entropy value H(X) is non-negative (since the logp(xi) terms are non-positive, and we are summing them).

Putting it together:

The term −p(xi)logp(xi) quantifies the "surprise" or information content associated with outcome xi, weighted by how likely that outcome is. Less likely events (small p(xi)) have a higher "surprise" (large negative logp(xi)). The formula sums these weighted surprise values across all possible outcomes to give the average surprise or uncertainty of the random variable X.

High Entropy: Means high uncertainty. The outcomes are more evenly spread out in probability (like a fair coin).
Low Entropy: Means low uncertainty. One or a few outcomes are much more likely than others (like a biased coin that almost always lands heads). The minimum entropy is 0, which occurs when one outcome has a probability of 1 (no uncertainty at all).

Example: Coin Toss

Let X be the outcome of a coin toss. Possible outcomes are Heads (H) and Tails (T). We'll use log2 for units in bits.

Case 1: Fair Coin

Probabilities: p(H)=0.5, p(T)=0.5
Calculation: H(X)=−[p(H)log2p(H)+p(T)log2p(T)] H(X)=−[0.5log2(0.5)+0.5log2(0.5)] H(X)=−[0.5×(−1)+0.5×(−1)] (since log2(0.5)=log2(2−1)=−1) H(X)=−[−0.5−0.5] H(X)=−[−1] H(X)=1 bit
Interpretation: There is 1 bit of uncertainty associated with a fair coin toss. This is the maximum possible entropy for a variable with two outcomes.

Case 2: Biased Coin (Always Lands Heads)

Probabilities: p(H)=1, p(T)=0
Calculation: We need to use the fact that limp→0plogp=0. H(X)=−[p(H)log2p(H)+p(T)log2p(T)] H(X)=−[1×log2(1)+0×log2(0)] H(X)=−[1×0+0] (since log2(1)=0 and using the limit for the second term) H(X)=0 bits
Interpretation: There is 0 bits of uncertainty. We know the outcome before the toss, so there's no surprise.

Case 3: Slightly Biased Coin

Probabilities: p(H)=0.8, p(T)=0.2
Calculation: H(X)=−[p(H)log2p(H)+p(T)log2p(T)] H(X)=−[0.8log2(0.8)+0.2log2(0.2)] H(X)≈−[0.8×(−0.3219)+0.2×(−2.3219)] H(X)≈−[−0.2575−0.4644] H(X)≈−[−0.7219] H(X)≈0.7219 bits
Interpretation: There is less uncertainty than the fair coin (1 bit) but more than the completely predictable coin (0 bits).

In essence, the formula provides a way to quantify the average unpredictability of a system or a source of information based on the probabilities of its different states or symbols.

Sources and related content

Known Public Domain - Bytes

Search This Blog

Comments

Post a Comment