PROB_L7

Expectation

Suppose $X$ is any discrete random variable having possible values $x_{1}, x_{2}, \dots$ . We would like to define the expectation of $X$ as

EX = j = 1 \sum \infty x_{j} f (x_{j}) .

If the support of $X$ is finite, we are good. In the general discrete case, the definition is valid only if the sum $\sum_{j} ∣ x_{j} ∣ f (x_{j})$ is defined (basically, the sequence $(x_{j} f (x_{j}))$ must converge absolutely, to avoid Riemann rearrangement shenanigans). This leads to the following.

Definition 1(Definition).

Let $X$ be a discrete random variable having density $f$ . If $\sum_{j} ∣ x_{j} ∣ f (x_{j}) < \infty$ , we say that $X$ has finite expectation and we define its expectation by
$EX = j = 1 \sum \infty x_{j} f (x_{j}) .$
On the other hand if $\sum_{j} ∣ x_{j} ∣ f (x_{j}) = \infty$ , then we say $X$ does not have finite expectation and $EX$ is undefined.

It should be evident that $X$ has finite expectation iff $∣ X ∣$ has finite expectation.

Properties of expectation

Theorem 2(Property 1 (LOTUS)).

Let $X$ be a discrete $r$ -dimensional random vector having density $f$ and let $φ$ be a real valued function on $R^{r}$ . Then, the random variable $Z = φ (X)$ has finite expectation iff
$x \sum ∣ φ (x) ∣ f (x) < \infty.$
If $Z$ has finite expectation,
$EZ = x \sum φ (x) f (x) .$

Proof
Clearly,
$j \sum ∣ z_{j} ∣ f_{Z} (z_{j}) = x \sum ∣ φ (x) ∣ f_{X} (x) .$
Thus, $Z$ has finite expectation iff $\sum_{x} ∣ φ (x) ∣ f (x) < \infty$ . Similarly, it is clear that
$j \sum z_{j} f_{Z} (z_{j}) = x \sum φ (x) f_{X} (x),$
and the second part follows.

Theorem 3(Property 2).

Let $X$ and $Y$ be two random variables having finite expectation.

If $c$ is a constant and $P (X = c) = 1$ , then $EX = c$ .

If $c$ is a constant, then $c X$ has finite expectation and $E (c X) = c EX$ .

$X + Y$ has finite expectation and $E (X + Y) = EX + E Y$ .

Suppose $P (X \geq Y) = 1$ . Then $EX \geq E Y$ ; moreover $EX = E Y$ iff $P (X = Y) = 1$ .

$∣ EX ∣ \leq E ∣ X ∣$ .

Proof of 3

Let $W = (X, Y)$ . Let $φ : R^{2} \to R$ be defined by $φ (x, y) = x + y$ . Then, $X + Y = φ (W)$ has finite expectation if
$w \sum ∣ φ (w) ∣ f_{W} (w) \leq \infty$
Now,
$x, y \sum ∣ x + y ∣ f_{W} (x, y) \leq x, y \sum ∣ x ∣ f_{W} (x, y) + x, y \sum ∣ y ∣ f (x, y) = x \sum ∣ x ∣ y \sum f_{W} (x, y) + y \sum ∣ y ∣ x \sum f_{W} (x, y) = x \sum ∣ x ∣ f_{x} (x) + y \sum ∣ y ∣ f_{Y} (y) \leq \infty.$
Hence, $X + Y$ has finite expectation. From the definition of expectation, we get
$E (X + Y) = x, y \sum (x + y) f (x, y) = x, y \sum x f (x, y) + x, y \sum y f (x, y) = EX + E Y .$
Proof of 4
Let $Z = X - Y$ .
$z \sum z f_{Z} (z) = EZ = E (X - Y) = EX - E Y .$
Since $P (Z \geq 0) = P (X \geq Y) = 1$ , $z > 0$ for all $z$ in the support of $Z$ . Thus, $EX - E Y \geq 0$ . If $EX = E Y$ , all $z$ in the support of $Z$ are forced to be $0$ , so we get $P (Z = 0) = P (X = Y) = 1$ .

Theorem 4(Property 3).

Let $X$ be a random variable such that for some constant $M$ , $P (∣ X ∣ \leq M) = 1$ . Then $X$ has finite expectation and $∣ EX ∣ \leq M$ .

Theorem 5(Property 4).

Let $X$ and $Y$ be two independent random variables having finite expectations. Then $X Y$ has finite expectation and $E (X Y) = (EX) (E Y)$ .

Proof
Since $X$ and $Y$ are independent, the joint density of $X$ and $Y$ is $f_{X} (x) f_{Y} (y)$ . Thus
$x, y \sum ∣ x y ∣ f (x, y) = x, y \sum ∣ x ∣∣ y ∣ f_{X} (x) f_{Y} (y) = (x \sum ∣ x ∣ f_{X} (x)) (y \sum ∣ y ∣ f_{Y} (y)) \leq \infty.$
So, $X Y$ has finite expectation. Using Property $1$ , we can conclude that $E (X Y) = (EX) (E Y)$ .

Theorem 6(Property 5).

Let $X$ be a nonnegative integer-valued random variable. Then $X$ has finite expectation iff the series $\sum_{x = 1}^{\infty} P (X \geq x)$ converges. If the series does converge, then $EX = \sum_{x = 1}^{\infty} P (X \geq x)$ .

Proof
Follows from the fact that
$x = 1 \sum \infty x P (X = x) = x = 1 \sum \infty P (X \geq x) .$

Moments

Definition 7(Definition).

Let $X$ be a discrete random variable, and let $r \geq 0$ be an integer. We say that $X$ has a moment of order $r$ if $X^{r}$ has finite expectation. In that case we define the $r$ th moment of $X$ as $E X^{r}$ . If $X$ has a moment of order $r$ , then the $r$ th moment of $X - μ$ , where $μ = EX$ , is called the $r$ th central moment of $X$ .
$E X^{r} = x \sum x^{r} f (x);$ $E (X - μ)^{r} = x \sum (x - μ)^{r} f (x) .$

Properties of Moments

Theorem 8(Property 1).

If $X$ has a moment of order $r$ , then $X$ has a moment of order $k$ for all $k \leq r$ .

Proof
Let $k \leq r$ . For any $x$ in the support of $X$ , it is always true that $∣ x ∣^{k} \leq ∣ x ∣^{r} + 1$ . Since $\sum_{x} (∣ x ∣^{r} + 1) f (x) = E X^{r} + 1$ , it follows from the comparison test that $\sum ∣ x ∣^{k} f (x)$ converges. Thus, $X^{k}$ has finite expectation.

Example 9(A random variable which does not have finite first moment).

Let $f$ be the function defined on $R$ by
$f (x) = ⎩ ⎨ ⎧ \frac{1}{x ( 1 + x )} 0 x = 1, 2, \dots otherwise.$
$f$ clearly satisfies the properties of a discrete density function. However, $f$ does not have finite expectation because $\sum_{x} ∣ x ∣ f (x) = \sum_{x} \frac{1}{1 + x}$ diverges.

Example 10(A random variable which has a finite moment of order $r$ but no higher finite moment).

We know that the series $\sum_{x \in N} \frac{1}{x ^{k}}$ converges for $k \geq 2$ . Let the series $\sum_{x \in N} \frac{1}{x ^{r + 2}}$ converge to $C$ . Let $X$ be a random variable with density
$f (x) = ⎩ ⎨ ⎧ \frac{1}{C x ^{r + 2}} 0 x = 1, 2, \dots otherwise.$
Then,
$E X^{r} = x \in N \sum \frac{1}{C x ^{2}} = \frac{π ^{2}}{6 C} .$
But, $E X^{r + 1}$ and all higher moments are not finite.

We know that the $r$ th central moment of $X$ exists given $X$ has a moment of order $r$ because of the following theorem.

Theorem 11(Property 2).

If the random variables $X$ and $Y$ have moments of order $r$ , then $X + Y$ also has a moment of order $r$ .

Variance

Definition 12(Definition).

If $X$ is a random variable having finite second moment, then the variance of $X$ , denoted by $V (X)$ , is defined to be the second central moment of $X$ :
$V (X) = E [(X - EX)^{2}] .$
The nonnegative number $σ = V (X)$ is called the standard deviation of $X$ or $f_{X}$ .

By expanding the right hand side, we get

V (X) = E X^{2} - (EX)^{2} .

Note that $Var (X) = 0$ iff $X$ is a constant.

$EX$ and $Var (X)$ can be computed using the PGF for $X$ :

EX = Φ_{X}^{'} (1),

Var (X) = Φ_{X}^{''} (1) + Φ_{X}^{'} (1) - (Φ_{X}^{'} (1))^{2} .

Variance of a sum

Let $X$ and $Y$ be two random variables each having finite second moment. Then $X + Y$ has finite second moment and hence finite variance. Also,

Var (X + Y) = Var (X) + Var (Y) + 2 E [(X - EX) (Y - E Y)] .

The quantity $E [(X - EX) (Y - E Y)]$ is called the covariance of $X$ and $Y$ . Thus,

Var (X + Y) = Var (X) + Var (Y) + 2 Cov (X, Y) .

Further,

Cov (X, Y) = E (X Y) - (EX) (E Y) .

From this, it is clear that $Cov (X, Y) = 0$ when $X$ and $Y$ are independent (the converse is not true). So, if $X$ and $Y$ are independent, $Var (X + Y) = Var (X) + Var (Y)$ .

More generally, if $X_{1}, \dots, X_{n}$ are $n$ random variables each having a finite second moment, then

Var (i = 1 \sum n X_{i}) = i = 1 \sum n Var (X_{i}) + 2 i = 1 \sum n - 1 j = i + 1 \sum n Cov (X_{i}, X_{j}) .

In particular, if $X_{1}, \dots, X_{n}$ are mutually independent, then

Var (i = 1 \sum n X_{i}) = i = 1 \sum n Var (X_{i}) .

In particular, if $X_{1}, \dots, X_{n}$ are mutually independent random variables having a common variance $σ^{2}$ (for example, if they had the same density), then

Var (X_{1} + \dots + X_{n}) = n Var (X_{1}) = n σ^{2} .

Another useful fact is that $Var (a X) = a^{2} Var (X)$ .

The Schwarz inequality

Theorem 13(Theorem).

Let $X$ and $Y$ have finite second moments. Then,
$[E (X Y)]^{2} \leq (E X^{2}) (E Y^{2}) .$
Equality holds iff either $P (Y = 0) = 1$ or $P (X = aY) = 1$ for some constant $a$ .

Proof
If $P (Y = 0) = 1$ or $P (X = aY) = 1$ equality holds trivially.

Sorry, I'm stupid (Not you, I am stupid)

If $P (X = aY) = 1$ , then
$E (X Y) = x, y \sum x y P (X = x, Y = y) = y \sum y x \sum x P (X = x, Y = y) = y \sum y a y P (X = a y, Y = y) + x \neq = a y \sum x P (X = x, Y = y)$
Now, ${X = x, Y = y}$ where $x \neq = a y$ is a subset of ${X \neq = aY}$ . $1 = P ({X = aY} ⊔ {X \neq = aY}) = P (X = aY) + P (X \neq = aY)$ so $P (X \neq = aY) = 0$ . Now, $P (X = x, Y = y) \leq P (X \neq = aY)$ when $x \neq = a y$ so it follows that $P (X = x, Y = y) = 0$ when $x \neq = a y$ . Therefore,
$E (X Y) = y \sum a y^{2} P (X = a y, Y = y)$
Now, $P (Y = y) = P (Y = y, X = a y) + P (Y = y, X \neq = a y) = P (Y = y, X = a y)$ . Thus,
$E (X Y) = y \sum a y^{2} P (Y = y) = a E (Y^{2}) .$
Also,
$E (X^{2}) = x \sum x^{2} P (X = x) = a y \sum (a y)^{2} P (X = a y) = a y \sum (a y)^{2} P (Y = y) = a^{2} E (Y^{2})$

The correlation coefficient

Applying the Schwarz inequality to the random variables $(X - EX)$ and $(Y - E Y)$ , we see that

(E [(X - EX) (Y - E Y)])^{2} \leq [E (X - EX)^{2}] [E (Y - E Y)^{2}],

that is,

[Cov (X, Y)]^{2} \leq Var (X) Var (Y);

Cov (X, Y) \leq Var (X) Var (Y) .

For two random variables $X$ and $Y$ having finite non-zero variances, the quantity

ρ (X, Y) \equiv \frac{Cov ( X , Y )}{Var(X) Var ( Y )}

is called the correlation coefficient of $X$ and $Y$ . Clearly, $- 1 \leq ρ (X, Y) \leq 1$ .

Chebyshev’s inequality

Let $X$ be a nonnegative random variable having finite expectation, and let $t$ be a positive real number. Definite the random variable $Y$ by setting $Y = 0$ if $X < t$ and $Y = t$ if $X \geq t$ . Then,

E Y = tP (Y = t) + 0 P (Y = 0) = tP (Y = t) = tP (X \geq t) .

Now clearly, $X \geq Y$ and hence $EX \geq E Y$ . Thus
$EX \geq E Y = tP (X \geq t)$ or

P (X \geq t) \leq \frac{EX}{t} .

If we apply the the above inequality to the random variable $(X - μ)^{2}$ and the number $t^{2}$ , we get Chebyshev’s inequality:

P ((X - μ)^{2} \geq t^{2}) \leq \frac{E ( X - μ ) ^{2}}{t ^{2}} = \frac{Var ( X )}{t ^{2}} .

Theorem 14(Chebyshev's inequality).

Let $X$ be a random variable with mean $μ$ and finite variance. Then for any real number $t > 0$
$P (∣ X - μ ∣ \geq t) \leq \frac{Var ( X )}{t ^{2}} .$

Chebyshev’s inequality gives an upper bound in terms of $Var (X)$ and $t$ for the probability that $X$ deviates from its mean by more than $t$ units.

Laws of large numbers

Weak law of large numbers

Let $X_{1}, \dots, X_{n}$ be $n$ independent random variables having the same distribution. These random variables may be thought of as $n$ independent measurements of some quantity that is distributed according to their common distribution. Suppose that the common distribution of these random variables has finite mean $μ$ . Then for $n$ sufficiently large we would expect that their arithmetic mean $S_{n} / n = (X_{1} + \dots + X_{n}) / n$ should be close to $μ$ . If the $X_{i}$ also have finite variance, then

Var (\frac{S _{n}}{n}) = \frac{n σ ^{2}}{n ^{2}} = \frac{σ ^{2}}{n}

and thus $Var (S_{n} / n) \to 0$ as $n \to \infty$ , that is, as $n$ gets large, the distribution $S_{n} / n$ becomes more concentrated about its mean $μ$ . More precisely, by applying Chebyshev’s inequality to $S_{n} / n$ we obtain

P (\frac{S _{n}}{n} - μ \geq δ) \leq \frac{Var ( S _{n} / n )}{δ ^{2}} = \frac{σ ^{2}}{n δ ^{2}} .

It follows that for any $δ > 0$ ,

n \to \infty lim P (\frac{S _{n}}{n} - μ \geq δ) = 0.

The number $δ$ can be thought of as the desired accuracy in the approximation of $μ$ by $S_{n} / n$ . The above equation assures us that no matter how small $δ$ may be, the probability that $S_{n} / n$ approximates $μ$ to within this accuracy converges to $1$ as the number of observations gets large. This is called the Weak Law of Large Numbers.

Theorem 15(Weak law of large numbers).

Let $X_{1}, X_{2}, \dots, X_{n}$ be independent random variables having a common distribution with finite expectation $μ$ and let $S_{n} = X_{1} + \dots + X_{n}$ . Then for any $δ > 0$ ,
$n \to \infty lim P (\frac{S _{n}}{n} - μ \geq δ) = 0.$

Strong law of large numbers

Definition 16(Definition).

We say that a sequence of random variables ${Y_{n}}_{n = 1}^{\infty}$ converges in probability to a random variable $Y$ if for all $ϵ > 0$ , $lim_{n \to \infty} P (∣ Y_{n} - Y ∣ > ϵ) = 0$ .

Denote $S_{n} / n$ by $\overline{S}_{n}$ . Note that the weak law states that the sequence of events ${\overline{S}_{n}}$ converges in probability to the constant random variable $μ$ .

Definition 17(Definition).

We say that a sequence of random variables ${Y_{n}}_{n = 1}^{\infty}$ converges almost surely to a random variable $Y$ if $lim_{n \to \infty} Y_{n} (ω) = Y (ω)$ for almost every $ω$ , that is, $P ({ω ∣ lim_{n \to \infty} Y_{n} (ω) = Y (ω)}) = 1$ .

For discrete random variables, this reduces to $lim_{n \to \infty} Y_{n} (ω) = Y (ω)$ for all $ω \in Ω$ .

NoNotes

Graph View

PROB_L7

Expectation

Properties of expectation

Moments

Properties of Moments

Variance

Variance of a sum

The Schwarz inequality

The correlation coefficient

Chebyshev’s inequality

Laws of large numbers

Weak law of large numbers

Strong law of large numbers

Table of Contents

Backlinks