Expectation
Suppose is any discrete random variable having possible values . We would like to define the expectation of as
If the support of is finite, we are good. In the general discrete case, the definition is valid only if the sum is defined (basically, the sequence must converge absolutely, to avoid Riemann rearrangement shenanigans). This leads to the following.
Definition 1.
Let be a discrete random variable having density . If , we say that has finite expectation and we define its expectation by
On the other hand if , then we say does not have finite expectation and is undefined.
It should be evident that has finite expectation iff has finite expectation.
Properties of expectation
Theorem 2(Property 1 (LOTUS)).
Let be a discrete -dimensional random vector having density and let be a real valued function on . Then, the random variable has finite expectation iff
If has finite expectation,
Proof
Clearly,Thus, has finite expectation iff . Similarly, it is clear that
and the second part follows.
Theorem 3(Property 2).
Let and be two random variables having finite expectation.
- If is a constant and , then .
- If is a constant, then has finite expectation and .
- has finite expectation and .
- Suppose . Then ; moreover iff .
- .
Proof of 3
Let . Let be defined by . Then, has finite expectation if
Now,
Hence, has finite expectation. From the definition of expectation, we get
Proof of 4
Let .Since , for all in the support of . Thus, . If , all in the support of are forced to be , so we get .
Theorem 4(Property 3).
Let be a random variable such that for some constant , . Then has finite expectation and .
Theorem 5(Property 4).
Let and be two independent random variables having finite expectations. Then has finite expectation and .
Proof
Since and are independent, the joint density of and is . ThusSo, has finite expectation. Using Property , we can conclude that .
Theorem 6(Property 5).
Let be a nonnegative integer-valued random variable. Then has finite expectation iff the series converges. If the series does converge, then .
Proof
Follows from the fact that
Moments
Definition 7.
Let be a discrete random variable, and let be an integer. We say that has a moment of order if has finite expectation. In that case we define the th moment of as . If has a moment of order , then the th moment of , where , is called the th central moment of .
Properties of Moments
Theorem 8(Property 1).
If has a moment of order , then has a moment of order for all .
Proof
Let . For any in the support of , it is always true that . Since , it follows from the comparison test that converges. Thus, has finite expectation.
Example 9(A random variable which does not have finite first moment).
Let be the function defined on by
clearly satisfies the properties of a discrete density function. However, does not have finite expectation because diverges.
Example 10(A random variable which has a finite moment of order but no higher finite moment).
We know that the series converges for . Let the series converge to . Let be a random variable with density
Then,
But, and all higher moments are not finite.
We know that the th central moment of exists given has a moment of order because of the following theorem.
Theorem 11(Property 2).
If the random variables and have moments of order , then also has a moment of order .
Variance
Definition 12.
If is a random variable having finite second moment, then the variance of , denoted by , is defined to be the second central moment of :
The nonnegative number is called the standard deviation of or .
By expanding the right hand side, we get
Note that iff is a constant.
and can be computed using the PGF for :
Variance of a sum
Let and be two random variables each having finite second moment. Then has finite second moment and hence finite variance. Also,
The quantity is called the covariance of and . Thus,
Further,
From this, it is clear that when and are independent (the converse is not true). So, if and are independent, .
More generally, if are random variables each having a finite second moment, then
In particular, if are mutually independent, then
In particular, if are mutually independent random variables having a common variance (for example, if they had the same density), then
Another useful fact is that .
The Schwarz inequality
Theorem 13.
Let and have finite second moments. Then,
Equality holds iff either or for some constant .
Proof
If or equality holds trivially.
Sorry, I'm stupid (Not you, I am stupid)
If , then
Now, where is a subset of . so . Now, when so it follows that when . Therefore,
Now, . Thus,
Also,
The correlation coefficient
Applying the Schwarz inequality to the random variables and , we see that
that is,
For two random variables and having finite non-zero variances, the quantity
is called the correlation coefficient of and . Clearly, .
Chebyshev’s inequality
Let be a nonnegative random variable having finite expectation, and let be a positive real number. Definite the random variable by setting if and if . Then,
Now clearly, and hence . Thus
or
If we apply the the above inequality to the random variable and the number , we get Chebyshev’s inequality:
Theorem 14(Chebyshev's inequality).
Let be a random variable with mean and finite variance. Then for any real number
Chebyshev’s inequality gives an upper bound in terms of and for the probability that deviates from its mean by more than units.
Laws of large numbers
Weak law of large numbers
Let be independent random variables having the same distribution. These random variables may be thought of as independent measurements of some quantity that is distributed according to their common distribution. Suppose that the common distribution of these random variables has finite mean . Then for sufficiently large we would expect that their arithmetic mean should be close to . If the also have finite variance, then
and thus as , that is, as gets large, the distribution becomes more concentrated about its mean . More precisely, by applying Chebyshev’s inequality to we obtain
It follows that for any ,
The number can be thought of as the desired accuracy in the approximation of by . The above equation assures us that no matter how small may be, the probability that approximates to within this accuracy converges to as the number of observations gets large. This is called the Weak Law of Large Numbers.
Theorem 15(Weak law of large numbers).
Let be independent random variables having a common distribution with finite expectation and let . Then for any ,
Strong law of large numbers
Definition 16.
We say that a sequence of random variables converges in probability to a random variable if for all , .
Denote by . Note that the weak law states that the sequence of events converges in probability to the constant random variable .
Definition 17.
We say that a sequence of random variables converges almost surely to a random variable if for almost every , that is, .
For discrete random variables, this reduces to for all .