Limit Theorems
Contents
Limit Theorems#
Today we will discuss limit theorems that describe the limiting behavior of the sum of independent random variables as the number of summands becomes large. Limit theorems are considered to be some of the most important theoretical results in probability theory. They are extremely useful, because many commonly computed statistical quantities, such as averages, can be presented as sums.
Markov’s and Chebyshev’s Inequalities#
We mentioned earlier that the variance or standard deviation of a random variable gives an indication as to how spread out its possible values are. Chebyshev’s inequality lends a quantitative aspect to this indication.
To prove the this inequality we first state another important result known as Markov’s inequality.
Markov’s Inequality#
Theorem. Let
Proof. The proof is outside the scope of this class. An interested reader can consult, for instance, A First Course in Probability by Sheldon Ross.
The importance of Markov’s inequality is that it enables us to derive bounds on probabilities when only the mean of the probability distribution is known. Of course, if the actual distribution was known, then the desired probabilities could be computed exactly and we would not need to use the inequality.
Chebyshev’s Inequality#
Similarly to Markov’s inequality, Chebyshev’s inequality allows us to derive bounds on probabilities. However, it requires both the mean and the variance of the probability distribution to be known.
Theorem. Let
Proof. Let
Since
The theorem says that if the variance is very small, there is a high probability that
For another interpretation, we can set
For example, the probability that
As Chebyshev’s inequality is valid for all distributions of the random variable
The Weak Law of Large Numbers#
It is commonly believed that if a fair coin is tossed many times and the proportion of heads is calculated, that proportion will be close to
Mathematician John Kerrich tested this belief empirically while detained as a prisoner during World War II. He tossed a coin 10,000 times and observed heads 5067 times.
The law of large numbers is a mathematical formulation of this belief.
The tosses of the coin are modeled as independent random trials. The random variable
The law of large numbers states that
Theorem. Let
Proof. We first find
Since
From Chebyshev’s inequality follows that
The above theorem is known as the weak law of large numbers, because
There is another mode of convergence, called strong convergence, which asserts more than convergence in probability.
Strong convergence implies that, beyond some point in the sequence, the difference between
The above result can also be proven and is known as the strong law of large numbers. This version of the law of large numbers is outside the scope of this course.
The Central Limit Theorem#
In applications, we often want to find
The approximation is often arrived at the central limit theorem (CLT). The CLT is the most famous limit theorem in probability theory and is the main topic of today’s lecture.
The CLT is concerned with a limiting property of sums of random variables.
If
we already know from the weak law of large numbers that
The CLT is concerned not with the fact that the ratio
Theorem. Let
tends to the standard normal distribution as
Proof. The proof is outside the scope of this class. An interested reader can consult, for instance, A First Course in Probability by Sheldon Ross.
There are many central limit theorems of various degrees of abstraction and generality. The above version is one of the simplest versions. Loosely put, the above version tells us the following:
If
are independent random variables with mean and standard deviation , and if is large, then, is approximately normally distributed with mean and standard deviation .
Another relatively simple version of the CLT concerns averages of random variables. This version states that
If
are independent random variables with mean and standard deviation , and if is large, then, is approximately normally distributed with mean and standard deviation .
For practical purposes, the limiting results in itself are not of primary interest to us. We are more interested in their use as an approximation with finite values of
It is impossible to give a concise and definitive statement of how good the approximation is for what value of
How fast the approximation becomes good depends on the distribution of the summands, the
If the distribution is fairly symmetric and has tails that die off rapidly, the approximation becomes good for relatively small values of
Example. The CLT tells us that a random variable defined as the sum of a large number of Bernoulli random variables should be approximately normally distributed.
Let us look at the distributions obtained for

We see that for all three values of
When only 10 random variables are used to compute the sum, the distribution is centered around 5. When the sum of 40 random variables is computed, the distribution of the sum is centered around 20. With 100 random variables, it is centered around 50.
These results are in agreement with the CLT that predicts the normal distribution of the sum with mean
Furthermore, the distribution with
Example. Let
a. Use the CLT to approximate the distribution of
b. Calculate an approximation to
Solution.
a. To apply the CLT, we need to find
For a continuous uniform distribution on
Substituting
The CLT tells us that
b. To find
Denoting the standard normal distribution by
Example. An instructor has 50 exams that will be graded in sequence. The times required to grade the 50 exams are independent, with a common distribution that has mean 20 minutes and standard deviation 4 minutes. Approximate the probability that the instructor will grade at least 25 of the exams in the first 450 minutes of work.
Solution. If we let
is the time it takes to grade the first 25 exams.
Because the instructor will grade at least 25 exams in the first 450 minutes of work if the time it takes to grade the first 25 exams is less than or equal to 450, we see that the desired probability is
To approximate this probability, we use the central limit theorem. Now,
and
Consequently, with