Functions of Random Variables#
Today we will talk about the properties of functions of random variables. The lecture will focus on discrete random variables. However, all the results remain valid for continuous random variables.
LOTUS#
Imagine we want to study some function \(g(\cdot)\) of a discrete random variable \(X\).
\(g(X)\) could be any expression, eg, \(X^2\) or \(e^X\) or \(\log X\) or whatever.
What is the expectation of \(g(X)\), ie, \(E[g(X)]\)?
If we think of \(g(X)\) as a new random variable, then it would have expectation:
So, for example, we can say that
or
and so forth.
So what does LOTUS even stand for?
The “Law of the Unconscious Statistician” (LOTUS) because it is so often used without even thinking about it.
Now let’s use LOTUS to build up some more facts.
What is \(E[aX]\) where \(a\) is a number?
Since \(aX\) is a function of \(X\), we use LOTUS:
And what is \(E[X+b]\) where \(b\) is a number?
By combining the above observations we can conclude that the expected value of \(aX+b\), \(E[aX+b]\), is equal to \(aE[X] + b.\)
Sums of Random Variables#
With LOTUS, we can now think about sums of random variables.
Linearity of Expectation#
Let \(X\) and \(Y\) be discrete random variables. Then, no matter what their joint distribution is,
Proof.
Since \(E[X + Y]\) involves two random variables, we have to evaluate the expectation using LOTUS, with \(g(x, y) = x + y\). Suppose that the joint distribution of \(X\) and \(Y\) is \(p(x, y)\). Then:
In other words, linearity of expectation says that you only need to know the marginal distributions of \(X\) and \(Y\) to calculate \(E[X + Y]\).
In particular, it does not matter whether \(X\) and \(Y\) are independent.
Even if \(X\) and \(Y\) are correlated, their expectations still add.
An important corollary is this: suppose we have \(n\) random variables with the same distribution.
Then
That is, if you have \(n\) random variables, and each has mean \(\mu\), then the mean of the sum is \(n\mu\).
Example 1#
Use the linearity of expectation to calculate the expected value of the Binomial distribution.
Steps to Solution
Note that a Binomial is the sum of Bernoulli trials.
Determine the expected value of a Bernoulli trial.
Find expected value of Binomial by linearity of expectation of sum of Bernoulli trials.
Solution
Expected value of a Bernoulli trial is \(p\), and we have \(n\) trials, so expected value of a Binomial distribution is \(np\). This agrees with what we expected from directly calculating the expected value of a Binomial distribution.
Example 2#
Suppose two people are playing Roulette. They each first bet on red three times in a row. (Note that in Roulette, 18 of the 38 numbers are red). The first player leaves, but the second player bets two more times on red. How many more times is player 2 expected to win than player one?
Crucially, note that the number of times player 2 wins is not independent from the number of times player 1 wins, because every time player 1 wins, player 2 also wins.
Solution
Let \(X\) be the number of times player one wins and \(Y\) be the number of times player 2 wins, we want to calculate \(E[Y-X]\)
What is \(X\)?
\(X\) is Binomial with \(n=3\) and \(p=18/38\)
Similarly, \(Y\) is Binomial with \(n=5\) and \(p=18/38\)
How do we calculate \(E[Y-X]\) ?
\( E[Y−X]=E[Y]+E[−1⋅X]=E[Y]+(−1)E[X]=E[Y]−E[X]\)
We just showed that the expect value of a Binomail is \(np\) so putting it all together:
Variance and Covariance#
Using linearity of expectation we can prove some useful equations for calculating variance and covariance.
Consider \(\operatorname{Cov}(X, Y)\) where \(X\) and \(Y\) may have any joint distribution. Recall that:
\(\operatorname{Cov}(X, Y)=E[XY]−E[X]E[Y]\) is a valuable simplification when computing covariance.
It only requires computing the means of \(X\) and \(Y\), and \(E[XY]\).
From this fact we can also conclude that:
One of the most useful results of the linearity of expectation.
Variance of a Sum#
Let’s keep using these facts to explore how the variance of a sum works.
Consider \(X\) and \(Y\) which may have any joint distribution.
What is \(\operatorname{Var}(X + Y)\)?
So we see that when adding random variables, there is a correction to the variance: if the variables are positively correlated, then the variance of their sum is greater than the sum of their variances.
The amount of this correction is twice the covariance.
There’s another important way to look at this result:
When adding independent random variables, variances sum.
So, consider the case where we are summing \(n\) independent random variables, each with mean \(\mu\) and variance \(\sigma^2\).
Then the sum has mean \(n\mu\) and variance \(n\sigma^2\).
Variance of \(aX+b\)#
Finally, if \(\operatorname{Var}(X)\) exists, then \(\operatorname{Var}(aX+b) = a^2\operatorname{Var}(X)\) for constants \(a\) and \(b\). The proof of this property is outside the scope of this course.