Parameter Estimation: Estimators#
Some Key Concepts of Sampling#
A population is the universe of possible data for a specified object. It can be people, places, objects, and many other things. It is not observable.
Example. People (or IP addresses) who have visited or will visit a website.
A sample is a subset of the population. It is observable.
Example. People who visited the website on a specific day.

What is a statistic?#
A statistic is anything (i.e., any function) that can be computed from the collected data sample. A statistic must be observable.
Parameter Estimation#
Example. We are studying the variance of height among male students at BU. Our sample of size 30 is shown below. We want to fit normal distribution
What values should we choose for

Here,
Parameter estimation is inference about an unknown population parameter (or set of population parameters) based on a sample statistic.
Parameter estimation is a commonly used statistical technique. For instance, traffic engineers estimate the rate
In other words, we need parameter estimation when we are given some data and we want to treat it as an independent and identically distributed (or i.i.d.) sample from some population. The population has certain parameters, such as
We will use
Point Estimation#
Example. The sample average of 71.3 inches is a point estimate of the average height of male students at BU.

A point estimator is a statistic that is used to estimate the unknown population parameter and whose realisation is a single point.
Let’s say our data is
We’ll use the hat notation (
Note: The above definition does not require that
In the frequentist perspective on statistics,
This is a very important perspective to keep in mind; it is really the defining feature of the frequentist approach to statistics!
The sampling distribution of a statistic is the probability distribution of that statistic when we draw many samples.

The figure shows a population distribution (left) and the corresponding sampling distribution (right) of a statistic, which in this case is the mean. The sampling distribution is obtained by drawing samples of size n from the population.
Recall that Bernoulli random variable has
only two outcomes: 0 and 1,
one parameter
, which is the probability that the random variable is equal to 1,mean equal to
and variance equal to ,PMF:

Example. Consider data sample
A common estimator for
Bias and Variance#
We use two criteria to describe an estimator: bias and variance.
Bias measures the difference between the expected value of the estimator and the true value of the parameter, while variance measures how much the estimator can vary as a function of the data sample.

Bias. The bias of an estimator is defined as:
where the expectation is over the data (seen as realizations of a random variable) and
An estimator is said to be unbiased if
Continuing our example for the mean of the Bernoulli:
So we have proven that this estimator of
Variance. The variance of an estimator is simply the variance
where
Remember that the data set is random; it is assumed to be an i.i.d. sample of some distribution.
Recall:
I. If
II. If
Let us return again to our example where the sample is drawn from the Bernoulli distribution and the estimator
So we see that the variance of the mean is:
This has a desirable property: as the number of data points
Note: It can be shown that for a sample
What if we use
The variance of this estimator is equal to
However, the bias in this case is equal to
The estimator
Mean Squared Error#
Consider two estimators,
Let us say that these estimators show the following distributions:

The figure shows that estimator
Which is better?
The answer of course depends, but there is a single criterion we can use to try to balance these two kinds of errors.
It is Mean Squared Error.
This measures the “average distance squared” between the estimator and the true value.
It is a good single number for evaluating an estimator, because it turns out that:
For example, the two estimators
We have already shown that for sample
Python example#
Consider the following i.i.d. data sample of size 35.

Is the sample from a discrete uniform distribution? Do the values range from 1 to 6?
Discrete uniform distribution:
parameters:
mean:
variance:
If the population distribution is
We can use the provided sample to estimate the population mean and variance.
Note: We are not fitting
We discussed earlier that regardless of the population distribution, the sample mean is an unbiased estimator of the population mean and its variance is inversely proportional to the size of the sample.
Thus we will use the sample average to estimate the population mean:
# compute the sample mean
mu_hat = np.mean(s)
print(f'The estimate of the population mean is equal to {mu_hat:.4}.')
The estimate of the population mean is equal to 3.343.
What about the population variance? Can it be estimated by
# compute the variance of the sample (without Bessel's correction)
sig2_hat_1 = np.var(s, axis = 0, ddof = 0)
print(f'The biased sample variance is equal to {sig2_hat_1:0.4}.')
The biased sample variance is equal to 2.797.
We will see later in this lecture that this estimator systematically underestimates the population variance. Bessel’s correction is required to obtain an unbiased estimator:
The above expression is known as sample variance.
# compute the sample variance (with Bessel's correction)
sig2_hat_2 = np.var(s, axis = 0, ddof = 1)
print(f'The unbiased sample variance is equal to {sig2_hat_2:0.4}.')
The unbiased sample variance is equal to 2.879.
Conclusion. Based on the provided sample, the population mean can be estimated by 3.343 and the population variance can be estimated by 2.879. These values are close to 3.5 and 2.917, respectively. Therefore,
Effect of Bessel’s correction#

Note: The