Model Fitting
Contents
Model Fitting#
Today we will talk about model fitting, likelihood, and log-likelihood functions. This lecture provides a foundation for our next lecture, where we will discuss maximum likelihood estimation.
Model Fitting#
The notion of parameter estimation is closely related to the concept called model fitting. We have actually been doing this quite a bit already, but now we want to treat the notion more directly.
Imagine that you know that data is drawn from a particular kind of distribution, but you don’t know the value(s) of the distribution’s parameter(s).
The following table shows a number of common distributions and their corresponding parameters:
Distribution |
Parameters |
---|---|
Bernoulli |
|
Binomial |
|
Poisson |
|
Geometric |
|
Exponential |
|
Uniform |
|
Normal |
We formalize the problem as follows. We say that data is drawn from a distribution
The way to read this is: the probability of
We call
The graph below illustrates the probability density functions of several normal distributions (from the same parametric family).
Model fitting is finding the parameters
Notice that in this context, it is the parameters
When we think of
This change in terminology is just to emphasize that we are thinking about varying
For example, consider the dataset below:

Can you imagine that this dataset might be drawn from a normal distribution?
In that case,
Then model fitting would consist of finding the
In other words, the likelihood function gives the probability of observing the given data as a function of the parameter(s). Therefore, the parameters can be estimated by maximizing the likelihood function. We will do that in the next lecture. In this lecture we focus on computing the likelihood function.
Calculating Likelihood#
Let’s think about how to calculate likelihood. Consider a set of
drawn independently from the true but unknown data-generating distribution
Let
Then, for any value of
What is the probability of the entire dataset
We assume that the
Therefore, the joint probability is
We can use a special shorthand notation to represent products. Just like
For example, the product of two numbers
So the joint probability can be written as:
Now, each individual
And there are
For example, if a typical probability is
So the probability of a given dataset as a number will usually be too small to even represent in a computer using standard floating point!
Log-Likelihood#
Luckily, there is an excellent way to handle this problem. Instead of using likelihood, we will use the log of likelihood.
The table below shows some of the properties of the natural logarithm.
Product rule |
|
Quotient rule |
|
Power rule |
|
Exponential\logarithmic |
In the next lecture we will see that we are only interested in the maxima of the likelihood function. Since the log function does not change those points (the log is a monotonic function), using the log of the likelihood works for us.

So we will work with the log-likelihood:
Which becomes:
This way we are no longer multiplying many small numbers, and we work with values that are easy to represent.
Note: The log of a number less than one is negative, so log-likelihoods are always negative values.
Example.
Suppose that
Here
We want to find the corresponding log-likelihood function.
Based on the observed data sample, the (joint) likelihood function is equal to
Since the likelihood function is given by
the log-likelihood function can be written as
We can visualise the log-likelihood function by varying

Example.
As an example, let’s return to Bortkeiwicz’s horse-kick data, and see how the log-likelihood changes for different values of the Poisson parameter
Recall that Bortkeiwicz had collected deaths by horse-kick in the Prussian army over a span of 200 years, and was curious whether they occurred at a constant, fixed rate.
To do this he fitted the data to a Poisson distribution. To do that, he needed to estimate the parameter
Let’s see how the log-likelihood of the data varies as a function of
Deaths Per Year | Observed Instances |
---|---|
0 | 109 |
1 | 65 |
2 | 22 |
3 | 3 |
4 | 1 |
5 | 0 |
6 | 0 |
As a reminder,
The Poisson distribution predicts:
Since
The likelihood function then becomes
The corresponding log-likelihood function can be written as
Note that the log-likelihood of a particular number of deaths
Using this we can express the log-likelihood of the data as
Which looks like this as we vary

Recall that for an arbitrary Normal distribution with mean

Example.
Suppose
The likelihood function is the product of the marginal densities:
This can be rewritten as
Since the likelihood function is equal to
the log likelihood becomes
Visualizing Log-Likelihood#
We consider a sample of size 100 from a normal distribution. Our goal is to visualize the corresponding log-likelihood function. To do so, first we create the log-likelihood function.
# log-likelihood function for a normal distribution
# input: sample, parameters' domains
def normloglik(X,muV,sigmaV):
m = len(X)
return -m/2*np.log(2*np.pi) - m*np.log(sigmaV)\
- (np.sum(np.square(X))- 2*muV*np.sum(X)+m*muV**2)/(2*sigmaV**2)
Note: The log-likelihood function of the normal distribution can also be written as
If neither of the parameters is known, we can use a surface plot to visualize the log-likelihood as a function of both parameters.
Alternatively, if one of the parameters is known, we visualize the log-likelihood as function of the unknown parameter.
