The Characteristic Equation#

Today we deepen our study of linear dynamical systems,

systems that evolve according to the equation:

\[\mathbf{x}_{k+1} = A\mathbf{x}_k.\]

Let’s look at some examples of how such dynamical systems can evolve in \(\mathbb{R}^2\).

First we’ll look at the system corresponding to:

\[\begin{split} A = \begin{bmatrix}\cos 0.1&-\sin 0.1\\\sin 0.1&\cos 0.1\end{bmatrix} \end{split}\]

Next let’s look at the system corresponding to:

\[\begin{split} A = \begin{bmatrix}1.1&0\\0&0.9\end{bmatrix} \end{split}\]

Next let’s look at the system corresponding to:

\[\begin{split} A = \begin{bmatrix}0.8&0.5\\-0.1&1.0\end{bmatrix} \end{split}\]

There are very different things happening in these three cases!

Can we find a general method for understanding what is going on in each case?

The study of eigenvalues and eigenvectors is the key to acquiring that understanding.

We will see that to understand each case, we need to learn how to extract the eigenvalues and eigenvectors of \(A\).

Finding \(\lambda\)#

In the last lecture we saw that, if we know an eigenvalue \(\lambda\) of a matrix \(A,\) then computing the corresponding eigenspace can be done by constructing a basis for

\[\operatorname{Nul}\, (A-\lambda I).\]

Today we’ll discuss how to determine the eigenvalues of a matrix \(A\).

The theory will make use of the determinant of a matrix.

Let’s recall that the determinant of a \(2\times 2\) matrix \(A = \begin{bmatrix}a&b\\c&d\end{bmatrix}\) is \(ad-bc.\)

We also have learned that \(A\) is invertible if and only if its determinant is not zero.

(Recall that the inverse of of \(A\) is \(\frac{1}{ad-bc}\begin{bmatrix}d&-b\\-c&a\end{bmatrix}).\)

Let’s use these facts to help us find the eigenvalues of a \(2\times 2\) matrix.

Example.

Let’s find the eigenvalues of \(A = \begin{bmatrix}2&3\\3&-6\end{bmatrix}.\)

Solution. We must find all scalars \(\lambda\) such that the matrix equation

\[(A-\lambda I)\mathbf{x} = {\bf 0}\]

has a nontrivial solution.

By the Invertible Matrix Theorem, this problem is equivalent to finding all \(\lambda\) such that the matrix \(A-\lambda I\) is not invertible.

Now,

\[\begin{split} A - \lambda I = \begin{bmatrix}2&3\\3&-6\end{bmatrix} - \begin{bmatrix}\lambda&0\\0&\lambda\end{bmatrix} = \begin{bmatrix}2-\lambda&3\\3&-6-\lambda\end{bmatrix}.\end{split}\]

We know that \(A\) is not invertible exactly when its determinant is zero.

So the eigenvalues of \(A\) are the solutions of the equation

\[\begin{split}\det(A-\lambda I) = \det\begin{bmatrix}2-\lambda&3\\3&-6-\lambda\end{bmatrix} = 0.\end{split}\]

Since \(\det\begin{bmatrix}a&b\\c&d\end{bmatrix} = ad-bc,\) then

\[\det(A-\lambda I) = (2-\lambda)(-6-\lambda)-(3)(3)\]
\[ = -12 + 6\lambda -2\lambda + \lambda^2 - 9\]
\[= \lambda^2+4\lambda-21\]
\[=(\lambda-3)(\lambda + 7)\]

If \(\det(A-\lambda I) = 0,\) then \(\lambda = 3\) or \(\lambda = -7.\) So the eigenvalues of \(A\) are \(3\) and \(-7\).

The same idea works for \(n\times n\) matrices – but, for that, we need to define a determinant for larger matrices.

Determinants of Larger Matrices#

Previously, we’ve defined a determinant for a \(2\times 2\) matrix.

To find eigenvalues for larger matrices, we need to define the determinant for any sized (ie, \(n\times n\)) matrix.

Definition. Let \(A\) be an \(n\times n\) matrix, and let \(U\) be any echelon form obtained from \(A\) by row replacements and row interchanges (no row scalings), and let \(r\) be the number of such row interchanges.

Then the determinant of \(A\), written as \(\det A\), is \((-1)^r\) times the product of the diagonal entries \(u_{11},\dots,u_{nn}\) in \(U\).

If \(A\) is invertible, then \(u_{11},\dots,u_{nn}\) are all pivots.

If \(A\) is not invertible, then at least one diagonal entry is zero, and so the product \(u_{11} \dots u_{nn}\) is zero.

In other words:

\[\begin{split}\det\ A = \left\{\begin{array}{ll}(-1)^r\cdot\left(\mbox{product of pivots in $U$}\right),&\mbox{when $A$ is invertible}\\ 0,&\mbox{when $A$ is not invertible}\end{array}\right.\end{split}\]

Example. Compute \(\det A\) for \(A = \begin{bmatrix}1&5&0\\2&4&-1\\0&-2&0\end{bmatrix}.\)

Solution. The following row reduction uses one row interchange:

\[\begin{split}A \sim \begin{bmatrix}1&5&0\\0&-6&-1\\0&-2&0\end{bmatrix} \sim \begin{bmatrix}1&5&0\\0&-2&0\\0&-6&-1\end{bmatrix} \sim \begin{bmatrix}1&5&0\\0&-2&0\\0&0&-1\end{bmatrix}.\end{split}\]

So \(\det A\) equals \((-1)^1(1)(-2)(-1) = (-2).\)

The remarkable thing is that any other way of computing the echelon form gives the same determinant. For example, this row reduction does not use a row interchange:

\[\begin{split}A \sim \begin{bmatrix}1&5&0\\0&-6&-1\\0&-2&0\end{bmatrix} \sim \begin{bmatrix}1&5&0\\0&-6&-1\\0&0&1/3\end{bmatrix}.\end{split}\]

Using this echelon form to compute the determinant yields \((-1)^0(1)(-6)(1/3) = -2,\) the same as before.

Invertibility#

The formula for the determinant shows that \(A\) is invertible if and only if \(\det A\) is nonzero.

We have yet another part to add to the Invertible Matrix Theorem:

Let \(A\) be an \(n\times n\) matrix. Then \(A\) is invertible if and only if:

  1. The determinant of \(A\) is not zero.

  2. The number 0 is not an eigenvalue of \(A\).

The second fact is true because if zero is an eigenvalue of \(A\), then \(\det A = \det(A - \lambda I) = 0. \)

Some facts about determinants:

  1. \(\det AB = (\det A) (\det B).\)

  2. \(\det A^T = \det A.\)

  3. If \(A\) is triangular, then \(\det A\) is the product of the entries on the main diagonal of \(A\).

The Characteristic Equation#

So, \(A\) is invertible if and only if \(\det A\) is not zero.

To return to the question of how to compute eigenvalues of \(A,\) recall that \(\lambda\) is an eigenvalue if and only if \((A-\lambda I)\) is not invertible.

We capture this fact using the characteristic equation:

\[\det(A-\lambda I) = 0.\]

We can conclude that \(\lambda\) is an eigenvalue of an \(n\times n\) matrix \(A\) if and only if \(\lambda\) satisfies the characteristic equation \(\det(A-\lambda I) = 0.\)

Example. Find the characteristic equation of

\[\begin{split}A = \begin{bmatrix}5&-2&6&-1\\0&3&-8&0\\0&0&5&4\\0&0&0&1\end{bmatrix}\end{split}\]

Solution. Form \(A - \lambda I,\) and note that \(\det A\) is the product of the entries on the diagonal of \(A,\) if \(A\) is triangular.

\[\begin{split}\det(A-\lambda I) = \det\begin{bmatrix}5-\lambda&-2&6&-1\\0&3-\lambda&-8&0\\0&0&5-\lambda&4\\0&0&0&1-\lambda\end{bmatrix}\end{split}\]
\[=(5-\lambda)(3-\lambda)(5-\lambda)(1-\lambda).\]

So the characteristic equation is:

\[(\lambda-5)^2(\lambda-3)(\lambda-1) = 0.\]

Expanding this out we get:

\[\lambda^4 - 14\lambda^3 + 68 \lambda^2 - 130\lambda + 75 = 0.\]

Notice that, once again, \(\det(A-\lambda I)\) is a polynomial in \(\lambda\).

In fact, for any \(n\times n\) matrix, \(\det(A-\lambda I)\) is a polynomial of degree \(n\), called the characteristic polynomial of \(A\).

We say that the eigenvalue 5 in this example has multiplicity 2, because \((\lambda -5)\) occures two times as a factor of the characteristic polynomial. In general, the mutiplicity of an eigenvalue \(\lambda\) is its multiplicity as a root of the characteristic equation.

Example. The characteristic polynomial of a \(6\times 6\) matrix is \(\lambda^6 - 4\lambda^5 - 12\lambda^4.\) Find the eigenvalues and their multiplicity.

Solution Factor the polynomial

\[\lambda^6 - 4\lambda^5 - 12\lambda^4 = \lambda^4(\lambda^2-4\lambda-12) = \lambda^4(\lambda-6)(\lambda+2)\]

So the eigenvalues are 0 (with multiplicity 4), 6, and -2.

Since the characteristic polynomial for an \(n\times n\) matrix has degree \(n,\) the equation has \(n\) roots, counting multiplicities – provided complex numbers are allowed.

Note that even for a real matrix, eigenvalues may sometimes be complex.

Practical Issues.

These facts show that there is, in principle, a way to find eigenvalues of any matrix. However, you need not compute eigenvalues for matrices larger than \(2\times 2\) by hand. For any matrix \(3\times 3\) or larger, you should use a computer.

Similarity#

An important concept for things that come later is the notion of similar matrices.

Definition. If \(A\) and \(B\) are \(n\times n\) matrices, then \(A\) is similar to \(B\) if there is an invertible matrix \(P\) such that \(P^{-1}AP = B,\) or, equivalently, \(A = PBP^{-1}.\)

Similarity is symmetric, so if \(A\) is similar to \(B\), then \(B\) is similar to \(A\). Hence we just say that \(A\) and \(B\) are similar.

Changing \(A\) into \(B\) is called a similarity transformation.

An important way to think of similarity between \(A\) and \(B\) is that they have the same eigenvalues.

Theorem. If \(n\times n\) matrices \(A\) and \(B\) are similar, then they have the same characteristic polynomial, and hence the same eigenvalues (with the same multiplicities.)

Proof. If \(B = P^{-1}AP,\) then

\[B - \lambda I = P^{-1}AP - \lambda P^{-1}P\]
\[ = P^{-1}(AP-\lambda P)\]
\[ = P^{-1}(A-\lambda I)P\]

Now let’s construct the characteristic polynomial by taking the determinant:

\[\det(B-\lambda I) = \det[P^{-1}(A-\lambda I)P]\]

Using the properties of determinants we discussed earlier, we compute:

\[ = \det(P^{-1})\cdot\det(A-\lambda I)\cdot\det(P).\]

Since \(\det(P^{-1})\cdot\det(P) = \det(P^{-1}P) = \det I = 1,\) we can see that

\[\det(B-\lambda I) = \det(A - \lambda I).\]

Markov Chains#

Let’s return to the problem of solving a Markov Chain.

At this point, we can place the theory of Markov Chains into the broader context of eigenvalues and eigenvectors.

Theorem. The largest eigenvalue of a Markov Chain is 1.

Proof. First of all, it is obvious that 1 is an eigenvalue of a Markov chain since we know that every Markov Chain \(A\) has a steady-state vector \(\mathbf{v}\) such that \(A\mathbf{v} = \mathbf{v}.\)

To prove that 1 is the largest eigenvalue, recall that each column of a Markov Chain sums to 1.

So, every row of \(A^T\) sums to 1.

Hence, for any vector \(\mathbf{x}\), no element of \(A^T\mathbf{x}\) can be greater than the largest element of \(\mathbf{x}\). This is because each element of \(A^T\mathbf{x}\) is the weighted sum of \(\mathbf{x}\) with weights adding up to 1.

So \(A^T\) cannot have an eigenvalue greater than 1.

Now, any matrix \(A\) and \(A^T\) have the same eigenvalues.

To see this, suppose \(\lambda\) is an eigenvalue of \(A\). Then det(\(A-\lambda I\)) = 0.

But det(\(A-\lambda I\)) = det\(((A-\lambda I)^T)\) = det(\(A^T-\lambda I\)).

So then \(\lambda\) is an eigenvalue of \(A^T\).

So there can be no \(\lambda > 1\) such that \(A\mathbf{x} = \lambda \mathbf{x}.\)

The Complete Solution for a Markov Chain#

Previously, we were only able to ask about the “eventual” steady state of a Markov Chain.

But another crucial question is: how quickly does a particular Markov Chain reach steady state from some initial starting condition?

By completely solving a Markov Chain, we will determine both its steady state and its rate of convergence.

Let’s use an example: we previously studied the Markov Chain defined by \(A = \begin{bmatrix}0.95&0.03\\0.05&0.97\end{bmatrix}.\)

Let’s ask how quickly it reaches steady state, from the starting point defined as \(\mathbf{x}_0 = \begin{bmatrix}0.6\\0.4\end{bmatrix}.\)

Remember that \(x_0\) is probability vector – its entries are nonnegative and sum to 1.

_images/8bfd18d39ca6789cb22183babe294ea534068cc64168771b19f9bcc4330c98dc.png

Using the methods we studied today, we can find the characteristic equation:

\[\lambda^2 -1.92\lambda +0.92 \]

Using the quadratic formula, we find the roots of this equation to be 1 and 0.92.

(Note that, as expected, the largest eigenvalue of the chain is 1.)

Next, using the methods in the previous lecture, we find a basis for each eigenspace of \(A\) (each nullspace of \(A-\lambda I\)).

For \(\lambda = 1\), a corresponding eigenvector is \(\mathbf{v}_1 = \begin{bmatrix}3\\5\end{bmatrix}.\)

For \(\lambda = 0.92\), a corresponding eigenvector is \(\mathbf{v}_2 = \begin{bmatrix}1\\-1\end{bmatrix}.\)

Next, we write \(\mathbf{x}_0\) as a linear combination of \(\mathbf{v}_1\) and \(\mathbf{v}_2.\) This can be done because \(\{\mathbf{v}_1,\mathbf{v}_2\}\) is obviously a basis for \(\mathbb{R}^2.\)

To write \(\mathbf{x}_0\) this way, we want to solve the vector equation

\[c_1\mathbf{v}_1 + c_2\mathbf{v}_2 = \mathbf{x}_0\]

In other words:

\[\begin{split}[\mathbf{v}_1\;\mathbf{v}_2]\begin{bmatrix}c_1\\c_2\end{bmatrix} = \mathbf{x}_0.\end{split}\]

The matrix \([\mathbf{v}_1\;\mathbf{v}_2]\) is invertible, so,

\[\begin{split}\begin{bmatrix}c_1\\c_2\end{bmatrix} = [\mathbf{v}_1\;\mathbf{v}_2]^{-1} \mathbf{x}_0 = \begin{bmatrix}3&1\\5&-1\end{bmatrix}^{-1}\begin{bmatrix}0.6\\0.4\end{bmatrix}.\end{split}\]
\[\begin{split} = \frac{1}{-8}\begin{bmatrix}-1&-1\\-5&3\end{bmatrix}\begin{bmatrix}0.6\\0.4\end{bmatrix} = \begin{bmatrix}0.125\\0.225\end{bmatrix}.\end{split}\]

So, now we can put it all together.

We know that

\[\mathbf{x}_0 = c_1\mathbf{v}_1 + c_2\mathbf{v}_2\]

and we know the values of \(c_1, c_2\) that make this true.

So let’s compute each \(\mathbf{x}_k\):

\[\mathbf{x}_1 = A\mathbf{x}_0 = c_1A\mathbf{v}_1 + c_2A\mathbf{v}_2\]
\[ = c_1\mathbf{v}_1 + c_2(0.92)\mathbf{v}_2.\]

Now note the power of the eigenvalue approach:

\[\mathbf{x}_2 = A\mathbf{x}_1 = c_1A\mathbf{v}_1 + c_2(0.92)A\mathbf{v}_2\]
\[=c_1\mathbf{v}_1 + c_2(0.92)^2\mathbf{v}_2.\]

And so in general:

\[\mathbf{x}_k = c_1\mathbf{v}_1 + c_2(0.92)^k\mathbf{v}_2\;\;\;(k = 0, 1, 2, \dots)\]

And using the \(c_1\) and \(c_2\) and \(\mathbf{v}_1,\) \(\mathbf{v}_2\) we computed above:

\[\begin{split}\mathbf{x}_k = 0.125\begin{bmatrix}3\\5\end{bmatrix} + 0.225(0.92)^k\begin{bmatrix}1\\-1\end{bmatrix}\;\;\;(k = 0, 1, 2, \dots)\end{split}\]

This explicit formula for \(\mathbf{x}_k\) gives the solution of the Markov Chain \(\mathbf{x}_{k+1} = A\mathbf{x}_k\) starting from the initial state \(\mathbf{x}_0.\)

In other words:

\[\begin{split}\mathbf{x}_0 = 0.125\begin{bmatrix}3\\5\end{bmatrix} + 0.225\begin{bmatrix}1\\-1\end{bmatrix}\end{split}\]
\[\begin{split}\mathbf{x}_1 = 0.125\begin{bmatrix}3\\5\end{bmatrix} + 0.207\begin{bmatrix}1\\-1\end{bmatrix}\end{split}\]
\[\begin{split}\mathbf{x}_2 = 0.125\begin{bmatrix}3\\5\end{bmatrix} + 0.190\begin{bmatrix}1\\-1\end{bmatrix}\end{split}\]
\[\begin{split}\mathbf{x}_3 = 0.125\begin{bmatrix}3\\5\end{bmatrix} + 0.175\begin{bmatrix}1\\-1\end{bmatrix}\end{split}\]
\[ ... \]
\[\begin{split}\mathbf{x}_\infty = 0.125\begin{bmatrix}3\\5\end{bmatrix}\end{split}\]

The equation tells us how quickly the chain converges: like \((0.92)^k\) – which of course goes to zero as \(k \rightarrow \infty\).

Thus \(\mathbf{x}_k \rightarrow 0.125\mathbf{v}_1 = \begin{bmatrix}0.375\\0.625\end{bmatrix}.\)


_images/22b6f928604a9404c55c1380e5343306aeba9750ade86f88b1b7122cc1165f90.png