The Matrix of a Linear Transformation#

In the last lecture we introduced the idea of a linear transformation:

_images/6cc84d7076ff2ad84485b62de3da2a4016f1743faf03d7219e5a91c109682608.jpg

We have seen that every matrix multiplication is a linear transformation from vectors to vectors.

But, are there any other possible linear transformations from vectors to vectors?

No.

In other words, the reverse statement is also true:

every linear transformation from vectors to vectors is a matrix multiplication.

We’ll now prove this fact.

We’ll do it constructively, meaning we’ll actually show how to find the matrix corresponding to any given linear transformation \(T\).

Theorem. Let \(T: \mathbb{R}^n \rightarrow \mathbb{R}^m\) be a linear transformation. Then there is (always) a unique matrix \(A\) such that:

\[ T({\bf x}) = A{\bf x} \;\;\; \mbox{for all}\; {\bf x} \in \mathbb{R}^n.\]

In fact, \(A\) is the \(m \times n\) matrix whose \(j\)th column is the vector \(T({\bf e_j})\), where \({\bf e_j}\) is the \(j\)th column of the identity matrix in \(\mathbb{R}^n\):

\[A = \left[T({\bf e_1}) \dots T({\bf e_n})\right].\]

\(A\) is called the standard matrix of \(T\).

Proof. Write

\[{\bf x} = I{\bf x} = \left[{\bf e_1} \dots {\bf e_n}\right]\bf x\]
\[ = x_1{\bf e_1} + \dots + x_n{\bf e_n}.\]

In other words, for any \(\mathbf{x}\), we can always expand it as:

\[\begin{split} \mathbf{x} \;= \;\;\;\;\; \begin{bmatrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & 1 \end{bmatrix} \; \begin{bmatrix}x_1\\x_2\\ \vdots \\ x_n\end{bmatrix} \;\;\;\;\;= \;\;\;\;\; \begin{bmatrix} x_1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} + \begin{bmatrix} 0 \\ x_2 \\ \vdots \\ 0 \end{bmatrix} + \dots + \begin{bmatrix} 0 \\ 0 \\ \vdots \\ x_n \end{bmatrix} \end{split}\]

Because \(T\) is linear, we have:

\[ T({\bf x}) = T(x_1{\bf e_1} + \dots + x_n{\bf e_n})\]
\[ = x_1T({\bf e_1}) + \dots + x_nT({\bf e_n})\]
\[\begin{split} = \left[T({\bf e_1}) \dots T({\bf e_n})\right] \, \left[\begin{array}{r}x_1\\\vdots\\x_n\end{array}\right] = A{\bf x}.\end{split}\]

So … we see that the ideas of matrix multiplication and linear transformation are essentially equivalent when applied to vectors.

Every matrix multiplication is a linear transformation, and every linear transformation from vectors to vectors is a matrix multiplication.

However, term linear transformation focuses on a property of the mapping, while the term matrix multiplication focuses on how such a mapping is implemented.

This proof shows us an important idea:

To find the standard matrix of a linear transformation, ask what the transformation does to the columns of \(I\).

In other words, if \( T(\mathbf{x}) = A\mathbf{x} \), then:

\[A = \left[T({\bf e_1}) \dots T({\bf e_n})\right].\]

This gives us a way to compute the standard matrix of a transformation.

Now, in \(\mathbb{R}^2\), \(I = \left[\begin{array}{cc}1&0\\0&1\end{array}\right]\). So:

\[\begin{split}\mathbf{e_1} = \left[\begin{array}{c}1\\0\end{array}\right]\;\;\mbox{and}\;\;\mathbf{e_2} = \left[\begin{array}{c}0\\1\end{array}\right].\end{split}\]

So to find the matrix of any given linear transformation of vectors in \(\mathbb{R}^2\), we only have to know what that transformation does to these two points:

_images/3fd49a4c73bda8df6ccde3282e58e6914f700b9aca1d8b4bd43ba29adfeff9f3.png

This is a hugely powerful tool.

Let’s say we start from some given linear transformation; we can use this idea to find the matrix that implements that linear transformation.

For example, let’s consider rotation about the origin as a kind of transformation.

_images/75a0ee7be2f0ccdbe19779ac6f2a1dc1ee784d9c9ecf229a9f89e8cb31d4aed7.png

First things first: Is rotation a linear transformation?

Recall that a for a transformation to be linear, it must be true that \(T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}).\)

I’m going to show you a “geometric proof.”

This figure shows that “the rotation of \(\mathbf{u+v}\) is the sum of the rotation of \(\mathbf{u}\) and the rotation of \(\mathbf{v}\)”.

_images/dcbec68944ae8e3e0fbcdea38be342a480fe47b8f801a28cd7e392035df4284b.png

OK, so rotation is a linear transformation.

Let’s see how to compute the linear transformation that is a rotation.

Specifically: Let \(T: \mathbb{R}^2 \rightarrow \mathbb{R}^2\) be the transformation that rotates each point in \(\mathbb{R}^2\) about the origin through an angle \(\theta\), with counterclockwise rotation for a positive angle.

Let’s find the standard matrix \(A\) of this transformation.

Solution. The columns of \(I\) are \({\bf e_1} = \left[\begin{array}{r}1\\0\end{array}\right]\) and \({\bf e_2} = \left[\begin{array}{r}0\\1\end{array}\right].\)

Referring to the diagram below, we can see that \(\left[\begin{array}{r}1\\0\end{array}\right]\) rotates into \(\left[\begin{array}{r}\cos\theta\\\sin\theta\end{array}\right],\) and \(\left[\begin{array}{r}0\\1\end{array}\right]\) rotates into \(\left[\begin{array}{r}-\sin\theta\\\cos\theta\end{array}\right].\)

_images/2050fd20a2388de3d088ae1bacb981c8503a13436d62bd06901bd2948e28974f.png

So by the Theorem above,

\[\begin{split} A = \left[\begin{array}{rr}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{array}\right].\end{split}\]

To demonstrate the use of a rotation matrix, let’s rotate the following shape:

dm.plotSetup()
note = dm.mnote()
dm.plotShape(note)
_images/b271037992491a7553bff2326ddb7375f46cdc3d5dc3fa869ea3f7b23e618c3a.png

The variable note is a array of 26 vectors in \(\mathbb{R}^2\) that define its shape.

In other words, it is a 2 \(\times\) 26 matrix.

To rotate note we need to multiply each column of note by the rotation matrix \(A\).

In Python this can be performed using the @ operator.

That is, if A and B are matrices,

A @ B

will multiply A by every column of B, and the resulting vectors will be formed into a matrix.

dm.plotSetup()
angle = 90
theta = (angle/180) * np.pi
A = np.array(
    [[np.cos(theta), -np.sin(theta)],
     [np.sin(theta), np.cos(theta)]])
rnote = A @ note
dm.plotShape(rnote)
_images/1d30853b5a998d89539344ae8475c0ba5b099336f0f3dd2f7ffeeb746fa6849f.png

Geometric Linear Transformations of \(\mathbb{R}^2\)#

Let’s use our understanding of how to construct linear transformations to look at some specific linear transformations of \(\mathbb{R}^2\) to \(\mathbb{R}^2\).

First, let’s recall the linear transformation

\[T(\mathbf{x}) = r\mathbf{x}.\]

With \(r > 1\), this is a dilation. It moves every vector further from the origin.

Let’s say the dilation is by a factor of 2.5.

To construct the matrix \(A\) that implements this transformation, we ask: where do \({\bf e_1}\) and \({\bf e_2}\) go?

_images/3fd49a4c73bda8df6ccde3282e58e6914f700b9aca1d8b4bd43ba29adfeff9f3.png

Under the action of \(A\), \(\mathbf{e_1}\) goes to \(\left[\begin{array}{c}2.5\\0\end{array}\right]\) and \(\mathbf{e_2}\) goes to \(\left[\begin{array}{c}0\\2.5\end{array}\right]\).

So the matrix \(A\) must be \(\left[\begin{array}{cc}2.5&0\\0&2.5\end{array}\right]\).

Let’s test this out:

square = np.array(
    [[0,1,1,0],
     [1,1,0,0]])
A = np.array(
    [[2.5, 0],
     [0, 2.5]])
display(Latex(rf"$A = {ltx_array_fmt(A, '{:1.1f}')}$"))
dm.plotSetup()
dm.plotSquare(square)
dm.plotSquare(A @ square,'r')
\[\begin{split}A = \begin{bmatrix} 2.5 & 0.0\\ 0.0 & 2.5 \end{bmatrix}\end{split}\]
_images/424c86d187e8d41231ebf823eb485ccbcf9aa93e11aa72d6e8987e380bb0bb38.png
dm.plotSetup(-7,7,-7, 7)
dm.plotShape(note)
dm.plotShape(A @ note,'r')
_images/25e01f127e433fe6b9d4f903953457a56a4ea5eebb5046c57d46aec29c0cd1ec.png

OK, now let’s reflect through the \(x_1\) axis. Where do \({\bf e_1}\) and \({\bf e_2}\) go?

_images/3fd49a4c73bda8df6ccde3282e58e6914f700b9aca1d8b4bd43ba29adfeff9f3.png
A = np.array(
    [[1,  0],
     [0, -1]])
display(Latex(rf"$A = {ltx_array_fmt(A, '{:d}')}$"))
dm.plotSetup()
dm.plotSquare(square)
dm.plotSquare(A @ square,'r')
plt.title(r'Reflection through the $x_1$ axis', size = 20);
\[\begin{split}A = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}\end{split}\]
_images/a16495e2b943b2cf22d5ee385fa85ad236436a4ce483a9f737a77ee5126d2147.png
dm.plotSetup()
dm.plotShape(note)
dm.plotShape(A @ note,'r')
_images/0ec6657d266118a881cb50025d546b815b892c5d4dd413730200522683251eac.png

What about reflection through the \(x_2\) axis?

A = np.array(
    [[-1,0],
     [0, 1]])
display(Latex(rf"$A = {ltx_array_fmt(A, '{:2d}')}$"))
dm.plotSetup()
dm.plotSquare(square)
dm.plotSquare(A @ square,'r')
plt.title(r'Reflection through the $x_2$ axis', size = 20);
\[\begin{split}A = \begin{bmatrix} -1 & 0 \\ 0 & 1 \end{bmatrix}\end{split}\]
_images/b7cb88d3ffd620a35dca5f7e87d3c3b56064c0b1e289729a20368752a86d647f.png
dm.plotSetup()
dm.plotShape(note)
dm.plotShape(A @ note,'r')
_images/ea3083e8a9b5ec05b2d8eeff534c651826bf73aa0e488d61741d3fb0951bf9fd.png

What about reflection through the line \(x_1 = x_2\)?

\[\begin{split}A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}\end{split}\]
_images/529f7f28931bfa043a21c7871bb1d4e1ba3e3dbbb299f175d4a0b25596ef6583.png
_images/db981e21289a28ddd6ca00a0033342e59730d28804580c1431d571bfaed7708c.png

What about reflection through the line \(x_1 = -x_2\)?

\[\begin{split}A = \begin{bmatrix} 0 & -1 \\ -1 & 0 \end{bmatrix}\end{split}\]
_images/07b08ed0ecc3af8b530ef383bd96bd151bfe283df0bb2dbe94f480ef7c5d2e30.png
_images/011013b27f197e33f726d222fca48672e3a089de8bc715ba2f3e70eefc004731.png

What about reflection through the origin?

\[\begin{split}A = \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix}\end{split}\]
_images/87d9ce11b89482fbd6fbbec3145cc6e4d4ea62a29e6c4a5cdf1607300dd37e22.png
_images/590ea8aa02006158bddfc0369426348d84b8b8dfcbd2c1c608ee39e37579930f.png
\[\begin{split}A = \begin{bmatrix} 0.45 & 0.00\\ 0.00 & 1.00 \end{bmatrix}\end{split}\]
_images/1e843e1262ae7f59960f257cbb4ca4988791d2044d959b07edd7841a07eacc7d.png
_images/d037a6a5c6668cc85a6536756e234fbf520fc3c722d4d25515ad43a427a24e2f.png
\[\begin{split}A = \begin{bmatrix} 2.5 & 0.0\\ 0.0 & 1.0 \end{bmatrix}\end{split}\]
_images/26433b1ee0b58b772386457e13cf6d072c4026e6de29ab76cd3b19637df00fa2.png
\[\begin{split}A = \begin{bmatrix} 1.0 & 0.0\\ -1.5 & 1.0 \end{bmatrix}\end{split}\]
_images/ac531ab366540b8690f9d499c612f0ddbc962ae7121140d3d2befc39d8899f5d.png
_images/8a01f6e7240757bfab25b9ca4f0eee763fe3008f17f596b9f5da04d199e4f79e.png

Now let’s look at a particular kind of transformation called a projection.

Imagine we took any given point and ‘dropped’ it onto the \(x_1\)-axis.

\[\begin{split}A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}\end{split}\]
_images/4ba9d4fc56a086125631446d7920f61efea13288c514ad88a49cef89d75f3f61.png

What happens to the shape of the point set?

_images/34f2bbf111de20aa749d02e3fb64a5e9d61592ae4631ead817a4d4f7d74da80c.png
\[\begin{split}A = \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}\end{split}\]
_images/f1eabbe8b54a65981a170bcfee60a8e88c6182a8cfaf4fefa3dfe62ff98037c7.png
_images/aeb13fbea7e209822c2e7c2016e0f5c4119538f367d664d2475b38e73455612c.png

Area is Scaled by the Determinant#

Notice that in some of the transformations above, the “size” of a shape grows or shrinks.

Let’s look at how area (or volume) of a shape is affected by a linear transformation.

\[\begin{split}A = \begin{bmatrix} 0.45 & 0.00\\ 0.00 & 1.00 \end{bmatrix}\end{split}\]
_images/8f7cd1ba90961d6ec300516c1d8cd93fd34ef28437ca1afc81f231dda6bd597d.png

In this transformation, each unit of area in the blue shape is transformed to a smaller region in the red shape.

So to understand how area changes, it suffices to ask what happens to the unit square (or hypercube):

_images/105844a7600bb77c27470f8135be5a3c74aaa9117a735ec2ae9f22ebee10198f.png

Let’s denote the matrix of our linear transformation as:

\[\begin{split} A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \end{split}\]

Then, here is what happens to the unit square:

_images/279462702ffc0ceb15a9016a59484634a32a0d2bcfd02d18a0eed0f4d39a8566.png

Now, let’s determine the area of the blue diamond in terms of \(a, b, c\), and \(d\).

To do that, we’ll use this diagram:

_images/f92b43da0587a09f416dbae14d14e2dfebccacebc1cd91dd4ac4525a2a971fa5.png

Each of the triangles and rectangles has an area we can determine in terms of \(a, b, c\) and \(d\).

The large rectangle has sides \((a+b)\) and \((c+d)\), so its area is:

\[ (a+b)(c+d) = ac + ad + bc + bd. \]

From this large rectangle we need to subtract:

  • \(bd\) (red triangles),

  • \(ac\) (gray triangles), and

  • \(2bc\) (green rectangles).

So the area of the blue diamond is:

\[ (ac + ad + bc + bd) - (bd + ac + 2bc) \]
\[ = ad - bc \]

So we conclude that when we use a linear transformation

\[\begin{split} A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \end{split}\]

the area of a unit square (or any shape) is scaled by a factor of \(ad - bc\).

This quantity is a fundamental property of the matrix \(A\).

So, we give it a name: it is the determinant of \(A\).

We denote it as

\[\det(A)\]

So, for a \(2\times 2\) matrix \( A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \),

\[\det(A) = ad-bc.\]

However, the determinant can be defined for any \(n\times n\) (square) matrix.

For a square matrix \(A\) larger than \(2\times 2\), the determinant tells us how the volume of a unit (hyper)cube is scaled when it is linearly transformed by \(A\).

We will learn how to compute determinants for larger matrices in a later lecture.

There are important cases in which the determinant of a matrix is zero.

When does it happen that \(\det(A) = 0\)?

Consider when \(A\) is the matrix of a projection:

\[\begin{split}A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}\end{split}\]
_images/4ba9d4fc56a086125631446d7920f61efea13288c514ad88a49cef89d75f3f61.png

The unit square has been collapsed onto the \(x\)-axis, resulting in a shape with area of zero.

This is confirmed by the determinant, which is

\[\begin{split} \det\left(\begin{bmatrix}1 & 0 \\ 0 & 0\end{bmatrix}\right) = (1 \cdot 0) - (0 \cdot 0) = 0.\end{split}\]

Existence and Uniqueness#

Notice that some of these transformations map multiple inputs to the same output, and some are incapable of generating certain outputs.

For example, the projections above can send multiple different points to the same point.

We need some terminology to understand these properties of linear transformations.

Definition. A mapping \(T: \mathbb{R}^n \rightarrow \mathbb{R}^m\) is said to be onto \(\mathbb{R}^m\) if each \(\mathbf{b}\) in \(\mathbb{R}^m\) is the image of at least one \(\mathbf{x}\) in \(\mathbb{R}^n\).

Informally, \(T\) is onto if every element of its codomain is in its range.

Another (important) way of thinking about this is that \(T\) is onto if there is a solution \(\mathbf{x}\) of

\[T(\mathbf{x}) = \mathbf{b}\]

for all possible \(\mathbf{b}.\)

This is asking an existence question about a solution of the equation \(T(\mathbf{x}) = \mathbf{b}\) for all \(\mathbf{b}.\)

_images/6cc84d7076ff2ad84485b62de3da2a4016f1743faf03d7219e5a91c109682608.jpg

Here, we see that \(T\) maps points in \(\mathbb{R}^2\) to a plane lying within \(\mathbb{R}^3\).

That is, the range of \(T\) is a strict subset of the codomain of \(T\).

So \(T\) is not onto \(\mathbb{R}^3\).

_images/d9503ef508361d8a203f9d0f5b3cbad26bb9c8ed5c64cabf52a89e9b33b60705.png

In this case, for every point in \(\mathbb{R}^2\), there is an \(\mathbf{x}\) that maps to that point.

So, the range of \(T\) is equal to the codomain of \(T\).

So \(T\) is onto \(\mathbb{R}^2\).

Here is an example of the reflection transformation. The red points are the images of the blue points.

What about this transformation? Is it onto \(\mathbb{R}^2\)?

_images/011013b27f197e33f726d222fca48672e3a089de8bc715ba2f3e70eefc004731.png

Here is an example of the projection transformation. The red points (which all lie on the \(x\)-axis) are the images of the blue points.

What about this transformation? Is it onto \(\mathbb{R}^2\)?

_images/34f2bbf111de20aa749d02e3fb64a5e9d61592ae4631ead817a4d4f7d74da80c.png

Definition. A mapping \(T: \mathbb{R}^n \rightarrow \mathbb{R}^m\) is said to be one-to-one if each \(\mathbf{b}\) in \(\mathbb{R}^m\) is the image of at most one \(\mathbf{x}\) in \(\mathbb{R}^n\).

If \(T\) is one-to-one, then for each \(\mathbf{b},\) the equation \(T(\mathbf{x}) = \mathbf{b}\) has either a unique solution, or none at all.

This is asking a uniqueness question about a solution of the equation \(T(\mathbf{x}) = \mathbf{b}\) for all \(\mathbf{b}\).

_images/6ed345688ae0abc244e0d2ad69e23fe34311c05d6d3cd56cd8a8d3e5c2e1cbbe.jpg

Let’s examine the relationship between these ideas and some previous definitions.

If for all \(\mathbf{b}\), \(A\mathbf{x} = \mathbf{b}\) is consistent, is \(T(\mathbf{x}) = A\mathbf{x}\) onto? one-to-one?

  • \(T(\mathbf{x})\) is onto. \(T(\mathbf{x})\) may or may not be one-to-one. If the system has multiple solutions for some \(\mathbf{b}\), \(T(\mathbf{x})\) is not one-to-one.

If for all \(\mathbf{b}\), \(A\mathbf{x} = \mathbf{b}\) is consistent and has a unique solution, is \(T(\mathbf{x}) = A\mathbf{x}\) onto? one-to-one?

  • Yes to both.

If it is not the case that for all \(\mathbf{b}\), \(A\mathbf{x} = \mathbf{b}\) is consistent, is \(T(\mathbf{x}) = A\mathbf{x}\) onto? one-to-one?

  • \(T(\mathbf{x})\) is not onto. \(T(\mathbf{x})\) may or may not be one-to-one.

If \(T(\mathbf{x}) = A\mathbf{x}\) is onto, is \(A\mathbf{x} = \mathbf{b}\) consistent for all \(\mathbf{b}\)? is the solution unique for all \(\mathbf{b}\)?

  • \(A\mathbf{x} = \mathbf{b}\) is consistent for all \(\mathbf{b}\). The solution may not be unique for any \(\mathbf{b}\).

If \(T(\mathbf{x}) = A\mathbf{x}\) is one-to-one, is \(A\mathbf{x} = \mathbf{b}\) consistent for all \(\mathbf{b}\)? is the solution unique for all \(\mathbf{b}\)?

  • \(A\mathbf{x} = \mathbf{b}\) may or may not be consistent for all \(\mathbf{b}\). For any \(\mathbf{b}\), if there is a solution, it is unique.