Analytic Geometry in \(\mathbb{R}^n\)#

_images/fc15baa03beeb7db8f68c47b5fc0b78dfbbae38a44c64bb23c3b6b03b24b87cb.jpg
_images/fd314ec0f0a34ffbe0952eb5544329822857e29f8721c54f0f2cb292610aa56a.png

Transcript of oral arguments before the US Supreme Court, Briscoe v. Virginia, January 11, 2010:

MR. FRIEDMAN: I think that issue is entirely orthogonal to the issue here because the Commonwealth is acknowledging –

CHIEF JUSTICE ROBERTS: I’m sorry. Entirely what?

MR. FRIEDMAN: Orthogonal. Right angle. Unrelated. Irrelevant.

CHIEF JUSTICE ROBERTS: Oh.

JUSTICE SCALIA: What was that adjective? I liked that.

MR. FRIEDMAN: Orthogonal.

CHIEF JUSTICE ROBERTS: Orthogonal.

MR. FRIEDMAN: Right, right.

JUSTICE SCALIA: Orthogonal, ooh.

(Laughter.)

JUSTICE KENNEDY: I knew this case presented us a problem.

(Laughter.)

The Challenge: Extending Geometric Intuition to High Dimension#

Our challenge today is to begin to lay down the elements of analytic geometry.

The concepts we deal with today will be familiar to you.

Our goal is not to introduce new ideas, but to cast old ideas into greater generality.

In particular, we will take familiar notions and reformulate them in terms of vectors …

… vectors in arbitrary dimension.

Let’s start with a simple example in \(\mathbb{R}^2\).

How would you determine the angle \(\theta\) below?

_images/afcb34e06dd572e09700aec27881e0a6f85dc64e0fe6c8b6d14485b48e699f34.png

This is fairly easy. Probably we’d take out a protractor and use it to measure \(\theta\).

OK, let’s go up one dimension, to \(\mathbb{R}^3\).

_images/d9af8808bc46e040880dc735ff53478e105dd82d53ca0e36bb98455fc2a1f31d.png

This seems a little more challenging, but we could probably do it.

Let’s go up one dimension, to \(\mathbb{R}^4\):

This seems hard!

What we need are ways to capture simple notions:

  • length,

  • distance,

  • orthogonality (perpendicularity), and

  • angle.

However we need to take these notions that are familiar from our 3D world and see how to define them for spaces of arbitrary dimension, ie, \(\mathbb{R}^n\).

Interestingly, it turns out that these notions (length, distance, perpendicularity, angle) all depend on one key notion: the inner product.

In fact, the notion is so important that we refer to a vector space for which there is an inner product as an inner product space.

So let’s briefly return to and review the inner product.

Inner Product (Review)#

Recall that we consider vectors such as \(\mathbf{u}\) and \(\mathbf{v}\) in \(\mathbb{R}^n\) to be \(n\times1\) matrices.

Then \(\mathbf{u}^T\mathbf{v}\) is a scalar, called the inner product of \(\mathbf{u}\) and \(\mathbf{v}.\)

You will also see this called the dot product.

It is sometimes written as \(\mathbf{u} \mathbf{\cdot} \mathbf{v}\), or \(<\mathbf{u}, \mathbf{v}>\)

but we will always write \(\mathbf{u}^T\mathbf{v}.\)

The inner product is the sum of the componentwise product of \(\mathbf{u}\) and \(\mathbf{v}\).

If \(\mathbf{u} = \begin{bmatrix}u_1\\u_2\\\vdots\\u_n\end{bmatrix}\) and \(\mathbf{v} = \begin{bmatrix}v_1\\v_2\\\vdots\\v_n\end{bmatrix},\) then the inner product of \(\mathbf{u}\) and \(\mathbf{v}\) is:

\[\begin{split}\mathbf{u}^T\mathbf{v} = \begin{bmatrix}u_1&u_2&\dots&u_n\end{bmatrix}\begin{bmatrix}v_1\\v_2\\\vdots\\v_n\end{bmatrix} = u_1v_1 + u_2v_2 + \dots + u_nv_n = \sum_{i=1}^n u_iv_i.\end{split}\]

Let’s remind ourselves of the properties of the inner product:

Theorem. Let \(\mathbf{u}\),\(\mathbf{v}\), and \(\mathbf{w}\) be vectors in \(\mathbb{R}^n\), and let \(c\) be a scalar. Then:

  1. \(\mathbf{u}^T\mathbf{v} = \mathbf{v}^T\mathbf{u}\)

Inner product is symmetric. Note that these two expressions are the transpose of each other – but of course the transpose of a scalar is itself!

  1. \((\mathbf{u}+\mathbf{v})^T\mathbf{w} = \mathbf{u}^T\mathbf{w} + \mathbf{v}^T\mathbf{w}\)

  2. \((c\mathbf{u})^T\mathbf{v} = c(\mathbf{u}^T\mathbf{v}) = \mathbf{u}^T(c\mathbf{v})\)

Inner product is linear in each term.

  1. \(\mathbf{u}^T\mathbf{u} \geq 0,\;\;\;\mbox{and}\;\mathbf{u}^T\mathbf{u} = 0\;\mbox{if and only if}\;\mathbf{u} = 0\)

Inner product of a vector with itself is never negative.

The first three are restatements of facts about matrix-vector products.

The last one is straightforward, but important.

Now, given that review, let’s start talking about geometry.

Vector Norm#

OK, armed with the inner product, let’s get started.

Our first question will be: How do we measure the length of a vector?

Let’s say we are in \(\mathbb{R}^2\). Then the length follows directly from the Pythagorean theorem:

_images/7dc3b43203e563b32023fd1e775f2255b89e9b2ad495f21f89b6489a35422047.png

What happens when we move to \(\mathbb{R}^3\)?

It turns out we need to apply the Pythagorean Theorem an additional time:

_images/69241dd70b063faec3fb5d1ecedeaead08fabd80cb0edb08c398a4b8eda41868.png

So the length of a vector in \(\mathbb{R}^n\) is

\[ \sqrt{v_1^2 + v_2^2 + \dots + v_n^2} = \sqrt{\sum_{i=1}^n{v_i}^2}\]

Now let’s express this in a way that does not require writing the individual components of \(\mathbf{v}\).

We notice that the above expression for the length of \(\mathbf{v}\) is the same as:

\[\sqrt{\mathbf{v}^T\mathbf{v}}.\]

Notice that this is always defined because \(\mathbf{v}^T\mathbf{v}\) is nonnegative.

Length is such a fundamental concept that we introduce a special notation and name for it.

Definition. The norm of \(\mathbf{v}\) is the nonnegative scalar \(\Vert\mathbf{v}\Vert\) defined by

\[\Vert\mathbf{v}\Vert = \sqrt{\mathbf{v}^T\mathbf{v}} = \sqrt{\sum_{i=1}^n{v_i}^2}.\]

Normalization to Unit Length#

For any scalar \(c\), the length of \(c\mathbf{v}\) is \(|c|\) times the length of \(\mathbf{v}\). That is,

\[\Vert c\mathbf{v}\Vert = \vert c\vert\Vert\mathbf{v}\Vert.\]

So, for example, \(\Vert(-2)\mathbf{v}\Vert = 2\Vert \mathbf{v}\Vert\).

A vector of length 1 is called a unit vector.

If we divide a nonzero vector \(\mathbf{v}\) by its length – that is, multiply by \(1/\Vert\mathbf{v}\Vert\) – we obtain a unit vector \(\mathbf{u}\).

We say that we have normalized \(\mathbf{v}\), and that \(\mathbf{u}\) is in the same direction as \(\mathbf{v}.\)

Example. Let \(\mathbf{v} = \begin{bmatrix}1\\-2\\2\\0\end{bmatrix}.\) Find the unit vector \(\mathbf{u}\) in the same direction as \(\mathbf{v}.\)

Solution.

First, compute the length of \(\mathbf{v}\):

\[\Vert\mathbf{v}\Vert^2 = \mathbf{v}^T\mathbf{v} = (1)^2 + (-2)^2 + (2)^2 + (0)^2 = 9\]
\[\Vert\mathbf{v}\Vert = \sqrt{9} = 3\]

Then multiply \(\mathbf{v}\) by \(1/\Vert\mathbf{v}\Vert\) to obtain

\[\begin{split}\mathbf{u} = \frac{1}{\Vert\mathbf{v}\Vert}\mathbf{v} = \frac{1}{3}\mathbf{v} = \frac{1}{3}\begin{bmatrix}1\\-2\\2\\0\end{bmatrix} = \begin{bmatrix}1/3\\-2/3\\2/3\\0\end{bmatrix}\end{split}\]

It’s important to note that we can’t actually visualize \(\mathbf{u}\) but we can still reason geometrically about it as a unit vector.

For example, we can talk about (2D) circles, (3D) spheres, four-dimensional spheres, five-dimensional spheres, etc.

Distance#

It’s very useful to be able to talk about the distance between two points (or vectors) in \(\mathbb{R}^n\).

We can start from basics:

_images/952f81d8a91359d35745f3697e55c37cc7e834c9eccc7439f2bbf266fee29fa1.png

On the number line (ie, \(\mathbb{R}^1\)), the distance between two points \(a\) and \(b\) is \(\vert a-b\vert\).

The same is true in \(\mathbb{R}^n\).

Definition. For \(\mathbf{u}\) and \(\mathbf{v}\) in \(\mathbb{R}^n,\) the distance between \(\mathbf{u}\) and \(\mathbf{v}\), written as \(\mbox{dist}(\mathbf{u},\mathbf{v}),\) is the length of the vector \(\mathbf{u}-\mathbf{v}\). That is,

\[\mbox{dist}(\mathbf{u},\mathbf{v}) = \Vert\mathbf{u}-\mathbf{v}\Vert.\]

This definition agrees with the usual formulas for the Euclidean distance between two points. The usual formula is

\[\mbox{dist}(\mathbf{u},\mathbf{v}) = \sqrt{(u_1-v_1)^2 + (u_2-v_2)^2 + \dots + (u_n-v_n)^2}.\]

Which you can see is equal to

\[\begin{split}\Vert\mathbf{u}-\mathbf{v}\Vert = \sqrt{(\mathbf{u}-\mathbf{v})^T(\mathbf{u}-\mathbf{v})} = \sqrt{\begin{bmatrix}u_1-v_1&u_2-v_2&\dots&u_n-v_n\end{bmatrix}\begin{bmatrix}u_1-v_1\\u_2-v_2\\\vdots\\u_n-v_n\end{bmatrix}}\end{split}\]

There is a geometric view as well.

For example, consider the vectors \(\mathbf{u} = \begin{bmatrix}7\\1\end{bmatrix}\) and \(\mathbf{v} = \begin{bmatrix}3\\2\end{bmatrix}\) in \(\mathbb{R}^2\).

Then one can see that the distance from \(\mathbf{u}\) to \(\mathbf{v}\) is the same as the length of the vector \(\mathbf{u}-\mathbf{v}\).

_images/db1b6afce8ad67496b7908328aa4a3184946a918419a2ae8fe61a78a9ab1b9c6.png

This shows that the distance between two vectors is the length of their difference.

Question Time! Q 20.1

Orthogonality#

Now we turn to another familiar notion from 2D geometry, which we’ll generalize to \(\mathbb{R}^n\): the notion of being perpendicular.

Once again, we seek a way to express perpendicularity of two vectors, regardless of the dimension they live in.

We will say that two vectors are perpendicular if they form a right angle at the origin.

_images/d9f78400e349da05fd8b23a9ba74bb5c4acd3cc9dc26ff6079c71f90e7f9e4fa.png

Draw the line connecting \(\mathbf{u}\) and \(\mathbf{v}\).

Then \(\mathbf{u}\) and \(\mathbf{v}\) are perpendicular if and only if they make a right triangle with the origin.

So \(\mathbf{u}\) and \(\mathbf{v}\) are perpendicular if and only if the Pythagorean Theorem is satisified for this triangle.

_images/95fcb0608059d206797d536e4f7ca1b9c68040a77a11dd663b49154caf08d7c1.png

What is the length of the red side of the triangle?

According to the definitions we’ve developed today, it is \(\Vert \mathbf{u} - \mathbf{v} \Vert\).

_images/8f2ad2198aff4838243f5bc31dc5caa805395f4f8d7805db23261f7816baf244.png

So the blue and green lines are perpendicular if and only if:

\[ \Vert \mathbf{u} - \mathbf{v} \Vert^2 = \Vert \mathbf{u} \Vert^2 + \Vert \mathbf{v} \Vert^2\]

Let’s see what this implies from an algebraic standpoint.

First let’s simplify the expression for squared distance from \(\mathbf{u}\) to \(\mathbf{v}\):

\[[\mbox{dist}(\mathbf{u},\mathbf{v})]^2 = \Vert\mathbf{u}-\mathbf{v}\Vert^2\]
\[ = (\mathbf{u}-\mathbf{v})^T(\mathbf{u}-\mathbf{v})\]
\[ = (\mathbf{u}^T-\mathbf{v}^T)(\mathbf{u}-\mathbf{v})\]
\[ = \mathbf{u}^T(\mathbf{u}-\mathbf{v}) - \mathbf{v}^T(\mathbf{u}-\mathbf{v})\]
\[ = \mathbf{u}^T\mathbf{u} - \mathbf{u}^T\mathbf{v} - \mathbf{v}^T\mathbf{u} + \mathbf{v}^T\mathbf{v}\]

Now, remember that inner product is symmetric, ie, \(\mathbf{u}^T\mathbf{v} = \mathbf{v}^T\mathbf{u}\), so

\[ = \mathbf{u}^T\mathbf{u} + \mathbf{v}^T\mathbf{v} - 2\mathbf{u}^T\mathbf{v}\]
\[ = \Vert\mathbf{u}\Vert^2 + \Vert\mathbf{v}\Vert^2 - 2\mathbf{u}^T\mathbf{v}\]

Now, let’s go back to the Pythagorean Theorem.

\( \mathbf{u}\) and \( \mathbf{v} \) are perpendicular if and only if:

\[ \Vert \mathbf{u} - \mathbf{v} \Vert^2 = \Vert \mathbf{u} \Vert^2 + \Vert \mathbf{v} \Vert^2\]

But we’ve seen that this means:

\[ \Vert\mathbf{u}\Vert^2 + \Vert\mathbf{v}\Vert^2 - 2\mathbf{u}^T\mathbf{v}= \Vert \mathbf{u} \Vert^2 + \Vert \mathbf{v} \Vert^2\]

So now we can define perpendicularity in \(\mathbb{R}^n\):

Definition. Two vectors \(\mathbf{u}\) and \(\mathbf{v}\) in \(\mathbb{R}^n\) are orthogonal to each other if \(\mathbf{u}^T\mathbf{v} = 0.\)

As you can see, we have introduced a new term for this notion: orthogonal.

So when we are referring to vectors, orthogonal means the same thing as perpendicular.

Question Time! Q20.2

The Angle Between Two Vectors#

There is an important connection between the inner product of two vectors and the angle between them.

This connection is very useful (eg, in thinking about data mining operations).

We start from the law of cosines:

\[ c^2 = a^2 + b^2 - 2ab\cos\theta\]
_images/705b22831f24f444071064c48f9123fc7bb64675caad8a4c3c44623d9b13e3b2.png

Now let’s interpret this law in terms of vectors \(\mathbf{u}\) and \(\mathbf{v}\).

Once again, it is the angle that these vectors make at the origin that we are concerned with:

_images/dd33b285b1a14c43ed92c4b78abe6e996cbb6eb7569aab04f94081802e51fc42.png

Applying the law of cosines we get:

\[\Vert\mathbf{u}-\mathbf{v}\Vert^2 = \Vert\mathbf{u}\Vert^2 + \Vert\mathbf{v}\Vert^2 - 2\Vert\mathbf{u}\Vert\Vert\mathbf{v}\Vert\cos\theta\]

Now, previously we calculated that:

\[ \Vert\mathbf{u}-\mathbf{v}\Vert^2 = (\mathbf{u}-\mathbf{v})^T(\mathbf{u}-\mathbf{v})\]
\[ = \Vert\mathbf{u}\Vert^2 + \Vert\mathbf{v}\Vert^2 - 2\mathbf{u}^T\mathbf{v}\]

Which means that

\[ 2\mathbf{u}^T\mathbf{v} = 2\Vert\mathbf{u}\Vert\Vert\mathbf{v}\Vert\cos\theta\]

So

\[ \mathbf{u}^T\mathbf{v} = \Vert\mathbf{u}\Vert\Vert\mathbf{v}\Vert\cos\theta\]

This is a very important connection between the notion of inner product and trigonometry.

As a quick check, note that if \(\mathbf{u}\) and \(\mathbf{v}\) are nonzero, and \(\mathbf{u}^T\mathbf{v} = 0\), then \(\cos\theta = 0.\)

In other words, the angle between \(\mathbf{u}\) and \(\mathbf{v}\) is 90 degrees (or 270 degrees). So this agrees with our definition of orthogonality.

One implication in particular concerns unit vectors.

\[ \mathbf{u}^T\mathbf{v} = \Vert\mathbf{u}\Vert\Vert\mathbf{v}\Vert\cos\theta\]

So

\[ \frac{\mathbf{u}^T\mathbf{v}}{\Vert\mathbf{u}\Vert\Vert\mathbf{v}\Vert} = \cos\theta\]
\[ \frac{\mathbf{u}^T}{\Vert\mathbf{u}\Vert}\frac{\mathbf{v}}{\Vert\mathbf{v}\Vert} = \cos\theta\]

Note that \(\frac{\mathbf{u}}{\Vert\mathbf{u}\Vert}\) and \(\frac{\mathbf{v}}{\Vert\mathbf{v}\Vert}\) are unit vectors.

So we have the very simple rule, that for two unit vectors, their inner product is the cosine of the angle between them!

_images/514ad5e82bc13755c80fe3b4e7fcc116e2613f4e0993b694aec61347a9ee96d8.png

Here \(\mathbf{u} = \begin{bmatrix}0.95\\0.31\end{bmatrix},\) and \(\mathbf{v} = \begin{bmatrix}0.20\\0.98\end{bmatrix}.\)

So \(\mathbf{u}^T\mathbf{v} = (0.95\cdot 0.20) + (0.31 \cdot 0.98) = 0.5\)

So \(\cos\theta = 0.5.\)

So \(\theta = 60\) degrees.

Example. Find the angle formed by the vectors:

\[\begin{split}\mathbf{u} = \begin{bmatrix}1\\3\\-7\\-2\end{bmatrix} \;\;\mbox{and}\;\; \mathbf{v} = \begin{bmatrix}8\\-2\\4\\6\end{bmatrix}\end{split}\]

Solution.

First normalize the vectors:

\[\Vert\mathbf{u}\Vert = \sqrt{1^2 + 3^2 + (-7)^2 + (-2)^2} = 7.93 \]
\[\Vert\mathbf{v}\Vert = \sqrt{8^2 + (-2)^2 + 4^2 + 6^2} = 10.95 \]

So

\[\begin{split}\frac{\mathbf{u}}{\Vert\mathbf{u}\Vert} = \begin{bmatrix}0.13\\0.38\\-0.88\\-0.25\end{bmatrix} \;\;\mbox{and}\;\;\frac{\mathbf{v}}{\Vert\mathbf{v}\Vert} = \begin{bmatrix}0.73\\-0.18\\0.36\\0.54\end{bmatrix}\end{split}\]

Then calculate the cosine of the angle between them:

\[\cos\theta = \frac{\mathbf{u}^T}{\Vert\mathbf{u}\Vert}\frac{\mathbf{v}}{\Vert\mathbf{v}\Vert}\]
\[ = (0.13\cdot0.73)+(0.38\cdot -0.18)+(-0.88\cdot 0.36)+(-0.25\cdot0.54)\]
\[= -0.44\]

Then:

\[\theta = \cos^{-1}(-0.44)\]
\[= 116\;\mbox{degrees.}\]

Example Application: Cosine Similarity.#

A typical example where these techniques are valuable arises in data science.

Let’s say we are given two documents \(d_1\) and \(d_2\). Each is a collection of “words.”

For a goal such as information retrieval, we’d like to know whether the two documents are “similar”.

One common way to formalize similarity is to say that documents are similar if the sets of words they contain are similar.

We can measure this as follows: Construct a vector \(\mathbf{x}_1\) that counts the frequency of occurrence of certain words in \(d_1\):

\[\begin{split} \begin{bmatrix} dog\\ car \\ house \\ \dots \\ door \end{bmatrix} \;\; \rightarrow \;\; \begin{bmatrix} 3\\ 7 \\ 4 \\ \dots \\ 20 \end{bmatrix} \end{split}\]

Do the same thing for \(d_2\), yielding \(\mathbf{x_2}\).

What we have done is taken individual documents and represented them as vectors \(\mathbf{x}_1\) and \(\mathbf{x}_2\) in a very high dimensional space, \(\mathbb{R}^n\).

Here \(n\) could be 10,000 or more.

Now, to ask whether the two documents are similar, we simply ask “do their vectors point in the same general direction?”

And, despite the fact that we are in a very high dimensional space which we cannot visualize, we can nonetheless answer the question easily.

The angle between the two document-vectors is simply:

\[\theta_{1,2} = \cos^{-1}\frac{\mathbf{x_1}^T}{\Vert\mathbf{x_1}\Vert}\frac{\mathbf{x_2}}{\Vert\mathbf{x_2}\Vert}\]