Linear Algebra – Orthogonality and least squares – Orthogonal projections
Theorem: If \(W\) is a subspace of \(\mathbb{R}^n\), then we have: each vector \(\mathbf{y}\in\mathbb{R}^n\) can uniquely be written in the form \(\mathbf{y}=\hat{\mathbf{y}}+\mathbf{z}\), where \(\hat{\mathbf{y}}\in W\) and \(\mathbf{z}\in W^{\perp}\). If \(\{\mathbf{v}_1,\ldots,\mathbf{v}_p\}\) is an orthogonal basis of \(W\), then we have:
\[\hat{\mathbf{y}}=\text{proj}_{W}\mathbf{y}=\left(\frac{\mathbf{y}\cdot\mathbf{v}_1}{\mathbf{v}_1\cdot\mathbf{v}_1}\right)\mathbf{v}_1+\cdots +\left(\frac{\mathbf{y}\cdot\mathbf{v}_p}{\mathbf{v}_p\cdot\mathbf{v}_p}\right)\mathbf{v}_p.\]This is the (orthogonal) projection of \(\mathbf{y}\) onto \(W\).
Proof: Suppose that \(\{\mathbf{v}_1,\ldots,\mathbf{v}_p\}\) is any orthogonal basis of \(W\) and that \(\hat{\mathbf{y}}\) is defined as in the theorem, then we have: \(\hat{\mathbf{y}}\in W\). Now let \(\mathbf{z}=\mathbf{y}-\hat{\mathbf{y}}\), then we have:
\[\mathbf{z}\cdot\mathbf{v}_i=\left(\mathbf{y}-\hat{\mathbf{y}}\right)\cdot\mathbf{v}_i=\mathbf{y}\cdot\mathbf{v}_i -\left(\frac{\mathbf{y}\cdot\mathbf{v}_i}{\mathbf{v}_i\cdot\mathbf{v}_i}\right)\mathbf{v}_i\cdot\mathbf{v}_i =\mathbf{y}\cdot\mathbf{v}_i-\mathbf{y}\cdot\mathbf{v}_i=0,\quad i=1,2,\ldots,p.\]This implies that \(\mathbf{z}\perp\mathbf{v}_i\) for all \(i=1,2,\ldots,p\) and therefore: \(\mathbf{z}\in W^{\perp}\).
In order to show that the decomposition is unique, suppose that also \(\mathbf{y}=\tilde{\mathbf{y}}+\mathbf{w}\) with \(\tilde{\mathbf{y}}\in W\) and \(\mathbf{w}\in W^{\perp}\). Then we have: \(\hat{\mathbf{y}}+\mathbf{z}=\tilde{\mathbf{y}}+\mathbf{w}\) and therefore \(\hat{\mathbf{y}}-\tilde{\mathbf{y}}=\mathbf{w}-\mathbf{z}\). The left-hand side is clearly a vector in \(W\), while the right-hand side is a vector in \(W^{\perp}\). Since the zero vector \(\mathbf{0}\) is the only vector that is both in \(W\) and in \(W^{\perp}\), this implies that \(\hat{\mathbf{y}}-\tilde{\mathbf{y}}=\mathbf{0}\) and \(\mathbf{w}-\mathbf{z}=\mathbf{0}\). Hence: \(\hat{\mathbf{y}}=\tilde{\mathbf{y}}\) and \(\mathbf{w}=\mathbf{z}\).
Theorem: Let \(W\) be a subspace of \(\mathbb{R}^n\) and \(\mathbf{y}\) a vector in \(\mathbb{R}^n\). Suppose that \(\hat{\mathbf{y}}\) is the orthogonal projection of \(\mathbf{y}\) onto \(W\). Then we have: \(\hat{\mathbf{y}}\) is the vector in \(W\) that is closest to \(\mathbf{y}\), which means that \(||\mathbf{y}-\hat{\mathbf{y}}|| < ||\mathbf{y}-\mathbf{w}||\) for all \(\mathbf{w}\in W\) unequal to \(\hat{\mathbf{y}}\).
The vector \(\hat{\mathbf{y}}\) is called the best approximation of \(\mathbf{y}\) byr elements of \(W\).
Proof: Let \(\mathbf{w}\) be a vector in \(W\) unequal to \(\hat{\mathbf{y}}\). Then we have: \(\hat{\mathbf{y}}-\mathbf{w}\in W\). Note that \(\mathbf{y}-\hat{\mathbf{y}}\in W^{\perp}\) and so also: \(\mathbf{y}-\hat{\mathbf{y}}\perp\hat{\mathbf{y}}-\mathbf{w}\). Now we have: \(\mathbf{y}-\mathbf{w}=(\mathbf{y}-\hat{\mathbf{y}})+(\hat{\mathbf{y}}-\mathbf{w})\). However then Pythagoras' theorem implies that \(||\mathbf{y}-\mathbf{w}||^2=||\mathbf{y}-\hat{\mathbf{y}}||^2+||\hat{\mathbf{y}}-\mathbf{w}||^2\). Since \(\mathbf{w}\) is unequal to \(\hat{\mathbf{y}}\), this implies that \(||\mathbf{y}-\hat{\mathbf{y}}|| < ||\mathbf{y}-\mathbf{w}||\).
This can be used to determine the distance of a point to a line or a plane in \(\mathbb{R}^3\) by finding the (orthogonal) projection first and then the distance to that (orthogonal) projection.
Examples:
1) Consider the point \(P=(1,-2,1)\) and the plane \(V\) given by the equation \(x_1+3x_2-2x_3=0\) in \(\mathbb{R}^3\). Note that \(V^{\perp}\) is the line spanned by the vector \(\mathbf{v}=\begin{pmatrix}1\\3\\-2\end{pmatrix}\). The (orthogonal) projection of \(\mathbf{y}=\begin{pmatrix}1\\-2\\1\end{pmatrix}\) onto \(V^{\perp}\) is:
\[\text{proj}_{V^{\perp}}\mathbf{y}=\left(\frac{\mathbf{y}\cdot\mathbf{v}}{\mathbf{v}\cdot\mathbf{v}}\right)\mathbf{v} =\frac{1-6-2}{1+9+4}\mathbf{v}=-\frac{7}{14}\begin{pmatrix}1\\3\\-2\end{pmatrix}=-\frac{1}{2}\begin{pmatrix}1\\3\\-2\end{pmatrix}.\]Then the distance of \(P\) to the plane \(V\) is \(\frac{1}{2}||\begin{pmatrix}1\\3\\-2\end{pmatrix}||=\frac{1}{2}\sqrt{1+9+4}=\frac{1}{2}\sqrt{14}\).
2) Consider the point \(P=(1,-2,1)\) and the line \(\ell\) spanned by the vector \(\mathbf{v}=\begin{pmatrix}1\\3\\-2\end{pmatrix}\). The (orthogonal) projection of \(\mathbf{y}=\begin{pmatrix}1\\-2\\1\end{pmatrix}\) onto the line \(\ell\) is then \(\hat{\mathbf{y}}=-\frac{1}{2}\begin{pmatrix}1\\3\\-2\end{pmatrix}\). This implies: \(\mathbf{y}-\hat{\mathbf{y}}=\frac{1}{2}\begin{pmatrix}3\\-1\\0\end{pmatrix}\). Then the distance of \(P\) to the line \(\ell\) is \(||\mathbf{y}-\hat{\mathbf{y}}||=\frac{1}{2}||\begin{pmatrix}3\\-1\\0\end{pmatrix}||=\frac{1}{2}\sqrt{9+1+0}=\frac{1}{2}\sqrt{10}\).
Theorem: If \(\{\mathbf{u}_1,\ldots,\mathbf{u}_p\}\) is an orthonormal basis of a subspace \(W\) of \(\mathbb{R}^n\) and \(\mathbf{y}\) is a vector in \(\mathbb{R}^n\), then we have:
\[\text{proj}_W\mathbf{y}=(\mathbf{y}\cdot\mathbf{u}_1)\mathbf{u}_1+\cdots+(\mathbf{y}\cdot\mathbf{u}_p)\mathbf{u}_p.\]If \(U=\Bigg(\mathbf{u}_1\;\ldots\;\mathbf{u}_p\Bigg)\), then we have:
\[\text{proj}_W\mathbf{y}=UU^T\mathbf{y}\quad\text{for all}\quad\mathbf{y}\in\mathbb{R}^n.\]Proof: The first part immediately follows from the first theorem since \(\mathbf{u}_i\cdot\mathbf{u}_i=1\) for \(i=1,2,\ldots,p\). For the second part we have:
\[\text{proj}_W\mathbf{y}=(\mathbf{y}\cdot\mathbf{u}_1)\mathbf{u}_1+\cdots+(\mathbf{y}\cdot\mathbf{u}_p)\mathbf{u}_p =(\mathbf{u}_1\cdot\mathbf{y})\mathbf{u}_1+\cdots+(\mathbf{u}_p\cdot\mathbf{y})\mathbf{u}_p =(\mathbf{u}_1^T\mathbf{y})\mathbf{u}_1+\cdots+(\mathbf{u}_p^T\mathbf{y})\mathbf{u}_p.\]This is a linear combination of the columns of \(U\) and the weights are the elements of the vector \(U^T\mathbf{y}\).
The matrix \(UU^T\) is called a projection matrix. This is symmetric, since: \((UU^T)^T=(U^T)^TU^T=UU^T\). Moreover we have that \(U^TU=I\).
Example: Consider the projection onto the plane \(V\) given by the equation \(x_1+x_2+x_3=0\). The orthogonal complement of \(V\) is the line spanned by the vector \(\mathbf{v}=\begin{pmatrix}1\\1\\1\end{pmatrix}\). The (orthogonal) projection of a vector \(\mathbf{y}=\begin{pmatrix}y_1\\y_2\\y_3\end{pmatrix}\) onto that line is then
\[\left(\frac{\mathbf{y}\cdot\mathbf{v}}{\mathbf{v}\cdot\mathbf{v}}\right)\mathbf{v}=\frac{y_1+y_2+y_3}{3}\begin{pmatrix}1\\1\\1\end{pmatrix}.\]This implies that the (orthogonal) projection of \(\mathbf{y}\) onto the plane \(V\) is equal to
\[\text{proj}_V\mathbf{y}=\mathbf{y}-\left(\frac{\mathbf{y}\cdot\mathbf{v}}{\mathbf{v}\cdot\mathbf{v}}\right)\mathbf{v} =\begin{pmatrix}y_1\\y_2\\y_3\end{pmatrix}-\frac{y_1+y_2+y_3}{3}\begin{pmatrix}1\\1\\1\end{pmatrix} =\frac{1}{3}\begin{pmatrix}2y_1-y_2-y_3\\-y_1+2y_2-y_3\\-y_1-y_2+2y_3\end{pmatrix} =\frac{1}{3}\begin{pmatrix}2&-1&-1\\-1&2&-1\\-1&-1&2\end{pmatrix}\mathbf{y}.\]Note that \(\{\mathbf{u}_1,\mathbf{u}_2\}\) with \(\mathbf{u}_1=\dfrac{1}{\sqrt{2}}\begin{pmatrix}1\\0\\-1\end{pmatrix}\) and \(\dfrac{1}{\sqrt{6}}\begin{pmatrix}1\\-2\\1\end{pmatrix}\) is an orthonormal basis of \(V\). Now suppose that \(U=\Bigg(\mathbf{u}_1\;\mathbf{u}_2\Bigg)\), then we have:
\[U=\frac{1}{\sqrt{6}}\begin{pmatrix}\sqrt{3}&1\\0&-2\\-\sqrt{3}&1\end{pmatrix}\quad\Longrightarrow\quad UU^T=\frac{1}{6}\begin{pmatrix}\sqrt{3}&1\\0&-2\\-\sqrt{3}&1\end{pmatrix}\begin{pmatrix}\sqrt{3}&0&-\sqrt{3}\\1&-2&1\end{pmatrix} =\frac{1}{6}\begin{pmatrix}$&-2&-2\\-2&4&-2\\-2&-2&4\end{pmatrix}=\frac{1}{3}\begin{pmatrix}2&-1&-1\\-1&2&-1\\-1&-1&2\end{pmatrix}.\]However \(\{\mathbf{u}_1,\mathbf{u}_2\}\) with \(\mathbf{u}_1=\dfrac{1}{\sqrt{2}}\begin{pmatrix}0\\-1\\1\end{pmatrix}\) and \(\mathbf{u}_2=\dfrac{1}{\sqrt{6}}\begin{pmatrix}-2\\1\\1\end{pmatrix}\) is also an orthonormal basis of \(V\). Now suppose that \(U=\Bigg(\mathbf{u}_1\;\mathbf{u}_2\Bigg)\), then we have:
\[U=\frac{1}{\sqrt{6}}\begin{pmatrix}0&-2\\-\sqrt{3}&1\\\sqrt{3}&1\end{pmatrix}\quad\Longrightarrow\quad UU^T=\frac{1}{6}\begin{pmatrix}0&-2\\-\sqrt{3}&1\\\sqrt{3}&1\end{pmatrix}\begin{pmatrix}0&-\sqrt{3}&\sqrt{3}\\-2&1&1\end{pmatrix} =\frac{1}{6}\begin{pmatrix}4&-2&-2\\-2&4&-2\\-2&-2&4\end{pmatrix}=\frac{1}{3}\begin{pmatrix}2&-1&-1\\-1&2&-1\\-1&-1&2\end{pmatrix}.\]Last modified on May 1, 2021