Linear Algebra – Orthogonality and least squares – The Gram-Schmidt process
For the computation of an orthogonal projection of a vector \(\mathbf{y}\) onto a subspace \(W\) of \(\mathbb{R}^n\) an orthogonal basis \(\{\mathbf{v}_1,\ldots,\mathbf{v}_p\}\) is most wanted. The orthogonal projection is then equal to the sum of orthogonal projections onto the (orthogonal) basis vectors:
\[\text{proj}_W\mathbf{y}=\left(\frac{\mathbf{y}\cdot\mathbf{v}_1}{\mathbf{v}_1\cdot\mathbf{v}_1}\right)\mathbf{v}_1+\cdots+ \left(\frac{\mathbf{y}\cdot\mathbf{v}_p}{\mathbf{v}_p\cdot\mathbf{v}_p}\right)\mathbf{v}_p.\]An interesting question is now: how can we construct an orthogonal basis if we know any (non-orthogonal) basis of \(W\)? This can be done by using the Gram-Schmidt (orthogonalization) process:
Theorem: Suppose that \(\{\mathbf{x}_1,\ldots,\mathbf{x}_p\}\) is any basis of a subspace \(W\) of \(\mathbb{R}^n\). Then define:
\begin{align*} \mathbf{v}_1&=\mathbf{x}_1\\[2.5mm] \mathbf{v}_2&=\mathbf{x}_2-\left(\frac{\mathbf{x}_2\cdot\mathbf{v}_1}{\mathbf{v}_1\cdot\mathbf{v}_1}\right)\mathbf{v}_1\\[2.5mm] \mathbf{v}_3&=\mathbf{x}_3-\left(\frac{\mathbf{x}_3\cdot\mathbf{v}_1}{\mathbf{v}_1\cdot\mathbf{v}_1}\right)\mathbf{v}_1 -\left(\frac{\mathbf{x}_3\cdot\mathbf{v}_2}{\mathbf{v}_2\cdot\mathbf{v}_2}\right)\mathbf{v}_2\\[2.5mm] &\;\;\vdots\\[2.5mm] \mathbf{v}_p&=\mathbf{x}_p-\left(\frac{\mathbf{x}_p\cdot\mathbf{v}_1}{\mathbf{v}_1\cdot\mathbf{v}_1}\right)\mathbf{v}_1 -\left(\frac{\mathbf{x}_p\cdot\mathbf{v}_2}{\mathbf{v}_2\cdot\mathbf{v}_2}\right)\mathbf{v}_2 -\cdots-\left(\frac{\mathbf{x}_p\cdot\mathbf{v}_{p-1}}{\mathbf{v}_{p-1}\cdot\mathbf{v}_{p-1}}\right)\mathbf{v}_{p-1}. \end{align*}Then we have: \(\{\mathbf{v}_1,\ldots,\mathbf{v}_p\}\) is an orthogonal basis of \(W\).
Remark: In each step all orthogonal projections of every basis vector \(\mathbf{x}_k\) onto the new (orthogonal) basis vectors \(\mathbf{v}_1,\ldots,\mathbf{v}_{k-1}\) are subtracted from \(\mathbf{x}_k\). The resulting vector \(\mathbf{v}_k\) is then perpendicular to all vectors \(\mathbf{v}_1,\ldots,\mathbf{v}_{k-1}\) which are already (mutually) orthogonal. Morover we have that: \(\text{Span}\{\mathbf{v}_1,\ldots,\mathbf{v}_k\}=\text{Span}\{\mathbf{x}_1,\ldots,\mathbf{x}_k\}\).
An orthogonal basis can easily be converted to an orthonormal basis by normalizing (or: scaling) every basis vector: if \(\{\mathbf{v}_1,\ldots,\mathbf{v}_p\}\) is an orthogonal basis of \(W\), then \(\{\mathbf{u}_1,\ldots,\mathbf{u}_p\}\) with \(\mathbf{u}_i=\displaystyle\frac{1}{||\mathbf{v}_i||}\mathbf{v}_i\) for \(i=1,2,\ldots,p\) is an orthonormal basis of \(W\).
Example: Suppose that \(W=\text{Span}\{\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3\mathbf{x}_4\}\) with \(\mathbf{x}_1=\begin{pmatrix}1\\-1\\0\\1\end{pmatrix}\), \(\mathbf{x}_2=\begin{pmatrix}0\\1\\2\\-1\end{pmatrix}\), \(\mathbf{x}_3=\begin{pmatrix}1\\1\\1\\1\end{pmatrix}\) and \(\mathbf{x}_4=\begin{pmatrix}2\\1\\3\\1\end{pmatrix}\). So \(W\) is a subspace of \(\mathbb{R}^4\). We first determine a basis of \(W\):
\[\begin{pmatrix}1&0&1&2\\-1&1&1&1\\0&2&1&3\\1&-1&1&1\end{pmatrix}\sim\begin{pmatrix}1&0&1&2\\0&1&2&3\\0&2&1&3\\0&-1&0&-1\end{pmatrix} \sim\begin{pmatrix}1&0&1&2\\0&1&2&3\\0&0&-3&-3\\0&0&2&2\end{pmatrix}\sim\begin{pmatrix}1&0&1&2\\0&1&2&3\\0&0&1&1\\0&0&0&0\end{pmatrix}.\]This implies that \(\{\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3\}\) is a basis of \(W\). Now we apply the Gram-Schmidt process:
\begin{align*} \mathbf{v}_1&=\mathbf{x}_1=\begin{pmatrix}1\\-1\\0\\1\end{pmatrix}\\[2.5mm] \mathbf{v}_2&=\mathbf{x}_2-\left(\frac{\mathbf{x}_2\cdot\mathbf{v}_1}{\mathbf{v}_1\cdot\mathbf{v}_1}\right)\mathbf{v}_1 =\begin{pmatrix}0\\1\\2\\-1\end{pmatrix}-\frac{\begin{pmatrix}0&1&2&-1\end{pmatrix}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix}}{\begin{pmatrix}1&-1&0&1\end{pmatrix}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix}}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix}\\[2.5mm] &=\begin{pmatrix}0\\1\\2\\-1\end{pmatrix}-\frac{0-1+0-1}{1+1+0+1}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix} =\begin{pmatrix}0\\1\\2\\-1\end{pmatrix}+\frac{2}{3}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix} =\frac{1}{3}\begin{pmatrix}2\\1\\6\\-1\end{pmatrix}\\[2.5mm] \mathbf{v}_3&=\mathbf{x}_3-\left(\frac{\mathbf{x}_3\cdot\mathbf{v}_1}{\mathbf{v}_1\cdot\mathbf{v}_1}\right)\mathbf{v}_1 -\left(\frac{\mathbf{x}_3\cdot\mathbf{v}_2}{\mathbf{v}_2\cdot\mathbf{v}_2}\right)\mathbf{v}_2=\begin{pmatrix}1\\1\\1\\1\end{pmatrix} -\frac{\begin{pmatrix}1&1&1&1\end{pmatrix}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix}}{\begin{pmatrix}1&-1&0&1\end{pmatrix}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix}}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix} -\frac{\begin{pmatrix}1&1&1&1\end{pmatrix}\begin{pmatrix}2\\1\\6\\-1\end{pmatrix}}{\begin{pmatrix}2&1&6&-1\end{pmatrix}\begin{pmatrix}2\\1\\6\\-1\end{pmatrix}}\begin{pmatrix}2\\1\\6\\-1\end{pmatrix}\\[2.5mm] &=\begin{pmatrix}1\\1\\1\\1\end{pmatrix}-\frac{1-1+0+1}{1+1+0+1}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix}-\frac{2+1+6-1}{4+1+36+1}\begin{pmatrix}2\\1\\6\\-1\end{pmatrix} =\begin{pmatrix}1\\1\\1\\1\end{pmatrix}-\frac{1}{3}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix}-\frac{8}{42}\begin{pmatrix}2\\1\\6\\-1\end{pmatrix} =\frac{1}{21}\begin{pmatrix}6\\24\\-3\\18\end{pmatrix}=\frac{1}{7}\begin{pmatrix}2\\8\\-1\\6\end{pmatrix}. \end{align*}Note that \(\displaystyle\left(\frac{\mathbf{x}_3\cdot\mathbf{v}_2}{\mathbf{v}_2\cdot\mathbf{v}_2}\right)\mathbf{v}_2 =\left(\frac{\mathbf{x}_3\cdot3\mathbf{v}_2}{3\mathbf{v}_2\cdot3\mathbf{v}_2}\right)3\mathbf{v}_2\). Since the length or norm of the vectors is not important for the orthogonality, we conclude that
\[\{\mathbf{v}_1,3\mathbf{v}_2,7\mathbf{v}_3\}=\left\{\begin{pmatrix}1\\-1\\0\\1\end{pmatrix},\begin{pmatrix}2\\1\\6\\-1\end{pmatrix}, \begin{pmatrix}2\\8\\-1\\6\end{pmatrix}\right\}\]is an orthogonal basis of \(W\).
Remark: we first determined that \(\mathbf{x}_4\in\text{Span}\{\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3\}\) which implies that we did not need to use \(\mathbf{x}_4\) in the orthogonalizing process. If we would have done so, then we would have obtained
\[\mathbf{v}_4=\mathbf{x}_4-\left(\frac{\mathbf{x}_4\cdot\mathbf{v}_1}{\mathbf{v}_1\cdot\mathbf{v}_1}\right)\mathbf{v}_1 -\left(\frac{\mathbf{x}_4\cdot\mathbf{v}_2}{\mathbf{v}_2\cdot\mathbf{v}_2}\right)\mathbf{v}_2 -\left(\frac{\mathbf{x}_4\cdot\mathbf{v}_3}{\mathbf{v}_3\cdot\mathbf{v}_3}\right)\mathbf{v}_3 =\begin{pmatrix}2\\1\\3\\1\end{pmatrix}-\frac{2}{3}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix} -\frac{11}{21}\begin{pmatrix}2\\1\\6\\-1\end{pmatrix}-\frac{1}{7}\begin{pmatrix}2\\8\\-1\\6\end{pmatrix}=\mathbf{0}.\]This also implies (possibly) that \(\mathbf{x}_4\) is a linear combination of \(\mathbf{v}_1\), \(\mathbf{v}_2\) and \(\mathbf{v}_3\).
By normalizing or scaling the vectors we can also determine an orthonormal basis of \(W\):
\[\{\mathbf{u}_1,\mathbf{u}_2,\mathbf{u}_3\}=\left\{\frac{1}{||\mathbf{v}_1||}\mathbf{v}_1,\frac{1}{||\mathbf{v}_2||}\mathbf{v}_2,\frac{1}{||\mathbf{v}_3||}\mathbf{v}_3\right\} =\left\{\frac{1}{\sqrt{3}}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix},\frac{1}{\sqrt{42}}\begin{pmatrix}2\\1\\6\\-1\end{pmatrix}, \frac{1}{\sqrt{105}}\begin{pmatrix}2\\8\\-1\\6\end{pmatrix}\right\}.\]The Gram-Schmidt process can be used to find a \(QR\) factorization of a matrix:
Theorem: If \(A\) is an \(m\times n\) matrix with linear independent colums, then \(A\) can be written as \(A=QR\), where \(Q\) is an \(m\times n\) matrix whose columns form an orthonormal basis of \(\text{Col}(A)\) and \(R\) is an invertible \(n\times n\) upper triangular matrix is.
Proof: Since the columns of \(A=\Bigg(\mathbf{a}_1\;\ldots\;\mathbf{a}_n\Bigg)\) are linear independent, \(\{\mathbf{a}_1,\ldots,\mathbf{a}_n\}\) is a basis of \(\text{Col}(A)\). Using the Gram-Schmidt process we can construct from this an orthonormal basis \(\{\mathbf{u}_1,\ldots,\mathbf{u}_n\}\) of \(\text{Col}(A)\). Then we have for every \(k=1,2,\ldots,n\) that \(\text{Span}\{\mathbf{a}_1,\ldots,\mathbf{a}_k\}=\text{Span}\{\mathbf{u}_1,\ldots,\mathbf{u}_k\}\). So there exist scalars \(r_{1k},\ldots,r_{kk}\) such that
\[\mathbf{a}_k=r_{1k}\mathbf{u}_1+\cdots+r_{kk}\mathbf{u}_k+0\mathbf{u}_{k+1}+\cdots+0\mathbf{u}_n,\quad k=1,2,\ldots,n.\]Let \(Q=\Bigg(\mathbf{u}_1\;\ldots\;\mathbf{u}_n\Bigg)\), then we have that \(\mathbf{a}_k=Q\mathbf{r}_k\) with \(\mathbf{r}_k=\begin{pmatrix}r_{1k}\\\vdots\\r_{kk}\\0\\\vdots\\0\end{pmatrix}\) for \(k=1,2,\ldots,n\). Now suppose that \(R=\Bigg(\mathbf{r}_1\;\ldots\;\mathbf{r}_n\Bigg)\), then we have:
\[A=\Bigg(\mathbf{a}_1\;\ldots\;\mathbf{a}_n\Bigg)=\Bigg(Q\mathbf{r}_1\;\ldots\;Q\mathbf{r}_n\Bigg)=QR.\]Suppose that \(R\mathbf{x}=\mathbf{0}\), then we have that \(A\mathbf{x}=QR\mathbf{x}=Q\mathbf{0}=\mathbf{0}\). Since the columns of \(A\) are linear independent, this implies that \(\mathbf{x}=\mathbf{0}\) and this proves that \(R\) is invertible.
Example: Consider \(A=\begin{pmatrix}1&0&1\\-1&1&1\\0&2&1\\1&-1&1\end{pmatrix}\). Above we have seen that the columns of \(A\) are linear independent and that
\[\left\{\frac{1}{\sqrt{3}}\begin{pmatrix}1\\-1\\0\\1\end{pmatrix},\frac{1}{\sqrt{42}}\begin{pmatrix}2\\1\\6\\-1\end{pmatrix}, \frac{1}{\sqrt{105}}\begin{pmatrix}2\\8\\-1\\6\end{pmatrix}\right\}\]is an orthonormal basis of \(\text{Col}(A)\). Hence:
\[Q=\begin{pmatrix}\frac{1}{\sqrt{3}}&\frac{2}{\sqrt{42}}&\frac{2}{\sqrt{105}}\\-\frac{1}{\sqrt{3}}&\frac{1}{\sqrt{42}}&\frac{8}{\sqrt{105}}\\ 0&\frac{6}{\sqrt{42}}&-\frac{1}{\sqrt{105}}\\\frac{1}{\sqrt{3}}&-\frac{1}{\sqrt{42}}&\frac{6}{\sqrt{105}}\end{pmatrix} =\frac{1}{\sqrt{210}}\begin{pmatrix}\sqrt{70}&2\sqrt{5}&2\sqrt{2}\\-\sqrt{70}&\sqrt{5}&8\sqrt{2}\\ 0&6\sqrt{5}&-\sqrt{2}\\\sqrt{70}&-\sqrt{5}&6\sqrt{2}\end{pmatrix}.\]Since \(Q^TQ=I\) it follows that
\[R=Q^TA=\frac{1}{\sqrt{210}}\begin{pmatrix}\sqrt{70}&-\sqrt{70}&0&\sqrt{70}\\2\sqrt{5}&\sqrt{5}&6\sqrt{5}&-\sqrt{5}\\ 2\sqrt{2}&8\sqrt{2}&-\sqrt{2}&6\sqrt{2}\end{pmatrix}\begin{pmatrix}1&0&1\\-1&1&1\\0&2&1\\1&-1&1\end{pmatrix}=\frac{1}{\sqrt{210}} \begin{pmatrix}3\sqrt{70}&-2\sqrt{70}&\sqrt{70}\\0&14\sqrt{5}&8\sqrt{5}\\0&0&15\sqrt{2}\end{pmatrix}.\]Last modified on May 2, 2021