Linear Algebra

Teaching

Linear Algebra
- Orthogonality and least squares

Linear Algebra – Orthogonality and least squares – Orthogonal sets

Definition: If \(W\) is a subspace of \(\mathbb{R}^n\), then we have: \(\mathbf{v}\perp W\;\;\Longleftrightarrow\;\;\mathbf{v}\perp\mathbf{w}\) for each \(\mathbf{w}\in W\) and \(W^{\perp}=\{\mathbf{v}\in\mathbb{R}^n\,|\,\mathbf{v}\perp W\}\) is called the orthogonal complement of the subspace \(W\).

It is easy to see that \(W^{\perp}\) is a subspace of \(\mathbb{R}^n\) as well.

Theorem: If \(A\) is an \(m\times n\) matrix, then we have: \((\text{Row}(A))^{\perp}=\text{Nul}(A)\) and \((\text{Col}(A))^{\perp}=\text{Nul}(A^T)\).

Here \(\text{Row}(A)\) is the row space of the matrix \(A\). This is the span of the rows of \(A\), where these rows are considered as vectors in \(\mathbb{R}^n\) (beacause the matrix \(A\) has \(n\) columns). Then the proof of the theorem is easy:

Proof: \(\text{Nul}(A)\) is the set of all vectors \(\mathbf{x}\in\mathbb{R}^n\) such that \(A\mathbf{x}=\mathbf{0}\). This implies that the inner product of \(\mathbf{x}\) and every row of \(A\) is equal to zero. Hence: \(\mathbf{x}\perp\text{Row}(A)\). This holds for each vector \(\mathbf{x}\in\text{Nul}(A)\), so: \((\text{Row}(A))^{\perp}=\text{Nul}(A)\).

The other statement then follows by replacing the matrix \(A\) by \(A^T\) and noting that \(\text{Row}(A^T)=\text{Col}(A)\).

Definition: A set of vectors \(\{\mathbf{v}_1,\mathbf{v}_2,\ldots,\mathbf{v}_p\}\) in \(\mathbb{R}^n\) is called an orthogonal set if \(\mathbf{v}_i\perp\mathbf{v}_j\) for all \(i\neq j\) or equivalently \(\mathbf{v}_i\cdot\mathbf{v}_j=0\) for all \(i\neq j\).

Example: The set \(\{\mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\}\) in \(\mathbb{R}^4\) with \(\mathbf{v}_1=\begin{pmatrix}1\\0\\-1\\2\end{pmatrix}\), \(\mathbf{v}_2=\begin{pmatrix}0\\1\\2\\1\end{pmatrix}\) and \(\mathbf{v}_3=\begin{pmatrix}2\\1\\0\\-1\end{pmatrix}\) is an orthogonal set, since:

\[\mathbf{v}_1\cdot\mathbf{v}_2=0+0-2+2=0,\quad\mathbf{v}_1\cdot\mathbf{v}_3=2+0-0-2=0\quad\text{and}\quad\mathbf{v}_2\cdot\mathbf{v}_3=0+1+0-1=0.\]

Theorem: An orthogonal set \(\{\mathbf{v}_1,\mathbf{v}_2,\ldots,\mathbf{v}_p\}\) in \(\mathbb{R}^n\) without the zero vector is linear independent.

Proof: Suppose that \(c_1\mathbf{v}_1+c_2\mathbf{v}_2+\cdots+c_p\mathbf{v}_p=\mathbf{0}\), then we have:

\[0=\mathbf{0}\cdot\mathbf{v}_i=\left(c_1\mathbf{v}_1+c_2\mathbf{v}_2+\cdots+c_p\mathbf{v}_p\right)\cdot\mathbf{v}_i =c_1\left(\mathbf{v}_1\cdot\mathbf{v}_i\right)+c_2\left(\mathbf{v}_2\cdot\mathbf{v}_i\right)+c_p\left(\mathbf{v}_p\cdot\mathbf{v}_i\right) =c_i(\mathbf{v}_i\cdot\mathbf{v}_i),\quad i=1,2,\ldots,p,\]

since \(\mathbf{v}_i\cdot\mathbf{v}_j=0\) for all \(i\neq j\). Since the set does not contain the zero vector, we have that \(\mathbf{v}_i\cdot\mathbf{v}_i\neq 0\) for all \(i=1,2,\ldots,p\). Hence: \(c_i=0\) for all \(i=1,2,\ldots,p\). This implies that the set \(\{\mathbf{v}_1,\mathbf{v}_2,\ldots,\mathbf{v}_p\}\) is linear independent.

Definition: A basis of a subspace \(W\) of \(\mathbb{R}^n\) which is also an orthogonal set, is called an orthogonal basis of the subspace \(W\).

Theorem: If \(\{\mathbf{v}_1,\ldots,\mathbf{v}_p\}\) is an orthogonal basis of a subspace \(W\) of \(\mathbb{R}^n\), then we have for each vector \(\mathbf{y}\in W\):

\[\mathbf{y}=c_1\mathbf{v}_1+\cdots+c_p\mathbf{v}_p\quad\text{with}\quad c_i=\frac{\mathbf{y}\cdot\mathbf{v}_i}{\mathbf{v}_i\cdot\mathbf{v}_i},\quad i=1,2,\ldots,p.\]

Proof: We have that \(\mathbf{y}\in W=\text{Span}\{\mathbf{v}_1,\ldots,\mathbf{v}_p\}\), so: \(\mathbf{y}=c_1\mathbf{v}_1+\cdots+c_p\mathbf{v}_p\). Since \(\{\mathbf{v}_1,\ldots,\mathbf{v}_p\}\) is an orthogonal set, we now have:

\[\mathbf{y}\cdot\mathbf{v}_i=\left(c_1\mathbf{v}_1+\cdots+c_p\mathbf{v}_p\right)\cdot\mathbf{v}_i =c_1(\mathbf{v}_1\cdot\mathbf{v}_i)+\cdots+c_p(\mathbf{v}_p\cdot\mathbf{v}_i)=c_i(\mathbf{v}_i\cdot\mathbf{v}_i),\quad i=1,2,\ldots,p.\]

Since \(\{\mathbf{v}_1,\ldots,\mathbf{v}_p\}\) does not contain the zero vector (because it is a basis and therefor linear independent) this implies that \(c_i=\displaystyle\frac{\mathbf{y}\cdot\mathbf{v}_i}{\mathbf{v}_i\cdot\mathbf{v}_i}\) for \(i=1,2,\ldots,p\).

Example: If \(\mathbf{y}=\begin{pmatrix}1\\2\\3\end{pmatrix}\), \(\mathbf{v}_1=\begin{pmatrix}2\\-1\\2\end{pmatrix}\), \(\mathbf{v}_2=\begin{pmatrix}2\\2\\-1\end{pmatrix}\) and \(\mathbf{v}_3=\begin{pmatrix}-1\\2\\2\end{pmatrix}\), then \(\{\mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\}\) is an othogonal basis of \(\mathbb{R}^3\), because:

\[\mathbf{v}_1\cdot\mathbf{v}_2=4-2-2=0,\quad\mathbf{v}_1\cdot\mathbf{v}_3=-2-2+4=0\quad\text{and}\quad \mathbf{v}_2\cdot\mathbf{v}_3=-2+4-2=0.\]

Further we have: \(\mathbf{v}_1\cdot\mathbf{v}_1=\mathbf{v}_2\cdot\mathbf{v}_2=\mathbf{v}_3\cdot\mathbf{v}_3=1+4+4=9\) en

\[\mathbf{y}\cdot\mathbf{v}_1=2-2+6=6,\quad\mathbf{y}\cdot\mathbf{v}_2=2+4-3=3\quad\text{and}\quad\mathbf{y}\cdot\mathbf{v}_3=-1+4+6=9.\]

Hence: \(\mathbf{y}=\displaystyle\frac{\mathbf{y}\cdot\mathbf{v}_1}{\mathbf{v}_1\cdot\mathbf{v}_1}\mathbf{v}_1 +\frac{\mathbf{y}\cdot\mathbf{v}_2}{\mathbf{v}_2\cdot\mathbf{v}_2}\mathbf{v}_2 +\frac{\mathbf{y}\cdot\mathbf{v}_3}{\mathbf{v}_3\cdot\mathbf{v}_3}\mathbf{v}_3=\frac{6}{9}\mathbf{v}_1+\frac{3}{9}\mathbf{v}_2+\frac{9}{9}\mathbf{v}_3 =\frac{2}{3}\mathbf{v}_1+\frac{1}{3}\mathbf{v}_2+\mathbf{v}_3\).

Definition: If \(\mathbf{y}\) and \(\mathbf{v}\) are two vectors in \(\mathbb{R}^n\) such that \(\{\mathbf{y},\mathbf{v}\}\) is linear independent, then the vector \(\displaystyle\left(\frac{\mathbf{y}\cdot\mathbf{v}}{\mathbf{v}\cdot\mathbf{v}}\right)\mathbf{v}\) is called the (orthogonal) projection of the vector \(\mathbf{y}\) onto or along the vector \(\mathbf{v}\).

If \(\alpha\mathbf{v}\) is the (orthogonal) projection of \(\mathbf{y}\) onto \(\mathbf{v}\), then we have that \(\mathbf{y}-\alpha\mathbf{v}\perp\mathbf{v}\) or equivalently \((\mathbf{y}-\alpha\mathbf{v})\cdot\mathbf{v}=0\). This implies that \(\mathbf{y}\cdot\mathbf{v}-\alpha(\mathbf{v}\cdot\mathbf{v})=0\). Since \(\mathbf{v}\neq\mathbf{0}\), this implies that \(\alpha=\displaystyle\frac{\mathbf{y}\cdot\mathbf{v}}{\mathbf{v}\cdot\mathbf{v}}\).

The vector \(\mathbf{y}-\displaystyle\left(\frac{\mathbf{y}\cdot\mathbf{v}}{\mathbf{v}\cdot\mathbf{v}}\right)\mathbf{v}\) is called the component of \(\mathbf{y}\) orthogonal to \(\mathbf{v}\).

Example: The (orthogonal) projection of \(\mathbf{y}=\begin{pmatrix}5\\-4\\7\end{pmatrix}\) onto \(\mathbf{v}=\begin{pmatrix}1\\3\\2\end{pmatrix}\) is: \(\displaystyle\left(\frac{\mathbf{y}\cdot\mathbf{v}}{\mathbf{v}\cdot\mathbf{v}}\right)\mathbf{v}=\frac{5-12+14}{1+9+4}\mathbf{v} =\frac{7}{14}\begin{pmatrix}1\\3\\2\end{pmatrix}=\frac{1}{2}\begin{pmatrix}1\\3\\2\end{pmatrix}\).

Definition: An orthogonal set \(\{\mathbf{u}_1,\ldots,\mathbf{u}_p\}\) in \(\mathbb{R}^n\) consisting of unit vectors (so: \(||\mathbf{u}_i||=1\) for all \(i=1,2,\ldots,p\)) is called an orthonormal set.

Remark: in an orthonormal set all vectors are perpendicular (orthogonal) and have length \(1\) (normalized).

Theorem: An \(m\times n\) matrix \(U\) has orthonormal colums if and only if \(U^TU=I\).

Proof: Suppose that \(U=\Bigg(\mathbf{u}_1,\;\ldots,\;\mathbf{u}_n\Bigg)\) with \(\mathbf{u}_i\cdot\mathbf{u}_j=0\) if \(i\neq j\) and \(\mathbf{u}_i\cdot\mathbf{u}_i=1\) for all \(i=1,2,\ldots,n\). This implies that

\[U^TU=\Bigg(\quad\begin{matrix}\mathbf{u}_1^T\\\vdots\\\mathbf{u}_n^T\end{matrix}\quad\Bigg) \Bigg(\mathbf{u}_1,\;\ldots,\;\mathbf{u}_n\Bigg)=\begin{pmatrix}1&0&\ldots&0&0\\0&1&&0&0\\\vdots&&\ddots&&\vdots\\ 0&0&&1&0\\0&0&\ldots&0&1\end{pmatrix}=I.\]

Theorem: If \(U\) is an \(m\times n\) matrix with orthonormal colums (so: \(U^TU=I\)), then we have:

\(||U\mathbf{x}||=||\mathbf{x}||\) for all \(\mathbf{x}\in\mathbb{R}^n\),

\((U\mathbf{x})\cdot(U\mathbf{y})=\mathbf{x}\cdot\mathbf{y}\) for all \(\mathbf{x},\mathbf{y}\in\mathbb{R}^n\),

\(U\mathbf{x}\perp U\mathbf{y}=\mathbf{x}\perp\mathbf{y}\) for all \(\mathbf{x},\mathbf{y}\in\mathbb{R}^n\).

Proof: We have: \((U\mathbf{x})\cdot(U\mathbf{y})=(U\mathbf{x})^T(U\mathbf{y})=\mathbf{x}^TU^TU\mathbf{y} =\mathbf{x}^TI\mathbf{y}=\mathbf{x}^T\mathbf{y}=\mathbf{x}\cdot\mathbf{y}\). The other two properties follow from this.

Definition: An orthogonal matrix is a square matrix with orthonormal columns.

Corollary: So for an orthogonal matrix we have: \(U^{-1}=U^T\). Hence, the inverse is equal to the transpose of the matrix. Further we have:

\[1=\det(I)=\det(U^TU)=\det(U^T)\det(U)=\left(\det(U)\right)^2\quad\Longrightarrow\quad\det(U)=\pm1.\]

Last modified on May 1, 2021

Author: Roelof Koekoek

Teaching

Linear Algebra – Orthogonality and least squares – Orthogonal sets

Metamenu

Roelof Koekoek

Roelof Koekoek

Roelof Koekoek