Linear Algebra – Symmetric matrices and quadratic forms – The singular value decomposition

Not every (square) matrix is diagonalizable. So not every matrix \(A\) can be written as \(A=PDP^{-1}\) with \(P\) an invertible matrix and \(D\) a diagonal matrix. However, symmetric matrices are even orthogonally diagonizable. This means that every symmetric matrix \(A\) can be written as \(A=PDP^T\) for certain orthogonal matrix \(P\) and diagonal matrix \(D\). This (orthogonal) diagonizability has many advantages and applications.

It is possible to write every (not necessary square) matrix \(A\) as \(A=U\Sigma V^T\) for some orthogonal matrices \(U\) and \(V\) and a matrix \(\Sigma\) with the same size as \(A\) that looks like a diagonal matrix. This is called a singular value decomposition of the matrix \(A\).

If \(A\) is a symmetric matrix, then we have for each \(\mathbf{x}\) with \(A\mathbf{x}=\lambda\mathbf{x}\) and \(||\mathbf{x}||=1\) that

\[||A\mathbf{x}||=||\lambda\mathbf{x}||=|\lambda|\,||\mathbf{x}||=|\lambda|.\]

Such a vector \(\mathbf{x}\) is mapped by the linear transformation \(\mathbf{x}\mapsto A\mathbf{x}\) onto a vector with length \(|\lambda|\). So this absolute value of the eigenvalue \(\lambda\) measures the amount that the linear transformation \(\mathbf{x}\mapsto A\mathbf{x}\) stretchtes or shrinks certain vectors (the eigenvectors).

This principle can also be studied for nonsquare matrices. If \(A\) is any \(m\times n\) matrix, then \(\mathbf{x}\mapsto A\mathbf{x}\) is a linear transformation from \(\mathbb{R}^n\) to \(\mathbb{R}^m\). Then we consider the set \(\{\mathbf{x}\in\mathbb{R}^n\,:\,||\mathbf{x}||=1\}\) in \(\mathbb{R}^n\) and study the possible values of \(||A\mathbf{x}||\). Now we have:

\[||A\mathbf{x}||^2=(A\mathbf{x})\cdot(A\mathbf{x})=(A\mathbf{x})^T(A\mathbf{x})=\mathbf{x}^TA^TA\mathbf{x}.\]

This is a quadratic form whose matrix is equal to \(A^TA\). This matrix is symmetric, since: \((A^TA)^T=A^T(A^T)^T=A^TA\). Now we look for the maximum and the minimum value of this quadratic form subject to the constraint that \(||\mathbf{x}||=1\). Hence this is respectively the largest and the smallest eigenvalue of the symmetric matrix \(A^TA\).

Example: See: Lay, §7.4. The matrix \(A=\begin{pmatrix}4&11&14\\8&7&-2\end{pmatrix}\) defines a linear transformation \(\mathbf{x}\mapsto A\mathbf{x}\) from \(\mathbb{R}^3\) tor \(\mathbb{R}^2\). In \(\mathbb{R}^3\) we only consider the unit sphere \(\{\mathbf{x}\in\mathbb{R}^3\,:\,||\mathbf{x}||=1\}\). The matrix \(A^TA\) has the eigenvalues \(\lambda_1=360\), \(\lambda_2=90\) and \(\lambda_3=0\). The corrsesponding orthonormal eigenvectors are:

\[\mathbf{v}_1=\pm\frac{1}{3}\begin{pmatrix}1\\2\\2\end{pmatrix},\quad\mathbf{v}_2=\pm\frac{1}{3}\begin{pmatrix}2\\1\\-2\end{pmatrix} \quad\text{and}\quad\mathbf{v}_3=\pm\frac{1}{3}\begin{pmatrix}2\\-2\\1\end{pmatrix}.\]

So the maximum of \(||A\mathbf{x}||\) subject to the constraint that \(||\mathbf{x}||=1\) is \(\sqrt{360}=6\sqrt{10}\). This maximum is attained if \(\mathbf{x}=\mathbf{v}_1\):

\[A\mathbf{v}_1=\pm\frac{1}{3}\begin{pmatrix}4&11&14\\8&7&-2\end{pmatrix}\begin{pmatrix}1\\2\\2\end{pmatrix} =\pm\frac{1}{3}\begin{pmatrix}4+22+28\\8+14-4\end{pmatrix}=\pm\frac{1}{3}\begin{pmatrix}54\\18\end{pmatrix} =\pm\begin{pmatrix}18\\6\end{pmatrix}.\]

The maximum of \(||A\mathbf{x}||\) subject to the constraints that \(||\mathbf{x}||=1\) and \(\mathbf{x}^T\mathbf{v}_1\) is \(\sqrt{90}=3\sqrt{10}\). This maximum is attained if \(\mathbf{x}=\mathbf{v}_2\):

\[A\mathbf{v}_2=\pm\frac{1}{3}\begin{pmatrix}4&11&14\\8&7&-2\end{pmatrix}\begin{pmatrix}2\\1\\-2\end{pmatrix} =\pm\frac{1}{3}\begin{pmatrix}8+11-28\\16+7+4\end{pmatrix}=\pm\frac{1}{3}\begin{pmatrix}-9\\27\end{pmatrix} =\pm\begin{pmatrix}-3\\9\end{pmatrix}.\]

The set \(\{A\mathbf{x}\,:\,||\mathbf{x}||=1\}\) appears to be an ellipse (the inside and the boundary) with halve axes \(\pm\begin{pmatrix}18\\6\end{pmatrix}\) and \(\pm\begin{pmatrix}-3\\9\end{pmatrix}\). The lengths of these halve axes are respectively \(6\sqrt{10}\) and \(3\sqrt{10}\):

\[||\pm\begin{pmatrix}18\\6\end{pmatrix}||=6||\begin{pmatrix}3\\1\end{pmatrix}||=6\sqrt{10}\quad\text{and}\quad ||\pm\begin{pmatrix}-3\\9\end{pmatrix}||=3||\begin{pmatrix}-1\\3\end{pmatrix}||=3\sqrt{10}.\]

If \(A\) is any \(m\times n\) matrix, then \(A^TA\) is a symmetric \(n\times n\) matrix. So this matrix is orthogonally diagonizable, id est: there exist an orthonormal basis \(\{\mathbf{v}_1,\ldots,\mathbf{v}_n\}\) of \(\mathbb{R}^n\) consisting of eigenvectors of \(A^TA\). Suppose that \(A^TA\mathbf{v}_k=\lambda_k\mathbf{v}_k\) for \(k=1,2,\ldots,n\), then we have:

\[||A\mathbf{v}_k||^2=(A\mathbf{v}_k)^T(A\mathbf{v}_k)=\mathbf{v}_k^TA^TA\mathbf{v}_k=\mathbf{v}_k^T(\lambda_k\mathbf{v}_k) =\lambda_k(\mathbf{v}_k^T\mathbf{v}_k)=\lambda_k,\quad k=1,2,\ldots,n.\]

This implies that all eigenvalues of \(A^TA\) are nonnegative. So these eigenvalues can be arranged as follows: \(\lambda_1\geq\lambda_2\geq\cdots\geq\lambda_n\geq0\).

The singular values of \(A\) are the square roots of the eigenvalues of \(A^TA\). So these are the lengths of the vectors \(A\mathbf{v}_k\) with \(k=1,2,\ldots,n\):

\[\sigma_k:=\sqrt{\lambda_k}=||A\mathbf{v}_k||,\quad k=1,2,\ldots,n.\]

Theorem: Suppose that \(\{\mathbf{v}_1,\ldots,\mathbf{v}_n\}\) is an orthonormal basis of \(\mathbb{R}^n\), consisting of eigenvectors of \(A^TA\) corresponding to the eigenvalues \(\lambda_1,\ldots,\lambda_n\) such that \(\lambda_1\geq\lambda_2\geq\cdots\geq\lambda_n\geq0\). If \(A\) has \(r\) positive singular values, then \(\{A\mathbf{v}_1,\ldots,A\mathbf{v}_r\}\) is an orthogonal basis of \(\text{Col}(A)\) and \(\text{rank}(A)=r\).

Proof: Since \(\{\mathbf{v}_1,\ldots,\mathbf{v}_n\}\) is an orthogonal set, we have that

\[(A\mathbf{v}_i)\cdot(A\mathbf{v}_j)=(A\mathbf{v}_i)^T(A\mathbf{v}_j)=\mathbf{v}_i^TA^TA\mathbf{v}_j =\mathbf{v}_i^T(\lambda_j\mathbf{v}_j)=\lambda_j(\mathbf{v}_i\cdot\mathbf{v}_j)=0,\quad i\neq j.\]

Hence: \(\{A\mathbf{v}_1,\ldots,A\mathbf{v}_n\}\) is an orthogonal set too.

The singular values of \(A\) are the lengths of the vectors \(A\mathbf{v}_1,\ldots,A\mathbf{v}_n\). Since \(A\) has \(r\) positive singular values, and therefore \(n-r\) singular values equal to zero, we have that \(A\mathbf{v}_k\neq\mathbf{0}\) for \(k=1,2,\ldots,r\) and \(A\mathbf{v}_k=\mathbf{0}\) for \(k=r+1,r+2,\ldots,n\). Hence: \(\{A\mathbf{v}_1,\ldots,A\mathbf{v}_r\}\) is a linear independent set of vectors in \(\text{Col}(A)\). Further we have that every vector \(\mathbf{y}\in\text{Col}(A)\) can be written as \(\mathbf{y}=A\mathbf{x}\) for some \(\mathbf{x}\in\mathbb{R}^n\). So we have \(\mathbf{x}=c_1\mathbf{v}_1+\cdots+c_n\mathbf{v}_n\) for some \(c_1,\ldots,c_n\in\mathbb{R}\), since \(\{\mathbf{v}_1,\ldots,\mathbf{v}_n\}\) is a basis of \(\mathbb{R}^n\). Hence:

\[\mathbf{y}=A\mathbf{x}=A(c_1\mathbf{v}_1+\cdots+c_n\mathbf{v}_n)=c_1A\mathbf{v}_1+\cdots+c_rA\mathbf{v}_r.\]

Dus: \(\text{Col}(A)=\text{Span}\{A\mathbf{v}_1,\ldots,A\mathbf{v}_r\}\) and \(\{A\mathbf{v}_1,\ldots,A\mathbf{v}_r\}\) is an orthogonal basis of \(\text{Col}(A)\). Hence: \(\text{rank}(A)=\dim(\text{Col}(A))=r\).

Theorem: Suppose that \(A\) is an \(m\times n\) matrix with \(\text{rank}(A)=r\). Then there exist an orthogonal \(m\times m\) matrix \(U\), an orthogonal \(n\times n\) matrix \(V\) and an \(m\times n\) matrix \(\Sigma\) whose first \(r\) diagonal entries are equal to the singular values \(\sigma_1,\ldots,\sigma_r\) of \(A\) with \(\sigma_1\geq\sigma_2\geq\cdots\geq\sigma_r>0\) and all other entries equal to zero, such that \(A=U\Sigma V^T\).

Every factorization of the form \(A=U\Sigma V^T\) with \(U\) and \(V\) orthogonal matrices, is called a singular value decomposition of the matrix \(A\). Such a factorization is not unique. The proof is constructive:

Proof: Suppose that \(\{\mathbf{v}_1,\ldots,\mathbf{v}_n\}\) is an orthogonal basis of \(\mathbb{R}^n\), consisting of eigenvectors of the matrix \(A^TA\) corrseponding to the eigenvalues \(\lambda_1,\ldots,\lambda_n\) respectively with \(\lambda_1\geq\lambda_2\geq\cdots\geq\lambda_n\geq0\). Since \(\text{rank}(A)=r\) we have that \(\{A\mathbf{v}_1,\ldots,A\mathbf{v}_r\}\) is an orthogonal basis of \(\text{Col}(A)\). The lengths of the vectors \(A\mathbf{v}_1,\ldots,A\mathbf{v}_r\) are the first \(r\) (positive) singular values of \(A\), hence:

\[A\mathbf{v}_k=\sigma_k\mathbf{u}_k,\quad k=1,2,\ldots,r,\]

where \(\{\mathbf{u}_1,\ldots,\mathbf{u}_r\}\) is an orthonormal set in \(\mathbb{R}^m\). If necessary this set can be extended to an orthonormal basis \(\{\mathbf{u}_1,\ldots,\mathbf{u}_m\}\) of \(\mathbb{R}^m\) by adding appropriate unit vectors \(\mathbf{u}_{r+1},\ldots,\mathbf{u}_m\). Now define the matrices

\[U=\Bigg(\mathbf{u}_1\;\ldots\;\mathbf{u}_m\Bigg)\quad\text{and}\quad V=\Bigg(\mathbf{v}_1\;\ldots\;\mathbf{v}_n\Bigg).\]

Then \(U\) is an orthogonal \(m\times m\) matrix and \(V\) is an orthogonal \(n\times n\) matrix. Further we have:

\[AV=A\Bigg(\mathbf{v}_1\;\ldots\;\mathbf{v}_n\Bigg)=\Bigg(A\mathbf{v}_1\;\ldots\;A\mathbf{v}_r\;\mathbf{0}\;\ldots\;\mathbf{0}\Bigg)\]

and

\[U\Sigma=\Bigg(\mathbf{u}_1\;\ldots\;\mathbf{u}_m\Bigg)\begin{pmatrix}\sigma_1&&0&\ldots&0\\&\ddots&\vdots&&\vdots\\ 0&&\sigma_r&0&\ldots&0\\0&\ldots&0&0&\ldots&0\\\vdots&&\vdots&\vdots&\ddots&\vdots\\0&\ldots&0&0&\ldots&0\end{pmatrix} =\Bigg(\sigma_1\mathbf{u}_1\;\ldots\;\sigma_r\mathbf{u}_r\;\mathbf{0}\;\ldots\;\mathbf{0}\Bigg).\]

This impiies that \(AV=U\Sigma\) and since \(V\) is an orthogonal matrix and therefore \(V^{-1}=V^T\) we have: \(A=U\Sigma V^{-1}=U\Sigma V^T\).

Example: For the matrix \(A=\begin{pmatrix}4&11&14\\8&7&-2\end{pmatrix}\) from the first example we choose for instance:

\[\mathbf{v}_1=\frac{1}{3}\begin{pmatrix}1\\2\\2\end{pmatrix},\quad\mathbf{v}_2=\frac{1}{3}\begin{pmatrix}2\\1\\-2\end{pmatrix} \quad\text{and}\quad\mathbf{v}_3=\frac{1}{3}\begin{pmatrix}2\\-2\\1\end{pmatrix}.\]

Then we have:

\[A\mathbf{v}_1=\begin{pmatrix}18\\6\end{pmatrix}=6\sqrt{10}\cdot\frac{1}{\sqrt{10}}\begin{pmatrix}3\\1\end{pmatrix} =\sigma_1\mathbf{u}_1\quad\Longrightarrow\quad\mathbf{u}_1=\frac{1}{\sqrt{10}}\begin{pmatrix}3\\1\end{pmatrix}\]

and

\[A\mathbf{v}_2=\begin{pmatrix}-3\\9\end{pmatrix}=3\sqrt{10}\cdot\frac{1}{\sqrt{10}}\begin{pmatrix}1\\-3\end{pmatrix} =\sigma_2\mathbf{u}_2\quad\Longrightarrow\quad\mathbf{u}_2=\frac{1}{\sqrt{10}}\begin{pmatrix}1\\-3\end{pmatrix}.\]

Then we have that \(A=U\Sigma V^T\) with

\[U=\frac{1}{\sqrt{10}}\begin{pmatrix}3&1\\1&-3\end{pmatrix},\quad\Sigma=\begin{pmatrix}6\sqrt{10}&0&0\\0&3\sqrt{10}&0\end{pmatrix} \quad\text{and}\quad V=\frac{1}{3}\begin{pmatrix}1&2&2\\2&1&-2\\2&-2&1\end{pmatrix}.\]

Note that \(\{\mathbf{u}_1,\mathbf{u}_2\}\) is already a basis of \(\mathbb{R}^2\). So there is no need to extend it.

If \(A=U\Sigma V^T\) is a singular value decomposition of \(A\), then \(A^T=(U\Sigma V^T)^T=V\Sigma U^T\) is a singular value decomposition of \(A^T\). In the example above \(A^TA\) is a \(3\times3\) matrix, while \(AA^T\) is a (2\times 2\) matrix. So it is somewhat easier to determine the singular values from the eigenvalues of \(A^T\). Then there are two of them: \(\sigma_1=6\sqrt{10}\) and \(\sigma_2=3\sqrt{10}\). Then the roles of the matrices \(U\) and \(V\) are interchanged, where only two columns are obtained for the orthogonal \(3\times 3\) matrix. The third column must then be constructed. We will also see this in the next example.

Example: If \(A=\begin{pmatrix}1&1\\1&0\\0&1\end{pmatrix}\), then we have \(A^TA=\begin{pmatrix}2&1\\1&2\end{pmatrix}\) with eigenvalues \(\lambda_1=3\) and \(\lambda_2=1\). Hence: \(\sigma_1=\sqrt{3}\) and \(\sigma_2=1\). Further we easily obtain that \(\mathbf{v}_1=\dfrac{1}{\sqrt{2}}\begin{pmatrix}1\\1\end{pmatrix}\) and \(\mathbf{v}_2=\dfrac{1}{\sqrt{2}}\begin{pmatrix}1\\-1\end{pmatrix}\) are two orthonormal eigenvectors of \(A^TA\) corresponding to the eigenvalues \(\lambda_1=3\) and \(\lambda_2=1\) respectively. Next we obtain:

\[A\mathbf{v}_1=\frac{1}{\sqrt{2}}\begin{pmatrix}1&1\\1&0\\0&1\end{pmatrix}\begin{pmatrix}1\\1\end{pmatrix} =\frac{1}{\sqrt{2}}\begin{pmatrix}2\\1\\1\end{pmatrix}=\sqrt{3}\cdot\frac{1}{\sqrt{6}}\begin{pmatrix}2\\1\\1\end{pmatrix} =\sigma_1\mathbf{u}_1\quad\Longrightarrow\quad\mathbf{u}_1=\frac{1}{\sqrt{6}}\begin{pmatrix}2\\1\\1\end{pmatrix}\]

and

\[A\mathbf{v}_2=\frac{1}{\sqrt{2}}\begin{pmatrix}1&1\\1&0\\0&1\end{pmatrix}\begin{pmatrix}1\\-1\end{pmatrix} =\frac{1}{\sqrt{2}}\begin{pmatrix}0\\-1\\1\end{pmatrix}=1\cdot\frac{1}{\sqrt{2}}\begin{pmatrix}0\\-1\\1\end{pmatrix} =\sigma_2\mathbf{u}_2\quad\Longrightarrow\quad\mathbf{u}_2=\frac{1}{\sqrt{2}}\begin{pmatrix}0\\-1\\1\end{pmatrix}.\]

Now we have that \(A=U\Sigma V^T\) with

\[U=\begin{pmatrix}2/\sqrt{6}&0&.\;\\2/\sqrt{6}&-1/\sqrt{2}&.\;\\1/\sqrt{6}&1/\sqrt{2}&.\;\end{pmatrix},\quad\Sigma =\begin{pmatrix}\sqrt{3}&0\\0&1\\0&0\end{pmatrix}\quad\text{en}\quad V=\frac{1}{\sqrt{2}}\begin{pmatrix}1&1\\1&-1\end{pmatrix}.\]

Now the third column of \(U\) is in fact arbitrary, however \(U\) should be an orthogonal matrix. So we have to choose the third column such that its length is \(1\) and is orthogonal with respect to the first two columns. Suppose that this third column is equal to \(\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}\), then we should have that \(2x_1+x_2+x_3=0\) and \(-x_2+x_3=0\). This implies that \(-x_1=x_2=x_3\). Since its length should be \(1\), we choose for instance \(\dfrac{1}{\sqrt{3}}\begin{pmatrix}-1\\1\\1\end{pmatrix}\). For the matrix \(U\) we then have:

\[U=\begin{pmatrix}2/\sqrt{6}&0&-1/\sqrt{3}\\2/\sqrt{6}&-1/\sqrt{2}&1/\sqrt{3}\\1/\sqrt{6}&1/\sqrt{2}&1/\sqrt{3}\end{pmatrix} =\frac{1}{\sqrt{6}}\begin{pmatrix}2&0&-\sqrt{2}\\1&-\sqrt{3}&\sqrt{2}\\1&\sqrt{3}&\sqrt{2}\end{pmatrix}.\]
Last modified on May 5, 2021
© Roelof Koekoek

Metamenu