Linear Algebra – Eigenvalues and eigenvectors – Iterative estimates for eigenvalues

We study two iterative methods for estimating the eigenvalues of a matrix. The first method is the power method and the second one is the inverse power method. This second one uses the first one.

The power method can be used to find a strictly dominant eigenvalue \(\lambda_1\) of an \(n\times n\) matrix \(A\). An eigenvalue \(\lambda_1\) is called a (strictly) dominant eigenvalue if this one is in absolute value (strictly) larger than the other eigenvalues. Let us assume that \(A\) is diagonizable which implies that there exists a basis \(\{\mathbf{v}_1,\ldots,\mathbf{v}_n\}\) of \(\mathbb{R}^n\) consisting of eigenvectors of \(A\) corresponding to the eigenvalues \(\lambda_1,\ldots,\lambda_n\) respectively, where

\[|\lambda_1| > |\lambda_2|\geq|\lambda_3|\geq\cdots\geq|\lambda_n|.\]

Hence, \(\lambda_1\) is a strict dominant eigenvalue of \(A\). Since \(\{\mathbf{v}_1,\ldots,\mathbf{v}_n\}\) is a basis of \(\mathbb{R}^n\), each vector \(\mathbf{x}\in\mathbb{R}^n\) can be written as a linear combination of these eigenvectors: \(\mathbf{x}=c_1\mathbf{v}_1+\cdots+c_n\mathbf{v}_n\). Since \(A\mathbf{v}_k=\lambda_k\mathbf{v}_k\) for \(k=1,2,\ldots,n\) this implies that

\[A^k\mathbf{x}=c_1\lambda_1^k\mathbf{v}_1+c_1\lambda_2^k\mathbf{v}_2+\cdots+c_n\lambda_n^k\mathbf{v}_n,\quad k=0,1,2,\ldots.\]

Now division by \(\lambda_1^k\) leads to

\[\frac{1}{\lambda_1^k}A^k\mathbf{x}=c_1\mathbf{v}_1+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\mathbf{v}_2+\cdots+c_n\left(\frac{\lambda_n}{\lambda_1}\right)^k\mathbf{v}_n, \quad k=0,1,2,\ldots.\]

Now we have \(\left|\lambda_i/\lambda_1\right| < 1\) for \(i=2,3,\ldots,n\) and therefore \(\displaystyle\frac{1}{\lambda_1^k}A^k\mathbf{x}\to c_1\mathbf{v}_1\) for \(k\to\infty\).

This observation leads to the following algorithm (details will be skipped):

  1. Select an initial vector \(\mathbf{x}_0\) whose (in absolute value) largest entry is \(1\).

  2. For \(k=0,1,2,\ldots\)
    1. Compute \(A\mathbf{x}_k\),

    2. Let \(\mu_k\) be an entry in \(A\mathbf{x}_k\) whose absolute value is the largest,

    3. Compute \(\mathbf{x}_{k+1}=\displaystyle\frac{1}{\mu_k}A\mathbf{x}_k\).

  3. For almost all choices of \(\mathbf{x}_0\), the sequence \(\{\mu_k\}\) approaches the dominant eigenvalue and the sequence \(\{\mathbf{x}_k\}\) approaches a corresponding eigenvector.

Example: The matrix \(A=\begin{pmatrix}1&4\\-4&11\end{pmatrix}\) has the eigenvalues \(\lambda_1=9\) and \(\lambda_2=3\) with eigenspaces \(\text{E}_9=\text{Span}\left\{\begin{pmatrix}1\\2\end{pmatrix}\right\}\) and \(\text{E}_3=\text{Span}\left\{\begin{pmatrix}2\\1\end{pmatrix}\right\}\).

Let \(\mathbf{x}_0=\begin{pmatrix}1\\0\end{pmatrix}\), then the algorithm leads to

\[A\mathbf{x}_0=\begin{pmatrix}1&4\\-4&11\end{pmatrix}\begin{pmatrix}1\\0\end{pmatrix}=\begin{pmatrix}1\\-4\end{pmatrix} \quad\Longrightarrow\quad\mu_0=-4\quad\text{and}\quad\mathbf{x}_1=\begin{pmatrix}-0.25\\1\end{pmatrix},\] \[A\mathbf{x}_1=\begin{pmatrix}1&4\\-4&11\end{pmatrix}\begin{pmatrix}-0.25\\1\end{pmatrix}=\begin{pmatrix}3.75\\12\end{pmatrix} \quad\Longrightarrow\quad\mu_1=12\quad\text{and}\quad\mathbf{x}_2=\begin{pmatrix}0.3125\\1\end{pmatrix},\] \[A\mathbf{x}_2=\begin{pmatrix}1&4\\-4&11\end{pmatrix}\begin{pmatrix}0.3125\\1\end{pmatrix}=\begin{pmatrix}4.3125\\9.75\end{pmatrix} \quad\Longrightarrow\quad\mu_2=9.75\quad\text{and}\quad\mathbf{x}_3\approx\begin{pmatrix}0.44231\\1\end{pmatrix},\] \[A\mathbf{x}_3=\begin{pmatrix}1&4\\-4&11\end{pmatrix}\begin{pmatrix}0.44231\\1\end{pmatrix}=\begin{pmatrix}4.44231\\9.231\end{pmatrix} \quad\Longrightarrow\quad\mu_2\approx9.231\quad\text{and}\quad\mathbf{x}_4\approx\begin{pmatrix}0.448125\\1\end{pmatrix}.\]

Although the convergence is rather slow we have: \(\lim\limits_{k\to\infty}\mu_k=9\) and \(\lim\limits_{k\to\infty}\mathbf{x}_k=\begin{pmatrix}0.5\\1\end{pmatrix}\).

The convergence is somewhat faster if the starting vector is chosen more in the direction of this eigenvector. For instance, if \(\mathbf{x}_0=\begin{pmatrix}0\\1\end{pmatrix}\), then we have:

\[A\mathbf{x}_0=\begin{pmatrix}1&4\\-4&11\end{pmatrix}\begin{pmatrix}0\\1\end{pmatrix}=\begin{pmatrix}4\\11\end{pmatrix} \quad\Longrightarrow\quad\mu_0=11\quad\text{and}\quad\mathbf{x}_1\approx\begin{pmatrix}0.364\\1\end{pmatrix},\] \[A\mathbf{x}_1=\begin{pmatrix}1&4\\-4&11\end{pmatrix}\begin{pmatrix}0.364\\1\end{pmatrix}=\begin{pmatrix}4.364\\9.55\end{pmatrix} \quad\Longrightarrow\quad\mu_1\approx9.55\quad\text{and}\quad\mathbf{x}_2\approx\begin{pmatrix}0.457\\1\end{pmatrix},\] \[A\mathbf{x}_2=\begin{pmatrix}1&4\\-4&11\end{pmatrix}\begin{pmatrix}0.457\\1\end{pmatrix}=\begin{pmatrix}4.457\\9.171\end{pmatrix} \quad\Longrightarrow\quad\mu_2\approx9.171\quad\text{and}\quad\mathbf{x}_3\approx\begin{pmatrix}0.486\\1\end{pmatrix},\] \[A\mathbf{x}_3=\begin{pmatrix}1&4\\-4&11\end{pmatrix}\begin{pmatrix}0.486\\1\end{pmatrix}=\begin{pmatrix}4.486\\9.056\end{pmatrix} \quad\Longrightarrow\quad\mu_2\approx9.056\quad\text{and}\quad\mathbf{x}_4\approx\begin{pmatrix}0.4954\\1\end{pmatrix}.\]

Applying this power method to another matrix we will be able to estimate other eigenvalues of \(A\) as well. Suppose that \(\alpha\in\mathbb{R}\) is a reasonable estimation of one of the eigenvalues \(\lambda_1,\ldots,\lambda_n\) of \(A\). Now let \(B=(A-\alpha I)^{-1}\), then the eigenvalues of \(B\) are:

\[\frac{1}{\lambda_1-\alpha},\;\frac{1}{\lambda_2-\alpha},\;\ldots,\;\frac{1}{\lambda_n-\alpha}.\]

If \(\alpha\) is chosen well enough, then there exists one eigenvalue for which the denominator \(\lambda_p-\alpha\) is the smallest in absolute value. Then this eigenvalue \(\displaystyle\frac{1}{\lambda_p-\alpha}\) is a strict dominant eigenvalue of \(B\). If we apply the power method to \(B\), then we obtain an estimation of this strict dominant eigenvalue \(\displaystyle\frac{1}{\lambda_p-\alpha}\). Since the value of \(\alpha\) is known, this leads to an estimation of the eigenvalue \(\lambda_p\) of \(A\). This is called the inverse power method.

However, there is a small problem when we have to compute \(B\mathbf{x}_k=(A-\alpha I)^{-1}\mathbf{x}_k\) because we need the inverse of the matrix \(A-\alpha I\) for this. However, instead of computing \(\mathbf{y}_k=(A-\alpha I)^{-1}\mathbf{x}_k\) we try to solve \((A-\alpha I)\mathbf{y}_k=\mathbf{x}_k\) (for instance using the \(LU\) decomposition).

This observation leads to the following algorithm (details will be skipped):

  1. Select an initial estimate \(\alpha\) sufficiently close to an eigenvalue \(\lambda\) of \(A\).

  2. Select an initial vector \(\mathbf{x}_0\) whose (in absolute value) largest entry is \(1\).

  3. For \(k=0,1,2,\ldots\)
    1. Solve \((A-\alpha I)\mathbf{y}_k=\mathbf{x}_k\),

    2. Let \(\mu_k\) be an entry in \(\mathbf{y}_k\) whose absolute value is the largest,

    3. Compute \(\nu_k=\alpha+\displaystyle\frac{1}{\mu_k}\),

    4. Compute \(\mathbf{x}_{k+1}=\displaystyle\frac{1}{\mu_k}\mathbf{y}_k\).

  4. For almost all choices of \(\mathbf{x}_0\), the sequence \(\{\nu_k\}\) approaches the eigenvalue \(\lambda\) of \(A\) and the sequence \(\{\mathbf{x}_k\}\) approaches a corresponding eigenvector.

Example: Let's choose \(\alpha=2\) as a rough estimate of the eigenvalue \(\lambda_2=3\) of the matrix \(A=\begin{pmatrix}1&4\\-4&11\end{pmatrix}\). Then we have \(A-2I=\begin{pmatrix}-1&4\\-4&9\end{pmatrix}\).

Let \(\mathbf{x}_0=\begin{pmatrix}1\\0\end{pmatrix}\), then the algorithm leads to

\[(A-2I)\mathbf{y}_0=\mathbf{x}_0:\;\;\left(\left.\begin{matrix}-1&4\\-4&9\end{matrix}\,\right|\,\begin{matrix}1\\0\end{matrix}\right) \;\;\Longrightarrow\;\;\mathbf{y}_0=\begin{pmatrix}1.286\\0.571\end{pmatrix}\;\;\Longrightarrow\;\; \mu_0\approx1.286,\;\;\nu_0=2+\frac{1}{\mu_0}\approx2.78\;\;\text{and}\;\;\mathbf{x}_1\approx\begin{pmatrix}1\\0.44\end{pmatrix},\] \[(A-2I)\mathbf{y}_1=\mathbf{x}_1:\;\;\left(\left.\begin{matrix}-1&4\\-4&9\end{matrix}\,\right|\,\begin{matrix}1\\0.44\end{matrix}\right) \;\;\Longrightarrow\;\;\mathbf{y}_1=\begin{pmatrix}1.033\\0.508\end{pmatrix}\;\;\Longrightarrow\;\; \mu_1\approx1.033,\;\;\nu_1=2+\frac{1}{\mu_1}\approx2.969\;\;\text{and}\;\;\mathbf{x}_1\approx\begin{pmatrix}1\\0.492\end{pmatrix},\] \[(A-2I)\mathbf{y}_2=\mathbf{x}_2:\;\;\left(\left.\begin{matrix}-1&4\\-4&9\end{matrix}\,\right|\,\begin{matrix}1\\0.492\end{matrix}\right) \;\;\Longrightarrow\;\;\mathbf{y}_2=\begin{pmatrix}1.004\\0.501\end{pmatrix}\;\;\Longrightarrow\;\; \mu_2\approx1.004,\;\;\nu_2=2+\frac{1}{\mu_2}\approx2.996\;\;\text{and}\;\;\mathbf{x}_2\approx\begin{pmatrix}1\\0.499\end{pmatrix}.\]

We have: \(\lim\limits_{k\to\infty}\nu_k=3\) and \(\lim\limits_{k\to\infty}\mathbf{x}_k=\begin{pmatrix}1\\0.5\end{pmatrix}\).


Last modified on April 5, 2021
© Roelof Koekoek

Metamenu