FUNCTIONS OF MATRICES

By Herman Schoenfeld

Copyright (c) 2006

In this article we shall derive an explicit formula for the evaluation

of arbitrary functions of a square matrix. The formula for a matrix

function is given in terms of the equivalent function defined for a

scalar parameter.

The mechanism which allows us to achieve this is the Maclaurin series

expansion of scalar functions, having form

[1] f(x) = SUM(n=0..INF) k_n x^n

given a set of constants k_n.

By simple substitution of x with some arbitrary square MxM matrix A, we

can define the matrix-continuation of [1] as

[2] f(A) = SUM(n=0..INF) k_n A^n

This substitution is perfectly valid since the natural-numbered powers

of square matrices always exist. By convention we define

[3] A^0 = I

CONVERGENCE OF MATRIX SUMS

A sequence of matrices (of common order)

[4] B = SUM(n) A_n

converges to B so long as the elements of B also converge. That is,

[5] B_ij = SUM(n) (A_n)_ij

Now, consider the Maclaurin series expansion of a scalar function as

seen in [1] and its matrix continuation as seen in [2]. If [1]

converges all |x| < N then [2] converges for all (square) matrices A

that have all its eigenvalues |e| < N.

We can see that this is true if we understand that a function of matrix

can be given in terms of the same function of its eigenvalues. If the

function diverges for the eigenvalue, then it diverges for the matrix

containing those eigenvalues.

FORMULA FOR FUNCTIONS OF A MATRIX

If we attempt to derive a formula for a matrix function using the

Maclaurin matrix expansion as seen in [2], we will get results which

are, in most cases, incalculable with a computer and of little value

otherwise.

Rather, we are able to take a different approach, one which makes use

of the scalar version of the matrix function.

We do this by using an important result deriving from the

Cayley-Hamilton theorem which tells us that a function of an MxM matrix

expands as a matrix polynomial of degree (M-1)

[6] f(A) = SUM(n=0..M-1) k_n A^n

If we solve for the constants k_n in terms of the scalar function f(x),

we can get a meaningful result for the matrix function f(A) by simply

evaluating the sum in [6].

Luckily, we can solve for these constants by making use of a related

Cayley-Hamilton result. It turns out that the scalar version of [6] is

true for the same constants k_n so long as the parameter to the

function is an eigenvalue e of A . In other words,

[7] f(e) = SUM(n=0..M-1) k_n e^n

Equation [7] is useful to us because it allows us to solve the

constants k_n in terms of the scalar function f.

Now, if A is an MxM matrix, it follows from the characteristic

polynomial of A that A has M eigenvalues (not necessarily all

distinct). This means that we have M solutions for equation [7], giving

us a system of simultaneous linear equations

[8] [ f(e_1) ] [ e_1^0 ... e_1^(M-1) ] [ k0 ]

[ ... ] = [ ... ... ... ] * [ ... ]

[ f(e_m) ] [ e_m^0 ... e_M^(M-1) ] [ k_(M-1) ]

Equation [8] only represents a system of linear equations if each

equation is actually unique. Since some matrices have eigenvalues which

repeat (i.e. multiplicity > 1), then [8] is generally not a set of M

distinct linear equations.

Luckily, we can make [8] a distinct set of linear equations by

replacing all the duplicate linear equations with the derivatives of

the original linear equation. More specifically, if the subset of

eigenvalues C=(e_i, e_(i+1),...) are all the same, then for all 0 < j <

|C|, replace the (i+j)'th linear equation with the j'th derivative of

the i'th linear equation.

For example, suppose that for some parameter matrix all the eigenvalues

are the same. In that case, equation [8] becomes

[9] [ f(e_1) ] [ e_1^0 ... e_1^(M-1) ] [ k0 ]

[ ... ] = [ ... ... ... ] * [ ... ]

[ f^m(e_1) ] [ 0 ... 1 ] [ k_(M-1) ]

In some cases [8] will have only one eigenvalue with multiplicity

greater than 1, in other cases there may be multiple eigenvalues with

multiplicity greater than 1. The objective is to construct a set of

unique linear equations from the eigenvalues.

Let the square matrix containing the eigenvalues, reduced if necessary

to this unique linear equation form, be denoted B and the corresponding

column vector containing the scalar function of the eigenvalues be

denoted F. We will denote the column vector containing the constants C.

We restate the system of linear equations as

[10] F = B C

Now, by use of Cramer's rule we solve for the constants k_n with

[11] k_n = det(B_n) / det(B)

where B_n is the matrix formed by replacing the n'th column of B with

C.

Having solved for constants k_n in terms of f(e), we proceed to solve

for matrix function by substitution of [11] into [6], giving us

[12] f(A) = SUM(n=0..M-1) A^n det(B_n)/det(B)

ALTERNATIVE FORMULA

There is a theorem which tells that every square matrix A is similar to

a matrix J in Jordan-Canonical form. Thus, if we restate A as

[13] A = M J M^(-1)

where M is a modal matrix for A and J is in Jordan Canonical Form, then

we can use another theorem which tells us that a function of matrix

simplifies to

[14] f(A) = M f(J) M^(-1)

If J is in diagonal form, then

[15] f(J) = [ f(e_1) 0 ... 0 ]

[ 0 f(e_2) ... 0 ]

[ ... ... ... ... ]

[ 0 0 ... f(e_M)]

Otherwise, if J is a Jordan-block then

[16] f(J) = [ f(e_1)/0! f^1(e_1)/1! ... f^(M-1)(e_1)/(M-1)! ]

[ 0 f(e_2)/0! ... f^(M-2)(e_2)/(M-2)! ]

[ ... ... ... ... ]

[ 0 0 0 ... f^M(e_M)/0! ]

Although this alternative approach appears simpler, it is generally

undesirable as the decomposition [13] is computationally-expensive.

Equation [12] is thus offered as a general formula for the evaluation

of an arbitrary function of an MxM matrix.