Programs for Programmers

Nested Factorization

Nested Factorization

by John Appleyard B.A., Ph D, Polyhedron Software Ltd, May 2004

 

In this note, I will describe the “Nested Factorization” algorithm, which may be used to approximate the solution of large finite difference matrix equations. Nested Factorization is generally used to pre-condition the conjugate gradient or a similar algorithm. It was first described by Appleyard, Cheshire & Pollard(1), and a number of variants were explored in a later paper(2). The method works well with very stiff nearly singular equations which defeat other iterative methods, and is similar to that implemented in Schlumberger’s widely used Eclipse oil reservoir simulator(3).

Matrix Structure

 

For an nx by ny by nz grid, the banded finite difference matrix A is

 

A = d + u + l + v + m + w + n

where

d is the diagonal,

u and l are the bands immediately above and below the diagonal, which connect adjacent cells within a line

v and m are the bands nx elements away from the diagonal, which connect adjacent lines within a plane

w and n are the bands nx*ny elements away from the diagonal, which connect adjacent planes

Banded Matrix for a 4 x 3 x 2 Finite Difference Grid

Matrix Structure for 4x3x2 Grid

 

This matrix can be regarded as recursively block tridiagonal. At the outermost level each block element represents a plane, or the coefficients connecting adjacent planes. Within each plane, the blocks represent lines, and coefficients connecting adjacent lines. At the innermost level, we have individual grid blocks and the connections within lines.

The Nested Factorization Preconditioner

 

Nested factorization exploits this recursive structure by defining the preconditioner, B, recursively:-

 

B = (P+n).(I+P-1.w) (B is block tridiagonal - block elements are planes)

where I is the unit matrix

and P = (T+m).(I+T-1.v) (P is block tridiagonal - block elements are lines)

and T = (γ +l).(I+ γ -1.u) (T is tridiagonal)

and γ = d - l. γ -1.u - colsum(ε) (γ is diagonal)

and ε = m.T-1.v + n.P-1.w (the uncorrected error matrix)

 

colsum(ε) is the diagonal matrix formed by summing the elements of ε in columns. After expansion, we obtain

 

B = A + ε – colsum(ε)

 

In summary, B is a partial L.U factorization, but unlike most other similar methods, the factors L and U are block (rather than strictly) lower and upper triangular. For our example 4 x 3 x 2 finite difference matrix, (P+n) and (I+P-1.w) are a 2 x 2 matrices

 

Pa

 

 

I

Pa-1w

n

Pb

 

 

I

 

where Pa and Pb are themselves matrices which connect cells within the 2 planes, and n, w are diagonal matrices containing elements from A. To compute B-1.x, where x is a vector, we first sweep forward through the planes to obtain (P+n)-1.x, and then back to apply (I+P-1.w)-1. The problem is thus reduced to solving P-1.x for each plane, and from 3 dimensions to 2.

 

For a suitable choice of P (P = Anw - n.P-1.w), this factorization of A is exact. However an exact factorization is not practical for large matrices, and instead, we approximate each block element of P in the same way – with an incomplete block L.U factorization.

 

Pa = (T+m).(I+T-1.v)

 

For our example matrix, (T+m) and (I+T-1.v) are 3 x 3 matrices

 

Ta

 

 

 

I

Ta-1 v

 

m

Tb

 

 

 

I

Tb-1 v

 

m

Tc

 

 

 

I

 

where Ta, Tb and Tc are themselves 4 x 4 matrices which connect cells within a line, and m, v are diagonal matrices containing elements from A. The evaluation of Pa-1.x proceeds in the same way as proceed as that of B-1.x, and the problem is further reduced from 2 dimensions to 1. The solution of the one dimensional problems, of the form Ta-1.x (where Ta is tridiagonal) is then a simple matter which terminates the recursion.

 

In fact, the Nested Factorization preconditioner can be shown to be equivalent a strictly triangular factorization, but with L and U factors that include a good subset of the fill-in bands produced by a Cholesky factorization.

The Diagonal Matrix, γ

 

With one exception, the entire calculation is performed using only the original matrix coefficients. Nested Factorization requires storage for only one computed band – the diagonal matrix γ (or more usually, γ -1):

 

γ = d - l. γ -1.u - colsum(m.T-1.v ) - colsum(n.P-1.w)

 

The elements of γ are calculated in a single sweep through the grid. Each element depends on the previous element within the same line (through l. γ -1.u), and on elements in the previous line (through colsum(m.T-1.v )) and the previous plane (through colsum(n.P-1.w )).

Why Colsum?

 

The error matrix (B-A)

 

B-A = m.T-1 .v - colsum(m.T-1.v ) + n.P-1.w - colsum(n.P-1.w)

 

is block diagonal, and has the property that columns of elements sum to zero. This is physically meaningful in many cases - for example it ensures zero material balance error for every plane in many CFD problems. With other iterative methods, it is often this type of error which is most persistent, and its immediate elimination helps to accelerate convergence.

 

The zero colsum property also provides a very useful check on the correctness of the implementation. If we approximate the solution of A.x = b by solving B.x = b, then the residual is (B-A).x. If colsum(B-A) is zero, then the sum of the components of the residual is zero for any x. If this property is not observed, we can deduce that the implementation is incorrect.

 

Finally, there is a mathematical reason for preferring colsum to the more usual rowsum constraint in fluid flow problems. It can be shown that in such problems, only the colsum constraint leads to a preconditioning matrix, B that is always non-singular.

 


Shape of the Error Matrix – (B-A)

Error Matrix

 

Numerical Tests

 

A Fortran 90 program which generates test matrices, and solves them using a variety of iterative methods was developed. It may be downloaded from http://www.polyhedron.com/nf.zip. The test matrices are symmetric and positive definite. The band elements are generated by a random number generator, with different maxima for the u, v and w bands. The l, m and n bands are defined by symmetry. The diagonal is set to

 

d = - colsum(u + l + v + m + w + n ) - 1/s

 

The final term represents a diagonal matrix which ensures that the matrix is diagonally dominant. The constant s is identified as the stiffness, and in general, the equations become more difficult as s is increased. The right hand sides are generated using a random number generator, with a maximum value of 1 for each element.

 

The test program reads values for the grid dimensions nx, ny and nz , the maximum sizes for elements in u , v and w, and the stiffness, s, together with the convergence criteria, maximum iteration count and the solution method. In these tests, we used nx=97 ny =105 and nz = 99 (representing a total of about 106 variables). Three different problems were examined, each with 4 different values of s, as shown in the following table. Four different solution methods were tested, NFCG (as described in this note), ICCG0 – a simple incomplete Cholesky preconditioned conjugate gradient method, ICCGC – a variant of ICCG0 adapted to give an error matrix with a zero colsum, and SIP3D – a classic iterative scheme devised by Stone(4).

 

 

umax

vmax

wmax

s

Methods

Problem 1

100

1

1

1,10,100,1000

NFCG, ICCG0,ICCGC,SIP3D

 

1

100

1

 

NFCG

 

1

1

100

 

NFCG

Problem 2

100

100

1

1,10,100,1000

NFCG, ICCG0,ICCGC,SIP3D

 

100

1

100

 

NFCG

 

1

100

100

 

NFCG

Problem 3

100

100

100

1,10,100,1000

NFCG, ICCG0,ICCGC,SIP3D

 

Results and Discussion

The graphs show the average number of NFCG iterations required to reduce the rms residual by a factor of 10, plotted as a function of the stiffness, s, for different solution methods. For ICCG0, ICCGC and SIP3D, the iteration counts are “NFCG equivalent” iterations (i.e. somewhat below the actual value).

 

The results show that NFCG works best when the u and l bands are numerically largest, followed by v, m and, finally, w, n. This makes sense intuitively, as the evaluation of B-1.x involves solving each P-1.x twice and each T-1.x four times. Thus u and l are used twice as often as v and m, and 4 times as often as w and n. In many problem domains, this directionality is convenient; for example in geological models, the finite difference grid blocks are generally much bigger horizontally than vertically. As a result, the vertical transmissibility between blocks is much greater than the horizontal transmissibility, and the matrix coefficients for the vertical connections are largest.

 

For large values of s, the directionality of the method is quite sufficient to justify re-ordering the equations so that u and l are the largest bands. However, nested factorization remains highly competitive with other methods when there is no obvious directionality, and even when the wrong direction dominates.

 

It is also apparent that, regardless of directionality, NFCG is faster than the other methods tested, and that the difference is most marked when s is large. Application of the colsum constraint dramatically improves the performance of ICCG, but it remains a factor of 2 or more slower than NFCG.

 

The ICCG and NFCG methods tested here require storage for just one computed diagonal band matrix. I have not, in this study, examined the performance of higher order versions of ICCG and NFCG, though some information may be found in reference 2. The higher order variants improve the accuracy of the approximate solution by adding additional computed bands to the triangular L and U factors. Such methods do yield a gradual improvement in convergence rates, but this must be set against higher storage requirements, a longer setup phase, and slower iterations. It’s also more difficult to apply the colsum constraint on the more complicated factorizations. In most situations, the basic NFCG method is hard to beat.

 

References

 

1)  Appleyard J. R., Cheshire I. M., and Pollard R. K. (1981): “Special techniques for fully-implicit simulators.” In Proc. European Symposium on Enhanced Oil Recovery Bournemouth, UK, pages 395-408

2)  Appleyard J.R and Cheshire I.M.: “Nested factorization”, In Seventh SPE Symposium on Reservoir Simulation, pages 315-324, 1983. paper number 12264.

3)  Crumpton, P.A., Fjerstad, P.A., and Berge, J.: “Parallel computing - Using ECLIPSE Parallel” - White Paper (2003) www.sis.slb.com/media/about/whitepaper_parallelcomputing.pdf

4)  Stone, H.L.,: “Iterative Solution of Implicit Approximations of Multidimensional Partial Differential Equations”,SIAM J Numer. Anal. (Sept 1968) 5, 530-558.

 

NFCG iterationsNFCG iterations

NFCG iterations