by John Appleyard B.A., Ph D, Polyhedron Software Ltd, May 2004
In this note, I will describe the “Nested Factorization” algorithm, which may be used to approximate the solution of large finite difference matrix equations. Nested Factorization is generally used to precondition the conjugate gradient or a similar algorithm. It was first described by Appleyard, Cheshire & Pollard^{(1)}, and a number of variants were explored in a later paper^{(2)}. The method works well with very stiff nearly singular equations which defeat other iterative methods, and is similar to that implemented in Schlumberger’s widely used Eclipse oil reservoir simulator^{(3)}.
For an n_{x} by n_{y} by n_{z} grid, the banded finite difference matrix A is
A = d + u + l + v + m_{ }+ w + n
where
d is the diagonal,
u and l are the bands immediately above and below the diagonal, which connect adjacent cells within a line
v and m_{ }are the bands n_{x} elements away from the diagonal, which connect adjacent lines within a plane
w and n_{ }are the bands n_{x}*n_{y} elements away from the diagonal, which connect adjacent planes
This matrix can be regarded as recursively block tridiagonal. At the outermost level each block element represents a plane, or the coefficients connecting adjacent planes. Within each plane, the blocks represent lines, and coefficients connecting adjacent lines. At the innermost level, we have individual grid blocks and the connections within lines.
Nested factorization exploits this recursive structure by defining the preconditioner, B, recursively:
B = (P+n).(I+P^{1}.w) (B is block tridiagonal  block elements are planes)
where I is the unit matrix
and P = (T+m).(I+T^{1}.v) (P is block tridiagonal  block elements are lines)
and T = (γ +l).(I+ γ^{ 1}.u) (T is tridiagonal)
and γ = d  l. γ^{ 1}.u  colsum(ε) (γ is diagonal)
and ε = m.T^{1}.v_{ }+_{ }n.P^{1}.w_{ }(the uncorrected error matrix)
colsum(ε) is the diagonal matrix formed by summing the elements of ε in columns. After expansion, we obtain
B = A + ε – colsum(ε)
In summary, B is a partial L.U factorization, but unlike most other similar methods, the factors L and U are block (rather than strictly) lower and upper triangular. For our example 4 x 3 x 2 finite difference matrix, (P+n) and (I+P^{1}.w) are a 2 x 2 matrices
P_{a} 


I 
P_{a}^{1}w 
n 
P_{b} 


I 
where P_{a} and P_{b }are themselves matrices which connect cells within the 2 planes, and_{ }n_{, }w are diagonal matrices containing elements from A. To compute B^{1}.x, where x is a vector, we first sweep forward through the planes to obtain (P+n)^{1}.x, and then back to apply (I+P^{1}.w)^{1}. The problem is thus reduced to solving P^{1}.x for each plane, and from 3 dimensions to 2.
For a suitable choice of P (P = A – n – w _{ }n.P^{1}.w), this factorization of A is exact. However an exact factorization is not practical for large matrices, and instead, we approximate each block element of P in the same way – with an incomplete block L.U factorization.
P_{a} = (T+m).(I+T^{1}.v)
For our example matrix, (T+m) and (I+T^{1}.v) are 3 x 3 matrices
T_{a} 



I 
T_{a}^{1 }v 

m 
T_{b} 



I 
T_{b}^{1 }v 

m 
T_{c} 



I 
where T_{a, }T_{b} and T_{c}_{ }are themselves 4 x 4 matrices which connect cells within a line, and_{ }m, v are diagonal matrices containing elements from A. The evaluation of P_{a}^{1}.x proceeds in the same way as proceed as that of B^{1}.x, and the problem is further reduced from 2 dimensions to 1. The solution of the one dimensional problems, of the form T_{a}^{1}.x (where T_{a} is tridiagonal) is then a simple matter which terminates the recursion.
In fact, the Nested Factorization preconditioner can be shown to be equivalent a strictly triangular factorization, but with L and U factors that include a good subset of the fillin bands produced by a Cholesky factorization.
With one exception, the entire calculation is performed using only the original matrix coefficients. Nested Factorization requires storage for only one computed band – the diagonal matrix γ (or more usually, γ^{ 1}):
γ = d  l. γ^{ 1}.u  colsum(m.T^{1}.v_{ })  colsum(n.P^{1}.w)
The elements of γ are calculated in a single sweep through the grid. Each element depends on the previous element within the same line (through l. γ^{ 1}.u), and on elements in the previous line (through colsum(m.T^{1}.v_{ })) and the previous plane (through colsum(n.P^{1}.w_{ })).
The error matrix (BA)
BA = m.T^{1} .v_{ } colsum(m.T^{1}.v_{ }) + n.P^{1}.w  colsum(n.P^{1}.w)
is block diagonal, and has the property that columns of elements sum to zero. This is physically meaningful in many cases  for example it ensures zero material balance error for every plane in many CFD problems. With other iterative methods, it is often this type of error which is most persistent, and its immediate elimination helps to accelerate convergence.
The zero colsum property also provides a very useful check on the correctness of the implementation. If we approximate the solution of A.x = b by solving B.x = b, then the residual is (BA).x. If colsum(BA) is zero, then the sum of the components of the residual is zero for any x. If this property is not observed, we can deduce that the implementation is incorrect.
Finally, there is a mathematical reason for preferring colsum to the more usual rowsum constraint in fluid flow problems. It can be shown that in such problems, only the colsum constraint leads to a preconditioning matrix, B that is always nonsingular.
Shape of the Error Matrix – (BA)
A Fortran 90 program which generates test matrices, and solves them using a variety of iterative methods was developed. It may be downloaded from http://www.polyhedron.com/nf.zip. The test matrices are symmetric and positive definite. The band elements are generated by a random number generator, with different maxima for the u, v and w bands. The l, m and n bands are defined by symmetry. The diagonal is set to
d =  colsum(u_{ }+ l_{ }+_{ }v_{ }+_{ }m_{ }+_{ }w_{ }+ n_{ })  1/s
The final term represents a diagonal matrix which ensures that the matrix is diagonally dominant. The constant s is identified as the stiffness, and in general, the equations become more difficult as s is increased. The right hand sides are generated using a random number generator, with a maximum value of 1 for each element.
The test program reads values for the grid dimensions n_{x}, n_{y} and n_{z }, the maximum sizes for elements in u_{ },_{ }v_{ }and w, and the stiffness, s, together with the convergence criteria, maximum iteration count and the solution method. In these tests, we used n_{x}=97 n_{y} =105 and n_{z }= 99 (representing a total of about 10^{6 }variables). Three different problems were examined, each with 4 different values of s, as shown in the following table. Four different solution methods were tested, NFCG (as described in this note), ICCG0 – a simple incomplete Cholesky preconditioned conjugate gradient method, ICCGC – a variant of ICCG0 adapted to give an error matrix with a zero colsum, and SIP3D – a classic iterative scheme devised by Stone^{(4)}.

u_{max} 
v_{max} 
w_{max} 
s 
Methods 
Problem 1 
100 
1 
1 
1,10,100,1000 
NFCG, ICCG0,ICCGC,SIP3D 

1 
100 
1 

NFCG 

1 
1 
100 

NFCG 
Problem 2 
100 
100 
1 
1,10,100,1000 
NFCG, ICCG0,ICCGC,SIP3D 

100 
1 
100 

NFCG 

1 
100 
100 

NFCG 
Problem 3 
100 
100 
100 
1,10,100,1000 
NFCG, ICCG0,ICCGC,SIP3D 
The graphs show the average number of NFCG iterations required to reduce the rms residual by a factor of 10, plotted as a function of the stiffness, s, for different solution methods. For ICCG0, ICCGC and SIP3D, the iteration counts are “NFCG equivalent” iterations (i.e. somewhat below the actual value).
The results show that NFCG works best when the u and l_{ }bands are numerically largest, followed by v, m and, finally, w, n. This makes sense intuitively, as the evaluation of B^{1}.x involves solving each P^{1}.x twice and each T^{1}.x four times. Thus u and l_{ }are used twice as often as v and m, and 4 times as often as w and n_{. }In many problem domains, this directionality is convenient; for example in geological models, the finite difference grid blocks are generally much bigger horizontally than vertically. As a result, the vertical transmissibility between blocks is much greater than the horizontal transmissibility, and the matrix coefficients for the vertical connections are largest.
For large values of s, the directionality of the method is quite sufficient to justify reordering the equations so that u and l are the largest bands. However, nested factorization remains highly competitive with other methods when there is no obvious directionality, and even when the wrong direction dominates.
It is also apparent that, regardless of directionality, NFCG is faster than the other methods tested, and that the difference is most marked when s is large. Application of the colsum constraint dramatically improves the performance of ICCG, but it remains a factor of 2 or more slower than NFCG.
The ICCG and NFCG methods tested here require storage for just one computed diagonal band matrix. I have not, in this study, examined the performance of higher order versions of ICCG and NFCG, though some information may be found in reference 2. The higher order variants improve the accuracy of the approximate solution by adding additional computed bands to the triangular L and U factors. Such methods do yield a gradual improvement in convergence rates, but this must be set against higher storage requirements, a longer setup phase, and slower iterations. It’s also more difficult to apply the colsum constraint on the more complicated factorizations. In most situations, the basic NFCG method is hard to beat.
References
1) Appleyard J. R., Cheshire I. M., and Pollard R. K. (1981): “Special techniques for fullyimplicit simulators.” In Proc. European Symposium on Enhanced Oil Recovery Bournemouth, UK, pages 395408
2) Appleyard J.R and Cheshire I.M.: “Nested factorization”, In Seventh SPE Symposium on Reservoir Simulation, pages 315324, 1983. paper number 12264.
3) Crumpton, P.A., Fjerstad, P.A., and Berge, J.: “Parallel computing  Using ECLIPSE Parallel”  White Paper (2003) www.sis.slb.com/media/about/whitepaper_parallelcomputing.pdf
4) Stone, H.L.,: “Iterative Solution of Implicit Approximations of Multidimensional Partial Differential Equations”,SIAM J Numer. Anal. (Sept 1968) 5, 530558.