(Downloads - 0)
For more info about our services contact : help@bestpfe.com
Table of contents
1 Introduction
2 Preliminaries
2.1 Notation
2.1.1 Communication
2.2 Graphs and partitioning techniques
2.2.1 Graphs
2.2.2 Nested dissection
2.2.3 K-way graph partitioning
2.3 Orthonormalization
2.3.1 Classical Gram Schmidt (CGS)
2.3.2 Modified Gram Schmidt (MGS)
2.3.3 Tall and skinny QR (TSQR)
2.4 The A-orthonormalization
2.4.1 Modified Gram Schmidt A-orthonormalization
2.4.2 Classical Gram Schmidt A-orthonormalization
2.4.3 Cholesky QR A-orthonormalization
2.5 Matrix powers kernel
2.6 Test matrices
3 Krylov Subspace Methods
3.1 Classical Krylov subspace methods
3.1.1 The Krylov subspaces
3.1.2 The Krylov subspace methods
3.1.3 Krylov projection methods
3.1.4 Conjugate gradient
3.1.5 Generalized minimal residual (GMRES) method
3.2 Parallelizable variants of the Krylov subspace methods
3.2.1 Block Krylov methods
3.2.2 The s-step Krylov methods
2 S. MOUFAWAD
3.2.3 Communication avoiding methods
3.2.4 Other CG methods
3.3 Preconditioners
3.3.1 Incomplete LU preconditioner
3.3.2 Block Jacobi preconditioner
3.3.3 Restricted additive Schwarz preconditioner
4 Enlarged Krylov Subspace (EKS) Methods
4.1 The enlarged Krylov subspace
4.1.1 Krylov projection methods
4.1.2 The minimization property
4.1.3 Convergence analysis
4.2 Multiple search direction with orthogonalization conjugate gradient (MSDO-CG) method
4.2.1 The residual rk
4.2.2 The domain search direction Pk
4.2.3 Finding the expression of ↵k`1 and « k`1
4.2.4 The MSDO-CG algorithm
4.3 Long recurrence enlarged conjugate gradient (LRE-CG) method
4.3.1 The LRE-CG algorithm
4.4 Convergence results
4.5 Parallel model and expected performance
4.5.1 MSDO-CG
4.5.2 LRE-CG
4.6 Preconditioned enlarged Krylov subspace methods
4.6.1 Convergence
4.7 Summary
5 Communication Avoiding Incomplete LU(0) Preconditioner
5.1 ILU matrix powers kernel
5.1.1 The partitioning problem
5.1.2 ILU preconditioned matrix powers kernel
5.2 Alternating min-max layers (AMML(s)) reordering for ILU(0) matrix powers kernel
5.2.1 Nested dissection + AMML(s) reordering of the matrix A
5.2.2 K-way + AMML(s) reordering of the matrix A
5.2.3 Complexity of AMML(s) Reordering
5.3 CA-ILU0 preconditioner
5.4 Expected numerical efficiency and performance of CA-ILU0 preconditioner
5.4.1 Convergence
5.4.2 Avoided communication versus memory requirements and redundant flops of the ILU0 matrix powers kernel
5.4.3 Comparison between CA-ILU0 preconditioner and block Jacobi preconditioner
5.5 Summary
6 Conclusion and Future work
Appendix A ILU(0) preconditioned GMRES convergence for different reorderings


