Modern Processors Stored-program computer architecture General-purpose cache-based microprocessor architecture Memory hierarchies Multicore processors Multithreaded processors Vector processors Basic Optimization Techniques for Serial Code Scalar profiling Common sense optimizations Simple measures, large impact The role of compilers C++ optimizations Data Access Optimization Balance analysis and lightspeed estimates Storage order Case study: The Jacobi algorithm Case study: Dense matrix transpose Algorithm classification and access optimizations Case study: Sparse matrix-vector multiply Parallel Computers Taxonomy of parallel computing paradigms Shared-memory computers Distributed-memory computers Hierarchical (hybrid) systems Networks Basics of Parallelization Why parallelize? Parallelism Parallel scalability Shared-Memory Parallel Programming with OpenMP Short introduction to OpenMP Case study: OpenMP-parallel Jacobi algorithm Advanced OpenMP: Wavefront parallelization Efficient OpenMP Programming Profiling OpenMP programs Performance pitfalls Case study: Parallel sparse matrix-vector multiply Locality Optimizations on ccNUMA Architectures Locality of access on ccNUMA Case study: ccNUMA optimization of sparse MVM Placement pitfalls ccNUMA issues with C++ Distributed-Memory Parallel Programming with MPI Message passing A short introduction to MPI Example: MPI parallelization of a Jacobi solver Efficient MPI Programming MPI performance tools Communication parameters Synchronization, serialization, contention Reducing communication overhead Understanding intranode point-to-point communication Hybrid Parallelization with MPI and OpenMP Basic MPI/OpenMP programming models MPI taxonomy of thread interoperability Hybrid decomposition and mapping Potential benefits and drawbacks of hybrid programming Appendix A: Topology and Affinity in Multicore Environments Appendix B: Solutions to the Problems Bibliography Index l Computers Taxonomy of parallel computing paradigms Shared-memory computers Distributed-memory computers Hierarchical (hybrid) systems Networks Basics of Parallelization Why parallelize? Parallelism Parallel scalability Shared-Memory Parallel Programming with OpenMP Short introduction to OpenMP Case study: OpenMP-parallel Jacobi algorithm Advanced OpenMP: Wavefront parallelization Efficient OpenMP Programming Profiling OpenMP programs Performance pitfalls Case study: Parallel sparse matrix-vector multiply Locality Optimizations on ccNUMA Architectures Locality of access on ccNUMA Case study: ccNUMA optimization of sparse MVM Placement pitfalls ccNUMA issues with C++ Distributed-Memory Parallel Programming with MPI Message passing A short introduction to MPI Example: MPI parallelization of a Jacobi solver Efficient MPI Programming MPI performance tools Communication parameters Synchronization, serialization, contention Reducing communication overhead Understanding intranode point-to-point communication Hybrid Parallelization with MPI and OpenMP Basic MPI/OpenMP programming models MPI taxonomy of thread interoperability Hybrid decomposition and mapping Potential benefits and drawbacks of hybrid programming Appendix A: Topology and Affinity in Multicore Environments Appendix B: Solutions to the Problems Bibliography Index lity of access on ccNUMA Case study: ccNUMA optimization of sparse MVM Placement pitfalls ccNUMA issues with C++ Distributed-Memory Parallel Programming with MPI Message passing A short introduction to MPI Example: MPI parallelization of a Jacobi solver Efficient MPI Programming MPI performance tools Communication parameters Synchronization, serialization, contention Reducing communication overhead Understanding intranode point-to-point communication Hybrid Parallelization with MPI and OpenMP Basic MPI/OpenMP programming models MPI taxonomy of thread interoperability Hybrid decomposition and mapping Potential benefits and drawbacks of hybrid programming Appendix A: Topology and Affinity in Multicore Environments Appendix B: Solutions to the Problems Bibliography Index ;STRONG>Appendix A: Topology and Affinity in Multicore Environments Appendix B: Solutions to the Problems Bibliography Index.
Introduction to High Performance Computing for Scientists and Engineers