Modern X86 SIMD Programming - Outline Page 1 of 7 D. Kusswurm - F:\ModX86SIMD\Outline\ModernX86SIMD_Outline (v1).docx Introduction The Introduction presents an overview of the book and includes concise descriptions of each chapter. It also summaries the hardware and software tools required to use the book''s source code. Overview Target Audience Chapter Descriptions Source Code Additional Resources Chapter 1 - SIMD Fundamentals Chapter 1 discusses SIMD fundamentals including data types, basic arithmetic, and common data manipulation operations. Understanding of this material is necessary for the reader to successfully comprehend the book''s subsequent chapters. What is SIMD? Simple C++ example (Ch01_01) Brief History of x86 SIMD Instruction Set Extensions MMX SSE - SSE4.2 AVX, AVX2, and AVX-512 SIMD Data Types Fundamental types 128b, 256b, 512b Integer types Packed i8, i16, i32, i64 (signed and unsigned) Floating-point types Packed f16/b16, f32 and f64 Little-endian storage SIMD Arithmetic Integer Addition and subtraction Wraparound vs.
saturated Multiplication Bitwise logical Floating-point Addition, subtraction, multiplication, division, sqrt Horizontal addition and subtraction Fused multiply-accumulate (FMA) SIMD Operations Integer Min & max Compares Shuffles, permutations, and blends Size promotions and reductions Floating-point Min & max Compares Shuffles, permutations, and blends Size promotions and reductions Modern X86 SIMD Programming - Outline Page 2 of 7 D. Kusswurm - F:\ModX86SIMD\Outline\ModernX86SIMD_Outline (v1).docx Masked moves Conditional execution and merging (AVX-512) SIMD Programming Overview C++ compiler options C++ SIMD intrinsic functions Assembly language functions Testing for AVX, AVX2, and AVX-512 Chapter 2 - AVX C++ Programming - Part 1 Chapter 2 teaches AVX integer arithmetic and other operations using C++ intrinsic functions. It also discusses how to code a few simple image processing algorithms using C++ intrinsic functions and AVX instructions. Basic Integer Arithmetic Addition (Ch02_01) Subtraction (Ch02_02) Multiplication (Ch02_03) Common Integer Operations Bitwise logical operations (Ch02_04) Arithmetic and logical shifts (Ch02_05) Image Processing Algorithms Pixel minimum and maximum (Ch02_06) Pixel mean (Ch02_07) Chapter 3 - AVX C++ Programming - Part 2 Chapter 3 is similar to the previous chapter but emphasizes floating-point instead of integer values. This chapter also explains how to employ C++ intrinsic functions to perform SIMD arithmetic operations using floating-point arrays and matrices. Basic Floating-Point Arithmetic Addition, subtraction, etc. (Ch03_01) Compares (Ch03_02) Conversions (Ch03_03) Floating-Point Arrays Array mean and standard deviation (Ch03_04, Ch03_05) Array square roots and compares (Ch03_06, Ch03_07) Floating-Point Matrices Matrix column means (Ch03_08, Ch03_09) Chapter 4 - AVX2 C++ Programming - Part 1 Chapter 4 describes AVX2 integer programming using C++ intrinsic functions.
This chapter also highlights the coding of more sophisticated image processing functions using the AVX2 instruction set. Basic Integer Arithmetic Addition and subtraction (Ch04_01) Pack and unpack operations (Ch04_02) Size promotions (Ch04_03) Image Processing Algorithms Pixel clipping (Ch04_04) RGB to grayscale (Ch04_05) Modern X86 SIMD Programming - Outline Page 3 of 7 D. Kusswurm - F:\ModX86SIMD\Outline\ModernX86SIMD_Outline (v1).docx Thresholding (Ch04_06) Pixel conversions (Ch04_07) Chapter 5 - AVX2 C++ Programming - Part 2 Chapter 5 explains how to accelerate the performance of commonly used floating-point algorithms using C++ intrinsic functions and the AVX2 instruction set. The source code examples in this chapter also demonstrate use of FMA (fusedmultiply-add) arithmetic. Floating-Point Arrays Least squares with FMA (Ch05_01) Floating-Point Matrices Matrix multiplication (Ch05_02, Ch05_03) Matrix (4x4) multiplication (Ch05_04, Ch05_05) Matrix (4x4) vector multiplication (Ch05_06) Matrix inversion (Ch05_07, Ch05_08) Chapter 6 - AVX2 C++ Programming - Part 3 Chapter 6 is a continuation of the previous chapter. It focuses on more advanced algorithms and SIMD programming techniques. Signal Processing Brief overview of convolution arithmetic 1D Convolutions Variable and fixed width kernels (Ch06_01, Ch06_02) 2D Convolutions Non-separable kernel (Ch06_03) Separable kernel (Ch06_04) Chapter 7 - AVX-512 C++ Programming - Part 1 Chapter 7 explains AVX-512 integer arithmetic and other operations using C++ intrinsic functions.
It also discusses how to code a few basic image processing algorithms using the AVX-512 instruction set. Integer Arithmetic Addition and subtraction (Ch07_01) Masked arithmetic (Ch07_02) Image Processing RGB to grayscale (Ch07_03) Image thresholding (Ch07_04) Image statistics (Ch07_05) Chapter 8 - AVX-512 C++ Programming - Part 2 Chapter 8 describes how to code common and advanced floating-point algorithms using C++ intrinsic functions and the AVX512 instruction set. Floating-Point Arithmetic Addition, subtraction, etc. (Ch08_01) Masked operations (Ch08_02) Floating-Point Arrays Array mean and standard deviation (Ch08_03) Modern X86 SIMD Programming - Outline Page 4 of 7 D. Kusswurm - F:\ModX86SIMD\Outline\ModernX86SIMD_Outline (v1).docx Floating-Point Matrices Covariance matrix (Ch08_04) Matrix multiplication (Ch08_05, Ch08_06) Matrix (4x4) vector multiplication (Ch08_07) Signal Processing 1D convolution using variable and fixed width kernels (Ch08_08) 2D convolutions using separable kernel (Ch08_09) Chapter 9 - Supplemental C++ SIMD Programming Chapter 9 examines supplemental x86 SIMD programming topics including instruction set detection, how to use SIMD math library functions, and SIMD operations using text strings. Instruction set detection (Ch09_01) SIMD Math Library Functions Rectangular to polar coordinate conversions (Ch09_02) Body surface area calculations (Ch09_03) SIMD String Operations String length (Ch09_04) Chapter 10 - X86 Processor Architecture Chapter 10 explains x86 processor architecture including data types, register sets, memory addressing modes, and condition codes. Knowledge of this material is necessary for the reader to successfully understand the subsequent x86 assembly language programming chapters.
Data types Fundamental data types Numerical data types SIMD data types Strings Internal architecture General-purpose registers RFLAGS register MXCSR register Scalar FP and SIMD registers Memory addressing Condition codes Chapter 11 - Core Assembly Language Programming - Part 1 Chapter 11 teaches fundamental x86-64 assembly language programming and basic instruction use. Understanding of this material is required to comprehend the source code examples in subsequent chapters. Integer Arithmetic Addition and subtraction (Ch11_01) Multiplication (Ch11_02) Division (Ch11_03) Mixed integer types and stack arguments (Ch11_04) Integer Operations Memory addressing modes (Ch11_05) Simple for-loops (Ch11_06) Modern X86 SIMD Programming - Outline Page 5 of 7 D. Kusswurm - F:\ModX86SIMD\Outline\ModernX86SIMD_Outline (v1).docx Compares (Ch11_07) Text Strings String instructions (Ch11_08) Chapter 12 - Core Assembly Language Programming - Part 2 Chapter 12 is a continuation of the previous chapter. Topics discussed include scalar floating-point arithmetic, floating-point arrays, and function calling conventions. Scalar Floating-Point Arithmetic.