Skip to main content
Back to top
Ctrl
+
K
Welcome
Syllabus
Lectures
1) First Class: Reproducibility and Git
2) The Linux Filesystem and commands
3) Community Projects
4) Introduction to Computer Architectures
5) Introduction to Vectorization
6) Measuring Performance
7) CPU Optimization: Matrix-Matrix Multiply
8) Blocked Matrix-Matrix Multiplication
9) Packing for cache
10) Introduction to Multithreading
11) Introduction to Parallel Scaling
12) Introduction to OpenMP
13) More on OpenMP and OpenMP Tasks
14) Case Study 1: libCEED
15) Case Study 2: ClimaCore
16) HW2 solution
17) Parallel reductions and scans
18) Introduction to MPI
19) Introduction to Batch Jobs
20) Parallel Linear Algebra
21) Introduction to MPI.jl
22) Blocking and non-blocking communication
23) Collective communication
24) HW3 solution
25) Coprocessor architectures
26) GPUs and CUDA
27) Practical CUDA
28) Intro to GPU programming in Julia
29) Memory management with CUDA.jl
30) Parallel reductions with CUDA.jl
Repository
Open issue
Index