CIS 5650 - Course Syllabus and Schedule
Schedule
Course Topics
- Introduction to CUDA- CUDA Terminology - Kernels, Threads, Blocks
- Memory Management
- Basic Matrix Multiplication using Parallel Programming
- Built-in Variables and Functions
- Thread Scheduling
- CUDA Memory Model
- Thread Synchronization
- Matrix Multiplication Revisited
 
- GPU Architecture Overview- Trends in CPU and GPU Performance
- CPU Architecture Overview
- CPU Parallelism, and Scheduling
- History of GPUs
- GPU Architecture Evolution
 
- Parallel Algorithms- Reduction
- Scan (Naive and Work-Efficient)
- Stream Compaction
- Summed Area Tables
- Radix Sort
 
- CUDA Performance- Parallel Reduction Revisited
- Warp Partitioning
- Memory Coalescing
- Bank Conflicts
- Dynamic Partitioning of SM Resources
- Data Pre-fetching
- Instruction Mix
- Loop Unrolling
- Thread Granularity
 
- CUDA Atomics- Atomic Functions
- Atomic Add, Subtract, Exchange, CAS
 
- Graphics Pipeline- Introduction to Graphics Pipeline (Forward Rendering)
- Mapping Graphics Pipeline to Hardware
 
- Introduction to WebGPU
- Multi-pass Forward Rendering
- Deferred Rendering
- Tile-Based Deferred Shading
- Forward+ Rendering
- Clustered Shading
- WebGL Demos
 
- Advance Topics in CUDA- CUDA Unified Memory
- Faster Memory Transfers
- Zero Copy Memory
- CUDA Streams
- CUDA Streaming Compute-Copy Overlap
- Warp Functions
- Shuffle Operations
- Warp Reduce using Shuffle
 
- Introduction to Vulkan
- Machine Learning & CUDA
Labs
- Debugging Lab
- Performance Lab- Launching Nsight
- Running Performance Analysis
- Understanding Metrics
- NVIDIA Visual Profiler
- Matrix Transpose Optimization
- Reduction Optimization
 
