CS184/284A: Lecture Slides

An interesting fact about GPUs is that per "warp" (group of cuda cores), the registers often have more storage space than the actual shared DRAM. This is to help the threads communicate for things such as reductions and be able to move data around very quickly as often times that leads to a bottleneck in high performance computing over large datasets.