CS184/284A: Lecture Slides

The paper here (http://web.cs.ucla.edu/~pouchet/doc/cc-article.11.pdf) discusses data layout transformations in the context of SIMD architectures. I found this outline to be very interesting, primarily because it goes deeper into how the pipeline is sped up by performing vector as opposed to single number math. It also discusses how various reordering optimizations decrease the amount of latency and wait time from one block to the next.