Lecture 24: High Performance Image Processing & Halide (11)
Caozongkai
I think this is quite similar when you look at the cpu chip layout. Majority of the area of the chip is actually being used for cache and buses. The ALU only take really small amount of area.
JayShenoy
This slide is saying that because communication dominates computation, taking the time to optimize code for locality and parallelism can highly improve program runtime.
I think this is quite similar when you look at the cpu chip layout. Majority of the area of the chip is actually being used for cache and buses. The ALU only take really small amount of area.
This slide is saying that because communication dominates computation, taking the time to optimize code for locality and parallelism can highly improve program runtime.