You are viewing the course site for a past offering of this course. The current offering may be found here.
Lecture 24: High Performance Image Processing & Halide (11)
Caozongkai

I think this is quite similar when you look at the cpu chip layout. Majority of the area of the chip is actually being used for cache and buses. The ALU only take really small amount of area.

JayShenoy

This slide is saying that because communication dominates computation, taking the time to optimize code for locality and parallelism can highly improve program runtime.

You must be enrolled in the course to comment