Lecture 24: High Performance Image Processing & Halide (18)
Zyy7390
This is closely related to 61c where utilizing locality is crucial to better parallelism (SIMD, OpenMP) performance.
gprechter
To elaborate on how this organization of the code is important, if I recall correctly, rearranging for loops that go across an array of values is important for accessing large array data structures stored in memory. Particularly, the computer's fastest section of memory is it's caches. With large arrays, portions of the array can be larger than a cache block. This can cause many misses to the cache when iterating through an array. Especially if the array is laid out in row major order, by staying at the same row, it is more likely that there will be more cache hits when iterating through the array; while skipping around to different rows may be in different memory blocks and may cause compulsory misses to the cache. These are tons of these small optimizations that are hard for programmers to think of.
AnastasiaMegabit
It's actually rather jarring to think about the effect a simple reversal of two lines has on the speed. 15x faster is no joke when we are talking about images that are so heavy on computation. As an aside, I was doing a leetcode problem the other day and realized I could use an advanced for loop instead of a traditional for loop and it sped my time up somewhere in the range of 10x-15x. Little changes make big differences.
tyleryath
We touched upon this topic in 61c in one of the projects my semester. We were given the goal of optimizing a numpy like library written in C by using things like OpenMP.
This is closely related to 61c where utilizing locality is crucial to better parallelism (SIMD, OpenMP) performance.
To elaborate on how this organization of the code is important, if I recall correctly, rearranging for loops that go across an array of values is important for accessing large array data structures stored in memory. Particularly, the computer's fastest section of memory is it's caches. With large arrays, portions of the array can be larger than a cache block. This can cause many misses to the cache when iterating through an array. Especially if the array is laid out in row major order, by staying at the same row, it is more likely that there will be more cache hits when iterating through the array; while skipping around to different rows may be in different memory blocks and may cause compulsory misses to the cache. These are tons of these small optimizations that are hard for programmers to think of.
It's actually rather jarring to think about the effect a simple reversal of two lines has on the speed. 15x faster is no joke when we are talking about images that are so heavy on computation. As an aside, I was doing a leetcode problem the other day and realized I could use an advanced for loop instead of a traditional for loop and it sped my time up somewhere in the range of 10x-15x. Little changes make big differences.
We touched upon this topic in 61c in one of the projects my semester. We were given the goal of optimizing a numpy like library written in C by using things like OpenMP.