When one fragment shares the instruction stream of another, does this mean that it is switching between register states similarly to thread switching or is this more lightweight than that?
williampeng20
I think what the following slides show is that the fragments become operated on by SIMD (Single Instruction Multiple Data) Processing, so there are multiple scalar registers that each hold these fragments and operated on by scalar operations.
SKYSCRAPERS1999
I think it is not thread level. Instead it may be implemented by vector processors. You could see CS61C for help.
kevkang
For more information on how CPUs do vectorization: https://www.codingame.com/playgrounds/283/sse-avx-vectorization/what-is-sse-and-avx
shannonsh
There are CPU-specific libraries like _mm_add_pd that can handle vector operations quickly and in parallel. For a memory trip back to 61C, see here: http://inst.eecs.berkeley.edu/~cs61c/fa18/labs/8/
Caozongkai
I think it means that the instructions are shared, but the registers and other stuff are still separated. So we only need one fetch/decode module for multiple ALU.
tigreezy
Instruction stream sharing means that one instruction can be run on multiple ALUs at the same time because they share the same instructions. This means that instead of running each fragment separately, we can execute 8 of them simultaneously because we have split the ALU from 1 to 8. This is implemented in the hardware so that an instruction can be run on 8 vectors at the same time.
When one fragment shares the instruction stream of another, does this mean that it is switching between register states similarly to thread switching or is this more lightweight than that?
I think what the following slides show is that the fragments become operated on by SIMD (Single Instruction Multiple Data) Processing, so there are multiple scalar registers that each hold these fragments and operated on by scalar operations.
I think it is not thread level. Instead it may be implemented by vector processors. You could see CS61C for help.
For more information on how CPUs do vectorization: https://www.codingame.com/playgrounds/283/sse-avx-vectorization/what-is-sse-and-avx
There are CPU-specific libraries like _mm_add_pd that can handle vector operations quickly and in parallel. For a memory trip back to 61C, see here: http://inst.eecs.berkeley.edu/~cs61c/fa18/labs/8/
I think it means that the instructions are shared, but the registers and other stuff are still separated. So we only need one fetch/decode module for multiple ALU.
Instruction stream sharing means that one instruction can be run on multiple ALUs at the same time because they share the same instructions. This means that instead of running each fragment separately, we can execute 8 of them simultaneously because we have split the ALU from 1 to 8. This is implemented in the hardware so that an instruction can be run on 8 vectors at the same time.