CS184/284A: Lecture Slides

Lecture 26: Image Processing (42)

Unicorn53547

An interesting discovery in deep learning articheture, convs are more like high-pass filters and Multi-head Selt-Attentions serve more as low-pass filter. Ppaer details here This actually mathces intuition in some sense, as CNN normally focuses on local high frequency features, and Vit style arch puts attention on global low freq features. But actual design is much more complicated than intuition. Correct me if my intuition is wrong.

waleedlatif1

One impressive feature of the human body is that our eyes are also able to compute a form of the Fourier transform. The photoreceptors in our eyes detect light and transmit these signals to the brain, with the high spatial frequency cells being in the retina and the lower ones in the periphery. Moreover, as we move through the levels of processing in our eyes, some are directionally or color selective which allow them to detect lines and edges.

You must be enrolled in the course to comment