I'm still having a hard time understanding the difference between this correct approach and the previous approach due to the vocabulary. The earlier approach just used perspective projection / barycentric interp (are those two terms interchangeable?). This new correct approach uses affine interp AND perspective interpolation, which the professor described equivalently as using barycentric interp and perspective division. Does that mean affine interp and barycentric interp are interchangeable terms? Sorry I'm a little confused!
Barycentric interpolation occurs in both the second and third images, which is where I think the confusion comes from. The difference is where in the rendering pipeline the interpolation happens. In the (incorrect) affine mode, the 3d vertices of triangles are projected into 2d screen space, and barycentric coordinates of an interior point are then interpolated between these new 2d screen space vertices. Here the interpolation is an affine transformation w.r.t. the screen-space vertices. In the correct perspective mode, the 3d vertices of triangles are also projected into 2d screen space, but to get the barycentric coordinates of an interior point, we trace the viewing ray of that point back out into 3d space to where it intersects the triangle, and interpolate the intersection point's barycentric coordinates. The interpolation here is a perspective transformation w.r.t. the screen-space vertices.