This is very similar to learning key points from images, you essentially force the key points into a bottleneck and take a reconstructive loss on the output.
This is very similar to learning key points from images, you essentially force the key points into a bottleneck and take a reconstructive loss on the output.