From two eyes' perspectives, we can get the depth information by calculating the distance between point pairs. So is there some algorithm that we can use to get the depth information from only one perspective?
Franklin-Zhang0
That's a good research direction that researchers are studying. There're already some impressive works published. For example, there are algorithms(e.g. WorDepth: Variational Language Prior for Monocular Depth Estimation) using text's information to eliminate the ambiguity of image's depth information. It's like how we human process the image information. First, we know what the object is. Since we have a priori knowledge of its scale, we can utilize the priori knowledge to get an estimation of the depth infomation of the object.
rcorona
I think the work mentioned above discussing language-derived priors sounds super interesting. To add on to this thread, I think there's also work on monocular depth estimation by enforcing temporal consistency through a video (e.g. https://roxanneluo.github.io/Consistent-Video-Depth-Estimation/). My high-level understanding is that although a single image might not give sufficient information for depth, changes observed over time should yield cues (such as parallax) for determining the depth.
agao25
It seems like there's quite a lot of research ongoing about this effect. I'm curious how some of the algorithm's that @Franklin-Zhang0 mentioned worked and the simple case of how these headsets combine the left-eye and right-eye perspectives. Or is it just that the presentation of these two images adjusted by an algorithm are just displayed, and we have our bodies (mainly our eyes) "figure it out?"
I think it's interesting how it's absolutely necessary to render the two eyes separately in VR to make the 3D aspect of scenes appear. From my understanding, if you render the same view in both eyes in the VR headset, not only would the scene not look 3D, but it would kind of be a "flat" picture? Where if you move from side to side it would kind of just remain 2D and form an incomprehensible/unrealistic scene.
jasonTelanoff
Building off off rendering for both eyes, for ray tracing, would the path tracing algorithm send out rays for both eyes, or is there a clever algorithm for estimating the lighting with still only sending out rays from one point.
carolyn-wang
I did a project on this in a previous class that changed the projection of one image to match a second image's orientation. That made it possible to stitch the images together to create a panorama.
GarciaEricS
If a person was really in this perspective, would the two images really look so different? I can't tell if this is exaggerated so we can understand vergence, but it seems like the two perspectives are very very different. Perhaps our different eye perspectives really are that different but we just don't notice it in our daily lives.
From two eyes' perspectives, we can get the depth information by calculating the distance between point pairs. So is there some algorithm that we can use to get the depth information from only one perspective?
That's a good research direction that researchers are studying. There're already some impressive works published. For example, there are algorithms(e.g. WorDepth: Variational Language Prior for Monocular Depth Estimation) using text's information to eliminate the ambiguity of image's depth information. It's like how we human process the image information. First, we know what the object is. Since we have a priori knowledge of its scale, we can utilize the priori knowledge to get an estimation of the depth infomation of the object.
I think the work mentioned above discussing language-derived priors sounds super interesting. To add on to this thread, I think there's also work on monocular depth estimation by enforcing temporal consistency through a video (e.g. https://roxanneluo.github.io/Consistent-Video-Depth-Estimation/). My high-level understanding is that although a single image might not give sufficient information for depth, changes observed over time should yield cues (such as parallax) for determining the depth.
It seems like there's quite a lot of research ongoing about this effect. I'm curious how some of the algorithm's that @Franklin-Zhang0 mentioned worked and the simple case of how these headsets combine the left-eye and right-eye perspectives. Or is it just that the presentation of these two images adjusted by an algorithm are just displayed, and we have our bodies (mainly our eyes) "figure it out?"
https://www.nature.com/articles/s41598-022-24450-9 https://pubmed.ncbi.nlm.nih.gov/1009881 8/
I think it's interesting how it's absolutely necessary to render the two eyes separately in VR to make the 3D aspect of scenes appear. From my understanding, if you render the same view in both eyes in the VR headset, not only would the scene not look 3D, but it would kind of be a "flat" picture? Where if you move from side to side it would kind of just remain 2D and form an incomprehensible/unrealistic scene.
Building off off rendering for both eyes, for ray tracing, would the path tracing algorithm send out rays for both eyes, or is there a clever algorithm for estimating the lighting with still only sending out rays from one point.
I did a project on this in a previous class that changed the projection of one image to match a second image's orientation. That made it possible to stitch the images together to create a panorama.
If a person was really in this perspective, would the two images really look so different? I can't tell if this is exaggerated so we can understand vergence, but it seems like the two perspectives are very very different. Perhaps our different eye perspectives really are that different but we just don't notice it in our daily lives.