The LED outside the headset is a indicator of the direction of headset. It can help the observers to distinguish the position.
Now most of the VR devices are worn on the head, it might cause burden to the neck. So the headset design could be improved by glasses.
If you're looking for more in-depth material, I stumbled on Ren's original dissertation covering a lot of this material more in depth! https://people.eecs.berkeley.edu/~ren/thesis/renng-thesis.pdf
The design here is really intriguing with how it uses two doublet lenses with an aperture stop in the middle. The first lens's purpose is for spherical aberrations while the second lens is used to correct the astigmatism created by the first lens. The resulting curvature of the lens and vinaigrette is a really pretty aesthetic as a result.
I really prefer a blurred background for some reason. This is also very popular in more "professional" photos. Does blurring the background put more focus on the actual object being taken a picture of. It's interesting to see how a technical challenge has now sort of added to the artistic and creative choices of photography.
It's really cool how Ren did so much work with this crazy technology.
Does computational refocusing involve focusing after an image has already been taken? If so how does it differ from simply focusing while taking the image? Can it be modeled so that these two are the same, which would be extremely helpful.
You definitely can and probably need to. Most of these headsets allow for glasses to fit comfortable thanks to the foam padding on the sides. However, it would be cool to see a diopter adjustment setting to allow for users to use the headset without their glasses.
I wonder how graphics will evolve to portray accurately the inaccuracies in human vision, especially since it is different for different individuals. For example, double vision tends to occur at varying distances for different individuals. My vision gets blurry if something gets too close (will they model this, or simply assume everyone has a perfect vision).
That's a really interesting question. I would read into Retinal Projection. It essentially draws the display directly into the retina of your eye thus eliminating some hardware constraints of the actual headset. I would be curious to see how this tech evolves and if it adopted in the future as the applications for such tech seem infinite.
I wonder how this design affects chromatic aberrations. And how one would go about correcting for this.
I think it's interesting how we only have 3 cone cells. Logically speaking, if we had more than 3 our vision would be more accurate to the real world. However, I've read that having more "receptors" as to say causes worse spacial understanding. It's interesting to think how we would see the world if we had say 16 cone cells such as the mantis shrimp.
Super cool to see the representation on the right breakdown the complexity of the diagram on the left. It makes it a lot easier to understand how the sensor plane and lens plane work in a 4D light field.
I also really like google cardboard, but does the phone have enough display resolution / fps to support a realistic VR experience. I have tried the google cardboard but never the oculus, so was wondering how stark the difference is.
On the flipside of VR, it's cool to see how similar yet different these are built compared to Snap's AR Spectacles. However, it's a shame that the weight + costs of these devices are still not the most affordable and accessible for usage.
This also makes me really curious about how early VR lenses interacted with environments and how they worked. Would it scan the qr code to have different functions?
This takes me back to laser games where you point at markers to score points. It's interesting how this targets can represent points in space to simulate a completely different environment. Kinda makes me think about the concept of point clouds.
It's really interesting to think about how proximity and gaze affects facial temperature. This also makes me think about the concept of personal space in VR and how perception with respect to physical sensitivity is handled in VR imaging
I think the kinect shown here uses infrared light to recover the depth information, and so it is able to reconstruct 3d information from that.
Some of the works presented on these next few slides (e.g. editing lighting, style GANs) seem like they would make great tools for artists. I wonder if they'll ever work well enough that they'll become just another tool in digital art software, and whether there's still a long way to go until that happens. (As cool as these results are, there usually seem to be some limitations that prevent the most current deep learning research from being used in everyday applications)
It's really interesting that NeRF works so well on complex image inputs with a relatively simple neural network architecture. Were any architectures other than a fully-connected network considered, and do we currently have any insight into why they might not work as well / not be as desirable?
I'm curious, how do graphing calculators like Desmos handle implicit functions? Do they just sample a bunch of coordinates and check whether they meet the coordinates, kind of like a Monte Carlo method? Or is there a better/more efficient way of doing this?
Fun fact: in Avenger's endgame, Tony stark used the möbius strip as a metaphor for understanding time travel which later inspired his solution for solving time travel!
Similarly, a piecewise spline is how you could draw independent curves that connect in Figma. Where each line is connected but toggled to curve at it's own tangent.
If anything i'm learning that a lot of graphics algorithms are a lot simpler conceptually than I initially thought. Kinda goes back to a question I asked in class of why we don't use a better or more advanced method probably because the simpler and more intuitive solution works.
I think Figma uses a similar algorithm to allow designers to draw curves. Each point added along a line has 2 toggles that can be used to configure a tangent that translates to how much the line curves at that point.
I think it's soo cool how your brain tries to preserve the image it's trying to see but still gets it wrong. Makes me question how reliable my brain is at visualizing reality.
AWB is definitely a helpful tool at taking photos and make scenes look generally correct in color. But different images can produce different colors colors if the subject is changing color. For instance, this photographer shot 3 images of a carnival ride. The ride keeps changing colors. Which affects how AWB modifies the surrounding object's colors.
I didn't even notice the banana turning green at first. It took a minute to recognize in the recording as well.
whats sorts of biological advantage does colorblindness add? Seems like it limits the quality of life more than it improves. And how do colorblind people enjoy movies?
@rianadon Definitely think this has some biological basis. It's often that we notice the composition of a scene before the details such as color or texture (from viewing only). Since color isn't the first thing we focus on, it's easier for a change in color to go less noticed than the composition of an image.
Is there a Moore's Law that applies to pixels? Surely as pixels get smaller and smaller, there is a physical limit to how small they can physically be. Are we close to the limit or is there still lots of room for improvement?
In photography, this is known as graininess. It's definitely nice to have a little of it since it makes an image look more realistic and not "too perfect." But too much of it is really annoying because it's distracting. Personally, I prefer images with as little noise as necessary since I'm pretty nit picky about the digital contents of my images.
What sorts of algorithms are used to determine that a certain part of an image is the most appealing? It's not like machines have eyes and can see an image the same way humans can.
How is gamut improved on displays? At what point is improving gamut impossible - either by physical or biological limitations?
Hmm I can see how Metamers are very useful in the computing world where we could find an alternative color composition easier produce than something otherwise complex. I guess it just seems kind of repetitive in the real world? Why have several colors that "look" the same when the world may look more appealing or interesting if we could detect more colors?
I think it's really cool that color is a nth dimensional vector projected onto 3 dimensions. It's amazing how the human eye, a biological entity, is able to simplify an otherwise incredibly complex world into 3 general pieces of data.
I think jumping would be a little more complicated since we'd need to factor in gravity and other forces involved. It'd be similar to how a constricted spring, aimed upward is released such that is expands and doing so bounces into the air. Then back onto the ground in a constricting motion.
It makes sense why springs could be used to model the attraction and repulsion between particles. A spring constricting models how particles attract while an expanding spring models the repulsion between 2 particles.
This reminds me of 16b when we were talking about feedback stabilization. In order to keep a system stable, aka not blowing up, we pass in a modified input that in a way minimizes the error that grows exponentially. Could a similar approach be used here?
To add, cartoon characters never seem to die. I'm not sure how manny times Tom has been shot in the face with a shot gun. Or how many times I as a kid believed humans could have faces flattened or shaped funny because somehow they're moldable due to the over-exaggeration in cartoon scenes. It certainly go the point across. Maybe a little too well.
This is really interesting to think about. I'm used to remembering each scene in an animation as the character having 1 general motion. In the case of the squirrel, it would be jumping and not necessarily causing the next action. Just that in the next scene, it's landing or jumping. The fact that i didn't notice goes to show the skill in the animators. They created scenes with secondary motion that doesn't distract from the primary motion.
I think something underrated is the amount of work animators put in to produce scenes in movies. A 1 minute animation could be over 300 scenes drawn alone. In fact, 1 attack on titan episode (roughly 20 seconds) is around 5000 - 10000 scenes alone. https://www.youtube.com/watch?v=Tvj-XnVKQI8
This reminds me of how Lewis Hines, a photographer in the early 1900's, used a shallow Depth o field to create compelling images isolating his subjects highlighting the cruelty in the the child labor industry. His photos convicted the government to later abolish child labor in the United States entirely since each photo told such a compelling story. Full video describing his work can be found here! https://www.youtube.com/watch?v=ddiOJLuu2mo&t=15s
I'd imagine that the slowed shutter speed would increase aliasing in the photos too. I've taken photos where subjects have this purple outline around them because the shutter speed was too slow and they were in motion when the image was captured.
As an avid photographer, I resonate deeply with this statement. It's super easy to get caught up on trying to get the best equipment thinking the best tools will get you the best resulting photo. However, it's definitely more about how you use a lens rather than the the quality of a lens. A tool is pretty useless in the hands of an unexperienced user.
For the shape of the initial geometry, are there any trade-offs when we increase or decrease the number of edges of it?
interesting to see the depth sensor data from iphone. Compared to the cameras on iphone, does the depth sensor have a lower resolution? How high should the resolution of depth sensor be so that we can use the data effectively?
It is quite interesting to see how various designs have enabled bokeh imagery on modern phones: google's software approach, apple's dual lens followed by lidar. I wonder what the tradeoffs are between each of these techniques.
I wonder if there is a technique that allows focus to be placed specifically on an entire object. In this case the head and tail of the dragon are out of focus.
This effect is really quite interesting. DJI has implemented this in their latest Mavic drones which have produced some cool shots of popular landmarks.
The feature points can be determined automatically through some algorithms. Generally, we want points that must be prominent in both images, easy to localize -- one way that's easy is to simply detect corners, where two lines meet. We can find where corners are by using edge detection image algorithms. From there, we do feature matching.
How is the 3D reconstructed from 2D data? Does it use lenses of various focal lengths?
Is there a formula for the expected number of steps to get within a factor of epsilon between an initial geometry and target geometry?
How are the locations of points on the point cloud determined? Are they chosen specifically in locations with high variance?
As someone who has used VR before, lag is so much more obvious than in any other medium. Any slight gap/change in framerate or delay in movement to actual vision not only ruins the immersion but also can lead to motion sickness. It's pretty interesting that there are some headsets that don't directly plug into high-powered systems and use bluetooth (given that bluetooth is a pretty big bottleneck).
In response to reconstructing things using just one camera (or one photo), I recently took a class where we read a paper on it. The idea of the paper was that we would classify objects in the image so that we could have some sort of baseline to depict depth, because otherwise it's pretty difficult to define depth. The intuition for this was based on how we think. If we were given an image, we would imagine the depth in the image by using prior knowledge of the objects in that image. Pretty interesting stuff, and it's cool that it sort of came up in this class as well.
The VR headsets I've tried have this distortion feature and I wonder if this distortion is what makes VR feel so uncomfortable to use. I wonder if undistorting the image would cause too much lag to be worth it.
Another realm of NeRF is camera optimization, e.g., iNeRF and BARF.
How does VR determine where someone's eye is trying to focus on. I imagine that for VR to work well, being able to shift focus is very important.
Here is the project page and here is the project page to HyperNeRF, a similar paper.
I agree. Here's the original NeRF project page https://www.matthewtancik.com/nerf.
Here is a relevant link to GANcraft https://nvlabs.github.io/GANcraft/.
@gowenong It really depends on on the priors built into the method. There are volume rendering methods that can render 3d scene from only one view (these are currently limited to simple scenes containing a single object like a chair or table). There is no optimal amount, more is almost always better. Most NeRF methods use 10-100 images.
@gowenong It is pretty arbitrary. A common shape to use is a sphere in 3D (circle in 2D which is shown here). People have explored data dependent initial geometries.
@adityaramkumar, this is just my perspective on your question, but I believe that traditional mathematical functions are too "simple" in the sense that we cannot accurately represent the complexities of objects/scenes we'd like to model. Not only that, but finding mathematical functions to best match these objects is a difficult task in itself. Therefore, I think these neural networks chip at that problem by steadily being able to better model the objects through rules, features and heuristics, examples of which are seen from this slide.
Does 360 degree mean you can't look up or down? That would be like 4pi steradians, right?
@melodysifry maybe in the future they can set a camera in front of a user (or even very small ones in the headset) to track their facial expressions, thereby allowing facial expressions to be rendered in VR meetings
It seems that voxels are basically 3D pixels stored as cubes. I think one benefit of voxels would be that calculating volume of objects in space is probably much easier, since meshes probably have more unique shapes to them. I do think that if the size of a voxel gets small enough, they probably do better at truly modeling the world around us than a typical mesh.
It's so interesting to see the intersection between neural networks and this class! It's also cool to see how a simple linear classifier could be used for determining something like occupancy in a scene.
You can follow the GANcraft project at this link
Are 2 views enough for volume rendering? Or is there an optimal amount?
It's NeRF or nothin'
Is there a reason why a decagon is chosen as the initial geometry or is it arbitrary? If it's arbitrary, is there a shape that is more optimal for initial geometry?
VR Video does seem to have a very high memory footprint, even over traditional video. Are there any techniques or research being done to reduce this?
Is using a neural network preferable in anyway to just a traditional mathematical function if it works? I would assume that a traditional higher degree polynomial may be better due to the lower memory footprint.
It's interesting to understand how depth sensors work. Our eyes are also depth sensors in and of themselves. What if we had only one camera, how much depth could we reconstruct using just software/inferring things?
Adding on to these applications, I also remember during the pandemic lock down when UC Berkeley had a virtual campus tour over a reconstruction of the campus on a Minecraft server. I feel that players could also use VR in this situation, and the simplified world model would be able to allow players to have more freedom of movement and circumvent the "waypoints" restriction.
This distortion is something I've noticed whenever I've tried on a VR headset. Of the VR systems that are on the market today, which ones if any actually implement the potential solutions listed here on the bottom of the slide?
Our eyes have the ability to selectively focus on objects that are at different depths- eg when I focus on my computer screen, everything in the background is out of focus, and when I focus on an object past my screen, my screen goes out of focus. Is this something that VR will ever be able to recreate, given that everything that is potentially in the foreground or background is flattened onto a single display?
It's surprising to hear that VR research goes as far back as 1968! Given all the current hype around it, I was under the impression that it's a relatively new concept, but it's interesting to know that VR efforts actually go this far back
This reminds me of how 3d glasses at movie theaters isolate the left and right eye perspectives to create this sort of depth of field
The one drawback I see for this in terms of VR as it stands now is that as long as the VR headsets are involved, a VR simulated meeting like this will never be able to replicate the experience of seeing other people's facial expressions and body language. In this sense, even though the feeling of sitting in a 3d meeting room might feel more authentic than sitting at home on Zoom, the key component that makes in-person meetings what they are is still missing
While there's definitely a lot of hype around both AR and VR, I feel like AR has more immediate potential with consumers- VR as it stands now is less accessible given the fact that you need an expensive headset to experience it at home, and AR has already proven its immediate potential with Pokemon Go blowing up how it did (even though the hype has almost completely fizzled out since then). One cool application of AR I can think of in our daily lives is for "previewing" what new buildings/structures could look like- oftentimes when there's a new big urban construction project, there'll be a printed picture displayed of what the space will look like after the construction project is done, complete with imaginary people going about their business in this new space. I can imagine an AR alternative to this where people could scan a QR code and be able to see what this new space would look like in 3d from their phones
This is an interesting problem and shows how challenging it is to simulate human perception. Our eye focuses on different points when rotating within eye socket, and the computing system should change the focus point correspondingly in order to deceive our brain.
Gaussian blurring is also used for reducing the size of an image. Here is an excerpt from Wikipedia: "When downsampling an image, it is common to apply a low-pass filter to the image prior to resampling. This is to ensure that spurious high-frequency information does not appear in the downsampled image (aliasing). Gaussian blurs have nice properties, such as having no sharp edges, and thus do not introduce ringing into the filtered image."
The result of this filter looks very similar to a filter that beautifies people's faces in images and videos. I wonder if a similar filter is used.
Totally agree on Crystal. Human brains are accustomed to extracting motion / mapping information from a large FOV. A small FOV hinders such ability and can make one feel dizzy.
I think Gaussian blur is used in practice. For example, you can find it in photo editing tools like photoshop or programs like openCV.
Yes. Missing any of the three cone pigments could result in color blindness. And each cone receives specific wavelength.
I have a similar concern, but I would imagine elementwise integer division might be slightly faster than setting a threshold(which might be a if statement?)
A moderately amusing anecdote in this vein is that, when using the Oculus Quest headset, it has a passthrough mode which lets you see through the cameras (used to orient yourself or set up a virtual boundary). While in this mode, the controllers look fairly normal, but if you happen to glance at the controllers for someone else's Quest, you see these little lights on them. I always guessed that these were somehow related to tracking, but I was never sure of why they were only ever on controllers belonging to other headsets. Now my best guess is just that the Quest disables some IR lights on the controller to help mask some of the internal workings of the system, even though that's how it tracks the controller positions (since the only cameras are on the headset itself).
Google Cardboard seems like such a good idea because even if it's not perfect or as good as another VR headset, it provides an economic alternative to access VR content. Most people already have phones and as long as the apps they are using support VR, they just need a bit of cardboard to transform it into a VR setup.
I wonder how the human visual field of view interacts with the FOV you can set in games. I'm not sure exactly how this works for VR but in most normal games on a screen you can easily change the FOV a fairly large amount in the settings. Do VR games have to match natural human FOV for it to look good, and if were able to play around with the FOV would it look very disorienting/weird?
6 degree of freedom is important because we can simulate the motion parallax effect, which essentially says object closer to us moves faster across our retina. (We need to measure our head motion across x, y, z to display the correct scene) This motion parallax effect is essential for 3D perception. A youtube video demonstrates this: https://www.youtube.com/watch?v=Jd3-eiid-Uw&t=195s
Yeah ig like one example of where ISO gain is very important is nighttime photography, where there is very little light coming into the sensor - hence why we so many grainy pixels.
I believe so as we move the sensor closer to the lens we would see more of the image.
On a separate note, to me it is very impressive to see the zoom on some of these cameras. I remember growing up playing around with small 2 or 4 mp cameras around my house, and even in good lighting condition I would get blurry images that I could not zoom in on. Now, it's crazy to see a camera as shown here that has impressive range of zoom and high level of detail.
Texture details are super interesting. I'm curious how texture can be used to generate new samples that are different from the original texture sample. It would be interesting to see randomness / random algorithms that try and approximate unique textures that all sort of look similar.
I would think that unbarring would be difficult since you are averaging or smoothening values, it may be hard to get that level of detail again. Although, it may be similar to upscaling, but I would guess it is hard to do.
I believe 3D glasses in 3D movies rely on this binocular cue to create a sense of depth. For every scene in the movie there are two images corresponding to left and right perspective. The left is only seen by left eye and right only by right eye through light filtering. And our brain will fuse the two images together to form a sense of depth.