@JunoLee128 Yup! I've also seen bezier curves in Maya, a computer graphics application to make 3-D animation! Specifically, it's used to simulate movement, and we can use these bezier curves to smoothly move our objects around in a way that seems human-like.
I encourage everyone to also check out ray marching! It has the same function as ray tracing but instead of directly calculating an intersection, it will instead "march" forward iteratively more and more and more UNTIL it hits an object, OR until it shoots off into space without intersecting anything (in which case, there is no intersection).
@andreisito I think it has more to do with perspective? For instance with how goofy is OVER the dog, it makes the situation very clear, that goofy is mad at the dog and yelling at the dog.
@el-refai from different clips that I've seen, it looks like the battery usually fits inside their pocket, and they can just walk around with a cord connected from the vision pro to the battery, so maybe not too cumbersome?
The Fresnel lens is valuable in motion picture production not just for its capability to concentrate light beams brighter than conventional lenses but also for ensuring relatively consistent light intensity across the entire width of the beam.
Is there a way to upgrade the sRGB in a way that helps it display the gamut that the Apple P3 can display? For example, as we saw on some example questions, can we add another color or light intensity into our equations?
In what ways can we modify the eye to perhaps see things that are currently outside the scope of human vision? Are there certain limitations in the shaping of our eye that cannot be overcome, even with modifications?
This set of slides documenting the different effects of different focal lengths and camera positions puts into perspective the massive effects of the technologies used to capture images. I wonder what it would take to model these effects digitally; for example, if we take an existing image with some focal length, can we modify it to accurately display the same image with the perspective of a different focal length?
It's interesting that sampling on the area of the light source produces clearer, more accurate lighting and shadows than sampling on the hemisphere. This makes sense to me because sampling on the hemisphere, at a high level, is less precise, as it is easier to track the ray when it comes from the light source itself.
Though I am not an artist myself, I can see the potential of art being able to be crafted from different angles, and being able to paint by seeing on VR would allow the artist to more easily output it.
How does the circle of confusion compare to the relationship between the focal plane and focal point, as images are in focus when the latter two align?
Here's a paper which uses a neural network to dynamically animate a controllable character. Something I find super interesting about this is that the model is capable of adapting to the geometry of the environment such that it's movements appear more natural:
I think a relatively more uniform environment or lack of distinctive features can be small issues for Inside Out Tracking but the number of infrared LEDs can solve that with more computations.
Since something like walking is so commonly seen with our eyes, we tend to be more critical if the walking seems off. Same goes for human faces; when I see an animated human face and animated dog face, I would think the dog face is more accurately depicted.
This kind of technology captures not just the visual elements of the concert from all angles but also the audio, which is crucial for a full VR experience. Such setups are designed to recreate the live experience as closely as possible for viewers at home, giving the sensation of being at the concert. The VR camera system usually contains multiple lenses (as shown in the image) that cover different angles, stitching the footage together to create a seamless spherical video that allows users to look around as if they were in the venue.
It's interesting to see how something we learned early on in the semester is still very prominently used in our final project. Not just linear interpolation across triangles, but interpolation in general seems very important throughout graphics.
While reading about popular filters, I came across a filter for making images or videos appear hand-drawn called the cartoon filter. It utilizes Sobel edge detection to identify edges and enhance them further. Then, it blurs and simplifies colors and details to simulate the less realistic hand-drawn effect. Lastly, it overlays the original image slightly.
Found this good article that represents the intuitive differences between brightness, chroma, and hue. Hue pretty much just gets the color value while chroma represents the intensity and saturation. Basically how much grayness is mixed in...
To make VR experiences as realistic and comfortable as possible, the camera views (and thus the rendered images) need to be dense enough to approach the resolution of human vision. This could potentially reduce the screen door effect, which is the visible fine lines separating pixels on the display, and improve the overall visual fidelity of the VR experience. However, achieving this is technically challenging and requires more advanced display technology and rendering techniques to handle the increased computational load.
"Spherical stereo" typically refers to stereo vision systems that capture and process images in a spherical coordinate system, often for applications like 360-degree panoramic imaging or immersive virtual reality.
6-DOF (Six Degrees of Freedom) head pose estimation refers to the process of determining the position and orientation of a user's head in three-dimensional space using six degrees of freedom: three for translation (moving in x, y, and z directions) and three for rotation (pitch, yaw, and roll).
The Oculus Rift uses a tracking system to monitor the movement of the headset and controllers in physical space. One of the key components of this tracking system is the constellation tracking system, which uses infrared LEDs on the headset and controllers along with an infrared camera to track their position and orientation.
It's interesting to see that the brain interpolates neighboring rod and cone information to account for the lack of information in the blind spot. Instead of doing this purely spatially, the brain also takes advantage of the fact that the eye cannot remain still and slightly moves every split second, thus moving the blind spot. This allows for a smaller and smaller blind spot as the brain compares input over time
Compared with AR, VR equipment requires more components to enhance the user's immersion, so it will be heavier. But when we realize we're wearing a device, it breaks the immersion a little bit. The optical waveguide technology I learned about can guide images directly to the user's eyes, and it is possible to image directly on the retina so that the user sees a virtual image overlaid in the real world. However, there are still considerable technical challenges, and it is hoped that display devices can be miniaturized in the future.
Understanding the distribution of photoreceptors helps in designing VR displays that mimic how the human eye perceives the real world. Since the central vision is cone-rich and thus more sensitive to detail and color, VR systems often use techniques like foveated rendering, where the highest resolution and processing power are concentrated in the area where the user’s gaze is directed, typically the center of the visual field. This not only creates a more realistic visual experience but also optimizes computational resources as the periphery, which is more rod-dominated and less sensitive to detail, is rendered at a lower resolution. This biological insight drives the technological advancement of VR, leading to more efficient and realistic visual simulations.
I feel like the frequency domain representation is less efficient, since there might be infinite frequencies? But in the slide, it seems like the frequency domain is the same size as the spatial domain
We mostly covered triangle meshes in this class. However, when working with Blender, I found that they only use quad-meshes. Is there an advantage that quad-meshes have over triangle meshes?
Recent developments have focused on improving the efficiency and accuracy of ray sampling in the context of volume rendering and neural radiance fields (NeRF). There's a shift from classical methods, which rely on a uniform distribution of ray sampling that doesn't necessarily reflect the real-world surfaces and can miss capturing high-frequency details such as sharp edges in images. The new strategies aim to rectify this by optimizing the distribution of sampled rays.
The latest methods involve pixel and depth-guided ray sampling strategies. Pixel-guided strategies focus on the color variation between pixels, using this information to guide non-uniform ray sampling and emphasize areas with greater detail by detecting higher standard deviations in the color of pixel neighborhoods. This strategy ensures that more sampling is done in areas of an image that are rich in detail, thus more critical to the visual outcome.
Depth-guided strategies, on the other hand, address the variations in depth within a scene, which are particularly challenging in regions with rapid changes. These techniques aim to sample more densely in areas with more significant depth variations to avoid issues like blurring of three-dimensional objects at their edges
After finishing my final project which involved Unity meshes, I'm confused to why everyone does not use the half-edge data structure to represent meshes. I had a lot of difficulty with iterating throughout the mesh efficiently without half-edges. Is creating a half-edge mesh computationally intensive?
b(t) is the function to evaluate the bezier curve. 2) the "slope" in this case is w.r.t the parameter t. It is a vector that points in the tangent direction. I'm not sure where the constant of 3 comes from, I think its related to the algebra of the bernstein polynomial somehow.
To my understanding the speed at which the tracking occurs is of great significance to the comfort of the user. From what I heard many modern systems not only perform explicit tracking but attempt to make predictive steps to reduce the perceived delay. Will this be touched on?
What are some of the applications of Bezier curves outside of rendering. I saw someone use a bezier curve to model a wind force on the cloth and found that super interesting.
Given the visual cues provided by panel displays & enhanced by VR/AR, how do these methods compare wrt the user's perception of depth and space in a virtual environment?
Higher resolution helps make VR feel real, but it needs more power. Do VR games change resolution or field of view on the fly to match what's happening, or is it fixed?
Given the various weaknesses, I was just wondering if there was some way to integrate automation to make kinematic animation more efficient / still consistent with physical laws?
One thing I'd worry about with the widespread adoption of VR is an overdependence on it--if users are prone to medical conditions like epilepsy or migraines, it could seem like another way to divide the population.
Even with better algorithms for fixing images, do they save time compared to older methods? And does this mean we can use simpler cameras or less processing?
L* represents brightness from white to black while the a* corresponds to aberration between red and green and b* blue to yellow. Allows to see more differences in color.
One concern I have with VR itself is how most VR experiences, though immersive, aren't significantly transformative to the usecase they're being applied to. For example, here, I'm not sure if a VR Teleconference would really be that much better than a zoom meeting, because in either setting you can't physically touch or share any objects or the setting around you in your call.
I wonder if early Virtual Reality Research had any health issues associated with them -- I would imagine early choice of materials and use of various wavelengths wouldn't be the most properly studied
It's interesting to see how monocular cues like shading and perspective can provide a 3D experience. In terms of rendering efficiency, do techniques that simulate depth perception, like occlusion and perspective, require significantly more computational power compared to 2D rendering? Also, for VR/AR applications, how do head-tracking algorithms differentiate between intentional head movements and accidental shakes to ensure consistent perspective rendering?
are there any attempts to widen the triangle to match the full visual spectrum? I guess colors could be more vivid if we could have darker reds and blues. Or is it a limitation of the hardware we have today?
Really curious how 360 degree cameras work, do they stitch together photos, or are frames of videos being saved to stitch together a seamless 360 camera?
Is there research done on very large lightfield displays not meant to be mounted to the head? This might be able to allow higher resolution relative to degree while allowing larger space by placing the displays further/
I am just curious but is there names for the colors that require negative red light to achieve? We probably won't be able to see it, but was just wondering if there are color codes/color names for the whole space of r, g, b wavelengths
Is there a special way to interpolate timesteps in animations? Lets say you want a scene to slow down to a crawl. Would we be expanding sin/cos waves to achieve this?
The bilateral filter has several key parameters like the standard deviation of the Gaussian function in the spatial domain and the value domain, which need to be carefully selected to adapt to different image and noise conditions. But I'm a little curious whether the variance of these parameters will be very large, and whether manual choosing of parameters can get the optimal solution.
I think it's very interesting to see how the number of photons hitting the sensor during exposure can make the image clear or noisy. This variance, which follows a Poisson distribution, is common in nature for modeling random events. It's fascinating to realize that shot noise is an inherent part of the imaging process, highlighting the limits of precision imposed by the laws of physics.
@jananisriram
on the next 2 slides, it states that for the image zn = 0.3m and zf = 0.6m so I think it is highly image dependent on what near and far features you want to capture.
I read that size of the markers can have an effect on features being captured. To track at large distances, you would need larger markers. They also help with visibility. However, to capture intricate features like facial expressions, you need a larger amount of markers and hence they need to be smaller. As such, I'm curious how the Oculus Rift is able to compromise on this and whether any post processing is needed.
Many image editing software packages feature automated lens distortion correction tools that can detect and correct distortions based on metadata embedded in the image file which streamlines the process for photographers
I think what's really interesting about HDR is how it enhances image sensors. I've heard there are multiple methods for image sensing that don't destroy a single pixel. However, the downside is the limitation in adopting advanced methods in mainstream devices! So, I guess the question is how such technologies might eventually trickle down to consumer-level cameras?
If we can change viewpoint, how to simulate blocked part is really a important question. I am thinking about using generative AI to simulate this part, though it might be super expensive.
The number of possible solutions increases with the number of independently moving parts. Couldn't we define inverse kinematics this way? Have a set position for each arm so that we have a well defined solution.
This looks amazing. when the single banana is blue we will think it is blue. Howerver, when we add a color filter of whole picture, it looks like yellow again. It is so cool that how our brain control color feelings.
I find it interesting to see how this function elucidates the varying sensitivities of the human eye to different wavelengths of light. The graph here is especially intriguing as it compares the visual responses of the human eye under low light and normal daylight conditions. For example, we can calculate how much the eye can perceive with a certain wavelength of light and how dark it is.
Recently, I have seen some techniques that use machine learning models to reduce noise in images. We need to collect images containing noise and corresponding images with no or less noise as training data. This requires a lot of high-quality images. I think a very interesting way is to manually add noise to high-quality images. to generate large amounts of training data.
I find it very interesting that one eye has trichromacy, which is the normal state of human vision, allowing the perception of a full spectrum of colors. As someone with red-green color blindness, it's intriguing to consider how I cannot see certain colors while others can. It reminds me of the diversity among humans and how our senses shape our understanding of reality.
I read that active optical motion capture works better in a variety of lighting conditions and is more robust but can be more expensive. However the electronics involved in active are heavier and use more power which requires actors to require a battery pack and thus can be inconvenient.
I think this is really interesting. I always assumed that these were issues that animators would have to go over and clean up. I wonder how well this works when objects are more complicated (has a intricate bump map, for example). I could see it potentially causing strange movements on the bump map as the bones move.
How can cameras like these account for the information lost within the in inside of the sphere? Is there a way for an interior camera to help patch this together?
Interesting to see how diffusion models have evolved from these initial pattern match image completions. The semantically guided adobe photoshop fill is a pretty impressive example.
I wonder if there are algorithms that have the ability to use the surrounding features to reextract a real-life saturation from pixels based on their surroundings or if there is too much info loss.
All of these head-mounted displays are really cool but the biggest issue I feel with these especially considering how high fidelity the vision pro is now is the battery life. For people to use this it needs to last a long time which requires a large battery but these batteries can't too large or else they'll be cumbersome to move around with. Really curious as to how companies plan to address this.
I'm really curious as to how you guys went about adding the liquid to this scene. Was it essentially just treated as a solid object but then you gave it a liquid appearance? Very cool nonetheless!
I'm a big fan of this art! I really like how you were able to convey the birds facing each other and get the wings to be really clear without having to do a lot of work with modeling
After doing project 4 and the amount of work to get the textures working I wonder how difficult modelling anisotropic stuff would be like, since it goes from 2d to 4d
To reproduce the color we want to chose a spectrum s' that when projected onto the span of eyes spectral response functions reproduce the same visual response.
So we can see that two dots represent the different kinds of spectrums of light and any that project onto the SML visual response space represent metamers as they had different locations in the higher dimensional space.
Comments