EE 267 is a good course to take after this. They have some overlaps but not too much, help you understand the transforms and lighting/shading better, and will go deeper into the VR aspect.
Because smart phones are carrying more and more functionalities instead of only making phone calls plus texting. For a large portion of the functionalities high resolution seems necessary. For example games, photos, videos...
Yeah I think you are correct. Since the shadow map has limited precision there must be some unsampled regions on or around the dividing line between shadow and non-shadow. So then the dividing line can hardly be precise and sharp.
All the interpolations in this section were constraint-free. However, what if we know there is, e.g., an object in the path that we must avoid in the interpolation? It seems like such problems can very quickly become non-convex.
Is a screen shade (e.g. a partially transparent shade) better thought of as blocking a light source or casting a shadow?
To determine the light sensed at Pixel P1, would you just add the values from each incoming ray?
Are there any nice properties of the Gaussian blur or do we just use it because it's a popular probability density function?
Is what's stored in a shadow map a vec3 representing a direction and a float t representing the depth of first hit in that direction? Given that the direction of the shadow rays might depend on our focal plane, how do we normalize what each point in the texture represents?
Why are floating point operations so much more expensive than integer operations?
When you're rasterizing, the only depth buffer you care about is the one at the point you're trying to render. Doesn't that mean it has the same lack of data dependencies as ray casting?
Are we guaranteed that an edge collapse will always be possible? There are cases were collapsing an edge leads to an invalid mesh.
Along the lines of the above comment, someone may find this WWDC talk on designing the fluid gestural interface of iPhone X: https://developer.apple.com/videos/play/wwdc2018/803/
The principles of animations from the previous lecture often exaggerated natural motions to compensate for lack of realism. Are actors in motion capture settings trained to follow them?
I think through convolution the (-1, 1) and (-2, 2) pairs will extract the differences between pixels either horizontally or vertically, and that's the gradients. I'm not sure how will the effects change if the matrix is multiplied by some scalar value, maybe it could still extract the gradients but just the colors of the resulting image will change?
I think you might wanted to say top-left?
Yes. Top left is the low freq data which survived
So does the zeroed-out information in the resulting matrix represent the information lost in high frequency?
Why is this the case again? Is this because the shadow map has limited precision and some boundary points are lumped together to one sample on the shadow map?
Derp you are right. Fixed!
A really good convex optimization class that I have taken is CS334A/EE364A (http://web.stanford.edu/class/ee364a/). Although it only focused on solving convex problems, it still gives a very good basis for understanding optimization in general.
I still don't quite understand how they conducted these experiments. What does it mean to perceive something is 2 times or 1.319 brighter by the human.
It seems like another way to reduce the power is to just render less pixels. How necessary is it to have such high resolution on phone screens especially since the triangles that we are displaying are large in comparison to the pixel size.
Huh: it looks like that "precomputed realtime GI" approach in the link is a little more flexible than outright baking the lighting in.
I found this to be a useful explanation: https://docs.unity3d.com/Manual/GIIntro.html
They have a lot of interesting little details. The Light Probes are a cool idea. Particularly strange, since most of the things we've seen have been per-frame:
"…while Precomputed Realtime GI does the final lighting at runtime, it does so iteratively over several frames, so if a big a change is done in the lighting, it will take more frames for it to fully take effect."
I just learned about the HEIC format, which my iPhone uses for photos already:
https://en.wikipedia.org/wiki/High_Efficiency_Image_File_Format
https://iso.500px.com/heif-first-nail-jpegs-coffin/
It uses this HEVC compression, which supposedly can take 50% less space than JPEG at same quality level. I haven't figured out how it works / what's new about it, though... all the information about it is with respect to video compression stuff, rather than comparison to JPEG.
If I understand correctly, then, there are
colors that can be represented as, say, ordinary CSS rgb(), and displayed on my laptop display
colors that can be represented as CSS rgb() but cannot be properly displayed on my laptop display
colors that can't be represented as CSS rgb() but can be displayed (if you set the OS color preferences to stretch the gamut, or have a custom graphics application, or something)
colors that can neither be represented nor displayed
or maybe there's just 3 out of these 4 because they're subsets?
Is there an effort to make all monitors have maximum gamut, or are people satisfied with the current state?
Is there a deep learning attack on this joint motion problem? Deep learning tends to be powerful in areas where humans perform better than computers (playing Go, image classification, natural language processing). Humans are pretty good at joint motion.
The optimization approach here sort of reminds me of the old-school signal processing computer vision stuff, which works pretty well (and predictably), but is outperformed at the high end by brute force learned methods.
I've heard that training is a problem for learning motion in robotics (robots take a while to move, I guess), but maybe not in graphics?
My mental model of OLED displays is that they're giant matrices of these OLED r/g/b elements which can be individually lit up, rather than having a distinct backlight. Is the rolling backlight effect because you need to multiplex all the elements onto relatively few control lines so you don't plug a million wires into your GPU, and the controller then spools the output onto the OLED elements from top to bottom?
Then could you build an OLED display with a different controller that didn't have a rolling backlight?
Basically, if Oculus did custom displays from scratch, could they get around this problem, or is it an inherent part of the display technology?
Why doesn't this effect fall naturally out of having shadows implemented (whether with precomputed shadows or whatever)? Do shadows only render for coarse, big objects? Does exaggerated ambient occlusion just look better than ordinary computed shadows?
Does "most effective" here mean that it saves time on the early Z culling because you can throw out triangles instead of having to render and do a depth comparison? Will it still work, just slower, if you don't sort?
I like the joke, but I guess it's not 100% right to say this is a rejection of perspective in the right 2 examples, since I think those were both games that predated modern 3D hardware?
I assume games like SimCity 2000 used this 'isometric' view so they could run fast enough on old-schoo 2D graphics technology: some PC framebuffer, or sprites and tiles on the SNES, or whatever. Actually, I'm curious how those 2D graphics acceleration interfaces looked, as opposed to the pipeline we saw in class.
I've heard the joke that home computers are basically graphics processors with a vestigial CPU on the side. It seems pretty true to me, from the Apple II to video game consoles to the Raspberry Pi to smartphones. Whenever you have a consumer machine, you usually want graphics, and when you want graphics, that's often the most demanding thing the machine has to do (low latency, high throughput, specialized math). It's funny that people usually think of the CPU as the main object to program and the GPU as just this add-on to that.
For interactive animations (e.g. iOS, maybe video games?), my sense is that you should favor procedural animation. See here: - https://medium.com/@flyosity/your-spring-animations-are-bad-and-it-s-probably-apple-s-fault-784932e51733 - https://github.com/chenglou/react-motion
Keyframing and motion capture are brittle: if the user stops touching, or drags the thing backward, or varies their finger movement speed, your data is fixed and can't react. You hardcoded that the object moves along this curve for this amount of time, whether the hardcoding happened by entering keyframes or by motion capture. The animation will feel like a pause-play video instead of a living system.
But if you're modeling the animation procedurally, like if the thing is a spring attached to the user's finger, then your simulation naturally responds to user interaction.
Is one approach more parallelizable than the other? It seems like both have a lot of parallelism: rasterization lets you process fragments in parallel for each polygon, while ray casting lets you process rays in parallel. I would guess ray casting is more parallel because it doesn't have the data dependency of the global depth/color buffers: all rays can cast in any order, and you can just collect their colors at the end.
What if the triangle is moving due to interaction and you don't have the prism in advance? (Is triangle-triangle intersection costly in practice? I guess if you have giant models with millions of triangles and you want to find their precise intersections?)
In general, I'm curious if there are 'online' / 'incremental' / ? versions of some of these algorithms. The next lecture focuses on partitioning for acceleration, but if you're doing this in realtime 60 times a second and cameras/objects only move slightly on each frame, maybe there are ways to reuse previous work for acceleration as well?
Highly recommend PSYCH 221 as well. Focuses a bit more on the human visual system and color science but goes fairly in-depth on the engineering side as well. It's an excellent intro course for EE 367.
As stated in the previous slide, rigid body skinning can result in unnatural looking discontinuities when neighboring mesh vertices are assigned to different bones. By accounting for the influence (weighted) of other bones when transforming the mesh vertices, we get a smoother transition where we had discontinuity previously.
Perhaps taking the hash of a video and using that for verification?
For some datasets you download for computer vision, they're large, so to verify that they're not corrupted you can take their hash and make sure it equals. We could do the same thing for important videos, and make sure they match the published, official video's hash. Of course that only works if we have a trusted video source.
Also Doug Lanman gave a talk at Stanford SCIEN and more recently on Focal Surface Displays
There are obviously some serious ethical issues surrounding this work. While one can make comparisons to the advent of image manipulation, video is often used as a standard of strong legal evidence in many jurisdictions. Faked video could be used to libel or humiliate public figures. Alternatively, one could always claim that damning-footage is not real. Especially when considering how nation states and adversaries of the United States already manipulate media towards promoting their own interests, I wonder what sort of measures could be taken to verify a video's authenticity. Curious to hear others' thoughts
We quantize the DCT coefficients rather than individual pixel intensities because the quantization can be designed with regards to the human-perceptive visibility of pixel terms –namely contrast and frequency.
There are actually various kinds of color blindness. Most commonly people with color blindness have anomalous trichromacy in which all three cones are present, but one type of cone perceives light slightly out of alignment. This means there is reduced sensitivity to certain wavelengths, and colors such reds, greens, browns and oranges e.g. can be hard to distinguish.
There is also another type of color-blindness known as dichromacy in which a cone type is missing.
If you think about the color matching experiment there are interesting implications. A dichromat would only need two primary lights to successfully complete the color matching experiment and would consider metamers many stimuli that a trichromat would not accept as a match. On the other hand, any stimuli that a trichromat would accept as a match a dichromat would also accept as a match.
There are actually two ways to compute the gradient: numerically and analytically. The numeric approach is often a slow approximate but easy to implement whereas the analytic can be fast and exact though more susceptible to implementation errors (in doing the calculus). So often programmers compute both and sanity check the two gradients against one another - then use the analytic gradient.
what do the numbers mean? What if the matrix is multiplied by some scalar value, how will the effects change?
Magic Leap is developing light field displays, which will be released this year: Magic Leap