Given the albedo of any material, it seems like it'd be a good heuristic for truncating the number of recursions. Once the reflected ray scaled by the material's albedo dips below a lower bound determined (could be arbitrary), we could end recursive calls.
I'm really curious about how this changes when motion is introduced--as Professor Ng mentioned in class, bounding box optimization can only be so powerful when something needs to be re-rendered so frequently in order to be animated. I wonder if there's some form of geometric model or data structure whose optimization is more focused on easier animation than on desirable topological properties. Though I'm unsure of what such a structure would look like myself (especially given that character rigs are already so effective), I can imagine using this to model water, such that a shader (perhaps taking in camera perspective as a parameter) colors water based on the height of each part of the mesh as it animates, yielding what looks like it would be (sans shadows) a similar effect to that displayed here (so long as the water is shaped correctly). In general, it might be interesting to move the light rendering to the tail end of the pipeline, wherein light can be conveyed either as texture (as discussed in environment maps) or as a function (likely of distance from some point, plane, or other basic geometry, like how leaves on a tree are much more visible when they're further from the tree's central cluster of leaves and more shadowed otherwise).
How does raycasting for a group of pixels improve performance? If the group diverges at any point (bouncing off a curved surface, getting partially intersected), then you have to keep track of all the different rays at the same time.
I don't know how well it would work in practice, but maybe there would be enough groups that stay together (or mostly together) to offer an improvement. I also imagine that, even if they separate, if they're separated by something like bouncing off of a curved surface it might be a transformation that's more efficient to apply to a group of rays at once rather than one at a time.
Even if you do have to keep track of different rays at the same time, you would have to do those calculations anyways, so that's 'only' a memory issue and not a speed one.