My guess for why max is used is that it's better to err on the side of blurriness than aliasing.
I understand the logical intuition that we can find the mipmap level of a point by seeing how far away (x+1,y) and (x,y+1) are in texture space since that gives us an idea of the depth, but I don't really understand why this equation works. Why do we log L? I think what Richard said in the previous comment about using max makes sense though?
I am a bit confused by the right graph. I understand the numerators are du and dv. However, why are the denominators for the right vector all dx? In this case, I am confused about which direction is now x and which is y?
Just forget about the x and y coordinates in the right graph. The coordinates are u and v now. In the graph, only the x coordinate changes between the two points, the du/dx and dv/dx just help to change the coordinate.