We talked about two ways to evaluate this, but I'm wondering if one algorithm is preferred in practice (e.g. for speeds, numerical stability, or depends on computation model like GPU/CPU)?

