Is there a way to pick optimal bucket size B, or is the result tested with different B options?
^Same question as above, though my guess is you want a fine granularity in how you select your buckets. Too little and you have far too much calculation to do and you are not taking enough advantage of parallelization architecture in GPUs. Too much and it just lags most systems. So I have a feeling that it is attached to the parallelization ability of GPUs.