This reminds me of some typical object detection models which based on a conception named anchor, whose position and size is fixed in one image, which help to located the object which falls into these pre-designed grids. One thing really important is that the size, location parameter of these "anchors" are essential for the model's accuracy and computation efficiency.
Just to make sure I get the right idea. The object is to balance the time between traversing each grid cell and the computation within each grid, which is dependent on the number of objects inside it. The goal is to have grids keep objects apart with the least cells while ideally let each object occupies one independent cell. I feel like clustering methods could potentially be useful to carve out boundaries.