ActFormer: Scalable Collaborative Perception via Active Queries

Our motivation stems from the idea that how vehicles collaboratively perceive should be closely related to their relative poses. Different camera poses result in varying viewpoints, each capturing unique information. However, conventional collaborative methods often treat all viewpoints equally, overlooking the fact that these camera perspectives offer different insights into the environment—some unique, some overlapping, and some redundant. Consequently, the ego vehicle may not fully capitalize on the diverse perspectives available, leading to indiscriminate collaboration that generates excessive communication and computation. Actually, communication may not be necessary when some partners share very similar observations.