ReferIt3D Benchmarks

Intro

With the ReferIt3D benchmarks, we wish to track and report the ongoing progress in the emerging field of language-assisted understanding and learning in real-world 3D environments. To this end, we investigate the same questions present in the ReferIt3D paper and compare methods that try to identify a single 3D object among many of a real-world scene, given appropriate referential language.

Specifically we consider:

How well such learning methods work when the input language is Natural as produced by speaking humans referring to the object (Nr3D challenge) vs. being template-based concerning only Spatial relations among the objects of a scene (Sr3D challenge)?
How such methods are affected when we vary the number of same-to-the-target-class distracting instances in the 3D scene? E.g., when handling an "Easy" case, where the system has to find the target among two armchairs vs. a "Hard" case, where it has to find it among at least three?
Last, how such methods perform when the input language is View-Dependent e.g., "Facing the couch, pick the ... on your right side", vs. being View-Independent e.g., "It's the ... between the bed and the window".

In a nutshell, these questions regarding the object identification problem in 3D environments aim to disentangle the performance-characteristics of the compared approaches, aside of providing a single "aggregate" performance score, as explained in the ReferIt3D paper.

Rules

Please use our published datasets following the official ScanNet train/val splits. Since in these benchmarks we tackle the identification problem among all objects in a scene (and not only among the same-class distractors), when using the Nr3D make sure to use only the utterances where the target-class is explicitly mentioned (mentions_target_class=True) and which where guessed correctly by the human listener (correct_guess=True).

To download the pre-processed datasets that reflect exactly the same input we gave to our proposed network (where the filters mentioned above are pre-applied), use the following links:

Otherwise, If you want to download the raw datasets instead, please use the following links: (Nr3D, Sr3D).

Note: The official code of Referit3D paper for training/testing takes as input the raw datasets because it applies the filters mentioned above on the fly.

Reporting new results

If you have new results on Sr3D or Nr3D to report, please send your performance numbers and the accompanying paper link to Panos Achlioptas.