Referential Grounding in Robotics

A number of long-term goals in robotics (e.g., using robots in household settings) require robots to interact with humans. In this research, we explore how robots can correlate object references in natural language instructions to the objects in the physical world sensed by an RGB-D camera.

 

Dataset

RBT-SCENE: An RGB-D scene dataset collected by our mobile robot. In each scene image, there is a target object that is marked with a bounding box.

NL-INST: Three groups of natural language instructions that take advantage of different cues (i.e. name, colour, shape, material, group-based relation and binary relation) for specify the target objects in RBT-SCENE.  NL-INST is collected from 12 persons.

RBT-OBJ: A small dataset of RGB-D objects segmented from RBT-SCENE.

Samples For Attribute Learning: name, colour, shapematerial

 

Code

The source code will be available.

 

Video

Natural language controlled object manipulation

Publications

  1. J. Bao, Y. Jia, Y. Cheng, H. Tang and N. Xi. Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera. Sensors, 2016, 16(12), 2117. PDF HTML