A number of long-term goals in robotics (e.g., using robots in household settings) require robots to interact with humans. In this research, we explore how robots can correlate object references in natural language instructions to the objects in the physical world sensed by an RGB-D camera.
Dataset
RBT-SCENE: An RGB-D scene dataset collected by our mobile robot. In each scene image, there is a target object that is marked with a bounding box.
NL-INST: Three groups of natural language instructions that take advantage of different cues (i.e. name, colour, shape, material, group-based relation and binary relation) for specify the target objects in RBT-SCENE. NL-INST is collected from 12 persons.
RBT-OBJ: A small dataset of RGB-D objects segmented from RBT-SCENE.
Samples For Attribute Learning: name, colour, shape, material
Code
The source code will be available.
Video
Natural language controlled object manipulation
Publications