Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception

TitleImitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
Publication TypeThesis
Year of Publication2021
AuthorsKim, D.
Academic DepartmentDepartment of Electrical Engineering and Computer Science
DegreeM. Eng
UniversityMassachusetts Institute of Technology
AbstractAs robots are increasingly being utilized to perform automated tasks, effective methods for transferring task specifications to robots have become imperative. However, existing techniques for training robots to perform tasks often depend on rote mimicry of human demonstrations and do not generalize well to new tasks or contexts. In addition, learning an end-to-end policy for performing a sequence of operations for a high-level goal remains a challenge. Transferring sequential task specifications is a difficult objective, as it requires extensive human intervention to establish the structure of the task including the constraints, objects of interest, and control parameters. In this thesis, we present an imitation learning framework for sequential manipulation tasks that enables humans to easily communicate abstract high-level task goals to the robot without explicit programming or robotics expertise. We introduce natural language input to the system to facilitate the learning of task specifications. During training, a human teacher provides demonstrations and a verbal description of the task being performed. The training process then learns a mapping from the multi-modal inputs to the low-level control policies. During execution, the high-level task instruction input is parsed into a list of sub-tasks that the robot has learned to perform. The presented framework is evaluated in a simulated table-top scenario of a robotic arm performing sorting and kitting tasks from natural language commands. The approach developed in this thesis achieved an overall task completion rate of 91.16% on 600 novel task scenes, with a sub-task execution success rate of 96.44% on 1,712 individual “pick” and “place” tasks.
URLhttps://dspace.mit.edu/handle/1721.1/139416