Visual Pre-training for Navigation: What Can We Learn from Noise?

Title

Visual Pre-training for Navigation: What Can We Learn from Noise?

Publication Type

Workshop Paper

Year of Publication

2022

Authors

Yanwei Wang

Ching-Yun Ko

Pulkit Agrawal

Conference Name

Thirty-sixth Annual Conference on Neural Information Processing Systems, New Orleans, LA

Abstract

In visual navigation, one powerful paradigm is to predict actions from observations directly. Training such an end-to-end system allows representations that are useful for downstream tasks to emerge automatically. However, the lack of inductive bias makes this system data-hungry. We hypothesize a sufficient representation of the current view and the goal view for a navigation policy can be learned by predicting the location and size of a crop of the current view that corresponds to the goal. We further show that training such random crop prediction in a self-supervised fashion purely on synthetic noise images transfers well to natural home images. The learned representation can then be bootstrapped to learn a navigation policy efficiently with little interaction data. The code is available at https://yanweiw.github.io/noise2ptz/

URL

https://arxiv.org/abs/2207.00052

DOI

http://dx.doi.org/ https://doi.org/10.48550/arXiv.2207.00052

Google Scholar