A H × W × 31 spatiotemporal heatmap representation that encodes joint positions and limb connectivity across three orthogonal projections and five temporal frames. A 3-channel spectrogram pseudo-image ...
Abstract: Manipulating unseen articulated objects through visual feedback is a critical but challenging task for real robots. Existing learning-based solutions mainly focus on visual affordance ...