Miguel Jaques, Tim Hospedales and I recently published a CVPR paper on learning latent dynamics models for proportional control from pixels. Miguel has written a great blog post about this idea.
This work builds on prior work around hybrid system identification using proportional controllers (Burke et. al, Corl 2019) and programming induction using visual servoing controllers (Burke et. al, RSS 2019, but goes substantially further because it is entirely unsupervised, relying only on image sequence observations. The key idea is to constrain a VAE latent space to be physically plausible (Newtonian) such that proportional control can be applied in this space. This allows for switching goal identification and sequential task following, or for dynamic movement primitives to be learned in the latent space.
M Burke, Y Hristov, S Ramamoorthy, Switching Density Networks for Hybrid System Identification, Conference on Robot Learning (CoRL) 2019. (arxiv link)
Michael Burke, Svetlin Penkov, Subramanian Ramamoorthy, From explanation to synthesis: Compositional program induction for learning from demonstration, Robotics: Science and Systems (R:SS), 2019. (arXiv link)
Miguel Jaques, Michael Burke, Tim Hospedales, NewtonianVAE: Proportional Control and Goal Identification from Pixels via Physical Latent Spaces, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. (arxiv link)