Physics as inverse graphics model

Stronger inductive biases for deep learning

Standard architectures for neural networks have numerous problems with interpretability, flexibility and generalisation. I believe that this is in large part due to a lack of stronger inductive biases in models and architectures, and have recently been pushing (see my job talk at Monash) to include stronger biases in deep learning models.

Switching controller front-ends

For example, by embedding known controllers in a model, in addition to knowledge of a switching structure, we gain better performance in settings where hybrid control is required, along with greater interpretability.

M Burke, Y Hristov, S Ramamoorthy, Switching Density Networks for Hybrid System Identification,  Conference on Robot Learning (CoRL) 2019. (arxiv link)

Relational representations

Similarly, by autoencoding with light supervision, we can ground perception networks in symbolic concepts that align with natural language for greater interpretability, while allowing for planning and symbolic reasoning.

Y Hristov, D Angelov, M Burke, Alex Lascarides, S Ramamoorthy, Disentangled Relational Representations for Explaining and Learning from Demonstration,  Conference on Robot Learning (CoRL) 2019. (arxiv link)

Video to physical parameters

The same idea can allow for parameter estimation from video, and the incorporation of physical dynamics into a model.

M Asenov, M Burke, D Angelov, T Davchev, K Subr, S Ramamoorthy, Vid2Param: Modelling of Dynamics Parameters from Video, Robotics and Automation Letters (RA-L) (arxiv link).

Integrated physics

The approaches above inject constraints through training, but for generalisation, we may need even stronger priors built into models. Our work on physics-as-inverse graphics does this by including differentiable physical equations in the model, and exhibits much stronger extrapolation performance.

M Jacques, M Burke, T Hospedales, Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video, International Conference on Learning Representations (ICLR 2020) (open review link)


True sequence

Physics as Inverse Graphics

Physics as Inverse Graphics

Interaction networks, inverse graphics



Inducing explainable robot programs

End-to-end learning is able to solve a wide range of control problems in robotics. Unfortunately, these systems lack interpretability and are difficult to reconfigure if there is a minor task change. For example, a robot inspecting a range of objects needs to be retrained if the order of inspection changes.

We address this by inducing a program from an end-to-end model using a generative model consisting of multiple proportional controllers. Inference under this model is challenging, so we use sensitivity analysis to extract controller goals and gains from the original model. The inferred controller trace (a sequence of controller goal states) is then simplified and controller specific grounding networks trained to predict controller goals for visual inputs, producing an interpretable and reconfigurable program describing the original learned behaviour.

Michael Burke, Svetlin Penkov, Subramanian Ramamoorthy, From explanation to synthesis: Compositional program induction for learning from demonstrationRobotics: Science and Systems (R:SS), 2019. arXiv link

Pose estimation for human-robot interaction

Human-robot interaction using gesture recognition typically requires that the 3D pose of a human be tracked in real time, but this can be challenging, particularly when only a single, potentially moving, camera is available. We use a mixture of random walks model for human motion that allowed for fast Rao-Blackwellised tracking, and provide a useful mechanism to map from 2D to 3D pose when only a few joint measurements were made.

Pose estimation code is available here and here. As a useful byproduct, a simplified motion model proves quite effective at estimating missing marker positions for motion capture applications. Code is available here.


Burke, M. and Lasenby, J., Estimating missing marker positions using low dimensional Kalman smoothing, Journal of Biomechanics , Volume 49 , Issue 9 , 1854 – 1858 (2016).

Burke, M. G. (2015). Fast upper body pose estimation for human-robot interaction (doctoral thesis).

Burke, M. and Lasenby, J., Single camera pose estimation using Bayesian filtering and Kinect motion priors, (2014).

Burke, M. and Lasenby, J., Fast upper body joint tracking using kinect pose priors, International Conference on Articulated Motion and Deformable Objects (Best paper award), 94-105, 2014.

Gesture recognition

Gesture recognition can be a valuable interface for human-robot interaction, but typical approaches to gesture recognition use pre-defined dictionaries of signs for recognition. This is often not particularly intuitive, so we explored the pantomimic gestures for human robot interaction.

Pantomimic gestures are those that mimic a desired behaviour or action, and can be used to improve gesture recognition. By mapping human actions to robot behaviours using matrix factorisation, we can successfully classify gestures. In fact, recognising human gestures using recordings of robot behaviours turns out to be easier than recognising these using recordings of gestures from other users.


Burke, M. G. (2015). Fast upper body pose estimation for human-robot interaction (doctoral thesis).

M. Burke and J. Lasenby, “Pantomimic Gestures for Human–Robot Interaction,” in IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1225-1237, Oct. 2015.

M. Burke and J. Lasenby, “Multilinear function factorisation for time series feature extraction,” 2013 18th International Conference on Digital Signal Processing (DSP), Fira, 2013, pp. 1-8.

Human (and cereal) following robots

This project focused on feature-based object recognition and tracking using a single camera for a target-following mobile robot. Robot controls are generated so as to maximise the chances of successfully detecting and tracking the object of interest while navigating.

This was a particular challenge at the time (2010 – before convnet fame) as object recognition approaches were extremely unreliable and moving cameras struggled with motion blur. This work was combined with LIDAR-based target tracking and obstacle avoidance to build the CSIR Autonomous Mule. Videos of the work and related publications are listed below.



Burke, Michael, and Willie Brink. “Estimating target orientation with a single camera for use in a human-following robot.“, Proceedings of the 21st Annual Symposium of the Pattern Recognition Association of South Africa (2010).

Burke, Michael. “Laser-Based Target Tracking using Principal Component Descriptors.”, Proceedings of the 21st Annual Symposium of the Pattern Recognition Association of South Africa (2010).

Burke, Michael, and Willie Brink. “Gain-scheduling control of a monocular vision-based human-following robot.IFAC Proceedings Volumes 44.1 (2011): 8177-8182.

Burke, Michael Glen. Visual servo control for a human-following robot. Diss. Stellenbosch University, 2011.