Physics as inverse graphics model

Stronger inductive biases for deep learning

Standard architectures for neural networks have numerous problems with interpretability, flexibility and generalisation. I believe that this is in large part due to a lack of stronger inductive biases in models and architectures, and have recently been pushing (see my job talk at Monash) to include stronger biases in deep learning models.

Switching controller front-ends

For example, by embedding known controllers in a model, in addition to knowledge of a switching structure, we gain better performance in settings where hybrid control is required, along with greater interpretability.

M Burke, Y Hristov, S Ramamoorthy, Switching Density Networks for Hybrid System Identification,  Conference on Robot Learning (CoRL) 2019. (arxiv link)

Relational representations

Similarly, by autoencoding with light supervision, we can ground perception networks in symbolic concepts that align with natural language for greater interpretability, while allowing for planning and symbolic reasoning.

Y Hristov, D Angelov, M Burke, Alex Lascarides, S Ramamoorthy, Disentangled Relational Representations for Explaining and Learning from Demonstration,  Conference on Robot Learning (CoRL) 2019. (arxiv link)

Video to physical parameters

The same idea can allow for parameter estimation from video, and the incorporation of physical dynamics into a model.

M Asenov, M Burke, D Angelov, T Davchev, K Subr, S Ramamoorthy, Vid2Param: Modelling of Dynamics Parameters from Video, Robotics and Automation Letters (RA-L) (arxiv link).

Integrated physics

The approaches above inject constraints through training, but for generalisation, we may need even stronger priors built into models. Our work on physics-as-inverse graphics does this by including differentiable physical equations in the model, and exhibits much stronger extrapolation performance.

M Jacques, M Burke, T Hospedales, Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video, International Conference on Learning Representations (ICLR 2020) (open review link)


True sequence

Physics as Inverse Graphics

Physics as Inverse Graphics

Interaction networks, inverse graphics



On inter-sectional bias and its remediation in data-driven models

It’s well established that machine learning has a problem with bias. Our datasets reflect the inequalities and prejudices of our daily lives, and the models we train and deploy exacerbate these even further. We even notice this in robotics (despite being generally removed from people), where localisation and mapping systems perfected in green European settings struggle in the browns and yellows of the South African Highveld.

Back in 2016, some colleagues (@nunuska @vukosi and others) and I at the CSIR submitted a funding proposal (#DecoloniseAI: Identifying and redressing structural and systemic bias in machine learning solutions) to investigate these problems and potentially devise mechanisms to address these. We’d put together a really nice team of machine learning experts, social scientists and ethnographers, and I felt we had a really strong proposal. Unfortunately, the proposal was rejected on the grounds that there was insufficient evidence (beyond the `anecdotal’ evidence we’d drawn on when framing the proposal [1,2,3,4]) that this was a problem in society (and despite the fact that part of our proposal was to systematically explore and show how widespread it was). Fast forward 5 years, I hope the reviewers feel extremely embarrassed.

Faces automatically sorted by skin tone using PCA embeddings and with no meta-data required.

However, I deeply regret being unable to pursue this line of work to the extent we had originally planned. Fortunately many others were able to do so, most notably Buolamwini and Gebru [5] with their seminal work on GenderShades (suck on that proposal reviewer 2). This, and conferences like ACM FAccT have stimulated a movement in machine learning research that has moved from a niche topic to a core pillar of the field.

For my part, I continued to explore some related ideas with colleagues here and there, but in a much scaled back form due to resource constraints, principally as part of Mkhuseli Ngxande‘s PhD on driver drowsiness detection. Our primary concern here was twofold, identifying and remedying bias. On the identification side, we set out to identify intersectionally biased models without needing to know protected (and often questionable) characteristics and traits associated with persons in our dataset. A common frustration I have with many bias remediation schemes is that they require meta-data about racial or gender classification, which seems both questionable and highly fallible (a bit like needing a pencil test to correct racism), so the first thing we did was explore visualisation to identify bias in face-based driver drowsiness detection algorithms without needing meta-data. It turns out a simple PCA visualisation scheme [6] works quite well here, and was able to highlight clear problems in driver drowsiness detection datasets.

Visualising classification error on face visualisation shows population groups and individuals where additional data is needed.

We then went on to explore correcting these problems using a meta-learning data augmentation loop, relying on generative adversarial networks to generate synthetic faces that are close to those our system was failing on [7]. This proved extremely effective in the driver drowsiness detection domain.

A meta-learning loop inspects performance on a validation set, and generates synthetic images that are similar to failure cases, in order to balance the training data.

I think this is a good start, but it has a number of problems. First, it still requires that we have a good, representative validation set that is bias free, which is the main problem we were trying to address in the first place. It also needs a large, but fair dataset to train a GAN, which although requiring less data labelling effort than for the drowsiness detection task, is still a potential problem.

But for me the biggest limitation of this work was the lack of nuance we exposed due to our inability to adequately involve the social scientists and ethnographers we had originally tried to in that early funding proposal. Bias remediation in this manner is really just plastering over the symptoms and serves to obscure, but not eliminate the underlying problems. Many have spoken about better and more comprehensive ways to address these issues (this workshop by Gebru and Denton, this thread by @math_rachel), and there are many researchers dedicated to the field doing excellent work on these lines.

For my part, these issues reinforce the need for stronger inductive biases in our models to help focus learning, the need to move away from hard decision making and to embrace uncertainty, and the importance of questioning when and whether a data-driven solution is actually needed.


[1]  “New Zealand passport robot thinks this Asian man’s eyes are closed ….” 9 Dec. 2016, Accessed 15 Dec. 2016.

[2]  “Google Photos labels black people as ‘gorillas’ – Telegraph.” 1 Jul. 2015, Accessed 15 Dec. 2016.

[3] Norris, Pippa. Digital divide: Civic engagement, information poverty, and the Internet worldwide. Cambridge University Press, 2001.

[4] Lum, K. and Isaac, W. (2016), To predict and serve?. Significance, 13: 14–19. doi:10.1111/j.1740-9713.2016.00960.x

[5] Buolamwini, Joy, and Timnit Gebru. “Gender shades: Intersectional accuracy disparities in commercial gender classification.” Conference on fairness, accountability and transparency. 2018.

[6] M. Ngxande, J. Tapamo and M. Burke, “Detecting inter-sectional accuracy differences in driver drowsiness detection algorithms,” 2020 International SAUPEC/RobMech/PRASA Conference, Cape Town, South Africa, 2020, pp. 1-6, doi: 10.1109/SAUPEC/RobMech/PRASA48453.2020.9041105

[7] M. Ngxande, J. Tapamo and M. Burke, “Bias Remediation in Driver Drowsiness Detection Systems Using Generative Adversarial Networks,” in IEEE Access, vol. 8, pp. 55592-55601, 2020, doi: 10.1109/ACCESS.2020.2981912.

Switching density networks for hybrid control systems

Hybrid system identification can be particularly challenging, particularly in the context of visuomotor control. We introduce switching density networks (SDNs), which can be used to identify switching control systems in an end-to-end learning fashion from demonstration data.

We show that SDNs, when paired with a general purpose family of proportional-integral-derivative control laws, can identify the pump, spin and balance controllers required to keep an inverted pendulum upright (see header). We also use them to identify the joint angle goals that make up an inspection task on a PR2 robot, and those needed to open a suitcase.

Switching density networks are particularly useful for options learning, as the controllers identified using these can be re-used elsewhere. Importantly, by embedding structure into the network, SDNs become more interpretable, and allow for hierarchical learning that is not possible with their closely related counterparts, mixture density networks.

M Burke, Y Hristov, S Ramamoorthy, Switching Density Networks for Hybrid System Identification,  Conference on Robot Learning (CoRL) 2019. (arxiv link)

Inducing explainable robot programs

End-to-end learning is able to solve a wide range of control problems in robotics. Unfortunately, these systems lack interpretability and are difficult to reconfigure if there is a minor task change. For example, a robot inspecting a range of objects needs to be retrained if the order of inspection changes.

We address this by inducing a program from an end-to-end model using a generative model consisting of multiple proportional controllers. Inference under this model is challenging, so we use sensitivity analysis to extract controller goals and gains from the original model. The inferred controller trace (a sequence of controller goal states) is then simplified and controller specific grounding networks trained to predict controller goals for visual inputs, producing an interpretable and reconfigurable program describing the original learned behaviour.

Michael Burke, Svetlin Penkov, Subramanian Ramamoorthy, From explanation to synthesis: Compositional program induction for learning from demonstrationRobotics: Science and Systems (R:SS), 2019. arXiv link

Finding interesting images

I obtained a young researcher’s establishment grant from the CSIR to investigate what makes images interesting and to find algorithms that flag images of potential interest to users. At present, I am exploring the use of pairwise image comparisons to estimate image interest. These interest estimates can be improved for video by imposing temporal smoothness constraints. Further improvements are obtained by incorporating image content information using convolutional neural network features within a Gaussian process smoother. A particularly exciting byproduct of this is the generation of a saliency map highlighting content of interest to users.

Unfortunately, generating this overlay is extremely expensive. However, some savings can be made by using a Gaussian process approximation to this to speed up the generation process.


Michael Burke. 2017. Leveraging Gaussian process approximations for rapid image overlay production. In Proceedings of SAWACMMM’17, Mountain View, CA, USA, October 23, 2017, 6 pages.

M. Burke, “User-driven mobile robot storyboarding: Learning image interest and saliency from pairwise image comparisons“,  eprint arXiv:1706.05850, 2017.

M. Burke, “Image ranking in video sequences using pairwise image comparisons and temporal smoothing“, 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), Stellenbosch, 2016, pp. 1-6.

Pose estimation for human-robot interaction

Human-robot interaction using gesture recognition typically requires that the 3D pose of a human be tracked in real time, but this can be challenging, particularly when only a single, potentially moving, camera is available. We use a mixture of random walks model for human motion that allowed for fast Rao-Blackwellised tracking, and provide a useful mechanism to map from 2D to 3D pose when only a few joint measurements were made.

Pose estimation code is available here and here. As a useful byproduct, a simplified motion model proves quite effective at estimating missing marker positions for motion capture applications. Code is available here.


Burke, M. and Lasenby, J., Estimating missing marker positions using low dimensional Kalman smoothing, Journal of Biomechanics , Volume 49 , Issue 9 , 1854 – 1858 (2016).

Burke, M. G. (2015). Fast upper body pose estimation for human-robot interaction (doctoral thesis).

Burke, M. and Lasenby, J., Single camera pose estimation using Bayesian filtering and Kinect motion priors, (2014).

Burke, M. and Lasenby, J., Fast upper body joint tracking using kinect pose priors, International Conference on Articulated Motion and Deformable Objects (Best paper award), 94-105, 2014.

Human (and cereal) following robots

This project focused on feature-based object recognition and tracking using a single camera for a target-following mobile robot. Robot controls are generated so as to maximise the chances of successfully detecting and tracking the object of interest while navigating.

This was a particular challenge at the time (2010 – before convnet fame) as object recognition approaches were extremely unreliable and moving cameras struggled with motion blur. This work was combined with LIDAR-based target tracking and obstacle avoidance to build the CSIR Autonomous Mule. Videos of the work and related publications are listed below.



Burke, Michael, and Willie Brink. “Estimating target orientation with a single camera for use in a human-following robot.“, Proceedings of the 21st Annual Symposium of the Pattern Recognition Association of South Africa (2010).

Burke, Michael. “Laser-Based Target Tracking using Principal Component Descriptors.”, Proceedings of the 21st Annual Symposium of the Pattern Recognition Association of South Africa (2010).

Burke, Michael, and Willie Brink. “Gain-scheduling control of a monocular vision-based human-following robot.IFAC Proceedings Volumes 44.1 (2011): 8177-8182.

Burke, Michael Glen. Visual servo control for a human-following robot. Diss. Stellenbosch University, 2011.