A reward signal inferred by PTR, slowly increasing, as ultrasound quality improves.The inferred reward correlates strongly with human image ratings.

Robotic ultrasound scanning: Learning from exploratory demonstrations using probabilistic temporal ranking

Recently, we have been exploring the use of time as a supervisory signal for learning from demonstration. As an example use case, we considered ultrasound scanning, where a technician is required to search for a scanning position and contact force that produces an optimal image. We propose a probabilistic temporal ranking (PTR) model that allows for the quality of an image to be determined from demonstration image sequences. This reward model can then be used for automated robotic ultrasound scanning, as shown in the video below.

This project page site has additional detail and explanation around PTR.

Michael Burke, Katie Lu, Daniel Angelov, Artūras Straižys, Craig Innes, Kartic Subr, Subramanian Ramamoorthy., Learning rewards for robotic ultrasound scanning using probabilistic temporal ranking, 2020, https://arxiv.org/abs/2002.01240