Robot-assisted feeding can greatly enhance the lives of those with mobility limitations. Modern feeding systems can pick up and position food in front of a care recipient’s mouth for a bite. However, many with severe mobility constraints cannot lean forward and need direct inside-mouth food placement. This demands precision, especially for those with restricted mouth openings, and appropriately reacting to various physical interactions — incidental contacts as the utensil moves inside, impulsive contacts due to sudden muscle spasms, deliberate tongue maneuvers by the person being fed to guide the utensil, and intentional bites.
In this paper, we propose an inside-mouth bite transfer system that addresses these challenges with two key components: a multi-view mouth perception pipeline robust to tool occlusion, and a control mechanism that employs mul- timodal time-series classification to discern and react to different physical interactions. We demonstrate the efficacy of these individual components through two ablation studies.
In a full system evaluation, our system successfully fed 13 care recipients with diverse mobility challenges. Participants consistently emphasized the comfort and safety of our inside-mouth bite transfer system, and gave it high technology acceptance ratings — underscoring its transformative potential in real-world scenarios.
Training: We train our multi-view encoder in a self-supervised manner, using a self-curated dataset of unoccluded images and masks of utensils with various foods. We synthetically add occlusion and use an existing single-view method effective on the unoccluded original to generate ground truth supervision.
Testing: Our mouth perception method accurately identifies face keypoints, even in heavily occluded situations, and outperforms baselines from head perception and robot-assisted feeding literature.
Real-time multi-view mouth perception enables our robot to adapt to voluntary head movement by the user, pause mid-transfer if the user is not ready, and evade involuntary head movements such as muscle spasms.
We validate the necessity of this functionality in an ablation study with 14 participants, where we significantly outperform a baseline that only perceives once.
We take a data-driven approach to multimodal (visual + haptics) physical interaction classification, and evaluate four time-series models - Time-Series Transformers, Temporal Convolutional Networks, Multi-Layer Perceptrons and SVMs.
This physical-interaction aware controller enables us to detect and be very compliant to tongue guidance from a new care recipient we are feeding, and remember the subsequent bite location (Feeding 1), so that the next time we feed them we can autonomously move to the bite location, while being very complaint to impulsive motions if they occur (Feeding 2).
We validate this functionality in an ablation study with 14 participants, where we significantly outperform a baseline that perceives any contact as bite, and ablations of our method on various metrics, such as satisfaction, safety, comfort, and maximum force applied.
@inproceedings{jenamani2024feelthebite,
author = {Jenamani, Rajat Kumar and Stabile, Daniel and Liu, Ziang and Anwar, Abrar and Dimitropoulou, Katherine and Bhattacharjee, Tapomayukh},
title = {Feel the Bite: Robot-Assisted Inside-Mouth Bite Transfer using Robust Mouth Perception and Physical Interaction-Aware Control},
booktitle = {ACM/IEEE International Conference on Human Robot Interaction (HRI)},
year={2024}
}