FLAIR

Abstract

Robot-assisted feeding has the potential to improve the quality of life for individuals with mobility limitations who are unable to feed themselves independently. However, there exists a large gap between the homogeneous, curated plates existing feeding systems can handle, and truly in-the-wild meals. Feeding realistic plates is immensely challenging due to the sheer range of food items that a robot may encounter, each requiring specialized manipulation strategies which must be sequenced over a long horizon to feed an entire meal. An assistive feeding system should not only be able to sequence different strategies efficiently in order to feed an entire meal, but also be mindful of user preferences given the personalized nature of the task. We address this with FLAIR, a system for long-horizon feeding which leverages the commonsense and few-shot reasoning capabilities of foundation models, along with a library of parameterized skills, to plan and execute user-preferred and efficient bite sequences. In real-world evaluations across 6 realistic plates, we find that FLAIR can effectively tap into a varied library of skills for efficient food pickup, while adhering to the diverse preferences of 42 participants without mobility limitations as evaluated in a user study. We demonstrate the seamless integration of FLAIR with existing bite transfer methods [19, 28], and deploy it across 2 institutions and 3 robots, illustrating its adaptability. Finally, we illustrate the real-world efficacy of our system by successfully feeding a care recipient with severe mobility limitations.

Library of Skills

Using a customized motorized fork with two degrees of freedom, we instantiate a library of vision-parameterized food manipulation skills. These include acquisition skills which attempt to pick up food, and pre-acquisition skills which attempt to rearrange or portion food into bite-sized items for downstream acquisition. Each skill is parameterized by the visual state estimate of a food item, specifically a segmented observation obtained from GroundedSAM.

Feeding Utensil Design

Food Manipulation Skills

System Overview

Given a plate observation and user preference in natural language, FLAIR first leverages VLMs (GPT-4V and GroundedSAM) to detect and recognize food items. Next, the segmented food observations along with their corresponding language labels serve as input to a Task Planner which outputs a sequence of skills to pick up each item.

Finally, an LLM (GPT-4) takes as input the summarized task plan and given user preference, and plans a sequence of bites (food items) to pick up using commonsense and chain-of-thought reasoning. FLAIR executes the necessary skills to pick up each next bite in a sequential fashion. Between bites, FLAIR can be combined with a framework for bite transfer to feed a user each acquired bite.

Evaluation

We evaluate FLAIR against an Efficiency-Only and a Preference-Only approach across 6 plates. Specifically, we compare how well each method adheres to preferences, and how humanlike each method is, as measured by Likert ratings in a user study across 42 individuals. We also quantify the rate of efficient plate clearance across all approaches in the setting where a user has no preference.

Example Plate Run: Sausage and Mashed Potatoes

Given Preference: "I have no preference."

Compared to FLAIR, an Efficiency-Only approach which prioritizes as few actions as possible ends up skewering all the sausage first, then eating potatoes. This quickly clears most of the plate, but results in no bite variability.

A Preference-Only approach ignores the fact that mashed potatoes are less efficient to pick up than sausage due to the need to push the sausage aside first. This ultimately results in more bite variability, but at the expense of several pushing actions which do not pick up any food.

FLAIR combines the benefits of both approaches by inferring the need to pick up sausage first for efficiency, but switching to potatoes for bite variability once they are uncovered.

Results

Strong Preferences Setting

For users with strong preferences, FLAIR is perceived to adhere to preferences and behave more human-like than Efficiency-Only across many plate and preference combinations.

No Preferences Setting

In the absence of strong preferences, FLAIR achieves faster plate clearance than Preference-Only, while achieving greater bite variability than Efficiency-Only.

Demonstration of Acquisition + Transfer

FLAIR readily integrates with, and is agnostic to, the choice of bite transfer framework. Its modularity makes it adaptable to various robot platforms, which we demonstrate by replicating FLAIR separately at two different institutions and three different robots.

Below, we show integration with outside-mouth transfer and visual servoing to the center of the mouth.

Below, we show an in-mouth system from prior work using real-time facial tracking and a compliant controller to feed a care recipient with Multiple Sclerosis the last requested strawberry dipped in chocolate.

In 2 Likert questions regarding the system’s diversity of skills and ability to follow preference, this user strongly agreed that FLAIR’s diverse bite acquisition skills and adherence to meal preferences are crucial for daily acceptance.

Pre-Print

BibTex

@article{jenamani2024flair,
  title={FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes},
  author={Jenamani, Rajat Kumar and Sundaresan, Priya and Sakr, Maram and Bhattacharjee, Tapomayukh and Sadigh, Dorsa},
  journal={arXiv preprint arXiv:2407.07561},
  year={2024}
}

FLAIR: Feeding via Long-Horizon
AcquIsition of Realistic dishes

FLAIR leverages the few-shot reasoning capabilities of vision-language foundation models to
plan and execute long-horizon bite sequences that factor in both efficiency and user-preference
.

We evaluate FLAIR across a range of diverse plates ranging from noodle dishes to semi-solid dishes to appetizer
and dessert plates with fruits, vegetables, and dips, all subject to a range of user preferences.

Abstract

Library of Skills

Feeding Utensil Design

Food Manipulation Skills

System Overview

Evaluation

Example Plate Run: Sausage and Mashed Potatoes

Given Preference: "I have no preference."

Results

Strong Preferences Setting

No Preferences Setting

Demonstration of Acquisition + Transfer

Pre-Print

BibTex

FLAIR: Feeding via Long-HorizonAcquIsition of Realistic dishes

FLAIR leverages the few-shot reasoning capabilities of vision-language foundation models to plan and execute long-horizon bite sequences that factor in both efficiency and user-preference.

We evaluate FLAIR across a range of diverse plates ranging from noodle dishes to semi-solid dishes to appetizer and dessert plates with fruits, vegetables, and dips, all subject to a range of user preferences.

Abstract

Library of Skills

Feeding Utensil Design

Food Manipulation Skills

System Overview

Evaluation

Example Plate Run: Sausage and Mashed Potatoes

Given Preference: "I have no preference."

Results

Strong Preferences Setting

No Preferences Setting

Demonstration of Acquisition + Transfer

Pre-Print

BibTex

FLAIR: Feeding via Long-Horizon
AcquIsition of Realistic dishes

FLAIR leverages the few-shot reasoning capabilities of vision-language foundation models to
plan and execute long-horizon bite sequences that factor in both efficiency and user-preference
.

We evaluate FLAIR across a range of diverse plates ranging from noodle dishes to semi-solid dishes to appetizer
and dessert plates with fruits, vegetables, and dips, all subject to a range of user preferences.