Current approaches in robot teleoperation often require significant mental and physical effort. In this study, we propose a new intent detection framework to teleoperate robotic arms based on human speech and natural eye gaze. Our framework applies instance segmentation on the robot’s camera image and predicts the human’s intended object through matching eye-gaze data, instance masks, instance classes, and transcribed words. Our experiment results show a prediction accuracy between 90.7% and 98.6%, including cases when the target objects are duplicated or occluded. The prediction accuracy of the combination of eye gaze and speech inputs outperformed the prediction accuracy of eye gaze input only, between 79.9% and 89.2%, and speech input only, between 25.3% and 71.6%. Moreover, we observe that eye gaze input has a greater importance than speech input in improving prediction accuracy when two duplicated target objects are present in the scene. Our results from NASA TLX questionnaires show that teleoperating the robotic arms with our proposed framework requires little effort including cases when the target objects are duplicated or occluded.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.