Abstract: Recent robot task planners utilize large language models (LLMs) or vision-language models (VLMs) as a failure detector. These methods perform well by leveraging their semantic reasoning ...
A simple prompt pulls up an interface that resembles an online test environment. You get explanations for wrong answers and detailed feedback at the end.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results