Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration

Sen Wang1, Bangwei Liu1, Zhenkun Gao1, Lizhuang Ma1, Xuhong Wang2, Yuan Xie1, Xin Tan1,2*
1East China Normal University, 2Shanghai AI Laboratory
* Corresponding author.
CVPR 2026

Long-term Memory Embodied Exploration, which aims to collect episodic memories during Multi-goal Navigation and introduces Memory-based Question Answering to unify and evaluate the model's cognitive and decision-making abilities.

Experiment Results

Abstract

An ideal embodied agent should possess lifelong learning capabilities to handle long-horizon and complex tasks, enabling continuous operation in general environments. This not only requires the agent to accurately accomplish given tasks but also to leverage long-term episodic memory to optimize decision-making. However, existing mainstream one-shot embodied tasks primarily focus on task completion results, neglecting the crucial process of exploration and memory utilization. To address this, we propose Long-term Memory Embodied Exploration (LMEE), which aims to unify the agent's exploratory cognition and decision-making behaviors to promote lifelong learning.

We further construct a corresponding dataset and benchmark, LMEE-Bench, incorporating multi-goal navigation and memory-based question answering to comprehensively evaluate both the process and outcome of embodied exploration. To enhance the agent's memory recall and proactive exploration capabilities, we propose MemoryExplorer, a novel method that fine-tunes a multimodal large language model through reinforcement learning to encourage active memory querying. By incorporating a multi-task reward function that includes action prediction, frontier selection, and question answering, our model achieves proactive exploration. Extensive experiments against state-of-the-art embodied exploration models demonstrate that our approach achieves significant advantages in long-horizon embodied tasks.

Real World Testing (5x Speed)


Citation

      @inproceedings{wang2026explore,
        title={Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration},
        author={Wang, Sen and Liu, Bangwei and Gao, Zhenkun and Ma, Lizhuang and Wang, Xuhong and Xie, Yuan and Tan, Xin},
        booktitle={Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition (CVPR)},
        year={2026}
      }