Enhancing Vision-Language Model Reasoning for
Object Navigation via Active 3D Gaussian Splatting

Wancai Zheng¹, Hao Chen², Xianlong Lu¹, Linlin Ou¹, Xinyi Yu^1*,

¹Zhejiang University of Technology.
²Zhejiang University.
^*Corresponding author

Object navigation is a core capability of embodied intelligence, enabling an agent to locate target objects in unknown environments. Recent advances in vision–language models (VLMs) have facilitated zero-shot object navigation (ZSON). However, existing methods often rely on scene abstractions that convert environments into semantic maps or textual representations, causing high‑level decision making to be constrained by the accuracy of low‑level perception. In this work, we present 3DGSNav, a novel ZSON framework that embeds 3D Gaussian Splatting (3DGS) as persistent memory for VLMs to enhance spatial reasoning. Through active perception, 3DGSNav incrementally constructs a 3DGS representation of the environment, enabling trajectory-guided free-viewpoint rendering of frontier-aware first-person views. Moreover, we design structured visual prompts and integrate them with Chain-of-Thought (CoT) prompting to further improve VLM reasoning. During navigation, a real‑time object detector filters potential targets, while VLM‑driven active viewpoint switching performs target re‑verification, ensuring efficient and reliable recognition. Extensive evaluations across multiple benchmarks and real‑world experiments on a quadruped robot demonstrate that our method achieves robust and competitive performance against state‑of‑the‑art approaches.