Skip to content Skip to sidebar Skip to footer

MathVerse: An All-Around Visual Math Benchmark Designed for an Equitable and In-Depth Evaluation of Multi-modal Large Language Models (MLLMs)

The performance of multimodal large Language Models (MLLMs) in visual situations has been exceptional, gaining unmatched attention. However, their ability to solve visual math problems must still be fully assessed and comprehended. For this reason, mathematics often presents challenges in understanding complex concepts and interpreting the visual information crucial for solving problems. In educational contexts…

Read More

GeFF: Revolutionizing Robot Perception and Action with Scene-Level Generalizable Neural Feature Fields

When a whirring sound catches your attention, you’re walking down the bustling city street, carefully cradling your morning coffee. Suddenly, a knee-high delivery robot zips past you on the crowded sidewalk. With remarkable dexterity, it smoothly avoids colliding into pedestrians, strollers, and obstructions, deftly plotting a clear path forward. This isn’t some sci-fi scene –…

Read More