3 months agoHugging Face Daily PapersHoPE: Hybrid of Position Embedding for Length Generalization in Vision-Language Models
3 months agoHugging Face Daily PapersMMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
3 months agoAI News CN (Telegram) - English Translation🎬 Xiao Hu: Veo 3 generates videos introducing products
3 months agoAI News CN (Telegram) - English Translation🎬 Xiaohu: Google Project Astra acts as an AI tutor
3 months agoHugging Face Daily PapersSWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
3 months agoHugging Face Daily PapersMangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding
3 months agoHugging Face Daily PapersDiSA: Diffusion Step Annealing in Autoregressive Image Generation
3 months agoHugging Face Daily PapersGLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
3 months agoHugging Face Daily PapersOpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
3 months agoHugging Face Daily PapersVisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
3 months agoHugging Face Daily PapersMotionPro: A Precise Motion Controller for Image-to-Video Generation
3 months agoHugging Face Daily PapersAlita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution