about 2 months agoHugging Face Daily PapersAniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
about 2 months agoHugging Face Daily PapersScientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
about 2 months agoHugging Face Daily PapersMirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
about 2 months agoHugging Face Daily PapersDiscovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
about 2 months agoHugging Face Daily PapersOptimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
about 2 months agoHugging Face Daily PapersTTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
about 2 months agoHugging Face Daily PapersAttention, Please! Revisiting Attentive Probing for Masked Image Modeling
about 2 months agoHugging Face Daily PapersViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs