13 days agoHugging Face Daily PapersOpen CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
13 days agoHugging Face Daily PapersReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL
13 days agoHugging Face Daily PapersMoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
13 days agoHugging Face Daily PapersTime Blindness: Why Video-Language Models Can't See What Humans Can?
13 days agoHugging Face Daily PapersProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
13 days agoHugging Face Daily PapersViStoryBench: Comprehensive Benchmark Suite for Story Visualization
13 days agoHugging Face Daily PapersMetaFaith: Faithful Natural Language Uncertainty Expression in LLMs
13 days agoHugging Face Daily PapersHarnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
13 days agoHugging Face Daily PapersContext is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings
13 days agoHugging Face Daily PapersReflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
13 days agoHugging Face Daily PapersFinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
13 days agoAI News CN (Telegram) - English TranslationPerplexity's new tools can generate spreadsheets, etc.
13 days agoAI News CN (Telegram) - English TranslationResearchers believe that large models can neither think nor reason.