11 days agoHugging Face Daily PapersUnleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
11 days agoHugging Face Daily PapersIllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
11 days agoHugging Face Daily PapersUniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
11 days agoHugging Face Daily PapersMERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
11 days agoHugging Face Daily PapersSVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
11 days agoHugging Face Daily PapersOmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
11 days agoHugging Face Daily PapersRethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach
11 days agoHugging Face Daily PapersAnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation
11 days agoHugging Face Daily PapersDCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
11 days agoHugging Face Daily PapersControllable Human-centric Keyframe Interpolation with Generative Prior
11 days agoHugging Face Daily PapersByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions
11 days agoHugging Face Daily PapersCritique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
11 days agoHugging Face Daily PapersTalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models
11 days agoHugging Face Daily PapersFuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens