3 months agoHugging Face Daily PapersUnleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
3 months agoHugging Face Daily PapersIllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
3 months agoHugging Face Daily PapersUniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
3 months agoHugging Face Daily PapersMERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
3 months agoHugging Face Daily PapersSVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
3 months agoHugging Face Daily PapersCo-Evolving LLM Coder and Unit Tester via Reinforcement Learning
3 months agoHugging Face Daily PapersOmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
3 months agoHugging Face Daily PapersRethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach
3 months agoHugging Face Daily PapersAnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation
3 months agoHugging Face Daily PapersDCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
3 months agoHugging Face Daily PapersControllable Human-centric Keyframe Interpolation with Generative Prior
3 months agoHugging Face Daily PapersByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions
3 months agoHugging Face Daily PapersCritique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
3 months agoHugging Face Daily PapersTalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models
3 months agoHugging Face Daily PapersFuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
3 months agoHugging Face Daily PapersStreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs
3 months agoHugging Face Daily PapersSparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers