6 days agoHugging Face Daily PapersMERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
6 days agoHugging Face Daily PapersSVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
6 days agoHugging Face Daily PapersOmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
6 days agoHugging Face Daily PapersRethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach
6 days agoHugging Face Daily PapersAnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation
6 days agoHugging Face Daily PapersDCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
6 days agoHugging Face Daily PapersControllable Human-centric Keyframe Interpolation with Generative Prior
6 days agoHugging Face Daily PapersByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions
6 days agoHugging Face Daily PapersCritique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
6 days agoHugging Face Daily PapersTalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models
6 days agoHugging Face Daily PapersFuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
6 days agoHugging Face Daily PapersStreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs
6 days agoHugging Face Daily PapersSparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers