4 days agoHugging Face Daily PapersDenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
4 days agoHugging Face Daily PapersRefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
4 days agoHugging Face Daily PapersUnleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
4 days agoHugging Face Daily PapersIllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
4 days agoHugging Face Daily PapersUniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
4 days agoHugging Face Daily PapersMERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
4 days agoHugging Face Daily PapersSVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
4 days agoHugging Face Daily PapersOmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models