3 months agoHugging Face Daily PapersAdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery
3 months agoHugging Face Daily PapersPaper2Poster: Towards Multimodal Poster Automation from Scientific Papers
3 months agoHugging Face Daily PapersUI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
3 months agoHugging Face Daily PapersAdversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
3 months agoHugging Face Daily PapersFrame In-N-Out: Unbounded Controllable Image-to-Video Generation
3 months agoHugging Face Daily PapersDetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction
3 months agoHugging Face Daily PapersScaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
3 months agoHugging Face Daily PapersActive-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
3 months agoHugging Face Daily PapersCoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects
3 months agoHugging Face Daily PapersR2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
3 months agoHugging Face Daily PapersVideo-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
3 months agoHugging Face Daily PapersHoliTom: Holistic Token Merging for Fast Video Large Language Models
3 months agoHugging Face Daily PapersMME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
3 months agoHugging Face Daily PapersMME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
3 months agoHugging Face Daily PapersrStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset