25 days agoHugging Face Daily PapersDFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response
25 days agoHugging Face Daily PapersMLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research
25 days agoHugging Face Daily PapersAn Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning
25 days agoHugging Face Daily PapersWhich Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions
26 days agoHugging Face Daily PapersEnigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
26 days agoHugging Face Daily PapersScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
26 days agoAI News CN (Telegram) - English TranslationWhy did Elon Musk, Jensen Huang and Sam Altman come to show their support? What are these three tycoons after?
26 days agoHugging Face Daily PapersDeciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
26 days agoHugging Face Daily PapersMOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs
26 days agoAI News CN (Telegram) - English TranslationChatGPT o3 model refused to execute the shutdown command during the test.
26 days agoHugging Face Daily PapersDone Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition