2025
30
- [Paper2Poster] This AI Agent Turns Your 22-Page Paper into a Conference Poster for Less Than a Cent [CogAgent] An AI That Sees Your Screen Like You Do—And Can Use It For You
- [MVBench] Beyond Still Frames: The Benchmark Testing if AI Truly Understands Time in Videos [Deformable 3D Gaussians] Bringing 3D Gaussian Splatting to Life for Real-Time Dynamic Scenes
- Making the Metaverse Real: How Semantic AI and Edge Computing Can Tame Holographic Video [LISA++] Making Vision Models Talk and Point at the Same Time [LISA] From 'Segment the Car' to 'Segment the Safest Place for a Toddler': LLMs Learn to Reason and See
- Less is More: Recursive Reasoning with Tiny Networks [ExGRPO] Teach LLMs to Learn from Experience
- [Apriel-1.5-15B-Thinker] Smart Training, Not Bigger Models [Paper2Video] From Paper to Presentation in Minutes [TaTToo] Why Do LLMs Fail on Tables?
- [DrivingDojo] Why Can't Self-Driving AIs Turn Left? The Dataset for Smarter World Models [The Dragon Hatchling] A New AI Architecture Linking Transformers to the Brain [1.58-Bit BitNet] The Era of 1-Bit LLMs Has Begun [Wonder3D] From 2D Snap to 3D Asset in 3 Minutes Diffusion
- Smarter, Not Louder: How LLMs Cut Multi-Agent Communication by 53% While Boosting Performance Building the Brain of 6G: A Tutorial on Large AI Models and Agentic AI for Intelligent Communications [LLaVA-CoT] Teaching AI to Think: Step-by-Step Visual Reasoning Real-Time Video Rendering with 4D Gaussian Splatting [Depth Anything] How 62 Million Unlabeled Photos Created a New State-of-the-Art Vision Model From 1 to N: How Scaling AI Agents with 'Behavior Narratives' Unlocks Near-Human Performance
- Sharing is Caring: How a 'Swarm' of Language Models Learns Faster by Sharing Experiences [MMMU] The AI 'College Exam' That Even Top Models Fail [RT-DETR] The First End-to-End Detector to Outpace YOLO in Real-Time [LLaVA-1.5] How Simple Changes Created a State-of-the-Art Vision-Language Model Sending Pictures with (Almost) Zero Bandwidth? A Breakdown of Multi-Modal Semantic Communication with Intelligent Metasurfaces
- When Tokens Talk Too Much: A Guide to Compressing AI Inputs from Images, Videos, and Audio [SSF2020] Blurring to Compress Better: Google's Scale-Space Flow for Video Compression [AutomaticWeightedLoss] Stop Tuning Your Losses: How Uncertainty Can Automatically Balance Multi-Task Learning Models