Make Papers Easy

2026 5

[VIS-SemCom] 为什么自动驾驶不需要看清天空？一文读懂基于“重要性感知”的语义通信新范式 [GenerativeJSCC] 告别模糊！用 StyleGAN 拯救极低信噪比下的无线图像传输 [DeepJSCC-l] 告别“悬崖效应”：如何用深度联合信源信道编码实现无线图像的多级自适应传输
[LeWM] How to Train Stable World Models from Pixels with Just Two Loss Terms
Fixing the Bjøntegaard Delta with Akima Interpolation

2025 30

[Paper2Poster] This AI Agent Turns Your 22-Page Paper into a Conference Poster for Less Than a Cent [CogAgent] An AI That Sees Your Screen Like You Do—And Can Use It For You
[MVBench] Beyond Still Frames: The Benchmark Testing if AI Truly Understands Time in Videos [Deformable 3D Gaussians] Bringing 3D Gaussian Splatting to Life for Real-Time Dynamic Scenes
Making the Metaverse Real: How Semantic AI and Edge Computing Can Tame Holographic Video [LISA++] Making Vision Models Talk and Point at the Same Time [LISA] From 'Segment the Car' to 'Segment the Safest Place for a Toddler': LLMs Learn to Reason and See
Less is More: Recursive Reasoning with Tiny Networks [ExGRPO] Teach LLMs to Learn from Experience
[Apriel-1.5-15B-Thinker] Smart Training, Not Bigger Models [Paper2Video] From Paper to Presentation in Minutes [TaTToo] Why Do LLMs Fail on Tables?
[DrivingDojo] Why Can't Self-Driving AIs Turn Left? The Dataset for Smarter World Models [The Dragon Hatchling] A New AI Architecture Linking Transformers to the Brain [1.58-Bit BitNet] The Era of 1-Bit LLMs Has Begun [Wonder3D] From 2D Snap to 3D Asset in 3 Minutes Diffusion
Smarter, Not Louder: How LLMs Cut Multi-Agent Communication by 53％ While Boosting Performance Building the Brain of 6G: A Tutorial on Large AI Models and Agentic AI for Intelligent Communications [LLaVA-CoT] Teaching AI to Think: Step-by-Step Visual Reasoning Real-Time Video Rendering with 4D Gaussian Splatting [Depth Anything] How 62 Million Unlabeled Photos Created a New State-of-the-Art Vision Model From 1 to N: How Scaling AI Agents with 'Behavior Narratives' Unlocks Near-Human Performance
Sharing is Caring: How a 'Swarm' of Language Models Learns Faster by Sharing Experiences [MMMU] The AI 'College Exam' That Even Top Models Fail [RT-DETR] The First End-to-End Detector to Outpace YOLO in Real-Time [LLaVA-1.5] How Simple Changes Created a State-of-the-Art Vision-Language Model Sending Pictures with (Almost) Zero Bandwidth? A Breakdown of Multi-Modal Semantic Communication with Intelligent Metasurfaces
When Tokens Talk Too Much: A Guide to Compressing AI Inputs from Images, Videos, and Audio [SSF2020] Blurring to Compress Better: Google's Scale-Space Flow for Video Compression [AutomaticWeightedLoss] Stop Tuning Your Losses: How Uncertainty Can Automatically Balance Multi-Task Learning Models