Make Papers Easy

Real-Time Video Rendering with 4D Gaussian Splatting

Paper at a Glance Paper Title: 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering Authors: Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang Affiliation: Huazhong University of Science and Technology, Huawei Inc. Published in:...
2025-10-06
CVPR 2024
Computer Vision

| 3D Reconstruction

| Dynamic Scenes
Read moreReal-Time Video Rendering with 4D Gaussian Splatting
[Depth Anything] How 62 Million Unlabeled Photos Created a New State-of-the-Art Vision Model

Paper at a Glance Paper Title: Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Authors: Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao Affiliation: HKU, TikTok, CUHK, ZJU Published in: Conference on Computer Vision and Pattern Recognition (CVPR...
2025-10-06
CVPR 2024
Computer Vision

| Foundation Models

| Monocular Depth Estimation
Read more[Depth Anything] How 62 Million Unlabeled Photos Created a New State-of-the-Art Vision Model
From 1 to N: How Scaling AI Agents with 'Behavior Narratives' Unlocks Near-Human Performance

Paper at a Glance Paper Title: The Unreasonable Effectiveness of Scaling Agents for Computer Use Authors: Gonzalo Gonzalez-Pumariega, Vincent Tu, Chih-Lun Lee, Jiachen Yang, Ang Li, Xin Eric Wang Affiliation: Simular Research Published in: arXiv 2025 (Preprint) Link to Paper: https://arxiv.org/a...
2025-10-06
arXiv 2025
AI Agents

| Large Language Models

| Human-Computer Interaction
Read moreFrom 1 to N: How Scaling AI Agents with 'Behavior Narratives' Unlocks Near-Human Performance
Sharing is Caring: How a 'Swarm' of Language Models Learns Faster by Sharing Experiences

Paper at a Glance Paper Title: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing Authors: Jeffrey Amico, Gabriel Passamani Andrade, John Donaghy, Ben Fielding, Tristin Forbus, Harry Grieve, Semih Kara, Jari Kolehmainen, Yihua Lou, Christopher Nies, Edward Philli...
2025-10-05
arXiv 2025
Reinforcement Learning

| Language Models

| Distributed Systems
Read moreSharing is Caring: How a 'Swarm' of Language Models Learns Faster by Sharing Experiences
[MMMU] The AI 'College Exam' That Even Top Models Fail

Paper at a Glance Paper Title: MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Authors: Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, ...
2025-10-05
CVPR 2024
Artificial Intelligence

| Multimodal AI

| Foundation Models
Read more[MMMU] The AI 'College Exam' That Even Top Models Fail
[RT-DETR] The First End-to-End Detector to Outpace YOLO in Real-Time

Paper at a Glance Paper Title: DETRs Beat YOLOs on Real-time Object Detection Authors: Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen Affiliation: Baidu Inc., Peking University Published in: Conference on Computer Vision and Pattern Recognition (C...
2025-10-05
CVPR 2024
Computer Vision

| Deep Learning

| Transformers
Read more[RT-DETR] The First End-to-End Detector to Outpace YOLO in Real-Time
[LLaVA-1.5] How Simple Changes Created a State-of-the-Art Vision-Language Model

Paper at a Glance Paper Title: Improved Baselines with Visual Instruction Tuning Authors: Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee Affiliation: University of Wisconsin-Madison, Microsoft Research Published in: CVPR 2024 Link to Paper: https://openaccess.thecvf.com//content/CVPR2024/html...
2025-10-05
CVPR 2024
Computer Vision

| Artificial Intelligence

| Multimodal AI
Read more[LLaVA-1.5] How Simple Changes Created a State-of-the-Art Vision-Language Model
Sending Pictures with (Almost) Zero Bandwidth? A Breakdown of Multi-Modal Semantic Communication with Intelligent Metasurfaces

Paper at a Glance Paper Title: Stacked Intelligent Metasurfaces for Multi-Modal Semantic Communications Authors: Guojun Huang, Jiancheng An, Lu Gan, Dusit Niyato, Mérouane Debbah, and Tie Jun Cui Affiliation: University of Electronic Science and Technology of China, Nanyang Technological Univers...
2025-10-05
arXiv 2025
Artificial Intelligence

| Wireless Communications

| Signal Processing
Read moreSending Pictures with (Almost) Zero Bandwidth? A Breakdown of Multi-Modal Semantic Communication with Intelligent Metasurfaces
When Tokens Talk Too Much: A Guide to Compressing AI Inputs from Images, Videos, and Audio

Paper at a Glance Paper Title: When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios Authors: Kele Shao, Keda Tao, Kejia Zhang, Sicheng Feng, Mu Cai, Yuzhang Shang, Haoxuan You, Can Qin, Yang Sui, and Huan Wang. Affiliation: A collabor...
2025-10-04
arXiv 2025
Computer Vision

| Multimodal AI

| Natural Language Processing
Read moreWhen Tokens Talk Too Much: A Guide to Compressing AI Inputs from Images, Videos, and Audio
[SSF2020] Blurring to Compress Better: Google's Scale-Space Flow for Video Compression

Paper at a Glance Paper Title: Scale-space flow for end-to-end optimized video compression Authors: Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Ballé, Sung Jin Hwang, George Toderici Affiliation: Google Research, Perception Team Published in: IEEE/CVF Conference on Computer Vision a...
2025-10-04
CVPR 2020
Video Compression

| Computer Vision

| Deep Learning
Read more[SSF2020] Blurring to Compress Better: Google's Scale-Space Flow for Video Compression

1 234