Paper at a Glance
Paper Title: Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Authors: Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
Affiliation: HKU, TikTok, CUHK, ZJU
Published in: Conference on Computer Vision and Pattern Recognition (CVPR...
Paper at a Glance
Paper Title: The Unreasonable Effectiveness of Scaling Agents for Computer Use
Authors: Gonzalo Gonzalez-Pumariega, Vincent Tu, Chih-Lun Lee, Jiachen Yang, Ang Li, Xin Eric Wang
Affiliation: Simular Research
Published in: arXiv 2025 (Preprint)
Link to Paper: https://arxiv.org/a...
Paper at a Glance
Paper Title: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Authors: Jeffrey Amico, Gabriel Passamani Andrade, John Donaghy, Ben Fielding, Tristin Forbus, Harry Grieve, Semih Kara, Jari Kolehmainen, Yihua Lou, Christopher Nies, Edward Philli...
Paper at a Glance
Paper Title: MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Authors: Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, ...
Paper at a Glance
Paper Title: DETRs Beat YOLOs on Real-time Object Detection
Authors: Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen
Affiliation: Baidu Inc., Peking University
Published in: Conference on Computer Vision and Pattern Recognition (C...
Paper at a Glance
Paper Title: Improved Baselines with Visual Instruction Tuning
Authors: Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee
Affiliation: University of Wisconsin-Madison, Microsoft Research
Published in: CVPR 2024
Link to Paper: https://openaccess.thecvf.com//content/CVPR2024/html...
Paper at a Glance
Paper Title: Stacked Intelligent Metasurfaces for Multi-Modal Semantic Communications
Authors: Guojun Huang, Jiancheng An, Lu Gan, Dusit Niyato, Mérouane Debbah, and Tie Jun Cui
Affiliation: University of Electronic Science and Technology of China, Nanyang Technological Univers...
Paper at a Glance
Paper Title: When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Authors: Kele Shao, Keda Tao, Kejia Zhang, Sicheng Feng, Mu Cai, Yuzhang Shang, Haoxuan You, Can Qin, Yang Sui, and Huan Wang.
Affiliation: A collabor...
Paper at a Glance
Paper Title: Scale-space flow for end-to-end optimized video compression
Authors: Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Ballé, Sung Jin Hwang, George Toderici
Affiliation: Google Research, Perception Team
Published in: IEEE/CVF Conference on Computer Vision a...
Paper at a Glance
Paper Title: Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
Authors: Alex Kendall, Yarin Gal, Roberto Cipolla
Affiliation: University of Cambridge
Published in: Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Link to Pa...