-
Sharing is Caring: How a 'Swarm' of Language Models Learns Faster by Sharing Experiences
Paper at a Glance Paper Title: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing Authors: Jeffrey Amico, Gabriel Passamani Andrade, John Donaghy, Ben Fielding, Tristin Forbus, Harry Grieve, Semih Kara, Jari Kolehmainen, Yihua Lou, Christopher Nies, Edward Philli... -
Is GPT-4V a True Expert? A Deep Dive into MMMU, the AI 'College Exam' That Even Top Models Fail
Paper at a Glance Paper Title: MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Authors: Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, ... -
RT-DETR: The First End-to-End Detector to Outpace YOLO in Real-Time
Paper at a Glance Paper Title: DETRs Beat YOLOs on Real-time Object Detection Authors: Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen Affiliation: Baidu Inc., Peking University Published in: Conference on Computer Vision and Pattern Recognition (C... -
LLaVA-1.5: How Simple Changes Created a State-of-the-Art Vision-Language Model
Paper at a Glance Paper Title: Improved Baselines with Visual Instruction Tuning Authors: Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee Affiliation: University of Wisconsin-Madison, Microsoft Research Published in: CVPR 2024 Link to Paper: https://openaccess.thecvf.com//content/CVPR2024/html... -
Sending Pictures with (Almost) Zero Bandwidth? A Breakdown of Multi-Modal Semantic Communication with Intelligent Metasurfaces
Paper at a Glance Paper Title: Stacked Intelligent Metasurfaces for Multi-Modal Semantic Communications Authors: Guojun Huang, Jiancheng An, Lu Gan, Dusit Niyato, Mérouane Debbah, and Tie Jun Cui Affiliation: University of Electronic Science and Technology of China, Nanyang Technological Univers... -
When Tokens Talk Too Much: A Guide to Compressing AI Inputs from Images, Videos, and Audio
Paper at a Glance Paper Title: When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios Authors: Kele Shao, Keda Tao, Kejia Zhang, Sicheng Feng, Mu Cai, Yuzhang Shang, Haoxuan You, Can Qin, Yang Sui, and Huan Wang. Affiliation: A collabor... -
Blurring to Compress Better: A Deep Dive into Google's Scale-Space Flow for Video Compression
Paper at a Glance Paper Title: Scale-space flow for end-to-end optimized video compression Authors: Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Ballé, Sung Jin Hwang, George Toderici Affiliation: Google Research, Perception Team Published in: IEEE/CVF Conference on Computer Vision a... -
Stop Tuning Your Losses: How Uncertainty Can Automatically Balance Multi-Task Learning Models
Paper at a Glance Paper Title: Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics Authors: Alex Kendall, Yarin Gal, Roberto Cipolla Affiliation: University of Cambridge Published in: Conference on Computer Vision and Pattern Recognition (CVPR), 2018 Link to Pa...
1