Computer Vision
2025
10
- [Deformable 3D Gaussians] Bringing 3D Gaussian Splatting to Life for Real-Time Dynamic Scenes
- [LISA++] Making Vision Models Talk and Point at the Same Time [LISA] From 'Segment the Car' to 'Segment the Safest Place for a Toddler': LLMs Learn to Reason and See
- [DrivingDojo] Why Can't Self-Driving AIs Turn Left? The Dataset for Smarter World Models [Wonder3D] From 2D Snap to 3D Asset in 3 Minutes Diffusion
- [LLaVA-CoT] Teaching AI to Think: Step-by-Step Visual Reasoning Real-Time Video Rendering with 4D Gaussian Splatting [Depth Anything] How 62 Million Unlabeled Photos Created a New State-of-the-Art Vision Model
- [RT-DETR] The First End-to-End Detector to Outpace YOLO in Real-Time [LLaVA-1.5] How Simple Changes Created a State-of-the-Art Vision-Language Model