Computer Vision
2025
8
- Teaching AI to Think: A Deep Dive into LLaVA-CoT's Step-by-Step Visual Reasoning Real-Time Video Rendering with 4D Gaussian Splatting Depth Anything: How 62 Million Unlabeled Photos Created a New State-of-the-Art Vision Model
- RT-DETR: The First End-to-End Detector to Outpace YOLO in Real-Time LLaVA-1.5: How Simple Changes Created a State-of-the-Art Vision-Language Model
- When Tokens Talk Too Much: A Guide to Compressing AI Inputs from Images, Videos, and Audio Blurring to Compress Better: A Deep Dive into Google's Scale-Space Flow for Video Compression Stop Tuning Your Losses: How Uncertainty Can Automatically Balance Multi-Task Learning Models
1