Multimodal AI
2025
10
- [Paper2Poster] This AI Agent Turns Your 22-Page Paper into a Conference Poster for Less Than a Cent [CogAgent] An AI That Sees Your Screen Like You Do—And Can Use It For You
- [MVBench] Beyond Still Frames: The Benchmark Testing if AI Truly Understands Time in Videos
- [LISA++] Making Vision Models Talk and Point at the Same Time
- [Apriel-1.5-15B-Thinker] Smart Training, Not Bigger Models [Paper2Video] From Paper to Presentation in Minutes
- [LLaVA-CoT] Teaching AI to Think: Step-by-Step Visual Reasoning
- [MMMU] The AI 'College Exam' That Even Top Models Fail [LLaVA-1.5] How Simple Changes Created a State-of-the-Art Vision-Language Model
- When Tokens Talk Too Much: A Guide to Compressing AI Inputs from Images, Videos, and Audio
1