![[Paper2Poster] This AI Agent Turns Your 22-Page Paper into a Conference Poster for Less Than a Cent](https://i.imgur.com/B8z4eT2.png)
[Paper2Poster] This AI Agent Turns Your 22-Page Paper into a Conference Poster for Less Than a Cent

Paper at a Glance
- Paper Title: Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
- Authors: Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, Philip Torr
- Affiliation: University of Waterloo, National University of Singapore, University of Oxford
- Published in: arXiv, 2025
- Link to Paper: https://arxiv.org/abs/2505.21497
- Project Page: https://paper2poster.github.io
The Gist of It: TL;DR
In one sentence: This paper introduces Paper2Poster, the first benchmark and metric suite for automatically generating academic posters from scientific papers, and proposes PosterAgent, a multi-agent system that uses a top-down, visual-feedback loop to transform long-form papers into concise, editable
.pptx
posters.
Why It Matters: The Big Picture
Anyone in research knows the pre-conference scramble: your paper is accepted, and now you have to distill dozens of pages of dense text, figures, and tables into a single, visually compelling A0 poster. This task is more art than science, requiring summarization, design sense, and a knack for spatial arrangement. While AI has made strides in generating slide decks, posters remain a uniquely hard problem. A slide deck can spread information across many simple layouts, but a poster must condense everything onto one canvas, demanding a complex interplay of text and graphics without becoming a cluttered mess.
Current large language models (LLMs) and vision-language models (VLMs), on their own, are not up to the task. They struggle to reason about spatial constraints, leading to text overflowing its boundaries or poorly aligned elements. Furthermore, how do we even measure if an AI-generated poster is “good”?
This is where Paper2Poster
comes in. The authors make two key contributions:
- A Benchmark: They create the first comprehensive dataset and evaluation framework specifically for the paper-to-poster task.
- An Agent: They build
PosterAgent
, a system that intelligently mimics the human workflow of creating a poster, from high-level planning down to fine-grained visual tweaks.
The Core Idea: How It Works
PosterAgent
breaks down the daunting task of poster creation into a structured, three-step pipeline that combines global planning with local, visually-grounded refinement.
1. The Problem They’re Solving
The core challenge is multimodal context compression. A 20-page paper with 20,000 tokens and 20+ figures must be transformed into a single page with ~1,500 tokens and ~8 figures. This requires not just summarizing text, but also selecting the right visuals, arranging them logically, and ensuring the final layout is both readable and aesthetically pleasing. End-to-end models like GPT-4o can generate beautiful images, but as the study shows, the embedded text is often nonsensical, and the informational content gets lost.
2. The Key Innovation
The standout idea is the visual-in-the-loop, multi-agent framework. Instead of trying to generate the entire poster in one shot, PosterAgent
acts like a team of specialists. A Parser organizes the content, a Planner sketches the layout, and a Painter-Commenter duo works iteratively to perfect each section, with the Commenter acting as a “critic” that provides visual feedback. This mirrors how a human designer would draft, review, and revise their work.
3. The Method, Step-by-Step
As illustrated in Figure 4 of the paper, the process unfolds in three stages:
-
Parser (Global Organization): The agent first ingests the raw paper PDF. Using tools like MARKER, it converts the paper into Markdown and then uses an LLM to distill it into a structured “asset library.” This library contains paragraph-level summaries for each section (Introduction, Methods, etc.) and all the extracted figures and tables.
-
Planner (Layout Generation): Next, the Planner acts as a high-level designer. It semantically matches each visual asset (e.g., a results graph) to its corresponding text summary. Then, based on the estimated length of the content for each section, it generates a binary-tree layout that maps out the poster’s panels, preserving reading order and ensuring a balanced composition.
-
Painter-Commenter Loop (Local Refinement): This is where the magic happens. For each panel defined by the Planner:
- The Painter takes the section summary and figure, distills the text into concise bullet points, and generates
python-pptx
code to render a draft of that panel. - The Commenter, a VLM, then “looks” at the rendered image of the panel. Using a “zoom-in” focus and guided by examples of good and bad layouts, it provides targeted feedback like “text is overflowing” or “layout is too blank.”
- This feedback is sent back to the Painter, which revises the content and code. This loop continues until the Commenter signals that the panel is “good to go.”
- The Painter takes the section summary and figure, distills the text into concise bullet points, and generates
This iterative process ensures each part of the poster is coherent and visually sound before the final, editable .pptx
file is assembled.
Key Experimental Results
The authors introduce a powerful new metric called PaperQuiz, where various VLMs act as “readers” with different expertise levels (from student to professor). These AI readers try to answer multiple-choice questions about the original paper based only on the generated poster. A higher score means the poster is better at conveying the core content.
- Pixel-level generation is not enough: GPT-4o’s image generation (
40-Image
) created visually appealing posters but scored poorly onPaperQuiz
and had terrible text quality (high perplexity). This shows that aesthetics alone don’t make a good scientific poster. PaperQuiz
is a robust metric: The scores from thePaperQuiz
metric showed strong correlation with human evaluations (Figure 6), confirming it’s a reliable proxy for how effectively a poster communicates information.PosterAgent
excels in communication and efficiency:PosterAgent
consistently achieved the highestPaperQuiz
scores, outperforming all other baselines (Table 2). The fully open-source version,PosterAgent-Qwen
, surpassed more resource-intensive systems while using 87% fewer tokens. As shown in Figure 7, this translates to an astonishingly low cost of just $0.005 per poster.


A Critical Look: Strengths & Limitations
Strengths / Contributions
- A Foundational Benchmark: The
Paper2Poster
benchmark, and especially thePaperQuiz
metric, provides the community with the first standardized way to measure progress on this complex task. This is a crucial contribution that will enable future research. - Elegant and Effective Agent Design: The top-down
Parser -> Planner -> Painter/Commenter
architecture is a smart solution to the spatial reasoning problem. The “visual-in-the-loop” refinement step is a clever mechanism for correcting layout errors that plague single-shot generation methods. - Highly Practical and Accessible: By producing an editable
.pptx
file at an extremely low cost, the authors have created a tool with real-world utility for researchers. The open-source variant further democratizes this capability.
Limitations / Open Questions
- Dependency on Conventional Paper Structure: The Parser’s success seems tied to the standard IMRAD (Introduction, Methods, Results, and Discussion) structure of scientific papers. It’s unclear how it would perform on papers with non-traditional formats or from different academic disciplines.
- The Aesthetic Gap: While the generated posters are functionally excellent, they remain visually generic (see Figure 8b). They lack the creative, high-impact visual design choices that distinguish the best human-made posters. The VLM-as-Judge “Engagement” score for
PosterAgent
still trails behind human-designed posters. - Reliability of VLM Feedback: The framework’s success, particularly in the Painter-Commenter loop and evaluation, hinges on the visual reasoning capabilities of state-of-the-art VLMs. The paper notes that GPT-4o was a better “Commenter” than open-source alternatives, suggesting that the quality of this feedback loop may be a bottleneck.
Contribution Level: Significant Improvement. This paper carves out a new and important problem space for AI. While not introducing a new foundational model, it presents a highly effective system and, more importantly, a robust benchmark to measure success. The Paper2Poster
framework and the PosterAgent
solution together represent a major step forward in AI-powered scientific communication.
Conclusion: Potential Impact
Paper2Poster
and PosterAgent
offer a glimpse into a future where researchers can offload tedious design tasks to intelligent AI assistants. This work moves beyond simple text generation to tackle a complex, multimodal, and layout-sensitive problem. By creating a practical, efficient, and open tool, the authors have not only advanced the state of generative AI but also provided a tangible benefit to the scientific community. The next steps will likely involve improving the aesthetic creativity of the agent and extending its capabilities to even more diverse and complex document types.
- Title: [Paper2Poster] This AI Agent Turns Your 22-Page Paper into a Conference Poster for Less Than a Cent
- Author: Jellyfish
- Created at : 2025-10-14 17:09:42
- Updated at : 2025-10-14 08:13:55
- Link: https://makepaperseasy.com/posts/20251014170942/
- License: This work is licensed under CC BY-NC-SA 4.0.
![[Paper2Video] From Paper to Presentation in Minutes](https://i.imgur.com/aiJuMt1.png)


![[LLaVA-CoT] Teaching AI to Think: Step-by-Step Visual Reasoning](https://i.imgur.com/OcSaL6a.png)
![[CogAgent] An AI That Sees Your Screen Like You Do—And Can Use It For You](https://i.imgur.com/kTVXkgi.png)

![[Paper2Video] From Paper to Presentation in Minutes](https://i.imgur.com/aiJuMt1.png)


![[LLaVA-CoT] Teaching AI to Think: Step-by-Step Visual Reasoning](https://i.imgur.com/OcSaL6a.png)