Fixing the Bjøntegaard Delta with Akima Interpolation

Paper at a Glance

Paper Title: Beyond Bjøntegaard: Limits of Video Compression Performance Comparisons
Authors: Christian Herglotz, Matthias Kränzler, Ruben Mons, André Kaup
Affiliation: Multimedia Communications and Signal Processing, Friedrich-Alexander University Erlangen-Nürnberg (FAU)
Published in: 2022 IEEE International Conference on Image Processing (ICIP)
Link to Paper: arXiv:2202.12565
Project Page: GitHub Repository

The Gist of It: TL;DR

In one sentence: This paper reveals significant inaccuracies in the standard “Bjøntegaard Delta” (BD-rate) calculation when applied to modern metrics like VMAF or decoding energy and proposes replacing the traditional interpolation methods with Akima interpolation, which reduces approximation errors to below 1.5%.

Why It Matters: The Big Picture

For the past two decades, the “gold standard” for claiming one video codec is better than another has been the Bjøntegaard Delta (BD). This metric calculates the average bitrate difference between two codecs at the same quality level. If you claim your new algorithm saves 5% bitrate, you calculated that using BD-rate.

However, the original BD method was designed for Rate vs. PSNR (Peak Signal-to-Noise Ratio). Today, researchers evaluate codecs using diverse metrics: SSIM, VMAF, and even hardware-dependent factors like decoding energy. The problem? No one checked if the math behind BD—specifically the curve fitting—actually works for these new, often noisier metrics. If the underlying math is shaky, a claimed “1% improvement” might actually be a calculation error. This paper audits the industry standard and proposes a fix to ensure future research is statistically valid.

The Core Idea: How It Works

The core of the paper investigates the “interpolation” step of the BD-rate calculation.

1. The Problem: Connect-the-Dots Gone Wrong

In video coding experiments, you usually generate only 4 data points (Rate-Distortion points) for a specific video sequence. To compare two codecs, you need to calculate the area between their curves. Since you only have 4 dots, you have to guess the shape of the line connecting them (interpolation).

Standard methods use Cubic Splines or PCHIP (Piecewise Cubic Hermite Interpolating Polynomial). The authors found that while these work okay for smooth PSNR curves, they can fail spectacularly for other metrics. For example, when plotting Decoding Energy vs. VMAF, a cubic spline might “overshoot,” creating a fictional curve that suggests the codec performs much better or worse than it actually does.

2. The Innovation: Akima Interpolation

The authors propose switching to Akima Interpolation. This is a specific method of curve fitting that is less prone to oscillation. Unlike a cubic spline, which tries to be very smooth (sometimes creating wide loops to connect points), Akima fits a curve that is tighter to the local data points and respects sudden changes in direction without “wiggling.”

3. The Methodology: Checking the Ground Truth

To prove this, the authors didn’t just guess. They established a ground truth:

Dense Encoding: Instead of the standard 4 points, they encoded video sequences at virtually all available quality settings (Quantization Parameters 22 through 37). This gave them the “true” curve shape.
Simulation: They then took only the standard 4 points from that dense set and tried to reconstruct the curve using different interpolation methods (Cubic Spline, PCHIP, Akima).
Error Measurement: They measured the difference between the reconstructed curve and the actual dense data points (as shown in Figure 2 of the paper). This quantified exactly how wrong the traditional BD calculation could be.

Key Experimental Results

The authors evaluated these interpolation methods using HEVC and VVC codecs across various sequences.

Finding 1: Traditional Methods Fail on New Metrics. When calculating BD-rate for SSIM vs. Bitrate, the standard Cubic Spline (CSI) method resulted in a maximum relative error of over 110% in extreme cases. This means the calculated performance was completely unreliable.
Finding 2: PCHIP is Better, but not Best. PCHIP, which is often used in standardization documents, reduced errors significantly compared to basic splines but still showed errors up to ~13% for Energy metrics.
Finding 3: Akima is the Most Robust. The proposed Akima interpolation consistently outperformed all other methods across all tested pairs (PSNR, SSIM, VMAF, Energy). It kept the mean interpolation error below 1.5% in all scenarios.
Finding 4: The paper concludes that while PCHIP is generally stable, Akima is the safest bet for a universal interpolation method, especially when dealing with non-standard performance metrics like decoding energy.

A Critical Look: Strengths & Limitations

Strengths / Contributions

Audit of a Foundational Tool: The paper scrutinizes a tool (BD-rate) that the entire industry takes for granted, highlighting hidden flaws that could invalidate research findings.
Rigorous Ground Truth: By generating dense Rate-Distortion points (encoding at every QP), the authors provided an indisputable baseline for accuracy, rather than relying on theoretical assumptions.
Practical Solution: They don’t just identify the problem; they provide a drop-in replacement (Akima) and even released Python scripts to help the community switch easily.

Limitations / Open Questions

Threshold of Significance: The paper establishes an accuracy bound of roughly 1.5% for Akima. This implies that if a researcher finds a coding gain of only 0.5% or 1%, it might still be indistinguishable from interpolation noise, even with the new method.
Limited Dataset: The evaluation used 6 sequences and 2 codecs (HM and VTM). While representative, edge cases in other content types (e.g., screen content, gaming) or different encoders (e.g., AV1, VP9) might exhibit different curve behaviors.

Contribution Level: Significant Improvement. This paper does not invent a new codec, but it refines the ruler used to measure all of them. It is an essential read for anyone conducting rigorous performance evaluations in video coding.

Conclusion: Potential Impact

This research serves as a crucial sanity check for the video compression community. As the industry moves toward AI-based compression and green computing (energy-aware coding), metrics are becoming more complex and less “smooth” than traditional PSNR. Adopting Akima-based BD-rate (or “ABD”) ensures that reported gains are real engineering victories, not just artifacts of bad math. Standardization bodies and researchers should consider updating their evaluation toolkits to reflect these findings.