Complexity-Invariant Rate-Distortion Gains of Transformer- Based Neural Image Codecs: A Stratified Evaluation Framework

Complexity-Invariant Rate-Distortion Gains of Transformer- Based Neural Image Codecs: A Stratified Evaluation Framework Digital Signal Processing and Artificial Intelligence for Automatic Learning Maleerat Maliyaem 5 1 2026 https://doi.org/10.6025/dspaial/2026/5/1/18-31 https://www.dline.info/dspai/fulltext/v5n1/dspaiv5n1_2.pdf This study investigates the rate distortion complexity tradeoffs of modern neural image codecs, with emphasis on practical deployment in resource constrained environments such as edge and augmented reality devices. While neural compression models often surpass classical standards (e.g., HEVC, VVC) in rate distortion performance, their high decoding complexity particularly from autoregressive entropy models hinders real world adoption. The authors address this by evaluating three representative architectures on the Kodak dataset (24 natural RGB images, 768Ã—512): a hyperprior baseline, an autoregressive context model, and a transformer based codec. To ensure robust analysis, images were objectively stratified into low , medium, and high complexity bins based on Sobel-based gradient energy. Results demonstrate that transformer based codecs achieve approximately 44% improvement in BD rate over the hyperprior baseline, whereas autoregressive models yield approximately 30% savings. Critically, these gains remain consistent across all complexity levels (variation <1.5 percentage points), indicating architectural robustness rather than content specific optimization. At 0.62 bits per pixel, transformers deliver a 2.5 dB PSNR advantage with visibly superior texture and edge preservation. All performance differences were statistically significant (p < 0.001). The findings underscore a paradigm shift from pure rate distortion optimization toward balanced rate distortion complexity design. Transformer architectures, with their capacity for global context modeling, emerge as particularly promising for next generation standards where bandwidth efficiency and visual fidelity must coexist with computational constraints. The study establishes a reproducible evaluation framework grounded in objective complexity metrics and rigorous statistical validation, offering a methodological foundation for future codec development and benchmarking.