The First Workshop on Benchmarking and Expanding AI Multimodal Approaches at CVPR 2025

BEAM2025

Workshop Time, June 11 8:30am - 12:30pm

Workshop location, Music City Center and Online (Hybrid) | Room 202 C

The event will feature discussions highlighting the evolution and future of multimodal AI benchmarks, exploring methodologies and their real-world applications. It will conclude with a dynamic panel discussion, offering insights into the challenges and solutions for creating robust multimodal benchmarks and fostering interdisciplinary collaboration and innovation in AI benchmarking standards.

As a continuation of the 1st Workshop on Evaluation for Generative Foundation Models at CVPR 2024, the 1st Workshop on Benchmarking and Expanding AI Multimodal Approaches at CVPR 2025 aims to build a forum to discuss ongoing efforts in industry and academia, share best practices, and engage the community in working towards more comprehensive AI evaluation framework incorporating audio, visual, and textual inputs, addressing the limitations of current unimodal benchmarks.

Workshop Schedule

Opening Remarks

8:30 - 8:40 EDT

Invited Talk 1

TBA
8:40 - 9:15 EDT

Invited Talk 2

TBA
9:15 - 9:50 EDT

Oral Presentation

9:50 - 10:05 EDT

Poster Session, Coffee Break

10:05 - 10:45 EDT

Invited Talk 3

TBA
10:45 - 11:20 EDT

Invited Paper Presentation from the Main Conference

TBA
11:20 - 11:40 EDT

Benchmark Promotion

11:40 - 11:45 EDT

Invited Talk 4

TBA
11:45 - 12:20 EDT

Closing Remarks

12:20 - 12:30 EDT


Invited Speakers

Huaxiu Yao

Assistant Professor at University of North Carolina at Chapel Hill

Andre Araujo

Staff Research Scientist / Tech Lead Manager at Google DeepMind

Sherry Tongshuang Wu

Assistant Professor at Carnegie Mellon University

Katerina Fragkiadaki

JPMorgan Chase Associate Professor at Carnegie Mellon University


Accepted Papers

  • Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models; Andrés Villa, Juan Léon, Alvaro Soto, Bernard Ghanem.

  • Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model; Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen.

  • Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models; Jierun Chen, Fangyun Wei, Jinjing Zhao, Sizhe Song, Bohuai Wu, Zhuoxuan Peng, S.-H. Gary Chan, Hongyang Zhang.

  • TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark; Forouzan Fallah, Maitreya Patel, Agneet Chatterjee, Vlad Morariu , Chitta Baral, Yezhou Yang .

  • Choosing 'Right' from Wrong: A Closer Look at Selection Bias in Spatial Multiple-Choice Questions in Large Multimodal Models; Giselle Zeno, Nour Jedidi, Steven Gomez.

  • Quantum Federated Learning for Multimodal Data: A Modality-Agnostic Approach; Atit Pokharel, Ratun Rahman, Thomas Morris, Dinh C. Nguyen.

  • Revisiting Multi-Modal LLM Evaluation; Jian Lu, Junyu Chen, Robik Shrestha, Manoj Acharya, Kushal Kafle, Christopher Kanan.

  • MerCulture: A Comprehensive Benchmark to Evaluate Vision-Language Models on Cultural Understanding in Singapore; Pranav Tushar, Eshan Pandey, Lyka Diane Bala Austria, Loo Yin Yin, Jing Hao Lim, Indriyati Atmosukarto, Donny Soh Cheng Lock.

  • KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language; Yoonshik Kim, Jaeyoon Jung.


Program Committee

George Z. Wei

Website Chair & Dataset Chair

CMU

Avik Kuthiala

Dataset Chair

Plus


Organizers

Laszlo A. Jeni

CMU (Primary Contact)

Morteza Ziyadi

Amazon

Hao Yang

Amazon

Xu Zhang

Amazon (Primary Contact)

Yang Zou

Amazon

Zhaowei Cai

Amazon

Maria Zontak

Amazon

Davide Modolo

Amazon

Ashwin Swaminathan

Amazon

Liuyue Xie

CMU

Mosam Dabhi

CMU

Xiang Yue

CMU

Ce Zheng

CMU

Rohan Choudhury

CMU

Ananya Bal

CMU

© 2025 BEAM2025

We thank Jalpc for the jekyll template