LAVA
ACM MM 2025 Workshop and Grand Challenge on
Large Vision – Language Model Learning and Applications
Dublin, Ireland 27 - 31.10.2025
HomeLAVA 2025 (ACMMM 2025)LAVA Challenge (ACMMM 2025)LAVA 2024 (ACCV 2024)
Workshop Schedule
Morning Session
  • --:-- - Opening Remark
  • --:-- - Keynote Talk: Dr. Md. Mamunur Rashid (The King Abdulaziz Center for World Culture - Ithra) Cross-Modal Trust: Evaluating LVLMs for Safeguarding Health Information
  • --:-- - Janak Kapuriya: Enhancing Scientific Visual Question Answering via Vision-Caption aware Supervised Fine-Tuning
  • --:-- - Jiadong Yan (online): Few-shot Anomaly Detection based on Long Short Text Interactive Contrastive Learning
  • --:-- - Tun-Yuan Chang: Harvesting Temporal Correlation in Large Vision-Language Models: Using Pose Estimation as a Case Study
  • --:-- - Nam Nguyen Xuan (online): StructCon-ST: Connectivity-Aware Spatio-Temporal Fine-Grained Image Analysis
  • --:-- - Jun Wan (pre-recorded video): Hierarchical Temporal Views for Policy Optimization in Multimodal Video Reasoning
Afternoon Session
  • --:-- - Keynote Talk: Dr. Seitaro Shinagawa (SB Intuitions, online) Sarashina2-Vision: Toward Vision -- Language Models for Understanding Japanese Figures and Conceptual/Explanatory Diagrams
  • --:-- - Daichi Sato: LAVA Grand Challenge Introduction
  • --:-- - SYSUpporter team: HEAR: A Holistic Extraction and Agentic Reasoning Framework for Document Understanding
  • --:-- - Woof team: AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings
  • --:-- - nsbsk team: Hierarchical Vision-Language Reasoning for Multimodal Multiple-Choice Question Answering
  • --:-- - char team: Two-Stage Approach Using a Pretrained Language Model for Question Answering on Japanese Document Images
Accepted Papers
Workshop Proceedings
  1. Jiadong Yan, Quan Zhang, Yifan Zhou, Tianle Yang, Ke Zhang: Few-shot Anomaly Detection based on Long Short Text Interactive Contrastive Learning
  2. Anwar Dilawar Shaikh, Janak Kapuriya, Arnav Goel, Medha Hira, Apoorv Singh, Jay Saraf, Sanjana Sanjeev, Vaibhav Nauriyal, Avinash Anand, Zhengkui Wang, Rajiv Ratn Shah: Enhancing Scientific Visual Question Answering via Vision-Caption aware Supervised Fine-Tuning
  3. Tun-Yuan Chang, Kenneth Chandra, Cheng-Hsin Hsu: Harvesting Temporal Correlation in Large Vision-Language Models: Using Pose Estimation as a Case Study
Fast Track
  1. Phuoc-Nguyen Bui, Khanh-Binh Nguyen, Hyunseung Choo: Accelerating Conditional Prompt Learning via Masked Image Modeling for Vision-Language Models
Non-archieved Paper
  1. Song-Li Wu: LAMDA: Leveraging Multi-Scale and Dynamic Alignment for Robust Referring Video Object Segmentation
  2. Jun, Kexin Lv, An Guo: Hierarchical Temporal Views for Policy Optimization in Multimodal Video Reasoning
  3. Song-Li Wu: Bridging the Modal Gap: A Targeted Patch Refinement and Residual Preservation Framework for Efficient Referring Expression Segmentation
Call for Papers

We welcome people to submit papers about large vision-language models (LVLMs) to The Second Workshop on Large Vision – Language Model Learning and Applications (LAVA 2025). Accepted papers will be presented in our workshop and will be published in the ACM MM 2025 workshop proceeding. We accept short papers (non-archived) which are up to 4 pages in ACM MM format, excluding references; and long papers (archived) which are up to 8 pages in ACM MM format, excluding references. Submission policies adhere to the ACM MM submission policies.

The topics in this workshop will include but are not limited to:

  • Data preprocessing and prompt engineering in LVLMs
  • Training/Compressing LVLMs
  • Self-supervised and/or unsupervised, few-/zero-shot learning in LVLMs
  • Generative AI
  • Trust-worthy/Explainable LVLMs learning
  • Security and privacy in LVLMs
  • LVLMs evaluation and benchmarking
  • LVLMs for downstream tasks
  • LVLMs in virtual reality, mixed reality
  • Applications of LVLMs
  • LVLMs and other modalities
  • LVLMs for low resources
Submit your paper here.
Important Dates
  • Paper submission deadline: 2025/6/15 2025/7/11 (Extended)
  • ACM MM fast track submision: 2025/7/11
  • For fast track, please include main conference reviews, your response and meta review in your submission.
  • Acceptance notification: 2025/8/1
  • Camera-ready: 2025/8/11
  • Workshop date: 2025/10/27-28
Organizers
Duc Minh Vo
Duc Minh Vo
SB Intuitions, Japan
Huy H. Nguyen
Huy H. Nguyen
SB Intuitions, Japan
Trung-Nghia Le
Trung-Nghia Le
University of Science, Vietnam
Akihiro Sugimoto
Akihiro Sugimoto
National Institute of Informatics, Japan
Hideki Nakayama
Hideki Nakayama
University of Tokyo, Japan
Minh-Triet Tran
Minh-Triet Tran
University of Science, Vietnam
Trung-Hieu Hoang
Trung-Hieu Hoang
University of Illinois Urbana-Champaign, US

Contact: lava-workshop(at)googlegroups.com