LAVA
ACM MM 2026 Grand Challenge on
Large Vision – Language Model Learning and Applications
LAVA Challenge 2026 — Rio de Janeiro, Brazil  ·  10–14 November 2026
HomeWorkshop ArchiveChallenge Archive
Overview

We are pleased to announce the 3rd LAVA Grand Challenge, to be held in conjunction with ACM Multimedia 2026. Building on the success of our previous challenges in 2024 and 2025, this year's edition introduces two major extensions:

  • Multilingual Expansion: While previous editions focused on Japanese, the 2026 challenge expands coverage to a broader set of languages, with a particular emphasis on low-resource and underrepresented languages.
  • Evidence-Grounded Answering: In addition to selecting the correct answer, participants are now required to provide evidence for their answer — such as the page number(s) where the supporting information can be found. This reflects the growing real-world demand for AI systems that can not only answer questions but also justify their responses with traceable references.

The challenge targets the document understanding capabilities of Vision-Language Models (VLMs) on multilingual PDF documents and invites researchers, engineers, and practitioners worldwide to participate.

Register your team for LAVA Challenge 2026 here!

qrcode
Background

In recent years, the rapid advancement of Large Language Models (LLMs) has significantly expanded the range of natural language processing applications across daily life and business contexts. In particular, the emergence of Vision-Language Models (VLMs), which jointly process visual and textual inputs, has enabled systems to tackle more complex multimodal tasks.

However, processing and understanding practical documents such as PDF files remains a challenging task. Their structural complexity continues to hinder widespread adoption in real-world business settings. PDF documents present several key challenges:

  • Document Length: PDF files can span tens to hundreds of pages, making it difficult to extract relevant information across long contexts and synthesize correct answers.
  • Complex Layouts: Unlike plain text, PDF documents contain a rich variety of content, including tables, charts, and diagrams (e.g., line graphs, maps), which are often arranged in non-linear and complex layouts. Standard OCR-based approaches and text-only LLMs struggle to effectively process such visual structures. In many cases, multiple elements within the document must be referenced simultaneously, requiring sophisticated document-level understanding.

To address these challenges, specialized VLMs for document understanding — such as mPLUG-DocOwl — have recently been proposed. Moreover, general-purpose VLMs developed by organizations like OpenAI and Google are beginning to demonstrate improved performance on document images in addition to natural scenes, indicating growing versatility.

Nevertheless, these advancements are largely driven by the availability of English-language datasets. For Japanese and other non-English languages, usable and openly licensed PDF or document image datasets are still scarce, which contributes to the relatively low performance of VLMs in these languages. Specifically for Japanese, the only known VQA dataset for document images is JDocQA.

The 2026 challenge directly addresses this gap by extending the scope from Japanese to a broader range of languages, and by requiring models to not only answer questions correctly but also ground their answers with explicit evidence references — a capability increasingly demanded in enterprise and government applications.

Task Details

🔔 Details of the task will be announced at a later date. Please check back for updates or register your interest below.

📄 Publication Opportunity

  • 🏆The top 3 solutions will be invited to submit a paper to the Grand Challenge track — your chance to have your winning method published!
  • 📝Paper length: 6 pages + up to 2 additional pages for references only.
  • 📚Accepted Grand Challenge papers will be included in the ACM MM 2026 main conference proceedings.
  • 🎟️At least one main-conference full registration is required per accepted paper.
Important Dates

🔔 Specific dates will be announced at a later date. Please check back for updates or register your team in advance.

EventDate
Registration opensTBA
Dataset releaseTBA
Registration deadlineTBA
Results & report submission deadlineTBA
Paper submission deadline (by invitation)TBA
Notification of resultsTBA
Camera-ready submissionTBA
Grand Challenge at ACM MM 2026TBA
Organizers
Duc Minh Vo
Duc Minh Vo
SB Intuitions, Japan
Akihiro Sugimoto
Akihiro Sugimoto
National Institute of Informatics, Japan
Hideki Nakayama
Hideki Nakayama
University of Tokyo, Japan
Khan Md Anwarus Salam
Khan Md Anwarus Salam
SoftBank, Japan
Daichi Sato
Daichi Sato
University of Tokyo, Japan
Takara Taniguchi
Takara Taniguchi
University of Tokyo, Japan
Kaito Baba
Kaito Baba
University of Tokyo, Japan

Contact: lava-workshop(at)googlegroups.com