The competition is now live!
The LAVA Challenge 2026 Kaggle page is officially open. Join now and compete!
Overview
We are pleased to announce the 3rd LAVA Grand Challenge, to be held in conjunction with ACM Multimedia 2026. Building on the success of our previous challenges in 2024 and 2025, this year's edition introduces two major extensions:
- Multilingual Expansion: While previous editions focused on Japanese, the 2026 challenge expands coverage to a broader set of languages, with a particular emphasis on low-resource and underrepresented languages.
- Evidence-Grounded Answering: In addition to selecting the correct answer, participants are now required to provide evidence for their answer — such as the page number(s) where the supporting information can be found. This reflects the growing real-world demand for AI systems that can not only answer questions but also justify their responses with traceable references.
The challenge targets the document understanding capabilities of Vision-Language Models (VLMs) on multilingual PDF documents and invites researchers, engineers, and practitioners worldwide to participate.
Task Details
This competition is a multilingual Document Visual Question Answering (Document VQA) task with evidence grounding. Given a PDF document and a question about its content, participants must:
- Answer the question by reading and understanding the document.
- Ground the answer — identify the page(s) of the PDF that contain the evidence needed to answer the question.
Each question requires reading one or more pages of the PDF and interpreting a variety of elements such as text paragraphs, tables, figures, and photographs. The dataset contains questions in Japanese and Vietnamese, reflecting the multilingual focus of this challenge.
Document VQA is a challenging task because it demands both visual understanding (interpreting the layout and structure of a rendered page) and language understanding (comprehending the question and formulating a correct answer). The evidence grounding requirement adds a further layer of difficulty: models must not only produce a correct answer but also justify it by pinpointing the exact page(s) from which the answer is derived.
Participants are encouraged to develop and evaluate Vision-Language Models (VLMs) or multimodal pipelines capable of handling multilingual, multi-page PDF documents in an open-ended question answering setting.
📄 Publication Opportunity
- 🏆The top 3 solutions will be invited to submit a paper to the Grand Challenge track — your chance to have your winning method published!
- 📝Paper length: 6 pages + up to 2 additional pages for references only.
- 📚Accepted Grand Challenge papers will be included in the ACM MM 2026 main conference proceedings.
- 🎟️At least one main-conference full registration is required per accepted paper.
- 🎁Tentative prize: The top 3 winning teams will each receive a conference fee waiver (one per team).
Evaluation Criterion
Each question is evaluated on two aspects: answer correctness (VQA Score) and evidence grounding (Grounding Score). The final score is the average of these two.
1VQA Score
Answer correctness is evaluated using LLM-as-a-Judge (Gemma-3 1B), which determines whether a predicted answer is semantically equivalent to the ground truth. This approach tolerates minor variations in phrasing, formatting, and representation.
2Grounding Score
The predicted and ground truth evidence page numbers are compared as sets. The score is computed as:
3Overall Score
Rules
1Use of Open Models and Data
Participants must use publicly available (open) models and datasets only.
If you create a new dataset specifically for this competition, you are required to:
- Publish it as a Kaggle Dataset.
- Explicitly announce its existence in the competition Discussion (Issues) tab, so all participants have equal access.
2Inference Environment Constraints
⚙️ Requirement
Participants must ensure that their inference pipeline completes within 2 hours on a single A100 GPU (40 GB VRAM).
There is no restriction on training — you may use any hardware and time budget for training. The constraint applies to inference only.
Background & Rationale
Ideally, this competition would be hosted as a Code Competition to enforce a unified inference environment for all participants. However, due to Kaggle platform limitations, Code Competitions cannot be held under Community Competitions. As an alternative, we are standardizing the environment by specifying the above inference constraint.
The A100 (40 GB) constraint is based on the hardware the organizers will use to verify submitted code. We understand that not all participants own an A100, but since GPU performance varies significantly by generation, we had no choice but to set the constraint based on the organizer's verification environment.
💡 If You Do Not Have an A100
- Look up the approximate performance ratio between your GPU and an A100.
- Estimate the inference time budget for your GPU accordingly (e.g., if your GPU is roughly half as fast, aim for ~1 hour of inference time).
The 40 GB VRAM limit was chosen because it is neither too tight nor too loose — it should be achievable for most modern large models without requiring extreme optimization.
3Code Submission for Top Finishers
After the competition ends, top-ranked participants are required to submit their code to the organizers for reproducibility verification.
To ensure reproducibility, please follow these practices:
- Set random seeds for all stochastic operations (model initialization, data shuffling, sampling, etc.).
- Use Docker to containerize your environment. You will be asked to submit a Dockerfile along with your code.
4Dataset Licenses
The dataset used in this competition contains a mix of Japanese and Vietnamese text.
🇯🇵 Japanese Data
The Japanese PDF annotation data is released under the CC BY 4.0 license.
🇻🇳 Vietnamese Data
The Vietnamese data is primarily sourced from Viet Nam Government News, the Viet Nam Government Portal, Vietnam News Agency, and other copyrighted sources. All content remains fully protected under copyright law.
Participants may freely access, view, cite, download, and print the materials for reference purposes. However, altering or modifying any content or images in any form is strictly prohibited. If you republish or redistribute any information, you must clearly attribute the original source (e.g., "Government Portal", "Viet Nam Government News", or link to www.chinhphu.vn).
© Viet Nam Government Portal. All rights reserved.
© Viet Nam Government News – Viet Nam Government Portal. All rights reserved.
© Vietnam News Agency. All rights reserved.
Important Dates
| Event | Date |
|---|---|
| Dataset release on Kaggle page | 2026/4/22(passed) |
| Challenge closed | 2026/5/31 |
| Results, report, docker container submission deadline | 2026/6/7 |
| Paper submission deadline | 2026/6/25 |
| Notification of results | 2026/7/16 |
| Camera-ready submission | 2026/8/6 |
| Grand Challenge at ACM MM 2026 | TBD |
Presentation Policy
⚠️ On-site Attendance Required
ACM Multimedia 2026 is an on-site event only. This means that all papers and contributions must be presented by a physical person on-site; remote presentations will not be hosted or allowed. Papers and contributions not presented on-site will be considered a no-show and removed from the proceedings of the conference. More details will be provided to handle unfortunate situations in which none of the authors would be able to attend the conference physically.
Organizers







Contact: lava-workshop(at)googlegroups.com