LAVA
ACCV Workshop on
Large Vision – Language Model Learning and Applications
December 8-12, 2024
Hanoi, Vietnam
WorkshopAccepted PapersCall for PapersChallengeScheduleSpeakersOrganizers

There has been significant advancement in AI and machine learning, particularly with the development of large language models like GPT recently. These models have demonstrated their prowess in harnessing extensive text data to achieve remarkable performance across various natural language understanding tasks. As a result, there is a growing interest in expanding these capabilities to encompass additional modalities such as images, audios, and videos, giving rise to large vision-language models (LVLMs) like GPT-V. LAVA workshop aims to unleash the full potential of research in large vision-language models (LVLMs) by emphasizing the convergence of diverse modalities, including text, images, and video. Furthermore, the workshop provides a platform for delving into the practical applications of LVLMs across a broad spectrum of domains, such as healthcare, education, entertainment, transportation, finance, etc.

Accepted Papers
Workshop track
  1. Duc-Tuan Luu, Viet-Tuan Le, Duc Minh Vo: Questioning, Answering, and Captioning for Zero-Shot Detailed Image Caption
  2. Rento Yamaguchi, Kenji Yanai: Exploring Cross-Attention Maps in Multi-modal Diffusion Transformers for Training-Free Semantic Segmentation
  3. Viet-Tham Huynh, Trong-Thuan Nguyen, Thao Thi Phuong Dao, Tam V. Nguyen, Minh-Triet Tran: DermAI: A Chatbot Assistant for Skin lesion Diagnosis Using Vision and Large Language Models
  4. Felix Hsieh, Huy Hong Nguyen, April Pyone Maung Maung, Dmitrii Usynin, Isao Echizen: Mitigating Backdoor Attacks using Activation-Guided Model Editing
Challenge track
  1. (1st prize) Thanh-Son Nguyen, Viet-Tham Huynh, Van-Loc Nguyen, Minh-Triet Tran: An Approach to Complex Visual Data Interpretation with Vision-Language Models
  2. (2nd prize) Gia-Nghia Tran, Duc-Tuan Luu, Dang-Van Thin: Exploring Visual Multiple-Choice Question Answering with Pre-trained Vision-Language Models
  3. (3rd prize) Trong Hieu Nguyen Mau, Binh Truc Nhu Nguyen, Vinh Nhu Hoang, Minh-Triet Tran, Hai-Dang Nguyen: Enhancing Visual Question Answering with Pre-trained Vision-Language Models: An Ensemble Approach at the LAVA Challenge 2024
Call for Papers

We welcome people to submit papers about large vision-language models (LVLMs) to LAVA workshop. Accepted papers will be presented in our workshop and will be published in the ACCV workshop proceeding. We accept short papers (non-archived) which are up to 7 pages in ACCV format, excluding references; and long papers (archived) which are up to 14 pages in ACCV format, excluding references. Submission policies adhere to the ACCV submission policies.

The topics in this workshop will include but are not limited to:

  • Data preprocessing and prompt engineering in LVLMs
  • Training/Compressing LVLMs
  • Self-supervised and/or unsupervised, few-/zero-shot learning in LVLMs
  • Generative AI
  • Trust-worthy/Explainable LVLMs learning
  • Security and privacy in LVLMs
  • LVLMs evaluation and benchmarking
  • LVLMs for downstream tasks
  • LVLMs in virtual reality, mixed reality
  • Applications of LVLMs
  • LVLMs and other modalities
  • LVLMs for low resources
Submit your paper here.
LAVA Challenge
Challenge Overview: The primary goal of this challenge is to advance the capability of Large Vision-Language Models to accurately interpret and understand complex visual data such as Data Flow Diagrams (DFDs), Class Diagrams, Gantt Charts, and Building Design Drawings. We invite AI researchers, data scientists, and practitioners with interest and experience in natural language processing, computer vision, and multimodal learning to join this workshop challenge. Participants can register as individuals or teams. They are required to develop a model that can answer questions related to the input data. Below, we provide a few samples.

Datasets:

  • Public dataset: We will release our dataset collected from the internet. It contains about 3,000 samples.
  • Private dataset: The TASUKI team (SoftBank) provides this dataset. It contains about 1,100 samples.
The data contain English and Japanese texts. You can download the data here. Please carefully read the Terms and Conditions for further information about the license, data, and submission instructions.
Participants are required to submit the results to our system to get the scores.


Register your team information here. We will send the link to download the dataset to registered participants.

Metric:
We will evaluate using MMMU.
Final score = 0.3 * Public dataset + 0.7 * Private dataset
Results
Team NamePublic scorePrivare scoreFinal score
WAS0.850.850.85
MMLAB-UIT0.830.840.84
V1olet0.820.820.82

Prizes and Travel Grants:Travel grants are available for winning teams (one per team). Prizes will be announced later.
Computational Resources:Participants from the University of Tokyo may use SoftBank Beyond AI SANDBOX GPUs.

Submit your report here.
Submit your results here
Important Dates
  • Challenge track opened: 2024/8/15
  • Test set released: 2024/8/30
  • Challenge track closed: 2024/9/30
  • Regular paper submission deadline: 2024/9/30
  • Challenge track paper submission deadline: 2024/10/15
  • Acceptance notification: 2024/10/30 2024/10/18
  • Camera-ready deadline: 2024/11/15 2024/10/20
  • Workshop date: 2024/12/8 (Afternoon)
Workshop Schedule
  • 13:30 - Opening Remark
  • 13:40 - Keynote Talk 1
  • 14:40 - Poster session + Coffee break
  • 15:50 - Keynote Talk 2
  • 16:40 - Challenge Award
  • 16:55 - Closing Remark
Speakers
Asim Munawar
Asim Munawar
IBM Research, US
April Pyone Maung Maung
April Pyone Maung Maung
National Institute of Informatics, Japan
Organizers
Duc Minh Vo
Duc Minh Vo
University of Tokyo, Japan
Huy H. Nguyen
Huy H. Nguyen
National Institute of Informatics, Japan
Trung-Nghia Le
Trung-Nghia Le
University of Science, Vietnam
Akihiro Sugimoto
Akihiro Sugimoto
National Institute of Informatics, Japan
Hideki Nakayama
Hideki Nakayama
University of Tokyo, Japan
Minh-Triet Tran
Minh-Triet Tran
University of Science, Vietnam
Khan Md Anwarus Salam
Khan Md Anwarus Salam
SoftBank, Japan

Contact: lava-workshop(at)googlegroups.com
Technical Supporters
Duy-Nam Ly
Duy-Nam Ly
University of Science, Vietnam
Trong-Le Do
Trong-Le Do
University of Science, Vietnam
Daichi Sato
Daichi Sato
The University of Tokyo, Japan
TASUKI Team
TASUKI Team
SoftBank, Japan

Workshop Sponsors
BeyondAI