There has been significant advancement in AI and machine learning, particularly with the development of large language models like GPT recently. These models have demonstrated their prowess in harnessing extensive text data to achieve remarkable performance across various natural language understanding tasks. As a result, there is a growing interest in expanding these capabilities to encompass additional modalities such as images, audios, and videos, giving rise to large vision-language models (LVLMs) like GPT-V. LAVA workshop aims to unleash the full potential of research in large vision-language models (LVLMs) by emphasizing the convergence of diverse modalities, including text, images, and video. Furthermore, the workshop provides a platform for delving into the practical applications of LVLMs across a broad spectrum of domains, such as healthcare, education, entertainment, transportation, finance, etc.
We welcome people to submit papers about large vision-language models (LVLMs) to LAVA workshop. Accepted papers will be presented in our workshop and will be published in the ACCV workshop proceeding. We accept short papers (non-archived) which are up to 7 pages in ACCV format, excluding references; and long papers (archived) which are up to 14 pages in ACCV format, excluding references. Submission policies adhere to the ACCV submission policies.
The topics in this workshop will include but are not limited to:
- Data preprocessing and prompt engineering in LVLMs
- Training/Compressing LVLMs
- Self-supervised and/or unsupervised, few-/zero-shot learning in LVLMs
- Generative AI
- Trust-worthy/Explainable LVLMs learning
- Security and privacy in LVLMs
- LVLMs evaluation and benchmarking
- LVLMs for downstream tasks
- LVLMs in virtual reality, mixed reality
- Applications of LVLMs
- LVLMs and other modalities
- LVLMs for low resources
Datasets:
- Public dataset: We will release our dataset collected from the internet. It contains about 3,000 samples.
- Private dataset: The TASUKI team (SoftBank) provides this dataset. It contains about 1,100 samples.
Register your team information here. We will send the link to download the dataset to registered participants.
Metric:
We will evaluate using MMMU.
Final score = 0.3 * Public dataset + 0.7 * Private dataset
Prizes and Travel Grants:Travel grants are available for winning teams (one per team). Prizes will be announced later.
Computational Resources:Participants from the University of Tokyo may use SoftBank Beyond AI SANDBOX GPUs.
Submit your report here.
Submit your results here
- Challenge track opened: 2024/8/15
- Test set released: 2024/8/30
- Challenge track closed: 2024/9/30
- Regular paper submission deadline: 2024/9/30
- Challenge track paper submission deadline: 2024/10/15
- Acceptance notification: 2024/10/30
- Camera-ready deadline: 2024/11/15
- Workshop date: TBA
Contact: lava-workshop(at)googlegroups.com