VLUE
VLUE: A Multi-Task Multi-Dimension Benchmarkfor Evaluating Vision-Language Pre-training
Wangchunshu Zhou, Yan Zeng, Shizhe Diao, Xinsong Zhang
Task | TR | IR | NLVR2 | VQA | VG | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Model |
R@1 | R@5 | R@10 |
R@1 | R@5 | R@10 |
Acc | Acc | Acc |
|
1 Nov. 16, 2021 |
X-VLM ByteDance AI Lab |
64.11 | 88.62 | 94.46 | 51.53 | 79.00 | 86.71 | 74.16 | 51.55 | 55.00 | |
2 Nov. 3, 2021 |
METER UCLA and Microsoft |
62.11 | 87.47 | 92.23 | 45.66 | 75.12 | 85.28 | 73.47 | 53.76 | - | |
3 Jul. 16, 2021 |
ALBEF Salesforce Research |
64.11 | 88.40 | 93.96 | 50.08 | 77.72 | 85.68 | 73.17 | 51.52 | 24.30 | |
4 Feb. 4, 2021 |
VL-T5 UNC Chapel Hill |
- | - | - | - | - | - | 73.84 | 46.31 | 27.89 | |
5 Sep. 25, 2019 |
UNITER Microsoft Dynamics 365 AI Research |
36.32 | 63.81 | 75.13 | 29.73 | 56.00 | 66.93 | 66.60 | 47.79 | 36.86 | |
6 Aug. 20, 2019 |
LXMERT UNC Chapel Hill |
- | - | - | - | - | - | 65.24 | 46.18 | - | |
7 Aug. 6, 2019 |
ViLBERT Georgia Institute of Technology |
- | - | - | 27.51 | 53.07 | 63.87 | 66.53 | 48.38 | 54.91 | |
Submit to this leaderboard: You are welcome to test your vision-language models on the VLUE benchmark! Labeled OOD test sets and evaluation scripts are released in the repo. After obtaining results, you are welcome email VLUE team to get your model included in the VLUE Learderboard. Your email should contain information displayed in the leaderboard (i.e., paper link/description, results on original test sets and OOD test sets.).
@article{zhou2022vlue,
author = {Wangchunshu Zhou and Yan Zeng and Shizhe Diao and Xinsong Zhang},
title = {VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models},
journal = {CoRR},
volume = {abs/2205.15237},
year = {2022},
archivePrefix = {arXiv},
eprint = {2205.15237}
}