Benchmarks in Natural Language Processing (NLP)

1 minute read


Benchmarks helps to assess the performance of pretrained language models in various tasks. A benchmark usually consists of one or more datasets in each task. Here is the list of benchmarks in NLP.

General NLP benchmarks

BenchmarkTypePaper and and Link
GLUENLUBenchmark for NLU. Paper and Link
XGLUECross-lingual NLUBenchmark for cross-lingual NLU and NLG. Paper and Link.
SuperGLUENLUBenchmark for NLU. Paper and Link.
LinCECode switchingBenchmark for Code switching NLU. Paper and Link.
GENIENLGBenchmark for NLG. Paper and Link.
Long ArenaEfficient TransformersBenchmark for efficient transformers. Paper and Link
GEMNLGBenchmark for Natural Language Generation. Paper and Link.
CodeXGLUECode NLUBenchmark for code intelligence. Paper and Link
GLUECoSCode SwitchingBenchmark for code-switched NLP. Paper and Link
DialoGLUEDialogueBenchmark for Task-Oriented Dialogue. Paper and Link.
XTREMECross-lingual NLUBenchmark for cross-lingual NLU. Paper and Link.

Language-specific NLP benchmarks

BenchmarkCategoryPaper and Link
RussianSuperGLUERussian NLUBenchmark for Russian NLU. Paper and Link.
IndicGLUEIndian NLUBenchmark for Indian NLU. Paper and Link.
CLUEChinese NLUBenchamark for Chinese NLU. Paper and Link.
IndoNLUIndonesian NLUBenchmark for Indonesian NLU. Paper and Link.

Domain-specific NLP benchmarks

BenchmarkCategoryPaper and Link
BLUEBiomedical NLUBenchmark for biomedical NLU. Paper and
BLURBBiomedical NLUBenchmark for biomedical NLU. Paper and Link.
ChineseBLUEChinese Biomedical NLUBenchamark for Chinese biomedical NLU. Paper and Link.
PharmKGBiomedical knowledge graphBenchmark for biomedical knowledge graph. Paper and Link.