ASRU2023 ML-SUPERB Challenge

Evaluation Framework: The evaluation framework of ASRU2023 ML-SUPERB challenge.
Upstream Specification: The Upstream Specification about data, programming language, and interface.

Abstract

This challenge

Benchmarks generalizability of Self-Supervised Learning (SSL) speech model in multilingual scenarios.
- A new evaluation framework that combines S3PRL and ESPnet
Focuses on four tasks
- Monolingual: ASR
- Multilingual: ASR, LID, ASR + LID
Has three tracks
- ML-SUPERB Track:
  - Two leaderboards for submissions
    - Public-set: Participants will use publicly available datasets and submit their prediction files.
    - Hidden-set: Participants will use data collected specifically for the new-language track and test their models on this unseen data.
- New-language Track: Participants can submit additional ML-SUPERB-style data to the benchmark, facilitating the evaluation of SSL models on new languages.
- Research Paper Track: Participants are encouraged to submit research papers that utilize the ML-SUPERB evaluation framework for their experiments and analysis.

Resources

ML-SUPERB evaluation framework
ML-SUPERB public-set (Google Drive Huggingface(Coming soon))
List of open-sourced speech SSL
- S3PRL List
Suggested speech SSL for the New-Language track (the same as the ML-SUPERB paper)
- wav2vec2-base https://s3prl.github.io/s3prl/tutorial/upstream_collection.html#wav2vec2
- wav2vec2-large https://s3prl.github.io/s3prl/tutorial/upstream_collection.html#wav2vec2-large-ll60k
- robust-wav2vec2-large https://s3prl.github.io/s3prl/tutorial/upstream_collection.html#wav2vec2-large-lv60-cv-swbd-fsh
- wav2vec2-base-23 https://huggingface.co/facebook/wav2vec2-base-100k-voxpopuli
- wav2vec2-large-23 https://huggingface.co/facebook/wav2vec2-large-100k-voxpopuli
- XLSR-53 https://s3prl.github.io/s3prl/tutorial/upstream_collection.html#xlsr-53
- XLSR-128 https://s3prl.github.io/s3prl/tutorial/upstream_collection.html#xls-r-300m
- HuBERT-base https://s3prl.github.io/s3prl/tutorial/upstream_collection.html#id15
- Hubert-large https://s3prl.github.io/s3prl/tutorial/upstream_collection.html#hubert-large-ll60k
- HuBERT-base-cmn https://huggingface.co/TencentGameMate/chinese-hubert-base
- HuBERT-large-cmn https://huggingface.co/TencentGameMate/chinese-hubert-large
- mHuBERT-base https://s3prl.github.io/s3prl/tutorial/upstream_collection.html#mhubert-base-vp-en-es-fr-it3

Track1: ML-SUPERB Track

Track Information

In The ML-SUPERB track of the challenge is designed for participants to submit their self-supervised models to the ML-SUPERB benchmark.

We offer two leaderboards for submission:

Pubic-set
- Brief information: The public dataset is publicly available, and participants are required to submit prediction files for public-benchmark submissions.
- Evaluation specifics:
  - Fixed downstream model architectures (CTC-based transformer network).
  - Frozen upstream SSL model.
  - No limitations on downstream training except for the above constraints (e.g., hyperparameters, optimizers, etc., are free to choose).
Hidden-set: Data for this set will be collected by the New-Language track and will be used to test the candidate models.
- The organizer will utilize different learning rates for each downstream task training while keeping all other hyperparameters fixed. The best results will be presented on the leaderboard.

Importat note

The primary goal of this challenge is to encourage innovation and foster a less competitive environment.

Participants have the freedom to choose whether they want their submissions to appear on the leaderboard. By default, submissions will not be displayed on the leaderboard.
As the focus is on extending the SSL model to multilingual scenarios, continual pre-training from existing self-supervised models on speech (or other modalities) is also encouraged.
In addition to model accuracy, metrics such as the number of parameters and operations will be considered to capture the computation efficiency of the proposed approaches. Algorithmic improvements from diverse perspectives are highly encouraged.

Submission to Track1

Participants are required to submit an upstream model to the hidden-set benchmark.
(Optional but strongly recommended) Participants are encouraged to submit a prediction file to the public-set benchmark. If not submitted, the submitted upstream model for the hidden-set will be used by the organizer to evaluate on the public-set benchmark.
(Optional) System description paper
- To verify that the submitted upstream model follows the challenge policy, participants are suggested to submit a system description paper in the ASRU submission format without a page limit. The paper should describe the method used for their submissions and should include the following information at minimum:
  - SSL ojbectives
  - Model architeture
  - Pre-training data
  - Parameter size for each submission.
- Submission Form

Track2: New-langauge Track

Track Information

To enhance multilingual research, we have introduced a new-language track that allows participants to submit their own data sources as part of the benchmark.

What data would be suitable for submission?
- There are generally no specific requirements for data submissions to this track as long as it is legally permissible for use in the challenge.
- We encourage submissions of data from either different languages, recording conditions, or speech types (e.g., read speech, conversational speech, lecture speech, etc.).
- Even if the language is already covered by the current ML-SUPERB public-set data, we still encourage submissions if the data has unique properties.

Two Options for Data Submission

We understand the sensitivity and potential legal issues surrounding some data when sharing it publicly. Therefore, we offer two options for data submission:

Internal hidden-set evaluation only:
- By selecting this option, the submitted data will be exclusively used by the organizers for the hidden-set evaluation of the challenge and will not be included in future versions of ML-SUPERB as a public data source. If any legal agreements or signatures are required, please contact the organizers directly.
Future version of ML-SUPERB:
- By selecting this option, the submitted data will not only be used for the hidden-set evaluation of this challenge but also considered for inclusion in future versions of the ML-SUPERB public benchmark.
- Please note that this option requires data holders' approvals for a free license for derivations and commercial use. We require permission for derivations to use the data in creating the future version of ML-SUPERB's public-set, and commercial permission to encourage participation from industry.

ML-SUPERB style data

The file structure are expected to be a zipped file named LanguageSubmission_[YourTeamName].zip as:

- ML_SUPERB
    - [Data_Source_Name1]
        - [Lang_ID1]
            - transcript_10min_dev.txt
            - transcript_10min_test.txt
            - transcript_10min_train.txt
            - transcript_1h_train.txt
            - wav
                - [Data_Source_Name1]_[Lang_ID1]_000001.wav
                - [Data_Source_Name1]_[Lang_ID1]_000002.wav
                - ...
        - [Lang_ID2]
            - transcript_10min_dev.txt
            - transcript_10min_test.txt
            - transcript_10min_train.txt
            - transcript_1h_train.txt
            - wav
                - [Data_Source_Name1]_[Lang_ID2]_000001.wav
                - [Data_Source_Name1]_[Lang_ID2]_000002.wav
                - ...
        - ...

The transcript_{10min, 1h}_{train, dev, test}.txt should in a format as (split the field by <\t>):

[Data_Source_Name]_[Lang_ID]_000001 [Original_File(if have)]   [Transcript]

For example:

LAD_eng_000257  rbu_z0001_371   Some of the countries have surveys for multiple years.

The audio file should be in
- 16-bit Signed Integer PCM (WAV) format
- Single Channel
- 16000 sampling rate

Submission to Track2

The data submission should adhere to the current ML-SUPERB style of data, including:
- Three 10-minute sets (for train/dev/test) and a 1-hour set (for train). There should be no overlap across different sets.
- The transcription for train/dev/test should not overlap.
- We understand that special treatment may be required for certain transcriptions. If there are any specific considerations, please communicate them via email. Otherwise, we will follow the default processing scripts provided in the codebase. For example, for Mandarin and Japanese, we may prefer to use phonemized transcriptions as it may not be possible to cover all symbolic characters in the train set.
- Instead of directly sending the data to the organizers, we request participants to submit the data with a link for downloading (e.g., huggingface, Zenodo, Google Drive, Dropbox, or a direct URL compatible with wget). Please ensure that the data is accompanied by the appropriate license for its use in this challenge.
Data description paper
- The submission to the new-language track requires a data description paper in ASRU format that:
  - Describes the language used in the submitted data.
  - Provides details about how the data was collected.
  - Conducts an investigation of the submitted data in the ML-SUPERB monolingual ASR task by testing suggested (or more) SSL models from the resources list.
Submission Form

Track3: Research Paper Track

Track Information

The Research Paper Track is dedicated to accepting research papers that utilize the ML-SUPERB evaluation framework.

Submission to Track3

Participants are required to submit a research paper following the guidelines specified by ASRU. The submission should demonstrate the use of the ML-SUPERB evaluation framework within the paper.

Organizers

Shinji Watanabe (CMU)

Jiatong Shi (CMU)

William Chen (CMU)

Dan Berrebbi (CMU)

Hung-yi Lee (NTU)

Shang-Wen Li (Meta)

Abdelrahman Mohamed (Rembrand)

Hidden-set Committee

Jiatong Shi

William Chen

Dan Berrebbi

Contact

superb.announcement@gmail.com

ASRU2023 ML-SUPERB Challenge

Contents

Abstract

Resources

Track1: ML-SUPERB Track

Track Information

Importat note

Submission to Track1

Track2: New-langauge Track

Track Information

Two Options for Data Submission

ML-SUPERB style data

Submission to Track2

Track3: Research Paper Track

Track Information

Submission to Track3

Organizers

Hidden-set Committee

Contact