ASRU2023 ML-SUPERB Challenge
Contents
Abstract
This challenge
- Benchmarks generalizability of Self-Supervised Learning (SSL) speech model in multilingual scenarios.
- A new evaluation framework that combines S3PRL and ESPnet
- Focuses on four tasks
- Monolingual: ASR
- Multilingual: ASR, LID, ASR + LID
- Has three tracks
- ML-SUPERB Track:
- Two leaderboards for submissions
- Public-set: Participants will use publicly available datasets and submit their prediction files.
- Hidden-set: Participants will use data collected specifically for the new-language track and test their models on this unseen data.
- New-language Track: Participants can submit additional ML-SUPERB-style data to the benchmark, facilitating the evaluation of SSL models on new languages.
- Research Paper Track: Participants are encouraged to submit research papers that utilize the ML-SUPERB evaluation framework for their experiments and analysis.
Resources
Track1: ML-SUPERB Track
In The ML-SUPERB track of the challenge is designed for participants to submit their self-supervised models to the ML-SUPERB benchmark.
We offer two leaderboards for submission:
- Pubic-set
- Brief information: The public dataset is publicly available, and participants are required to submit prediction files for public-benchmark submissions.
- Evaluation specifics:
- Fixed downstream model architectures (CTC-based transformer network).
- Frozen upstream SSL model.
- No limitations on downstream training except for the above constraints (e.g., hyperparameters, optimizers, etc., are free to choose).
- Hidden-set: Data for this set will be collected by the New-Language track and will be used to test the candidate models.
- The organizer will utilize different learning rates for each downstream task training while keeping all other hyperparameters fixed. The best results will be presented on the leaderboard.
Importat note
The primary goal of this challenge is to encourage innovation and foster a less competitive environment.
- Participants have the freedom to choose whether they want their submissions to appear on the leaderboard. By default, submissions will not be displayed on the leaderboard.
- As the focus is on extending the SSL model to multilingual scenarios, continual pre-training from existing self-supervised models on speech (or other modalities) is also encouraged.
- In addition to model accuracy, metrics such as the number of parameters and operations will be considered to capture the computation efficiency of the proposed approaches. Algorithmic improvements from diverse perspectives are highly encouraged.
Submission to Track1
- Participants are required to submit an upstream model to the hidden-set benchmark.
- (Optional but strongly recommended) Participants are encouraged to submit a prediction file to the public-set benchmark. If not submitted, the submitted upstream model for the hidden-set will be used by the organizer to evaluate on the public-set benchmark.
- (Optional) System description paper
- To verify that the submitted upstream model follows the challenge policy, participants are suggested to submit a system description paper in the ASRU submission format without a page limit. The paper should describe the method used for their submissions and should include the following information at minimum:
- SSL ojbectives
- Model architeture
- Pre-training data
- Parameter size for each submission.
-
Track2: New-langauge Track
To enhance multilingual research, we have introduced a new-language track that allows participants to submit their own data sources as part of the benchmark.
- What data would be suitable for submission?
- There are generally no specific requirements for data submissions to this track as long as it is legally permissible for use in the challenge.
- We encourage submissions of data from either different languages, recording conditions, or speech types (e.g., read speech, conversational speech, lecture speech, etc.).
- Even if the language is already covered by the current ML-SUPERB public-set data, we still encourage submissions if the data has unique properties.
Two Options for Data Submission
We understand the sensitivity and potential legal issues surrounding some data when sharing it publicly. Therefore, we offer two options for data submission:
- Internal hidden-set evaluation only:
- By selecting this option, the submitted data will be exclusively used by the organizers for the hidden-set evaluation of the challenge and will not be included in future versions of ML-SUPERB as a public data source. If any legal agreements or signatures are required, please contact the organizers directly.
- Future version of ML-SUPERB:
- By selecting this option, the submitted data will not only be used for the hidden-set evaluation of this challenge but also considered for inclusion in future versions of the ML-SUPERB public benchmark.
- Please note that this option requires data holders' approvals for a free license for derivations and commercial use. We require permission for derivations to use the data in creating the future version of ML-SUPERB's public-set, and commercial permission to encourage participation from industry.
ML-SUPERB style data
- The file structure are expected to be a zipped file named
LanguageSubmission_[YourTeamName].zip
as:
- ML_SUPERB
- [Data_Source_Name1]
- [Lang_ID1]
- transcript_10min_dev.txt
- transcript_10min_test.txt
- transcript_10min_train.txt
- transcript_1h_train.txt
- wav
- [Data_Source_Name1]_[Lang_ID1]_000001.wav
- [Data_Source_Name1]_[Lang_ID1]_000002.wav
- ...
- [Lang_ID2]
- transcript_10min_dev.txt
- transcript_10min_test.txt
- transcript_10min_train.txt
- transcript_1h_train.txt
- wav
- [Data_Source_Name1]_[Lang_ID2]_000001.wav
- [Data_Source_Name1]_[Lang_ID2]_000002.wav
- ...
- ...
- The
transcript_{10min, 1h}_{train, dev, test}.txt
should in a format as (split the field by <\t>):
[Data_Source_Name]_[Lang_ID]_000001 [Original_File(if have)] [Transcript]
For example:
LAD_eng_000257 rbu_z0001_371 Some of the countries have surveys for multiple years.
- The audio file should be in
- 16-bit Signed Integer PCM (WAV) format
- Single Channel
- 16000 sampling rate
Submission to Track2
- The data submission should adhere to the current ML-SUPERB style of data, including:
- Three 10-minute sets (for train/dev/test) and a 1-hour set (for train). There should be no overlap across different sets.
- The transcription for train/dev/test should not overlap.
- We understand that special treatment may be required for certain transcriptions. If there are any specific considerations, please communicate them via email. Otherwise, we will follow the default processing scripts provided in the codebase. For example, for Mandarin and Japanese, we may prefer to use phonemized transcriptions as it may not be possible to cover all symbolic characters in the train set.
- Instead of directly sending the data to the organizers, we request participants to submit the data with a link for downloading (e.g., huggingface, Zenodo, Google Drive, Dropbox, or a direct URL compatible with wget). Please ensure that the data is accompanied by the appropriate license for its use in this challenge.
- Data description paper
- The submission to the new-language track requires a data description paper in ASRU format that:
- Describes the language used in the submitted data.
- Provides details about how the data was collected.
- Conducts an investigation of the submitted data in the ML-SUPERB monolingual ASR task by testing suggested (or more) SSL models from the resources list.
- Submission Form
Track3: Research Paper Track
The Research Paper Track is dedicated to accepting research papers that utilize the ML-SUPERB evaluation framework.
Submission to Track3
Participants are required to submit a research paper following the guidelines specified by ASRU. The submission should demonstrate the use of the ML-SUPERB evaluation framework within the paper.
Organizers
Shinji Watanabe (CMU)
Jiatong Shi (CMU)
William Chen (CMU)
Dan Berrebbi (CMU)
Hung-yi Lee (NTU)
Shang-Wen Li (Meta)
Abdelrahman Mohamed (Rembrand)
Hidden-set Committee
Jiatong Shi
William Chen
Dan Berrebbi
superb.announcement@gmail.com