2nd Edition of the Spoken CALL Shared Task

About the task

The Spoken CALL Shared Task is an initiative to create an open challenge dataset for speech-enabled CALL systems, jointly organised by the University of Geneva, the University of Birmingham, Radboud University and the University of Cambridge. The task is based on data collected from a speech-enabled online tool which has been used to help young Swiss German teens practise skills in English conversation. Items are prompt-response pairs, where the prompt is a piece of German text and the response is a recorded English audio file. The task is to label pairs as “accept” or “reject”, accepting responses which are grammatically and linguistically correct to match a set of hidden gold standard answers as closely as possible. Resources are provided so that a scratch system can be constructed with a minimal investment of effort, and in particular without necessarily using a speech recogniser.

The first edition of the task was announced at LREC 2016, with training data released in July 2016 and test data in March 2017, and attracted 20 entries from 9 groups. Results, including seven papers, were presented at the SLaTE workshop in August 2017. Full details, including links to resources, results and papers, can be found on the Shared Task home page.

Following the success of the original task, we are organising a second edition. We will approximately double the amount of training data, provide new test data, and release improved versions of the accompanying resources. In particular, we will make generally available the open source Kaldi recogniser developed by the University of Birmingham, which achieved the best performance on the original task, together with versions of the training and test data pre-processed through this recogniser. Results will be presented in a special session at Interspeech 2018.

The task is now closed. The annotated test data is now available in the downloads tab and the anonymised task results in the results tab.

The papers published at Interspeech 2018 are now available in the Interspeech 2018 papers tab

Schedule

~~late Oct, 2017~~

~~Release new resources and updated version of original training data~~

~~7 Feb 2018~~

~~Release test data~~

~~14 Feb 2018~~

~~Deadline for submission of entries~~

~~16 March 2018~~

~~Interspeech abstract submission deadline~~

~~23 Mar 2018~~

~~Interspeech submission deadline~~

2-6 Sep 2018

Presentation and discussion of results at Interspeech special session 10

Subscribe to the Newsletter

Your email address will never be shared with any 3rd parties and we will use it only to keep you updated on shared task resources, notifications and deadlines.

Shared task papers published at Interspeech 2018

Overview paper

Overview of the 2018 Spoken CALL Shared Task. Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik, Xizi Wei

Submitted systems

The CSU-K Rule-Based System for the 2nd Edition Spoken CALL Shared Task. Dominik Jülg, Mario Kunstek, Cem Philipp Freimoser, Kay Berkling, Mengjie Qian
Liulishuo's System for the Spoken CALL Shared Task 2018. Huy Nguyen, Lei Chen, Ramon Prieto, Chuan Wang, Yang Liu
An Optimization Based Approach for Solving Spoken CALL Shared Task. Mohammad Ateeq, Abualsoud Hanani, Aziz Qaroush
The University of Birmingham 2018 Spoken CALL Shared Task Systems. Mengjie Qian, Xizi Wei, Peter Jančovič, Martin Russell
Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task. Keelan Evanini, Matthew Mulholland, Rutuja Ubale, Yao Qian, Robert Pugh, Vikram Ramanarayanan, Aoife Cahill

Shared task results

The Liulishuo team submitted the highest scoring entry in the second edition of the Spoken CALL Shared Task.

Liulishuo's System for the Spoken CALL Shared Task 2018. Huy Nguyen, Lei Chen, Ramon Prieto, Chuan Wang, Yang Liu

Results for anonymised submissions

Id	T	Pr	R	F	SA	IRej	CRej	D
LLL	Text	0.742	0.984	0.846	0.760	0.305	0.016	19.088
HHH	Speech	0.758	0.975	0.853	0.772	0.342	0.025	13.492
KKK	Text	0.777	0.967	0.862	0.787	0.399	0.033	11.965
GGG	Speech	0.773	0.967	0.859	0.782	0.381	0.033	11.424
III	Speech	0.764	0.967	0.853	0.774	0.364	0.033	10.909
FFF	Speech	0.893	0.936	0.914	0.871	0.689	0.064	10.764
DDD	Speech	0.896	0.935	0.915	0.873	0.700	0.065	10.714
BaselinePerfectRec	Text	0.961	0.913	0.936	0.907	0.889	0.087	10.256
EEE	Speech	0.885	0.924	0.904	0.856	0.669	0.076	8.804
OOO	Text	0.759	0.955	0.846	0.764	0.362	0.045	7.993
JJJ	Text	0.797	0.941	0.863	0.793	0.458	0.059	7.804
RRR	Text	0.842	0.920	0.880	0.823	0.592	0.080	7.397
QQQ	Text	0.840	0.916	0.876	0.818	0.588	0.084	7.001
MMM	Text	0.794	0.933	0.858	0.785	0.445	0.067	6.677
BBB	Text	0.882	0.889	0.886	0.832	0.673	0.111	6.079
AAA	Text	0.881	0.889	0.885	0.831	0.672	0.111	6.068
NNN	Text	0.798	0.921	0.855	0.783	0.470	0.079	5.971
CCC	Text	0.873	0.891	0.882	0.825	0.643	0.109	5.885
PPP	Text	0.802	0.912	0.853	0.784	0.497	0.088	5.648
Baseline	Text	0.916	0.855	0.884	0.834	0.777	0.145	5.343

T = track
Pr = precision,
R = recall
F = F-measure
SA = scoring average
IRej = rejections on incorrect responses,
CRej = rejections on correct responses
D = D-measure.

The scores have been calculated using the scoring script

Test data

View second edition test data online

This data is also available as a csv file in the downloads tab.

Task instructions

Speech-processing task

In the speech version of the CALL shared task, each item consists of

an identifier
a German text prompt
an audio file containing an English language response

The data was collected from an online CALL tool used to help young Swiss German students improve their English fluency.

The task is to create software that will decide whether each response is appropriate (accept) or inappropriate (reject) in the context of the prompt. This will presumably require a combination of speech recognition and text processing methods. A response is considered appropriate if it both responds to the prompt in terms of meaning and is also correct English. For example, if the prompt is

"Frag: rote Stiefel"

("Ask for: red boots"), then "I would like some red boots" or "Red boots, please" are appropriate responses. "Give me brown boots" is inappropriate because it has the wrong meaning. "I wants red boots" is inappropriate because it is incorrect English.

The task is open-ended; there are many potentially appropriate responses to each prompt.

In this version of the task, no explicit attention is paid to quality of pronunciation.

Format of training data

The training data has been created in two tranches. The original data, created for the first edition of the task, was hand-annotated by three native speakers. The original speech task training release directory contains the following resources:

A set of 5221 audio files
A CSV file of metadata in the format (headers and example lines)

Id	Prompt	Wavfile	Transcription	language	meaning
11336	Frag: rote Stiefel	11336.wav	i'd like red boots	correct	correct
7068	Frag: Wie viel kostet es?	7068.wav	how many is it	incorrect	incorrect
8774	Frag: Ich möchte die Rechnung	8774.wav	i want the bills	incorrect	correct

There is one line for each audio file. The specific CSV format is UTF-8, tab separated.

The 'language' column contains the word "correct" if the response has been judged fully correct in terms of both language and meaning by the human annotators. The 'meaning' column contains the word "correct" if the response has been judged correct in terms of meaning, but not necessarily language.

The new data added for the second edition of the task was able to leverage systems developed for the first edition, to create an improved annotation process where each item has been annotated both by machines and by humans. The metadata for the second edition of the task contains an extra column giving a summary of the annotation information, and has also been divided into three groups (A highest, C lowest) of descending reliability. A brief summary of the annotation process is given in the Shared Task 2 release notes.

The second edition speech task training release directory contains the following resources:

A set of 6698 audio files
Three CSV files of metadata in the format (headers and example lines)

Id	Prompt	Wavfile	Transcription	language	meaning	Trace
11336	Frag: rote Stiefel	11336.wav	i'd like red boots	correct	correct	H: 3-1 M: 3-0
7068	Frag: Wie viel kostet es?	7068.wav	how many is it	incorrect	incorrect	H: 0-4 M: 0-0
8774	Frag: Ich möchte die Rechnung	8774.wav	i want the bills	incorrect	correct	H: 2-2 M: 0-3

Again, there is one line for each audio file. The specific CSV format is UTF-8, tab separated.

The 'language' column contains the word "correct" if the response has been judged fully correct in terms of both language and meaning. The 'meaning' column contains the word "correct" if the response has been judged correct in terms of meaning, but not necessarily language.

Format of test data

The speech task test release directory, which will be released on Jan 31 2018, will contain the following resources:

A set of 1000 audio files
A CSV file of metadata in the format (headers and example lines)

Id	Prompt	Wavfile
11336	Frag: rote Stiefel	11336.wav
7068	Frag: Wie viel kostet es?	7068.wav
8774	Frag: Ich möchte die Rechnung	8774.wav

i.e. like the training data but without transcriptions, judgements or annotation information. The specific CSV format is UTF-8, tab separated.

Format of answers

Groups who wish to submit an entry to the shared task should submit a CSV file, produced by running their system over the test data. The format should be the same as the test data, but with an extra column called Judgement added in which the possible values are 'accept' and 'reject'. For example:

Id	Prompt	Wavfile	Judgement
11336	Frag: rote Stiefel	11336.wav	accept
7068	Frag: Wie viel kostet es?	7068.wav	reject
8774	Frag: Ich möchte die Rechnung	8774.wav	reject

There should be one line for each line in the test data. The specific CSV format is once more UTF-8, tab separated.

Answer spreadsheets will be submitted by email to johanna.gerlach@unige.ch and emmanuel.rayner@unige.ch.

Scoring metric

The metric used to score the results is based on three intuitions:

The system should reject incorrect answers as often as possible, and reject correct answers as seldom as possible.
The more pronounced the difference between the system's response to incorrect as opposed to correct answers, the more useful it will be.
Some system errors are more serious than others. In particular, it is worse for the system to accept a sentence which is incorrect in terms of meaning than it is to accept one which is correct in terms of meaning but incorrect in terms of language.

The metric is defined as follows (there is further discussion in §4.1 of the LREC 2016 paper and §5 of the SLaTE 2017 paper):

Each system response falls into one of five categories:

Correct Reject: the student's answer is incorrect, the system rejects.
Correct Accept: the student's answer is correct, the system accepts.
False Reject: the student's answer is correct, the system rejects.
Plain False Accept: the student's answer is correct in meaning but incorrect English, the system accepts.
Gross False Accept: the student's answer is incorrect in meaning, the system accepts.

Define CR, CA, FR, PFA, GFA to be the number of utterances in each of the above categories, and put FA = PFA + k.GFA where k is a weighting factor that makes gross false accepts relatively more important. Then we define the differential response score, D, to be the ratio of the reject rate on incorrect answers to the reject rate on correct utterances:

D = ( CR/(CR + FA) ) / ( FR/(FR + CA) ) = CR(FR + CA) / FR(CR + FA)

We will use D as the metric for evaluating the quality of systems competing in the shared task, with the weighting factor k set equal to 3.

Important: In order to prevent "gaming" of the metric, entries are required to reject at least 25% of all incorrect responses. This should not pose problems for normal systems.

Recognition resources

A baseline recogniser for the task, built using the popular KALDI platform, will soon be available from the downloads tab.

Grammar resources

A sample grammar, based on the one in the app used to collect the data, is provided as part of the release. The grammar is in XML format, and associates each possible prompt with

a translation of the prompt into English
a set of possible responses.

A typical entry looks like this:

<prompt_unit>
 <prompt>Sag: Ich möchte am Montagmorgen abreisen</prompt>
 <translated_prompt>Ask for: I want to leave on monday morning</translated_prompt>
 <response>i need to leave on monday morning</response>
 <response>i need to leave on monday morning please</response>
 <response>i should like to leave on monday morning</response>
 <response>i should like to leave on monday morning please</response>
 <response>i want to leave on monday morning</response>
 <response>i want to leave on monday morning please</response>
 <response>i would like to leave on monday morning</response>
 <response>i would like to leave on monday morning please</response>
 <response>i'd like to leave on monday morning</response>
 <response>i'd like to leave on monday morning please</response>
</prompt_unit>

Important: the sample grammar is NOT INTENDED TO BE COMPLETE. As already noted, the task is open-ended.

Text-processing task

In the text version of the CALL shared task, each item consists of

an identifier
a German text prompt
an audio file containing an English language response
a text string representing the 1-best result of performing speech recognition on the audio file

The data was collected from an online CALL tool used to help young Swiss German students improve their English fluency.

The task is to create software that will decide whether each response is appropriate (accept) or inappropriate (reject) in the context of the prompt. This will require some kind of text processing method. A response is considered appropriate if it both responds to the prompt in terms of meaning and is also correct English. For example, if the prompt is

"Frag: rote Stiefel"

The task is open-ended; there are many potentially appropriate responses to each prompt.

Format of training data

The training data has been created in two tranches. The original data, created for the first edition of the task, was hand-annotated by three native speakers. The speech task training release directory contains the following resources:

A set of 5221 audio files
A CSV file of metadata in the format (headers and example lines)

Id	Prompt	Wavfile	RecResult	Transcription	language	meaning
11336	Frag: rote Stiefel	11336.wav	i'd like red boots	i'd like red boots	correct	correct
7068	Frag: Wie viel kostet es?	7068.wav	how many is it	how many is it	incorrect	incorrect
8774	Frag: Ich möchte die Rechnung	8774.wav	i want the bill	i want the bills	incorrect	correct

There is one line for each audio file. The specific CSV format is UTF-8, tab separated.

The second edition text task training release directory contains the following resources:

A set of 6698 audio files
Three CSV files of metadata in the format (headers and example lines)

Id	Prompt	Wavfile	RecResult	Transcription	language	meaning	Trace
11336	Frag: rote Stiefel	11336.wav	i'd like red boots	i'd like red boots	correct	correct	H: 3-1 M: 3-0
7068	Frag: Wie viel kostet es?	7068.wav	how many is it	how many is it	incorrect	incorrect	H: 0-4 M: 0-0
8774	Frag: Ich möchte die Rechnung	8774.wav	i want the bill	i want the bills	incorrect	correct	H: 2-2 M: 0-3

Again, there is one line for each audio file. The specific CSV format is UTF-8, tab separated.

The fact that speech recognition is often inaccurate means that there may not always be sufficient information to make a correct decision. For example, the third utterance should be rejected, since the student has replied with a grammatically incorrect sentence, but since the recogniser has corrected the error there is no way to determine this.

Format of test data

The text task test release directory, which will be released on Jan 31 2018, will contain the following resources:

A set of 1000 audio files
A CSV file of metadata in the format (headers and example lines)

Id	Prompt	Wavfile	RecResult
11336	Frag: rote Stiefel	11336.wav	i'd like red boots
7068	Frag: Wie viel kostet es?	7068.wav	how many is it
8774	Frag: Ich möchte die Rechnung	8774.wav	i want the bill

i.e. like the training data but without transcriptions or judgements. The specific CSV format is UTF-8, tab separated.

Format of answers

Groups who wish to submit an entry to the shared task should upload a CSV file, produced by running their system over the test data. The format should be the same as the test data, but with an extra column called Judgement added in which the possible values are 'accept' and 'reject'. For example:

Id	Prompt	Wavfile	RecResult	Judgement
11336	Frag: rote Stiefel	11336.wav	i'd like red boots	accept
7068	Frag: Wie viel kostet es?	7068.wav	how many is it	reject
8774	Frag: Ich möchte die Rechnung	8774.wav	i want the bill	reject

There should be one line for each line in the test data. The specific CSV format is once more UTF-8, tab separated.

Answer spreadsheets will be submitted by email to johanna.gerlach@unige.ch and emmanuel.rayner@unige.ch.

Scoring metric

The metric used to score the results is based on three intuitions:

The system should reject incorrect answers as often as possible, and reject correct answers as seldom as possible.
The more pronounced the difference between the system's response to incorrect as opposed to correct answers, the more useful it will be.
Some system errors are more serious than others. In particular, it is worse for the system to accept a sentence which is incorrect in terms of meaning than it is to accept one which is correct in terms of meaning but incorrect in terms of language.

The metric is defined as follows (there is further discussion in §4.1 of the LREC 2016 paper and §5 of the SLaTE 2017 paper):

Each system response falls into one of five categories:

Correct Reject: the student's answer is incorrect, the system rejects.
Correct Accept: the student's answer is correct, the system accepts.
False Reject: the student's answer is correct, the system rejects.
Plain False Accept: the student's answer is correct in meaning but incorrect English, the system accepts.
Gross False Accept: the student's answer is incorrect in meaning, the system accepts.

D = ( CR/(CR + FA) ) / ( FR/(FR + CA) ) = CR(FR + CA) / FR(CR + FA)

We will use D as the metric for evaluating the quality of systems competing in the shared task, with the weighting factor k set equal to 3.

Important: In order to prevent "gaming" of the metric, entries are required to reject at least 25% of all incorrect responses. This should not pose problems for normal systems.

Grammar resources

A sample grammar, based on the one in the app used to collect the data, is provided as part of the release. The grammar is in XML format, and associates each possible prompt with

a translation of the prompt into English
a set of possible responses.

A typical entry looks like this:

<prompt_unit>
 <prompt>Sag: Ich möchte am Montagmorgen abreisen</prompt>
 <translated_prompt>Ask for: I want to leave on monday morning</translated_prompt>
 <response>i need to leave on monday morning</response>
 <response>i need to leave on monday morning please</response>
 <response>i should like to leave on monday morning</response>
 <response>i should like to leave on monday morning please</response>
 <response>i want to leave on monday morning</response>
 <response>i want to leave on monday morning please</response>
 <response>i would like to leave on monday morning</response>
 <response>i would like to leave on monday morning please</response>
 <response>i'd like to leave on monday morning</response>
 <response>i'd like to leave on monday morning please</response>
</prompt_unit>

Important: the sample grammar is NOT INTENDED TO BE COMPLETE. As already noted, the task is open-ended.

Baseline system resource

A Python3 script which carries out a baseline version of the text task is provided as part of the release. The script reads the sample XML grammar and a training data spreadsheet, then scores each item in the spreadsheet by matching the prompt and recognition result against the appropriate record in the grammar. If the recognition result is listed in the grammar as a possible response for the prompt, it is accepted, otherwise it is rejected. The results are written out as a new spreadsheet.

The files used (resource grammar, input spreadsheet and output spreadsheet) are defined at the top of the script.

Note that the script does not run under Python 2.x

Further information

The ideas behind the shared task are elaborated further in the following two papers:

Original paper: Baur, Claudia, Johanna Gerlach, Manny Rayner, Martin Russell, and Helmer Strik. (2016). "A Shared Task for Spoken CALL?". Proc LREC 2016, Portoroz, Slovenia.

Summary of first edition: Baur, Claudia, Cathy Chua, Johanna Gerlach, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei. (2017). "Overview of the 2017 Spoken CALL Shared Task". Proc SLaTE 2017 workshop, Stockholm, Sweden.

If you have questions, please contact us at {johanna.gerlach,emmanuel.rayner}@unige.ch

Available downloads

Annotated test data

Test data with annotations (same as training data).

Speech-processing task test data

Test data will be released on Feb 7 2018

Speech processing task test data

Test data for the speech task. See task instructions tab for more info about the data format.

Test data audio files

Download this data set if you wish to participate in the speech-processing task. It includes all the audio files for the test data.

Text-processing task test data

Test data will be released on Feb 7 2018

Text processing task test data

Test data for the text processing task. See task instructions tab for more info about the data format.

Speech-processing task downloads

Training data audio files

Download this data set if you wish to participate in the speech-processing task. It includes all the audio files for the training data.

Training data for speech-processing task

Download this data set if you wish to participate in the speech-processing task. It includes three tab-separated metadata files with: prompt, wavfile, transcription, language judgement, meaning judgement.

Kaldi baseline system (522MB)

Baseline Kaldi system. This is the system JJJ that achieved the highest score in the first edition of the shared task, cf. The University of Birmingham 2017 SLaTE CALL Shared Task Systems. Mengjie Qian, Xizi Wei, Peter Jančovič and Martin Russell

Text-processing task downloads

Training data for text-processing task

Download this data set if you wish to participate in the text-processing task. It includes three tab-separated files with: prompt, recognition result (produced by the highest scoring system from the first edition of the shared task), transcription, language judgement, meaning judgement. The audio files can be downloaded separately under 'Speech-processing task downloads' if required.

Baseline system

Python3 system which carries out a baseline version of the text task by matching the prompt and recognition result against the appropriate record in the sample grammar.

Common downloads (resources usable for both tasks)

Reference grammar

Sample grammar in XML format which associates each possible prompt with a translation of the prompt into English and a set of possible responses.

Data from first edition of the task

The transcription scheme has been improved since the first task and all transcriptions have been updated accordingly. Please make sure to download the updated data below to get the new transcriptions.

First edition training data - audio files

Training data audio files.

First edition training data - metadata for speech-processing task

Updated version of the training metadata: tab-separated file with prompt, transcription, language judgement, meaning judgement

First edition training data - metadata for text-processing task

Updated version of the training metadata: tab-separated file with prompt, recognition result (produced by JJJ), transcription, language judgement, meaning judgement.

First edition test data - audio files

Test data audio files.

First edition test data - annotated

Updated version of the test metadata with transcriptions and language and meaning annotations.

Submitting task results

Results should be submitted by email to johanna.gerlach@unige.ch and emmanuel.rayner@unige.ch.

Please submit a csv result file in the format specified in the task instructions tab..

Groups may submit up to three entries for each task. When ranking the results, only the best entry from each group will be included.

Entries may use any available material (not just the material in the Spoken CALL Shared Task v. 2 training release) for training and tuning. In particular, you may use material from the original Spoken CALL Shared Task training and test releases in any way you think appropriate, and there is no explicit development set.

As with the first edition of the Spoken CALL Shared Task, test set prompts will not necessarily all appear in the training set, but they will all be in the reference grammar.

Submission deadline: February 14, 2018, 23:59 CET.

Interactive demo

This tab presents a toy interactive demo showing how the shared task works. On each line, you can see a German text prompt, a link to an audio file with the student's response to the prompt and a recognition result. The task is to decide whether the response is appropriate or not. The radio buttons in the "Accept/reject" column let you choose whether to accept or reject the student's response. The final columns (displayed when you press "Show score") give a transcription and gold standard judgements taken from native speakers of English. The "language" column indicates whether the response is fully correct, both in terms of meaning and in terms of being correct English. The "meaning" column only indicates whether the meaning is right, so is a weaker criterion of correctness.

There are two versions of the task:

For the speech version you should imagine that you are a system which includes a speech recogniser. You listen to the audio file, and decide on the basis of what you hear whether to accept or reject. For example, the prompt for the first line is "Frag : Ticket zum Trafalgar Square" ("Ask for: ticket to Trafalgar Square"), and the audio file has the response "A ticket to the Trafalgar Square". This is incorrect (the superfluous "the"), so should be rejected. The second line has the prompt "Frag : Zimmer für 7 Nächte" ("Ask for: room for 7 nights"), and the audio prompt has the response "hello". This is completely wrong, and should be rejected. The third line has the same prompt as the second, and the audio file has the response "A room for seven nights". This is completely correct, so should be accepted.
For the text version, you should imagine that that the speech recognition has already been done for you. You should NOT listen to the audio file, but make your decision only on the basis of what you see in the "Recognition result" column. If speech recognition was incorrect, you may not have enough information to make a good decision. For example, the second line has the prompt "Frag : Zimmer für 7 Nächte" ("Ask for: room for 7 nights"), and the recognition result is "hello". The obvious decision is to reject, which is correct. However, in the third line, the prompt is the same and the response is "an room for seven nights". This looks slightly wrong, since the first word should be "a" rather than "an". But in fact there has been a recognition error: the student has answered appropriately, and the correct decision is to accept.

Prompt

Audio

Recognition result

Accept/reject

Transcription

Language

Meaninng

correct

incorrect

correct

incorrect

Please complete annotations in Accept/reject column

Correct Reject: {{cr()}} (the student's answer is incorrect, the system rejects)

Correct Accept: {{ca()}} (the student's answer is correct, the system accepts)

False Reject: {{fr()}} (the student's answer is correct, the system rejects)

Plain False Accept: {{pfa()}} (the student's answer is correct in meaning but incorrect English, the system accepts)

Gross False Accept: {{gfa()}} (the student's answer is incorrect in meaning, the system accepts)

Weighted rejection rate on incorrect utterances: {{wRRIU() |textOrNumber:2}} = CR / (CR + PFA + k.GFA) = {{cr()}}/({{cr()}}+{{pfa()}}+3*{{gfa()}})

Rejection rate on correct utterances: {{rRCU() |textOrNumber:2}} = FR / (FR + CA) = {{fr()}}/({{fr()}}+{{ca()}})

Differential response score: {{score() |textOrNumber:2}} = Weighted rejection rate on incorrect utterances / Rejection rate on correct utterances

(or change selections in the Accept/reject column to see impact on the score)

Organisers

(in alphabetical order)

Claudia Baur, FTI/TIM, Université de Genève

Andrew Caines , University of Cambridge

Cathy Chua, Independent researcher

Johanna Gerlach , FTI/TIM, Université de Genève

Mengjie Qian, Department of Electronic, Electrical and Systems Engineering, University of Birmingham

Manny Rayner , FTI/TIM, Université de Genève

Martin Russell, Department of Electronic, Electrical and Systems Engineering, University of Birmingham

Helmer Strik , Centre for Language Studies (CLS), Radboud University Nijmegen

Xizi Wei, Department of Electronic, Electrical and Systems Engineering, University of Birmingham

Spoken CALL Shared Task - Second Edition

About the task

Schedule

Subscribe to the Newsletter

Shared task papers published at Interspeech 2018

Overview paper

Submitted systems

Shared task results

The Liulishuo team submitted the highest scoring entry in the second edition of the Spoken CALL Shared Task.

Test data

Task instructions

Speech-processing task

Format of training data

Format of test data

Format of answers

Scoring metric

Recognition resources

Grammar resources

Text-processing task

Format of training data

Format of test data

Format of answers

Scoring metric

Grammar resources

Baseline system resource

Further information

Available downloads

Annotated test data

Speech-processing task test data

Text-processing task test data

Speech-processing task downloads

Kaldi baseline system (522MB)

Text-processing task downloads

Common downloads (resources usable for both tasks)

Data from first edition of the task

Submitting task results

Interactive demo

Organisers