German Speech Corpus aligned with CTC segmentation

Alignments on Librivox and Spoken Wikipedia Corpus (SWC) with CTC segmentation:

Dataset Length Speakers Utterances
SWC 210h 363 78214
Librivox 804h 251 368532

 

The pre-processed text and alignments can be found on https://github.com/lumaku/german-corpus-aligned 

Source of the audio files:

See the Downloads section for a pre-trained model.

Downloads

Reference

The full paper can be found in the preprint https://arxiv.org/abs/2007.09127 or published at https://doi.org/10.1007/978-3-030-60276-5_27.

To cite this work:

@InProceedings{ctcsegmentation,
    author="K{\"u}rzinger, Ludwig and Winkelbauer, Dominik and Li, Lujun and Watzel, Tobias and Rigoll, Gerhard",
    editor="Karpov, Alexey
    and Potapova, Rodmonga",
    title="CTC-Segmentation of Large Corpora for German End-to-End Speech Recognition",
    booktitle="Speech and Computer",
    year="2020",
    publisher="Springer International Publishing",
    address="Cham",
    pages="267--278",
    abstract="Recent end-to-end Automatic Speech Recognition (ASR) systems demonstrated the ability to outperform conventional hybrid DNN/HMM ASR. Aside from architectural improvements in those systems, those models grew in terms of depth, parameters and model capacity. However, these models also require more training data to achieve comparable performance.",
    isbn="978-3-030-60276-5"
}