The speech group has developed a C/C++ system for large vocabulary continuous speech recognition. The system is language-independent, but it is particularly useful for languages like Finnish, Estonian or Turkish, in which the words consist of several morphemes. For testing the system, contact Mikko Kurimo or try the www demo.
Tools for language modeling
TheanoLM is a neural network language modeling toolkit implemented using Theano, a Python library for evaluating mathematical expressions.
Finnish Parliament corpus has 2269 hours of transcribed Finnish Parliament sessions 2008 - 2016 aligned by AaltoASR. The data is available from the Language Bank of Finland.
DSP corpus has spontaneous conversations recorded and transcribed by over 200 students of Aalto University. The data is available for research from the Language Bank of Finland.
Isolated Finnish words spoken by 59 speakers, about 260 words each collected at Helsinki University of Technology in 1999. A direct link to the data is available here.