Speech-to-speech Translator Evaluation (SSTE)

The SSTE project estimated the functional accuracy-in-context of an automatic speech-to-speech interpreter. An automated speech-to-speech interpreter can be implemented as three component processes in series: automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS). Tasso Partners selected one commercially available system, Jibbigo®, that ran in iPhone hardware and tested the accuracy of its performance for two language pairs: Spanish-to-English and Chinese-to-English. Tasso Partners developed a method for estimating a functionally relevant accuracy rate appropriate to the expected uses of a speech-to-speech interpreter system. The first series of SSTE experiments (in the travel domain) indicated that for Spanish and Chinese as L1 (source language) going into English as L2 (target language), conventional accuracy in a quiet environment is surprisingly high at about 70-90% correct (10-30% WER). A second series of SSTE experiments indicated that a more functionally appropriate measure that reflects communication-in-context nearly halves the measured error rate. The functional measure presents the text output from the Jibbigo® system along with skeletal contexts {speaker, addressee, location}, but with no prior verbal context. When these contextualized incorrect transcriptions were presented to human judges through Amazon’s Mechanical Turk service, about 45% of the incorrect translations were rendered into target-language text that was judged to be functionally equivalent to the correct translation in a target L2. Thus, for example, an ASR x MT performance of 87% correct may reflect a functional correct rate of 93%. Subsequent data collected and analyzed by Chris Frederick at Stanford University suggests that the Google speech translator performs with similar accuracy to Jibbigo® in the travel domain, but the Google system performs better in other domains of discourse.

Posted in Tasso Partners Projects, Uncategorized