AI in - Speech
If there is one field of AI which really has made its presence felt, it is 'speech' and this is especially true in media technology, as we shall see in this session. Speech-to-speech conversion is best known for its remarkable ability to transform one voice into another and this will be examined by Brazilian researchers. However, they have taken this further with a whole suite of fascinating applications which are already in use on broadcast TV: reviving archival voices; rescuing noisy-location dialogue; correcting spoken errors without re-recording; crafting new voices to have a chosen age or accent; and crafting human-to-animal voices from animal sounds.
The performance of TV subtitles has been the subject of both much complaining, and of much research. In a novel approach to measuring subtitling performance, generative AI speech-to-text tools are being used alongside natural language processing to quantify and monitor timing errors and word omission. This represents a big advance in the field. Join us to find out how.
A supporting paper for this session is also available – it investigates the use of automatic speech recognition (ASR) technologies to produce a reference-free measurement of dialog intelligibility.