By emphasizing words acoustically, people can convey the information about which concepts they wish to contrast. This feature of speech, known as focus, is pervasive in English, yet is inadequately modeled in state-of-the-art speech technologies. The challenge, which this Early Grant for Exploratory Research addresses, is that it is often difficult to identify phonetic emphasis independently of semantic contrast: words whose meanings are focused are usually realized with increased acoustic prominence, but not all cases of increased acoustic prominence are due to focus. The project is innovative in its use both of speech that has been recorded in a laboratory under controlled conditions, and also of speech that occurs naturally, such as in podcasts and videos. Judgments of focus location in laboratory speech and in naturally-occurring speech are collected from ordinary, non-expert listeners using online crowd-sourcing. Using the comparative construction (for example, "He liked it better than I did" or "I like it better now than I did") in which focus can be independently verified, computational procedures are developed to mimic the judgment of subjects who read but do not listen to the utterance being investigated. The findings will inform research in speech synthesis and in automatic speech recognition. Commercial applications may include aids for the deaf and hearing impaired, robot assistants for the elderly, language instruction and speech therapy.In a previous proof-of-concept study, the researcher collected utterances of "than I did" in laboratory experiments and from transcribed podcasts available on the web. Machine learning classifiers (using linear discriminant analysis and support vector machines) were trained to detect focus from acoustic features alone, including measures of fundamental frequency, duration and intensity. Location of focus can be determined independently from prosody in the comparative construction by observing the presence or absence of co-reference between subjects in the main and comparative clauses. This research generalizes that study to variations of the comparative with different pronouns and auxiliaries and also introduces updated methods of acoustic extraction and classification. Then, a verification dataset is created in order to reject annotations from participants who annotate non-focal prominence or who mark focus location incorrectly. Finally, classifiers are trained to detect focus on pronouns and auxiliaries in contexts other than the comparative, using the crowd-sourced annotation data to infer correct location of focus independently from prosody.
Acoustics; Address; Classification; commercial application; comparative; crowdsourcing; Data; Data Set; Discriminant Analysis; Elderly; Frequencies; Grant; Hearing Impaired Persons; hearing impairment; innovation; Instruction; Internet; Judgment; Laboratories; laboratory experiment; Language; Location; Machine Learning; Measures; Methods; Modeling; Names; Participant; podcast; Procedures; Research; Research Personnel; Robot; Semantics; Speech; speech recognition; Speech Therapy; Technology; Training; Update; Variant