SST 1986 Abstracts

Page numbers refer to nominal page numbers assigned to each paper for purposes of citation.

Text-To-Speech And Speech Synthesis

Pages Authors Title and Abstract PDF
2--7 S.J. Butler Articulatory And Cognitive Constraints In Speech Synthesis

 

Abstract  The representation of constraints on speech production in speech synthesis models is considered. Constraints acting at a cognitive level and the processes which give rise to them are focused upon. The acoustic properties of vowel spaces are shown to be a product of this type of constraint.

 

PDF
8--13 K. C. Zhou Principles Of Implementing A Text-To-Speech System For Tonal Chinese (Mandarin)

 

Abstract  Unlike other rule synthesis systems for non-tonal languages, a CTS system (Chinese Text-to-Speech or Chinese Type-Speaker) gives particular attention to the tone variation for each syllable. The principles and the implementation of CTS both in software and hardware, are presented in this paper.

 

PDF
14--19 R.Mannell and J.E.Clark Text-To-Speech Rule And Dictionary Development

 

Abstract  The development and evaluation of a set of letter—to—sound rules and their associated lexicon and suffix stripping rules is outlined. These rule/databases form part of a complete text-to-speech system being developed at the Speech, Hearing and Language Research Centre.

 

PDF
20--25 M.J. McAllister, J. Layer, J.M. McAllister A Preprocessor Algorithm For The Cstr Text To Speech System

 

Abstract  In any Text-to-Speech (TTS) system there will be textual strings presented as input which fail to conform to the typographical norm on which the design of the system’s dictionaries and rule sets has been based. For example, in this article the majority of the space-bounded strings (conventionally termed orthographic words) are composed entirely of lower case letters; a non-trivial minority, however, contain other characters. The signalling of sentence-initial words by the capitalisation of their first letter and the addition of a full stop to sentence-final words are ubiquitous examples of such anomalies. Examples of less commonplace non—standard strings are dates, times, acronyms, abbreviations and mathematical formulae. The interpretation of these and any other textual anomalies depends on their accurate detection and classification. The remainder of this paper comprises a brief description of an algorithm design to accomplish this identification process and the conceptual framework which underlies it. This algorithm forms part of a text preprocessor module in the CSTR system. The function of this preprocessor is to correct anomalous items to orthographic forms in lower case letters, so that the other modules of the TTS system can convert them to _phonemic form in a routine way.

 

PDF

Speech Disorders

Pages Authors Title and Abstract PDF
28--33 F.M. Bottell and J.E. Clark /R/ Misarticulation In Children'S Speech

 

Abstract  The speech productions of children who were identified as producing /w/ for /r/ substitutions were subjected to spectrographic analysis in an attempt to validate the assumption that such children do not contrast between /r/ and /w/ phonemes in their speech. Analysis revealed that the misarticulating children actually produced similar acoustic distinctions between their /w/ and intended /r/ as did a matched control group. Any differences were of degree rather than type. The presence of acoustic contrasts suggest that the classification of errors as substitutions may be incorrect but can probably be attributed to the language specific phonological biases which constrain the perceptions of listeners.

 

PDF
34--39 Megan D. Neilson and Peter D. Neilson Speech Motor Control And Stuttering: A Computational Model Of Adaptive Sensory-Motor Processing

 

Abstract  A theoretical account of stuttering is presented in which an inadequacy of neuronal resources for sensory—motor information processing is seen as the basis of the disorder. It is proposed that stutterers are deficient in the processing resources normally responsible for determining and adaptively maintaining the internal models which subserve speech production. A general description of such computational processes is detailed in the form of circuitry for an adaptive controller which can calibrate itself to control any variable, nonlinear, dynamic, multiple input, multiple output system.

 

PDF
40--45 J. Ingram, B. Murdoch and H. Chenery Physiological, Acoustic And Perceptual Aspects Of Hypokinetic Dysarthria

 

Abstract  The methodology of a study of speech disorders in Parkinsons disease is described and illustrated with three case studies.

 

PDF
46--51 Janis L. van Doorn, Nicholas J. O'Dwyer, Peter D. Neilson Dysarthric Speech In Cerebral Palsy: A Hornet'S Nest Of Acoustic And Electromyographic Data

 

Abstract  Speech waveforms were recorded simultaneously with electromyographic (EMG) signals from fourteen articulator muscles in cerebral palsied and normal subjects during a test sentence of continuous speech. This experiment has led to two parallel investigations of cerebral palsied and normal speech, using acoustic features for one study, and EMG activity of the articulator muscles for the other. Abnormal features found to be characteristic of cerebral palsied speech are reported here for both investigations, and related to articulatory dysfunction in dysarthria of cerebral palsy.

 

PDF

Speech Perception

Pages Authors Title and Abstract PDF
54--59 P.J. Blamey and G.M. Clark A Model Of Auditory··Visual Speech Perception

 

Abstract  A mathematical model relating the probabilities of correctly recognizing speech features, phonemes, and words was tested using data from the clinical trial of a multiple-channel cochlear implant. A monosyllabic word test was presented to the patients in the conditions hearing alone (H), lipreading alone (L), and hearing plus lipreading (HL). The model described the data quite well in each condition. Themodel was extended to predict the _HL scores from the feature recognition probabilities in the H and L conditions. The model may be useful for the evaluation of automatic speech recognition devices as well as hearing impaired people.

 

PDF
60--65 Louis C.W. Pols Psychophysics, Speech Perception, And Automatic Speech Recognition

 

Abstract  Research in psychophysics is manly concentrated on the perception of basic characteristics of relatively simple, stationary, isolated, sounds. Speech, however, is a complex, dynamic, acoustic signal, embedded in context, and linguistically meaningful. It is a temptation to try to bridge the gap between psychoacoustics and speech perception in terms of stimuli, methods, and models used, Thus contributing to our knowledge for improving automatic speech recognition as well.

 

PDF
66--71 Phillip Dermody, Kerrie Mackie and Richard Katsch Initial Speech Sound Processing In Spoken Word Recognition

 

Abstract  The present study uses the gating paradigm to investigate initial speech sound (ISS) processing in spoken word recognition. Results are presented for spoken words and consonant—vowel (CV) syllables, which both show very early recognition of the ISS. Acoustic analyses of the ISS show similarities between the words and syllables and are consistent with the templates proposed by Stevens and Blumstein (1978). It is suggested that the time course of ISS perception indicates the need to change present models of spoken word recognition.

 

PDF
72--77 U Thein-Tun Cue-Trading Relations For Initial Stop Voicing Contrast At Different Linguistic Levels

 

Abstract  The cue-trading relations between the intensity/duration of VOT and the intensity of the following vowel for the /d/-/t/ distinction were investigated at auditory, phonetic, syllable, word and sentence levels. The results demonstrate that from the syllable level onward, the higher the linguistic level of cue processing the less categorical were the identifications and the more effective was the cue—trading. The results also suggest that the speech mode of process- ing is a special mode of information processing in the sense that the cue-trading relationship of multiple cues signifying one phonemic contrast is most effective in the speech mode but not in the sense that cue-trading does not exist at non—linguistic levels.

 

PDF
78--83 P. Standen Abstrac And Literal Aspects Of Lexical Memory For Speech

 

Abstract  Many models of human speech recognition propose a lexicon where each word is represented by a single fixed string of abstract phonetic or phonemic elements. In contrast, recent models of long term memory require the storage of literal features of individual word tokens, including speaker‘s voice. In an experiment using the repetition priming phenomena as an indicator of lexical access, it is found that the lexicon is largely insensitive to variation in speaker's voice, even when an unfamiliar accent is present.

 

PDF

Speech Technology In Telecommunications

Pages Authors Title and Abstract PDF
86--91 R. A. Seidl A Perspective On Telecommunication Services With Relevance To Speech Processing

 

Abstract  Speech processing covers many areas and may be applied not only to facilitate communication between individuals via some communications infrastructure but also to provide "user friendly" access to, and control of, communication services. This paper concentrates on the role of speech processing in the end-to-end communication between individuals. The notion of transparency is introduced to provide a basis for the classification of telecommunication services, and the role of speech processing is discussed against this perspective.

 

PDF
92--97 R. Linggard Review Of British Telecom Speech Research: Objectives And Strategy, 1986.

 

Abstract  Speech Research and Development at British Telecom Research Labs is carried out in Division R18. This unit,of about 90 Scientists, Engineers and Technicians, undertakes long-term research, product development, and design of prototype equipment. This paper describes and discusses the objectives and strategies of British Telecom Speech Research in 1986, and outlines some of the projects now in hand.

 

PDF
98--103 N. Duong An Experimental Study Of Residual Excited Linear Predictive Coder For Telecommunication Purposes

 

Abstract  This paper describes a computer simulation study of the quality problems associated with RELP. Some observations are given on the appropriate conditions for LPC analysis.

 

PDF
104--109 R. A. Seidl Voice Response Techniques For Telecommunications Applications

 

Abstract  A wide range of services can be implemented by making use of voice response techniques. This paper provides an overview of the alternative technologies applicable to such services and the parameters affecting their performance are discussed. Some issues which do not relate to the technology itself, but rather to its use are also presented.

 

PDF

Speech Analysis

Pages Authors Title and Abstract PDF
112--117 Robert E. Bogner and Radin B. Ikram Yet More On Speech Splicing

 

Abstract  Speech waveforms are being cut and spliced to effect time scale modifications without pitch distortion, for data compression or time scale change. Suitable instants at which to effect the cutting and splicing are found by use of multi—dimensional representations of the waveforms. Residual splicing errors are reduced by tapering the abutting sections.

 

PDF
118--123 M.A. Jack, G. Duncan, A.M. Sutherland and J. Laver Signal Processing In Acoustic Phonetic Analysis Of Speech.

 

Abstract  This paper describes improved formant estimation and pitch tracking signal processing algorithms. Formant estimation is based on linear predictive coding techniques (LPC) using off-axis spectral estimation coupled with a progressive increase in vocal tract model order. Pitch tracking is based on a multi-feature investigation of the time-domain speech waveform, optimised for accurate measurement of individual pitch periods.

 

PDF
124--129 J. R. Glass and V. W. Zue Signal Representation For Acoustic Segmentation

 

Abstract  This paper describes an experiment designed to explore the relative merits of different spectral representations for acoustic segmentation. Conventional spectral representations, such as those produced by wideband discrete Fourier transform and by linear prediction, were compared to those based on auditory modeling. Our analysis of 1,000 sentences from 100 speakers indicates that the representations based on auditory modeling appear to be superior.

 

PDF
130--135 Phil Rose The Normalisation Of Tone

 

Abstract  Some considerations in the normalisation of tone are discussed, and some problems in application demonstrated on the fundamental frequency data of 6 speakers of a variety of wu Chinese.

 

PDF
136--141 R.E.E. Robinson Longterm Spectrum Analyser

 

Abstract  A Long Term Spectrum Analyser is described. It is based on a microcomputer and a set of real time multichannel filters and RMS detectors. The device can sample at rates as fast as 20 complete audio spectrums per second for periods in excess of 1 hour and produce _ machine readable output. The device was designed and built and is in use at Macquarie University.

 

PDF

Assessment Of Speech Technology Applications

Pages Authors Title and Abstract PDF
156--161 Margaret Grocke, Iain Macleod and Bruce Millar The Intelligibility Of Synthetic Speech In An Australian Classroom

 

Abstract  Speech synthesizers appropriate for classroom use, such as the Type-n-Talk and Intex—Talker, use extensive sets of pronunciation rules to convert ASCII text into speech. The intelligibility of an Intex-Talker synthesizer was evaluated by presenting students in grades 3-5 with lists of recorded words to recognise under three different conditions: male human speech; direct Intex—Talker output, in which only obvious mis-pronunciations were corrected; and modified Intex-Talker output, in which an attempt was made to produce a closer approximation to Australian English. As a result of this evaluation, we conclude that current classroom—style synthesizers (as represented by the Intex—Talker) are dramatically inferior to human speech with isolated individual words, and are too poor to use in this mode unless there are contextual or other clues as to the identity of the spoken word.

 

PDF
144--149 Kerrie Mackie, Phillip Dermody and Richard Katsch Assessing The Intelligibility, Quality And Encoding Speed Of Processed Speech

 

Abstract  The present study examines speech transmission evaluation measures designed to assess the intelligibility, quality and speed of encoding of processed speech. These measures are compared with a suprathreshold intelligibility test modelled on traditional speech evaluation measures. The results show that speech stimuli which are not differentiated by the traditional test methodology are differentiated by the other measures. These results indicate the value of including more sensitive tests of speech intelligibility in evaluation protocols for speech transmission evaluation.

 

PDF
162--167 K. C. Zhou Preliminary Evaluation Of A Chinese Text-To-Speech System

 

Abstract  As a message formation system, Text-to—Speech (TS), is language dependent. A preliminary evaluation method named Rhyme and Disyllable Test (RDT) has been proposed especially for tonal Chinese (Mandarin) and used to evaluate the author's Chinese Test-to-Speech system (CTS).

 

PDF
168--173 P. F. Duke and J. S. Spicer The Assessment Of Isolated Word Speech Recognizers

 

Abstract  A review of current proposals on techniques for the assessment of isolated word speech recognizers is given. This is followed by error performance tests on four low to medium cost speech recognizers, using both high quality and carbon microphones, and a vocabulary suited to the voice control of a telecommunication service. The results showed that one unit gave about 98% accuracy but the others had somewhat lower performance.

 

PDF
150--155 P.C.Koob and J.E.Clark An Investigation Int0 The Formulation Of A Speech Intelligibility Test Using Item Difficulty Criteria

 

Abstract  A simple method of formulating speech intelligibility tests in such a way as to maximise sensitivity and optimise difficulty is described. Some preliminary results from an investigation in the use of this method are presented.

 

PDF

Speech Aids For The Disabled

Pages Authors Title and Abstract PDF
176--181 M. Walsh, L. Westphal, M. Darveniza The Operation Of A Versatile Phoneme—Based Speech Aid Using A Special High Bandwidth Person-Aid Interface

 

Abstract  A novel method of man-machine communication is described. This is used to control a multi—purpose communication aid designed for persons with an acquired speech disability. The hardware and software required to realize the complete system are briefly described. The result of recent field trials are also given, indicating the ease of use of the system.

 

PDF

Speech Signal Processing Part I

Pages Authors Title and Abstract PDF
184--189 M.J. Flaherty Algorithmic Issues Of Adaptive Differential Pulse Code Modulation With Reference T0 Ccitt Recommendation G.721

 

Abstract  A description of the CCITT 32 kbit/s adaptive differential pulse code modulation algorithm. G.721 (June, 1986 version) is presented with sufficient information to better appreciate its derivation.

 

PDF
190--195 A.N. Johnson and A.B. Bradley Adaptive Transform Coding Of Speech Incorporating Time Domain Aliasing Cancellation

 

Abstract  A new transform coder that incorporates time domain aliasing cancellation is described. It contains 56 uniform 62.5 Hz channels that are adaptively quantized. Objective test results for this coder at bitrates of 16kbit/sec, 12kbit/sec and 9.6 kbit/sec are presented for two analysis/synthesis windows. Both windows are designed to meet the stated time domain aliasing cancellation constraints. The first window is a maximum length 128 point window. The second window is a 68 point trapezoidal window similar to that used in existing transform coders.

 

PDF
196--201 A. Maheswaran, R. E. Bogner A New Time-Scale Warping Algorithm For Single Dimensional And Multidimensional Speech Parameter Contours

 

Abstract  In this paper a new sample association approach to be known as the Hilbert Warping (HW) algorithm is described. This ·algorithm is chosen from the observation that signals of similar form but with different time scales appear as similar trajectories when represented by suitable two dimensional plots in the X—Y plane, and overcomes difficulties such as identification of signal endpoints and assumptions about the smooth nature of warping that are permissible, associated with dynamic programming algorithms, The HW algorithm can be applied to both single dimensional and multi-dimensional signals as in dynamic programming algorithms.

 

PDF

Speech Research Hardware And Software Part I

Pages Authors Title and Abstract PDF
204--209 Michael Wagner and John Fulcher An Ibm-Pc Based Speech Research Workstation

 

Abstract  A microcomputer-based speech research workstation with analog-to-digital and digital-to-analog conversion hardware and software, a specially designed antialias filter and software packages for speech editing, homomorphic analysis and linear predictive analysis and synthesis is described in this paper.

 

PDF
210--215 R.E.E. Robinson Data Acquisition Subsystem

 

Abstract  A data acquisition system is described. The system connects to a DEC VAX 750 computer and allows it to perform A/O and O/A operations at rates up to 100khz. It also can perform multichannel low speed operations for physiological channels. It allows for raster graphic and vector graphic displays. The system is partially complete and currently in use at the Speech, Hearing and Language Research Centre at Macquarie University.

 

PDF
216--221 H.S.J.Purvis The Development Of A General Purpose Speech Editor

 

Abstract  This paper describes a general purpose speech editing and analysis program being developed at Macquarie University by the author. A brief history of the project is given. The design philosophy is explained. The general structure of the data and programs is examined.

 

PDF
222--227 J. M. Harrison, Y. C. Tong and P. M. Sorenson A Real-Time Laboratory Computer-Based Speech Processor For Cochlear Implant Research

 

Abstract  The hardware and software development for a real-time laboratory computer-based speech processor for cochlear implant research is described. It is envisaged that this processor will also be suitable for other areas of research such as electro-neurophysiology.

 

PDF

Speaker Characteristics

Pages Authors Title and Abstract PDF
228--233 J.B.Millar Quantification Of Speaker Variability

 

Abstract  An approach to the quantification of the speaker dimension of speech is described. This utilises four phonetically motivated dimensions and a heirarchy of measurements in each, ranging from the macro-acoustics of long—term statistics to the micro-acoustics of the organisation of the syllable. A 33-speaker database of spoken Australian English is used to demonstrate the application of some of the macro-acoustic measurements and their importance for the development of robust speech processing for the Australian bionic ear project.

 

PDF
234--239 J. Pittam The Long-Term Measurement Of Voice Quality: A Comparison Of Acoustic Measures

 

Abstract  A long-term acoustic measure of voice quality is developed. Five measures, based on the long-term average spectrum, are compared for their ability to discriminate breathy voice, creaky voice, nasal voice, tense voice and whispery voice. A discriminant analysis technique is used. These measures utilise the Mel and Bark scales as well as 'equal-Hertz' intervals.

 

PDF
240--245 Michael G. Barlow and Michael Wagner Effects Of Acoustic Parameter Alteration Upon Perceived Speaker Characteristics

 

Abstract  A set of experiments to observe the relationship between certain acoustic parameters and perceived speaker characteristics are described. Linear predictive analysis was performed upon a sentence spoken by a group of three speakers. Utterances of each speaker were then altered via pitch variance, time alignment and pitch interchange. These altered sentences were then resynthesised and played back to an audience of listeners. The listeners‘ responses to these sentences were recorded and analysed.

 

PDF
246--251 J. Ingram and J. Pittam Acoustic Correlates Of Accent Change: Vietnamese Schoolchildren Acquiring Australian English

 

Abstract  Changes in vowel formant trajectories in four vocalic nuclei for IO Vietnamese immigrant schoolchildren acquiring Australian English are reported. Changes are in conformity with predictions based on a comparative phonetic/phonological analysis of the two languages.

 

PDF

Speech Technology Applications

Pages Authors Title and Abstract PDF
266--271 Margaret Grocke and Iain Macleod Use Of Synthetic Speech In Support Of Students With Special Needs

 

Abstract  Computer-based exercises have shown promise as an instructional technique for students with special needs . Many of _these students are pre-literate, hence the use of a conventional keyboard and display as a means of student/machine interaction is not appropriate. In this context, synthetic speech becomes an invaluable tool for communicating information and giving feedback _to students. Groups of students with severe reading difficulties, as well as students learning English as a Second Language, have made extensive use of a computer-based reading exercise developed at the ANU. The encouraging results obtained from evaluation studies with these two groups are in part attributable to effective use of synthetic speech as a component of this exercise.

 

PDF
254--259 A.B. Bradley Development Considerations For A Medium To High Security Voice Scrambling System

 

Abstract  This paper considers the application of modern digital signal processing technology to the task of voice waveform scrambling. Design considerations for realisation of traditional time domain and frequency domain segmentation and permutation scrambling strategies are described. Performance data based upon real time implementation of these scrambling techniques is presented. Frequency domain segmentation,permutation and reconstruction is achieved by means of a time aliased, overlap and add DFT implementation on a single monolithic digital signal processing device. A number of enhancements are discussed, each of which offer certain performance improvements in terms of residual intelligibility, cryptographic strength, distortion or synchronisation dependancy.

 

PDF
260--265 R. Mannell Australian English And The Votrax Sc-O2 Chip.

 

Abstract  Votrax SC-O2 vowels are analysed on a speech spectrograph and are plotted on F1/F2 and F3/F? planes against General Australian English vowels and diphthongs facilitating the selection of the nearest SC-O2 equivalents. SC-O2 equivalents of the consonants are also selected. Development of a text-to-speech system is briefly considered.

 

PDF
272--277 P.J. Blamey, R.C. Dowell, A.M. Brown, P.M. Seligman and G.M. Clark A Formant-Estimating Speech Processor For Cochlear Implant Patients

 

Abstract  A simple formant-estimating speech processor has been developed to make use of the "hearing" produced by electrical stimulation of the auditory nerve with a multiple-channel cochlear implant. Thirteen implant patients were trained and evaluated with a processor that presented the second formant frequency, fundamental frequency, and amplitude envelope of the speech. Nine patients were trained and evaluated with a processor that presented the first formant frequency and amplitude as well. The second group performed significantly better in discrimination tasks and word and sentence recognition through hearing alone. The second group showed a significantly greater improvement when hearing and lipreading was compared with lipreading alone in a speech tracking task.

 

PDF
278--283 M. O'Kane, D. Mead, J. Newmarch, R. Byrne and R. Stanton The Dicma Project

 

Abstract  The outline of a speech understanding project which has recently commenced is presented. This project involves the construction of an automatic dictating machine which will accept continuous speech input. A notable feature of the system is its reliance on information it generates from user-provided words.

 

PDF

Speech Signal Processing Part II

Pages Authors Title and Abstract PDF
286--291 R. Linggard and E. Ambikairajah A Computational Model Of The Basilar Membrane

 

Abstract  A digital filter simulation of basilar membrane vibration is described, in which the cochlea is represented as a cascade of 128 digital filters. The parameters of each filter are derived from the mechanical characteristics of the membrane at corresponding points. Using the model it is possible to simulate at each point, the waveform of the sound pressure in the cochlear fluid, as well as the deflection of the membrane itself. Thus it is possible to calculate the frequency response of the membrane at every point relative to input at the stapes. In addition the deflection of the membrane along its length, at any instant in time can be obtained. The significant advantage of this model is its relatively rapid computation time.

 

PDF
292--297 R. Togneri, Y. Attikiouzel Fast, Stable Solution Of The Two-Dimensional Cochlear Model

 

Abstract  The two dimensional model of the cochlea is solved in the time-domain. The use of the bi—linear transformation leads to faster and stable responses over previous methods. Plots of basilar membrane velocity are presented for an active model both as a function of distance and frequency.

 

PDF
298--303 D. Nandagopal, D.A.H. Johnson, J.P.T. Koljonen Modified Pole-Zero Decompostion Of Speech

 

Abstract  A modified method of pole-zero modelling of speech using cepstral coefficients is described. Cepstral coefficients are extracted from linear prediction coefficients (LPC) by assuming the signal to be minimum phase. Pole·zero decomposition is done by splitting the signal into a pole part , and a zero part, by using the group delay properties of the signal. The pole and Zero parts are then modelled using LPC.

 

PDF
304--309 M.A. Magdy and J.F. Chicharo A Modified Recursive Solution For The Linear Predictive Coding Of Speech

 

Abstract  In this paper a new algorithm is proposed to update recursively the parameter vector of a linear predictor for speech coding. The proposed method is based on modified Cholesky (LD) factorization of an augmented covariance matrix. In essence the algorithm suggested is a modified version of the method presented by (Ljung and Soderstrom, 1983) with distinct advantages. Firstly, the predictor parameter vector is directly obtained from the first column of the L factor, hence eliminating a computational step. Secondly, the given algorithm can readily and simply be implemented using standard systolic array structures (Jover, and Kailath, 1986).

 

PDF

Automatic Speech Recognition

Pages Authors Title and Abstract PDF
310--315 Michael Wagner Speech Recognition Experiments With The Syllable Inventory Of Standard Chinese

 

Abstract  This papers explores the possibility of using automatic speech recognition as a front-end to a computer for Chinese charac- ter processing. A speech recognition experiment has been performed on the complete inventory of second-tone words of Standard Chinese. 2 recordings which were made 48 hours after one another were used as test and reference sets. Distances within word clusters are shown to frequently exceed inter-cluster distances for the inventory of 260 syllables.

 

PDF
316--321 Edmund M.-K. Lai & Y. Attikiouzel A Comparison Of Several Coarse Phonetic Classification Schemes Preliminary Results

 

Abstract  The performances of three different coarse phonetic classification schemes for a very large lexicon are measured in terms of the number of unique cohorts, average cohort size and expected cohort size. It was found that the seven—way classification proposed by the authors is good for relatively short words with 3 or 4 phonemes.

 

PDF
322--327 M. O'Kane and J. Gillis Efficient Derivation Of Formant-Like Information From Speech Waveforms

 

Abstract  A computationally fast technique for deriving spectral information including formant information from speech waveforms is presented. Its use in automatic recognition of vowels and some ·consonants in continuous speech is briefly described and the relationship of the spectral display obtained using the new technique to the Bark scale is discussed.

 

PDF
328--333 D. Mead and M. O'Kane A Declarative Approach To Continuous Speech Recognition

 

Abstract  Arguments for constructing continuous speech recognition systems declaratively rather than procedurally are given, and re—construction of a particular continuous speech recognition system, FOPHO, using a declarative approach is described.

 

PDF
334--339 A.L. Harvey, H.J.N. Lee Use Of Speech Recognition Systems In Noisy Environments

 

Abstract  This paper discusses the use of discrete utterance speaker dependent recognisers in acoustically noisy industrial environments, Aspects discussed are training of the recogniser system, spectral subtraction, type of features to be extracted I and the effect of vocabulary size. The author worked with Intel's speech recognition group for six months during the early stages of a voice data entry contract with General Motors.

 

PDF

Speech Research Hardware And Software Part II

Pages Authors Title and Abstract PDF
342--349 J.E. Clark, C.D. Summerfield, R.H. Mannell A High Performance Digital Hardware Synthesiser

 

Abstract  This paper describes the phonetic aspects of the development of a high performance real time digital hardware synthesiser based on five formants in parallel connection, a nasal formant and a zero. The hardware realisation is based on a TMS32O digital signal processing chip, with a Z80 ancilliary processor to manage data communications and control.

 

PDF
348--353 C.D. Summerfield(*) A Review Of Vlsi Structures For The Implementation Of Formant Speech Synthesisers(1)

 

Abstract  This paper reviews the VLSI structures for the implementation of formant speech synthesisers. It concentrates on the application of bit-serial arithmetic structures for the implementation of signal processing functions.

 

PDF
354--359 C.D. Summerfield and J.E. Clark Implementation Of A High Performance Formant Speech Synthesiser(1)

 

Abstract  This paper describes the hardware structure and the real-time software developed to implement the parallel formant speech synthesiser described by Clark, Summerfield and Mannell(1986). Central to the speech synthesiser operation is a single TMS32010 signal processor microprocessor device which performs the synthesis calculation in real-time.

 

PDF
360--365 K. C. Zhou Implementation Of An Lsi Tune Generator Chip For Chinese Speech Synthesis

 

Abstract  A custom LSI chip has been designed and manufactured to meet the requirements of the Chinese Text-to-Speech (CTS) system using a number of pre-settable counters. The Tone Generator of Chinese (TGC) accepts a specified 3-bit tone number code and generates tone pulse trains for the four basic and three sandhied tones. Fabricated in a 5u nMGS process, the manufactured chip contains 1170 transistors and is 3 mm x 2.445 mm.

 

PDF
366--371 B.J. Stone, D. Mead, and M. O'Kane Digital Audio Processor Interface

 

Abstract  The paper describes a computer interface for a digital audio processor, or pulse-code modulation adaptor, which records and plays back digital audio signals on videocassettes. The total cost of the prototype interface (in Multibus form), with digital audio processor and videocassette unit, was below $A2000. Adoption of this all—digital technique for speech database storage and shipping by the Australasian speech research community is advocated.

 

PDF
 
Contact ASSTA: Either email The ASSTA Secretary, or

G.P.O. Box 143, Canberra City, ACT, 2601.

Copyright © ASSTA