Kazuhiro Kondo

Japanese page

Affiliation

Yamagata University
Graduate School of Science and Engineering
Electrical Engineering

Curriculum Vitae

Available here.

Current research project proposals

Active Speech Cancellation Systems for Cellular Speech
Cellular phones have become quite ubiquitous in most developed countries. This situation has created new types of problems: we are often bombarded by speech from people chatting away on their cell phones from all directions. This speech clearly is not intended for us, only to the people on the receiving end of the call, and thus is useless after the microphones in the handsets pick it up. It also creates privacy concerns. Thus it would be beneficial if we could control the radiation of this speech into the surrounding space, or at least if we can mitigate it to some degree.
Active Noise Cancellation has received great interest this decade with the advancement of Digital Signal Processors. There have been some successful applications of this technology, e.g. control of fan noise radiation through ducts, jet engine radiation control, road noise control in automobiles to name just a few. We considered applying similar techniques to the control of speech.
It is generally considered that global control of non-periodic/unpredictable noise radiation is not possible except for a “zone of silence” around an error-detection microphone. Since speech is quasi-periodic, we will attempt to “globally” control radiation of speech, at least to some level.
Data hiding for speech and audio signals
With the increase of network bandwidth, vast amount of digital speech, music and video contents are flowing within the network. However, a large proportion of these contents are illegal, violating copyrights. Accordingly, we are attempting to control these illegal contents through copyright information embedded undetectably into the digital contents. These information is called digital watermark due to the analogy with classic watermarking. First-generation digital watermarks are now available for portable digital music players. However, these watermarks are still not robust to compression, noise, channel transmission, and human alterations. Thus, the goal is to propose a robust digital watermarking techniques for speech and audio signals.
Super-directional speakers and audio spotlights
Parametric speakers can deliver audible sound in a very narrow beam by modulating an ultrasonic carrier wave with audible sound. Audible sound is gradually demodulated when the modulated signals travels through the air due to the non-linearities caused when the modulated ultrasonic wave is played out at extremely large levels. We use this parametric speaker to convey speech information to only the intended listener by tracking his head position, and steering the beam towards his head.
We can also generate audible sound in a limited area, not a beam, by splitting the modulated signal into multiple sub-bands, and generating these sub-bands from separate parametric speakers. The audible sound is generated at a position where all beams intersect.
Speech intelligibility testing methods and its prediction for Japanese
Japanese speech intelligibility is commonly tested through monosyllable hearing tests. However, these test require the subjects to choose the correct syllable among 100 valid Japanese syllables, which is not easy and may require training. It has long been known that these tests are unstable, subject to fatigue, individual variations, noise and channel characteristics. Accordingly, tests have been developed in English and a few Roman languages which try to overcome these shortcomings. In Dynamic Rhyme Tests (DRT), one such test, a subject is provided with a word, after which the subject attempts to select the correct word from a pair of candidates which only differ by one initial phone. DRT is known to be stable and robust. The test also is arranged to test specific phone characteristics in a systematic and efficient way. We have defined a similar testing methodology for Japanese.
However, DRT testing still requires a panel of listeners to evaluate the test speech contents. This can still be very time-consuming and expensive. Accordingly, we attempted to estimate the intelligibility of test speech from the acoustic properties of the test speech only, without human subjects. We were able to obtain a fairly reasonable estimation by mapping the difference between the reference (clean) speech and the test speech (degradation). We then attempted to estimate the intelligibility without the clean speech by using deep learning to estimate the clean speech, and mapping the difference between the estimated clean speech and the test speech, with almost the same accuracy as was possible with clean reference speech.

Contact Info

Snail mail:

Graduate School of Science and Engineering
Yamagata University
4-3-16 Jonan, Yonezawa, Yamagata 992-8510 JAPAN

email:

kkondo_at_yz.yamagata-u.ac.jp

WWW

http://spandaudiolab.yz.yamagata-u.ac.jp/~kkondo

Tel./Fax

+81-238-26-3312

Questions? Comments?