Kazuhiro Kondo
Japanese page
Affiliation
Yamagata
University
Graduate School of Science and Engineering
Electrical Engineering
Curriculum Vitae
Available
here.
Current research project
proposals
- Active Speech Cancellation Systems for
Cellular Speech
Cellular phones have become quite ubiquitous in most
developed countries. This situation has created new types of problems: we
are often bombarded by speech from people chatting away on their cell phones
from all directions. This speech clearly is not intended for us, only to
the people on the receiving end of the call, and thus is useless after the
microphones in the handsets pick it up. It also creates privacy concerns.
Thus it would be beneficial if we could control the radiation of this
speech into the surrounding space, or at least if we can mitigate it to
some degree.
Active Noise Cancellation has received great interest this decade with
the advancement of Digital Signal Processors. There have been some successful
applications of this technology, e.g. control of fan noise radiation through
ducts, jet engine radiation control, road noise control in automobiles
to name just a few. We considered applying similar techniques to the control
of speech.
It is generally considered that global control of
non-periodic/unpredictable noise radiation is not possible except for a
gzone of silenceh around an error-detection microphone. Since speech is
quasi-periodic, we will attempt to ggloballyh control radiation of speech,
at least to some level.
- Data hiding for speech and audio signals
With the increase of network bandwidth, vast amount
of digital speech, music and video contents are flowing within the
network. However, a large proportion of these contents are illegal,
violating copyrights. Accordingly, we are attempting to control these
illegal contents through copyright information embedded undetectably into
the digital contents. These information is called digital watermark due to
the analogy with classic watermarking. First-generation digital watermarks
are now available for portable digital music players. However, these
watermarks are still not robust to compression, noise, channel
transmission, and human alterations. Thus, the goal is to propose a robust
digital watermarking techniques for speech and audio signals.
- Super-directional speakers and audio spotlights
Parametric speakers can deliver audible sound in a very narrow beam by
modulating an ultrasonic carrier wave with audible sound. Audible sound
is gradually demodulated when the modulated signals travels through the
air due to the non-linearities caused when the modulated ultrasonic wave
is played out at extremely large levels. We use this parametric speaker
to convey speech information to only the intended listener by tracking
his head position, and steering the beam towards his head.
We can also generate audible sound in a limited area, not a beam, by splitting
the modulated signal into multiple sub-bands, and generating these sub-bands
from separate parametric speakers. The audible sound is generated at a
position where all beams intersect.
- Speech intelligibility testing methods and its prediction for Japanese
Japanese speech intelligibility is commonly tested
through monosyllable hearing tests. However, these test require the subjects to choose the correct
syllable among 100 valid Japanese syllables, which is not easy and may
require training. It has long been known that these tests are unstable,
subject to fatigue, individual variations, noise and channel characteristics. Accordingly, tests have been developed in
English and a few Roman languages which try to overcome these shortcomings.
In Dynamic Rhyme Tests (DRT), one such test, a subject is provided with
a word, after which the subject attempts to select the correct word from
a pair of candidates which only differ by one initial phone. DRT is known
to be stable and robust. The test also is arranged to test specific phone
characteristics in a systematic and efficient way. We have defined a similar
testing methodology for Japanese.
However, DRT testing still requires a panel of listeners to evaluate the
test speech contents. This can still be very time-consuming and expensive.
Accordingly, we attempted to estimate the intelligibility of test speech
from the acoustic properties of the test speech only, without human subjects.
We were able to obtain a fairly reasonable estimation by mapping the difference
between the reference (clean) speech and the test speech (degradation).
We then attempted to estimate the intelligibility without the clean speech
by using deep learning to estimate the clean speech, and mapping the difference
between the estimated clean speech and the test speech, with almost the
same accuracy as was possible with clean reference speech.
Contact Info
Snail
mail:
Graduate School of Science and Engineering
Yamagata University
4-3-16 Jonan, Yonezawa,
Yamagata 992-8510 JAPAN
email:
kkondo_at_yz.yamagata-u.ac.jp
WWW
http://spandaudiolab.yz.yamagata-u.ac.jp/~kkondo
Tel./Fax
+81-238-26-3312
Questions? Comments?
email
me.
Last Revision: April 7, 2022