Skip to main content

Human Computer Interaction

We consistently seek improvements in user interfaces so that systems can be accessed in a wide range of situations and by people with disabilities. IBM Research - Tokyo is well ahead of the industry in this area, particularly in voice technology and accessibility technology for people with visual impairments.

Competency fields

Robust Speech Recognition

Our target is to develop speech recognition technology which recognizes the spoken word beyond human abilities. One approach involves studying noise reduction methods, echo cancellation, and target speech enhancement and detection by using multiple microphones. Another approach starts from a language processing viewpoint, from which we are studying non-fluency modeling and the acquisition of unknown words to improve the accuracy of transcriptions of spontaneous speech. We are also developing an advanced technology for speech comprehension, focusing on the robust retrieval of POIs (points of interest) from speech.

Totally Trainable TTS (T4S)

Our target is to develop a totally trainable Text-to-Speech system. Conventional TTS output is intelligible but unattractive for most users because the characteristics of the original voice have been spoiled. We are using a newly developed stochastic approach from speech recognition, and training the stochastic models by using the prosodic features in speech. This method can produce more natural and human-like synthetic voices, though the applicable domains are still limited. In order to improve the naturalness, we are also studying a technology for the synthesis of emotional speech.

Speech Analytics

Beyond tools for analyzing written text, technologies for analyzing spoken conversations with clients are required for CRM and for compliance checks in various business scenarios. Recognizing natural speech within a dialogue is regarded as a difficult problem, but we have made significant progress in the last few years. In addition to more accurate transcription of conversations with time indexes, we are now developing various application technologies such as audio segmentation and classification, emotion detection, and a turn-taking overview tool.

Next Generation Office Document Editor

There is no existing office editor that really allows multiple authors to edit, share ideas, discuss effectively. Our research is focusing on new collaboration models by integrating Web-based office editors and other collaborating Web applications. To archive this goal, we first created a new Web office editor that can be freely combined with any Web application by using an open standard office document format (OpenDocument Format, ODF) as its programming model. 

Social Computing for Accessibility

Beyond developer participation, user participation is also necessary to improve the actual usability of webpages for diverse users. This project is initially focused on a new Web service using a kind of social network, which can allow disabled users and volunteers to collaborate to improve the accessibility of webpages without changing the original content. The "Social Accessibility Project", a Web service for visually impaired people and volunteers is already available. 

Accessibility research

Text Processing Assistance Tools

Globally integrated companies need to exchange documents written in many languages, and high quality documents are required for exchanging documents with customers. For such situations, we are working on technologies for document critiquing, translation, and sanitizing or masking by using natural language processing technologies.

Global Innovation Outlook

IBM is creating new opportunities for business and society.