KEY TAKEAWAYS
  • To develop AI that can understand, act and communicate like a human, NTT is embedding the values of ‘tolerance’ and ‘sincerity’ into its AI systems.
  • Evolving the sensing and communicating abilities of AI, i.e. being able to see, listen and speak, into higher-order thinking – synthesis and contextualization of information beyond simple recollection of facts – requires development of a number of supporting technologies.
  • Since 2015, NTT has been working on ‘angle-free object search’ technology, which makes it possible to perform recognition and search using images taken by smartphones and cameras, in place of the human eye – i.e. recognize objects, even with limited visual data, with high accuracy. 
  • Creating natural conversation is one of the key challenges in achieving truly human-like AI. NTT was the first in Japan to adopt a technology known as WFST (Weighted Finite State Transducer), making it possible to recognize the closest word from a range of 10 million, or 100 times as many as before. 
  • With Totto, NTT’s humanoid robot, NTT is using dialogue processing technology, high-accuracy speech recognition and highly-realistic speech synthesis to create natural conversation with human personality.
  • NTT is also working on a simulation environment to search for measures to avoid traffic problems, using groundbreaking spatio-temporal multidimensional collective data analysis – i.e. using AI to predict the movement of people and objects in a given area.
  • Challenges persist – particularly in keeping costs down for training data and hardware, and in making the inner workings of AI decisions clear and discoverable. Transfer learning, new infrastructures and whitebox AI are set to overcome these challenges.

For all the advances we have seen in big data analysis, machine learning and deep learning, we still haven’t achieved what has long been the holy grail of artificial intelligence researchers: creating AI that can truly think like a human. What will it take to get us to this next level?

At the heart of our research at NTT is the person behind the technology – not the tools themselves, but the human beings that use them. Perhaps in no other field is this more apposite than artificial intelligence. Instilling human-like qualities and abilities into AI is, we believe, the key to make AI capable of higher-order thinking.

Commercialization is already underway in many areas, and certain functions such as media processing have achieved results that actually surpass human abilities. However, if AI as intelligent as humans is the goal, that goal has not yet been reached. For the next generation of AI technology, we are focusing on developing the ability to understand human values – in other words how to make judgments regarding the particular context of an individual’s thoughts or scenario, going beyond simple recollection of facts and into creative, empathetic analysis.

Tolerance and sincerity

By developing AI that understands diverse human values, and considers those values itself, we are expanding the range of uses of AI. Today’s technology makes it possible to translate simple sentences such as with customer support chatbots, but currently these interactions are only able to output answers depending on the given information. If problems can be solved based on the experience of AI itself, then it will be capable of communication with humans on a deeper level, expanding the choice of actions.

And if the AI can predict the values of the other party and respond in a way that reflects their own values, then the AI will be capable of richer and more meaningful conversations. This ability is becoming important to introduce AI to various fields, such as counseling and facilities for the elderly.

Going beyond fact recollection means being able to synthesize and analyze multiple viewpoints, so AI based on values must have the flexibility to accept a variety of ideas – to be ‘tolerant’. To gain trust from human users, the AI must also act in a consistent manner and one which is respectful of the particular viewpoint of the individual – to be ‘sincere’.

In order for AI to think on a deeper level, NTT will continue to pursue AI research with a focus on incorporating these values – to help recognize diversity of thought (tolerance) and to recognize diversity and behave both flexibly and consistently (sincerity).

From sensing to thinking

Combining the sensing and communicating abilities of AI – being able to see, listen and speak – into the ability to think analytically, and realize higher-order rational thinking, requires development of a number of supporting technologies.

How do we make AI really see? Since 2015, NTT has been working on angle-free object search technology, which makes it possible to recognize objects, even with limited visual data, with high accuracy. 

To be able to perform recognition and search using images taken by smartphones and cameras, in place of the human eye, AI tools must be able to recognize objects, whether rigid or not, and viewed from any angle. But non-rigid objects – such as soft packaging products and clothing – have different patterns of ‘deformation’ than rigid 3D objects, and therefore the appearance of the images changes greatly, resulting in lower recognition accuracy.

Conventionally, image recognition technologies (including those based on deep learning) require the preparation of many database images in order to recognize ‘deformed’ (non-rigid, non-recognizable) objects, resulting in an expensive and labor-intensive process. But with our groundbreaking image matching technology, we can identify objects with just 1/10th of the number of images needed by conventional methods.

Angle-free object search technology is already being used in public spaces in Japan to give foreign visitors directions or tourism information in their native language, by pointing their smartphone camera at a signpost, display board, or individual product. This technology has the potential to revolutionize industries such as retail.

NTT has been researching speech recognition for more than half a century, and was the first to adopt a technology known as WFST (Weighted Finite State Transducer), making it possible to recognize the closest word from a range of 10 million – 100 times as many as before.

To hone AI’s ability to listen and speak, NTT is conducting case study research on speech recognition and spoken dialogue technology, while engaged in discussions with Professor Noriko Arai of the National Institute of Informatics, as well as Professor Hiroshi Ishiguro of Osaka University, a leading android researcher.

One of our projects, for instance, is focused on technologies that will enable AI systems to not only hold natural conversations with humans, but also to accurately ‘read the real intent’ behind someone’s words. To help us do this, we created an android helper called ‘Totto’, named after Japanese actress and TV personality, Tetsuko Kuroyanagi. Using dialogue processing technology that realizes natural conversation with human personality, in addition to high-accuracy speech recognition and highly-realistic speech synthesis, users can communicate as if they were talking with Ms. Kuroyanagi herself.

NTT has been researching speech recognition for more than half a century. In the beginning, this technology could only recognize extremely stilted speech, with a limited vocabulary. NTT was the first in Japan to adopt a technology known as WFST (Weighted Finite State Transducer), making it possible to recognize the closest word from a range of 10 million, or 100 times as many as before. 

Thanks to Totto, we’re able to create natural conversations, even in noisy public areas: by utilizing deep learning technology, NTT won first place in an international competition for the accuracy of voice recognition in noisy public places using mobile terminals.

Finally, there is our development of spatio-temporal multidimensional collective data analysis – i.e. using AI to predict the movement of people and objects in a given area.

Thanks to improvements in sensing technology and the rapid spread of smartphone applications and devices making up the Internet of Things (IoT), it is becoming possible to measure diverse types of data such as the movement of vehicles and things, human behavior, and environmental changes from just about anywhere. However, it is extremely difficult to extract, and appropriately apply, significant information lying latent in combinations of such massive amounts of data.

NTT is researching and developing AI technologies targeting IoT, aiming to obtain information on everything under the sun – things, people, the environment – from data collected and stored in real space and cyberspace, perform instantaneous analysis, and feed the results back to the real world.

NTT is researching and developing AI technologies targeting IoT, aiming to obtain information on everything under the sun – things, people, the environment – from diverse types of data collected and stored in real space and cyberspace, perform instantaneous event detection, analysis, and prediction based on that information, and feed the results back to the real world. Spatio-temporal multidimensional collective data analysis is vital for this, and is needed to model the space and time relationships among data with multiple dimensions and attributes. In this way, the place and time of a future event can be predicted.

This technology is being considered for introduction to on-demand buses and car sharing, which operate by predicting traffic demand. There are also attempts being made to develop this technology in order to relieve congestion and traffic jams in urban areas. Specifically, NTT is working on a simulation environment using information on the flow of people and vehicles, and searching for guidance measures to avoid traffic problems.

Three challenges – and how we overcome them

These projects, whilst different, face the same three key issues that have to be solved if we are to take artificial intelligence to the next level.

Reducing the amount of training data

First, we have to find ways to reduce the amount of training data needed to develop AI systems – otherwise it will become prohibitively expensive to develop the specialized systems we need.

Generally when developing AI, it is necessary to prepare a large amount of training data. As the fields of application increase and subdivide, it is necessary to collect, aggregate and analyze data for each field of application, so cost increases dramatically. One way we could cut back on costs is to limit the amount of data we need to collect and analyze, by using transfer learning that enables data gathered for one AI system to be used by another. Using this method can improve accuracy even if the volume of sample data is limited.

Whiteboxing AI

Second, we need to make AI processes more transparent and understandable to humans. Currently, ‘blackbox AI’ – where the inner workings of an AI system are opaque and unexaminable – means that AI-generated decisions are often mistrusted. The common approach is for AI to learn from a huge volume of data in a mechanized manner in order to derive results, but, in these cases where the processing inside the AI is blackboxed, it is difficult for humans to understand the learning process – and difficult for users to trust the results.

This is why researchers are working on ‘whitebox AI’ that humans can properly interrogate to learn why a particular outcome has been reached. Whiteboxing shows the full process leading to output results. In order to achieve this, in 2016 the Defense Advanced Research Projects Agency (DARPA) of the United States Department of Defense launched XAI, a research and development investment project to realize ‘explainable AI.’

High-speed, high-performance hardware

Third, there’s a need for high-performance hardware. As it is necessary to both execute a large amount of calculations in complex combinations and execute them in parallel, the performance of current general-purpose CPUs and GPUs will not be able to keep up (or huge computing resources will be necessary). For this reason, in the future it will be important to develop frameworks and architectures specialized for AI computation.

///

Artificial intelligence is an exciting field and one with the potential to transform so many areas of our lives. Together with its partners, NTT’s research team is developing cutting-edge technologies that will create more seamless, human-like interactions between people and machines, paving the way for the evolution of AI and the next step on our journey to a more egalitarian, tech-enabled, prosperous society.

 

Originally published by NTT R&D.