- Статьи
- Science and technology
- The investigation was introduced: "AI detective" will deal with a difficult task with a lack of data
The investigation was introduced: "AI detective" will deal with a difficult task with a lack of data
Russian experts have developed a method that allows artificial intelligence systems to draw conclusions and make decisions when there is a lack of data. The method proposed by scientists from Kazan helps to extract unique information from the array as quickly as possible. In particular, this method has already been used to assess the quality of drinking water based on a small sample of children's blood test results. According to experts, this made it possible to take measures to improve water quality. You can find out how the Russian innovation works in our article.
Why AI makes mistakes when there is a lack of data
When there is a lack of information, computational models based on artificial intelligence often produce unreliable results. This is because they are not "thinking" critically, but are looking for the statistically most likely answers. When there is not enough data, neural networks give up.
Solving this problem, scientists from Kazan National Research Technical University named after A.N. Tupolev — KAI have developed a new way of constructing such models, which is based on the introduction of "detective" methods into the work of AI.
— In practice, there are tasks where obtaining information is associated with physical, legal and other restrictions. For example, this applies to personal data," said Svetlana Novikova, one of the developers, professor of the Department of Applied Mathematics and Computer Science at KNRTU-KAI. — Also, some materials are difficult to obtain due to their inaccessibility. In particular, from the depths of the ocean or from the surface of other planets. In addition, it is often difficult to build models due to a lack of understanding of relationships, lack of details or measurements.
In such cases, she explained, AI fills in the gaps by generating new data based on what is available. The fewer they are, the higher the probability of distortion of the final conclusions. Therefore, when there is a shortage of information, the proposed approach is aimed at making the system act like a detective who asks questions and builds logical chains, revealing hidden connections.

According to the scientist, the principle of operation of the model is based on the idea of resonance. When information is received by the system, it compares it with the templates that are stored in memory. If they are similar, then there is a "consonance", and the received data belongs to the same class. If there is no similarity, then the new information itself becomes a template. The improvement made by KNRTU-KAI specialists is in the way of assessing the uniqueness of the data.
— As a rule, expanding the sample is desirable and useful to improve the accuracy of the model. However, in the case of rare information, adding new information to the set must be carefully weighed, since even a single measurement can lead to an imbalance of the model. The decision on whether to include them in the system is made through an additional expert assessment," Svetlana Novikova said.
For these purposes, the scientist explained, the system, distributing information into classes, uses simultaneously many features that are assigned different "weights". If the required "weight" is not gained, it means that the "beginner" is knocked out of the patterns stored in the model's memory.
The proposed approach allows not only to increase the accuracy of the model, but also allows a person to trace the process of AI decision-making. This is important to increase trust in such systems.
What is the next step in the development of artificial intelligence
— New algorithms have demonstrated their effectiveness in solving practical problems. For example, they were used to analyze the content of zinc-containing compounds in the blood of children aged one to 14 years in Kazan. The purpose of the study was to establish the relationship between the place of residence, the quality of drinking water and possible health threats," Svetlana Novikova said.
She clarified that zinc can enter water supply systems due to contamination of drinking water sources by industrial wastewater or prolonged contact of water with old galvanized pipes. The metal content above the permissible standards poses a danger to human health.
According to the expert, a total of 240 samples with anonymized data were examined. At the same time, despite the small sample, the proposed method made it possible to build accurate models and identify the relationship between the zinc content in children's bodies and their place of residence.
— The problem of data scarcity is one of the most acute in modern applied analytics. There are areas where there are not always "millions of observations." For example, medical research, ecology, industry, space, rare events, personal data," Anna Pyataeva, head of the Artificial Intelligence Center at Siberian Federal University, told Izvestia. — As soon as the sample becomes sparse, classical machine learning approaches "crumble". And this can be seen in any industry. In particular, this gap is typical for popular chatbots with artificial intelligence.
She added that currently, research laboratories and specialized teams are working with limited data sets. Making such tools widely available will accelerate the development of artificial intelligence algorithms. In fact, this is the next step — the transition from "models for everything" to models that can work where there is objectively little data. And the market for such solutions is already being formed.
In turn, Yuri Vizilter, scientific director of the MIPT Institute of Artificial Intelligence, noted that deep neural networks are beginning to learn and work steadily where there are at least tens of thousands of examples. Better, of course, even more — hundreds of thousands and millions of examples. Therefore, in the field of a small number of examples, there is an almost important gap that special methods should fill.
— The reliability of models strongly depends on the quality of the source data and the correctness of the models themselves. With small samples, the risk of overfitting increases — when AI is focused not on analyzing and summarizing information, but on "memorizing" it. Also, small samples are often not representative enough," said Alexander Nesterov, senior researcher at the Multimodal AI Architectures group at the Strong AI in Medicine Laboratory at the AIRI Institute.
According to him, AI tools for analyzing small samples are especially in demand when working with corporate data.
Переведено сервисом «Яндекс Переводчик»