Russian scientists have created a new method for understanding AI solutions

Thu, 04/10/2025 - 10:00

Photo: IZVESTIA/Pavel Volkov

Озвучить текст

Select important

Off

Scientists from the T-Bank AI Research Laboratory of Artificial intelligence (AI) have developed a method that allows you to determine at what point the AI begins to give incorrect or undesirable answers, and correct them.

According to T-Technologies, the results of the study are available in open sources. They will also be announced at the International Presentation Training Conference (ICRL), which will be held in Singapore from April 24 to 28.

"Our task is not only to make the models better, but also to understand how they work from the inside. Imagine a smart voice assistant that recommends movies. And suddenly he starts giving out unreliable or even rude information. Now changing his behavior is a difficult task, because existing methods do not give us a clear understanding of exactly where the problem arose. Our research in the field of AI interpretability is aimed at making such failures visible and quickly fixed without costly retraining of the model," said Nikita Balagansky, head of the LLM Foundations scientific group, T-Bank AI Research.

The SAE Match method is aimed at making the work of AI more transparent and understandable: a person will be able to track how the model processes information and why it makes certain decisions. This allows, in particular, to control the text generation process, rather than simply imposing external constraints or training the model on new data, which requires large computational resources.

Scientists believe that this scientific discovery will play an important role in the implementation of AI in critical areas such as medicine, finance and security.

Переведено сервисом «Яндекс Переводчик»

To share: