Skip to main content
Advertisement
Live broadcast

Russian researchers have learned how to identify and eliminate errors in the generation of AI responses

0
Озвучить текст
Select important
On
Off

Researchers from the Artificial Intelligence (AI) laboratory at T-Bank AI Research have developed a new way to interpret and manage language models based on the SAE Match method. The discovery allows you to directly influence errors and hallucinations in a large language model during text generation. This was reported in the scientific laboratory of T-Bank.

Language models such as ChatGPT build their responses based on a multi-layered architecture, where each layer processes information, "passing" it on. Until recently, researchers could only record which features (or concepts) appear in these layers, without understanding exactly how they evolve.

The new method allows you to get information about where the model got the data from — from the context of the request or internal data, and control its behavior, preventing the issuance of incorrect responses. It does not require additional computing resources, and any company can use it. This allows you to directly correct errors in a specific location, which will avoid high costs for further training of models.

Experiments have shown that it is possible to enhance or suppress certain features at different stages of processing, thereby changing the style, theme, or tone of the generated text. This is especially important for creating safe and ethical AI—based solutions, for example, to filter unwanted topics in chatbots without retraining them.

The results of the study were presented at the International Conference on Machine Learning (ICML), which was held in Vancouver on July 13-19. This is one of the main conferences in the field of machine learning and artificial intelligence.

Переведено сервисом «Яндекс Переводчик»

Live broadcast