Teaching should not be prohibited: how to adapt AI to the Russian cultural context

Generative neural networks are actively used by Russians. These artificial intelligence-based tools help to perform translations, prepare texts on various topics, create images and generate ideas. However, sometimes the content created by domestic neural networks may not take into account the Russian cultural code. Experts and market representatives told Izvestia about the causes of this phenomenon and ways to solve it.

What kind of data is the result?

The principle of neural networks is based on processing large amounts of data and identifying the most likely patterns — these can be combinations of words, symbols, or image elements. After analyzing a huge amount of content and identifying many such patterns, the neural network gets the opportunity to independently create a meaningful result from the user's point of view, be it text or an image.

According to Alexander Gorny, co—founder of AiAcademy and the ShareAI entrepreneurial club, it is the lack of Russian-language learning content that is the reason why domestic neural networks can produce results that do not correspond to Russian cultural values.

— Generative neural networks create images and texts based on materials available for learning, mainly from open Internet sources. At the same time, it is important to understand that the majority of English—language content on the Internet, so it is quite natural that it can influence the content generated by neural networks," the expert explained.

The lack of high-quality Russian-language data for training neural networks is also noted by other market participants.

— The neural network does not create information on its own — it analyzes existing data. If the AI creates an image of strawberries more often on the "strawberry" query, it means that such images prevail on the Internet. The problem is not learning errors, but the lack of high-quality Russian-language learning content. There is also an imbalance in Runet — some of the information is poorly optimized and indexed for AI search," the press service of VisionLabs, a developer of artificial intelligence-based technologies, told Izvestia.

Yandex shares a similar opinion.

— Neural networks learn from a large amount of open data, so they can sometimes make mistakes when generating and processing queries. We are constantly improving the models to make the results more accurate. Users can send examples of inaccurate generations — this helps to improve the learning process, the company said.

Limitation is not a solution

According to experts and market representatives, limiting the use of foreign data for training domestic neural networks will only worsen the problem.

— The higher the quality of responses and the effectiveness of neural networks, the more data was used to train them. Limiting learning to only Russian—language materials will lead to a deterioration in the work of neural networks, which will force users to turn to foreign analogues, where the local context is even less taken into account," comments Gorny.

Market representatives believe that the solution to the problem is to increase the volume of high-quality Russian-language content for training neural networks.

— First, it is necessary to increase the volume of high-quality Russian—language content so that neural networks can rely on accurate data. Secondly, optimize it for AI Search Optimization — correctly label images, use meta tags and structured descriptions. Third, duplicating key information in English will help global AI systems work more correctly with Russian terms. You can't just "reconfigure" the neural network — it will disrupt its operation. The only solution is to close the data gaps — each resource should do this independently," VisionLabs' press service commented.

— The dominance of English-language content creates an asymmetry in the capabilities of large language models. In this regard, Western models gain an advantage — there is simply more content for training neural networks that would take into account their cultural code. It is important for us to digitize existing Russian data and provide developers with free access to it — this will help train neural networks to better understand the cultural context," comments Gorny.

According to the expert, this will help to increase the competitiveness of the Russian language on the Internet and strengthen the position of domestic developers in competition with Western creators of neural networks.

Переведено сервисом «Яндекс Переводчик»

To share: