Estonian Language Institute tested artificial intelligence: Gemini scored 89% in word meaning assessment

A test by the Estonian Language Institute showed that large language models can mostly distinguish between neutral, colloquial, and derogatory uses of words. Google's Gemini performed best, with assessments suitable for dictionary inclusion in nearly 89 percent of cases. Lydia Risberg presented the results.

2026-06-01T05:26:28.019Z Technology

The Estonian Language Institute conducted a test to determine how well large language models can distinguish between different uses of words—neutral, colloquial, and derogatory. The results proved surprisingly promising.

Gemini emerged as the most successful language model, with assessments suitable for use in dictionaries in approximately 89 percent of cases. This shows that artificial intelligence can detect certain linguistic nuances with remarkable accuracy. The results were presented by Lydia Risberg, an employee of the Estonian Language Institute, in the institute's language tweet series.

The test results are particularly significant from a lexicography perspective—linguists and dictionary compilers are increasingly looking for ways to use artificial intelligence tools to support language description. If language models can reliably distinguish word usage registers, this could accelerate the compilation and updating of lexical databases.

At the same time, it should be noted that even 89 percent accuracy means that every tenth assessment can be wrong—which sets limits on how far artificial intelligence can be relied upon in linguistic decision-making. Estonian language peculiarities and smaller training datasets compared to major world languages remain a challenge for all language models.

Open in app →