MULTIVERSE: Fighting fakes

Creating fakes of varying quality is a “cute” craze of many news outlets. I hope that one day children will be taught to detect fakes from the first grade.

So far, the neural networks are trained to do that (and quite well).

Main hypothesis

The MULTIVERSE paper tests a rather intuitive hypothesis:

True news (not fake) will appear in different forms in different languages and platforms with different views, while the nature of the facts will be the same (even if highlighted from different sides)
Fake, respectively, will not be able to fully satisfy all these requirements

Fake detection algorithm

Extraction of text from the researched news article and its translation into foreign languages (English, French, German, Spanish, Russian)
Search for similar articles in foreign languages (by translated title)
Comparison of information from the found articles with the article under study, making a “verdict”

How is the “verdict” made?

During the experiments, the best result was shown by a combination of the following features:

– cosine distance between the text vectors of the original and alternative news extracted by the multilingual mBERT model

– reliability of the news source (calculated using AlexaRank)

The “signal” obtained from these two features is added to classifiers based on large pre-trained models ( BERT, RobertTa)

If the content of the majority of found articles from alternative sources corresponds to the content of the researched news, then the news is recognized as true. And vice versa.

Results

The combination of the proposed approach using large models gives a significant improvement over previous results on fake news datasets such as: