Creating fakes of varying quality is a “cute” craze of many news outlets. I hope that one day children will be taught to detect fakes from the first grade.

So far, the neural networks are trained to do that (and quite well).

Main hypothesis

The MULTIVERSE paper tests a rather intuitive hypothesis:

Fake detection algorithm

How is the “verdict” made?

During the experiments, the best result was shown by a combination of the following features:

– cosine distance between the text vectors of the original and alternative news extracted by the multilingual mBERT model

– reliability of the news source (calculated using AlexaRank)

The “signal” obtained from these two features is added to classifiers based on large pre-trained models ( BERT, RobertTa)

If the content of the majority of found articles from alternative sources corresponds to the content of the researched news, then the news is recognized as true. And vice versa.

Results

The combination of the proposed approach using large models gives a significant improvement over previous results on fake news datasets such as: