Linguistic Evaluation of Machine-Generated ‘Real’ and ‘Fake’ News

In this project, we create datasets of machine-generated “real news” and machine-generated “fake news” by using GPT-Neo 1.3B to perform text generation on input from the LIAR dataset including 12.8 K short stataments.

Two main goals of the project:

1) to assess whether this approach is an effective way to create compa- rable machine-generated real and fake news, and 2) to ascertain if there are any detectable stylometric or linguistic differences between real and fake news generated in this way.

Read more on the report or GitHub