Digital News Analysis Platform

Text Classification of Digital News Articles Based on Their Bias, Reliability and Category with the Use of Natural Language Processing and Machine Learning

Try it out

Features

Performs semantic analysis of an article to determine it's category, bias and reliability.

  • Detects the category of an article with 98.0% accuracy.
  • Detects the bias of an article with 73.2% accuracy.
  • Detects the reliability of an article with 74.4% accuracy.
  • Aids the users to identify whether an article is valid.

About

The rapid spread of digital misinformation has become so frightful that the World Economic Forum considers it a critical threat to human society [1]. The report by Freedom House [2] shows that this problem undermines democracy and poses a significant threat to the Internet's notion of a liberating technology. Studies have shown that people struggle to identify fake news [3, 4], this highlights the need for an automated tool that classifies the bias and reliability of a digital news article. The aim of this project was to develop a classifier which incorporates Natural Language Processing and Machine Learning, to identify the bias and reliability of a news article to combat the problem of misinformation.

Initially, two web scrappers were developed, the first one was created to retrieve a list of all the listed sources with their bias and reliability from Media Bias/Fact Check (MBFC) [5], and the second was developed to fetch articles from the listed sites on MBFC. Furthermore, the data were pre- processed and formatted sanely for text classification. Several supervised models for text classification used these datasets. The features were extracted from the article title and content using the term frequency-inverse document frequency that indicates how important a word is to a document in a corpus. The ten-fold cross-validation technique was used to evaluate the accuracy of each model. At every iteration, the dataset was split randomly by 70% for training and by 30% for validation. After the cross-validation was completed, the program outputted the mean accuracy and standard deviation of each model. This project mainly used two datasets, the first one consists of 6755 left, 12740 right and 4734 centre bias articles and the second one consists of 15505 reliable, 10973 semi-reliable and 1983 unreliable articles. The unbalanced nature of these datasets was taken into consideration, by calculating the class weights before training.

For all cases, the best model is the Linear Support Vector Machine, which yields the best mean accuracy with the smallest standard deviation. Furthermore for the bias and reliability, the cross-validation score didn't reach a plateau, hence if more data can be collected then the mean accuracy score will increase too.

Tools & Technologies:

References

[1] World Economic Forum [2012], Global Risks 2012, World Economic Forum, Cologny/Geneva,Switzerland. OCLC: 775784625.
[2] Freedom on the Net 2017 [2017], Technical report, Freedom House. URL: https://freedomhouse.org/report-types/freedom-net
[3] Knapp, J. [2017], `What is Fake News?'. URL:https://guides.libraries.psu.edu/c.php?g=620262&p=4319238 (Accessed 2 October 2018).
[4] Stecula, D. [2017], `Fake news might be harder to spot than most people believe'. URL: :https://www.washingtonpost.com/news/monkey-cage/wp/2017/07/10/fake-news-might-beharder-to-spot-than-most-people-believe/ (Accessed 8 October 2018).
[5] Media Bias/Fact Check [2018]. URL: https://mediabiasfactcheck.com/


Analyse

Server Status: -

Disclaimer: The results may be not accurate as this is a very experimental stage. Results may be more accurate in the future.

Contact

Stelios Ioannou Profile Picture

Stelios Ioannou

Computer Science Student
"Driven by innovative tech
and exciting projects."