The rapid spread of digital misinformation has become so frightful that the World Economic Forum considers it a critical threat to human society . The report by Freedom House  shows that this problem undermines democracy and poses a significant threat to the Internet's notion of a liberating technology. Studies have shown that people struggle to identify fake news [3, 4], this highlights the need for an automated tool that classifies the bias and reliability of a digital news article. The aim of this project was to develop a classifier which incorporates Natural Language Processing and Machine Learning, to identify the bias and reliability of a news article to combat the problem of misinformation.
Initially, two web scrappers were developed, the first one was created to retrieve a list of all the listed sources with their bias and reliability from Media Bias/Fact Check (MBFC) , and the second was developed to fetch articles from the listed sites on MBFC. Furthermore, the data were pre- processed and formatted sanely for text classification. Several supervised models for text classification used these datasets. The features were extracted from the article title and content using the term frequency-inverse document frequency that indicates how important a word is to a document in a corpus. The ten-fold cross-validation technique was used to evaluate the accuracy of each model. At every iteration, the dataset was split randomly by 70% for training and by 30% for validation. After the cross-validation was completed, the program outputted the mean accuracy and standard deviation of each model. This project mainly used two datasets, the first one consists of 6755 left, 12740 right and 4734 centre bias articles and the second one consists of 15505 reliable, 10973 semi-reliable and 1983 unreliable articles. The unbalanced nature of these datasets was taken into consideration, by calculating the class weights before training.
For all cases, the best model is the Linear Support Vector Machine, which yields the best mean accuracy with the smallest standard deviation. Furthermore for the bias and reliability, the cross-validation score didn't reach a plateau, hence if more data can be collected then the mean accuracy score will increase too.
Tools & Technologies:
 World Economic Forum , Global Risks 2012, World Economic Forum, Cologny/Geneva,Switzerland. OCLC: 775784625.
 Freedom on the Net 2017 , Technical report, Freedom House. URL: https://freedomhouse.org/report-types/freedom-net
 Knapp, J. , `What is Fake News?'. URL:https://guides.libraries.psu.edu/c.php?g=620262&p=4319238 (Accessed 2 October 2018).
 Stecula, D. , `Fake news might be harder to spot than most people believe'. URL: :https://www.washingtonpost.com/news/monkey-cage/wp/2017/07/10/fake-news-might-beharder-to-spot-than-most-people-believe/ (Accessed 8 October 2018).
 Media Bias/Fact Check . URL: https://mediabiasfactcheck.com/
Server Status: -
Disclaimer: The results may be not accurate as this is a very experimental stage. Results may be more accurate in the future.