No model training required. A news article is labelled authentic or manipulated from an authenticity score that measures how consistent it is with trusted reporting.
Clean text, tokenise, remove stop words, stem.
CRF-based NER pulls persons, locations, organisations + top keywords.
Form a query, crawl trusted/priority sources, read best match.
Compare on four features against the trusted article.
Mean of features vs. threshold → authentic / fake.
Share of the article's persons, locations and organisations that also appear in the trusted news.
Doc2Vec vector cosine similarity between the article and the trusted coverage.
Bengali VADER compound sentiment difference — manipulated news often skews tone.
Source reputation (Media Bias/Fact Check + registered-source check) and news coverage.
On the balanced dataset of 7,000 Bangla articles, the full four-feature model reaches about 91% accuracy for fake-news detection, and is effective on partially manipulated news where style-based methods struggle.
Try the detector →