Method · Bangla Fake News Detection

Preprocess

Clean text, tokenise, remove stop words, stem.

Extract knowledge

CRF-based NER pulls persons, locations, organisations + top keywords.

Search trusted news

Form a query, crawl trusted/priority sources, read best match.

Match consistency

Compare on four features against the trusted article.

Score & label

Mean of features vs. threshold → authentic / fake.

Authenticity = ( x₁ + x₂ + x₃ + x₄ ) / 4 → label by threshold

x₁

Knowledge consistency

Share of the article's persons, locations and organisations that also appear in the trusted news.

x₂

Semantic similarity

Doc2Vec vector cosine similarity between the article and the trusted coverage.

x₃

Sentiment consistency

Bengali VADER compound sentiment difference — manipulated news often skews tone.

x₄

Source credibility

Source reputation (Media Bias/Fact Check + registered-source check) and news coverage.

Reported performance

On the balanced dataset of 7,000 Bangla articles, the full four-feature model reaches about 91% accuracy for fake-news detection, and is effective on partially manipulated news where style-based methods struggle.

Try the detector →

The knowledge-driven pipeline