NirVorJogyoKnowledge-Driven Detection
How it works

The knowledge-driven pipeline

No model training required. A news article is labelled authentic or manipulated from an authenticity score that measures how consistent it is with trusted reporting.

01

Preprocess

Clean text, tokenise, remove stop words, stem.

02

Extract knowledge

CRF-based NER pulls persons, locations, organisations + top keywords.

03

Search trusted news

Form a query, crawl trusted/priority sources, read best match.

04

Match consistency

Compare on four features against the trusted article.

05

Score & label

Mean of features vs. threshold → authentic / fake.

Authenticity = ( x₁ + x₂ + x₃ + x₄ ) / 4  →  label by threshold
x₁

Knowledge consistency

Share of the article's persons, locations and organisations that also appear in the trusted news.

x₂

Semantic similarity

Doc2Vec vector cosine similarity between the article and the trusted coverage.

x₃

Sentiment consistency

Bengali VADER compound sentiment difference — manipulated news often skews tone.

x₄

Source credibility

Source reputation (Media Bias/Fact Check + registered-source check) and news coverage.

Reported performance

On the balanced dataset of 7,000 Bangla articles, the full four-feature model reaches about 91% accuracy for fake-news detection, and is effective on partially manipulated news where style-based methods struggle.

Try the detector →