ROUGE (metric)

Updated on Feb 24, 2026

Edit

Comment

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.

Metrics

The following five evaluation metrics are available.

ROUGE-N: N-gram based co-occurrence statistics.

ROUGE-L: Longest Common Subsequence (LCS) based statistics. Longest common subsequence problem takes into account sentence level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically.

ROUGE-W: Weighted LCS-based statistics that favors consecutive LCSes .

ROUGE-S: Skip-bigram based co-occurrence statistics. Skip-bigram is any pair of words in their sentence order.

ROUGE-SU: Skip-bigram plus unigram-based co-occurrence statistics.

ROUGE can be downloaded from berouge download link.

References

ROUGE (metric) Wikipedia

(Text) CC BY-SA