Skip to content

Metrics API

evret.metrics.base

Base interface for retrieval evaluation metrics.

Metric

Bases: ABC

Base class for metrics evaluated at a top-k cutoff.

For query i with retrieved labels R_i and expected labels G_i, each metric computes a per-query score at k and then averages:

score = (1 / |Q|) * sum(metric_i(R_i[:k], G_i))

name property

Metric display name including cutoff.

score(retrieved_by_query, expected_by_query)

Score a batch of queries by averaging per-query metric values.

score_query(retrieved_doc_ids, expected_answers) abstractmethod

Score a single query.

top_k(retrieved_doc_ids)

Return the retrieval list trimmed to metric cutoff.

evret.metrics.hit_rate

Hit Rate metric implementation.

HitRate

Bases: Metric

Binary top-k relevance presence metric.

Formula: HitRate@k = (1 / |Q|) * sum(1[relevant_i ∩ retrieved_i[:k] != ∅])

evret.metrics.recall

Recall@K metric implementation.

Recall

Bases: Metric

Coverage metric over expected answeruments.

Formula: Recall@k = |relevant ∩ retrieved[:k]| / |relevant|

evret.metrics.precision

Precision@K metric implementation.

Precision

Bases: Metric

Purity metric over retrieved top-k documents.

Formula: Precision@k = |relevant ∩ retrieved[:k]| / k

evret.metrics.mrr

MRR@K metric implementation.

MRR

Bases: Metric

Mean Reciprocal Rank query metric at top-k.

Formula: RR@k = 1 / rank_first_relevant if a hit exists in top-k, else 0.

evret.metrics.ndcg

nDCG@K metric implementation.

NDCG

Bases: Metric

Normalized Discounted Cumulative Gain at top-k.

evret.metrics.err

ERR@K metric implementation.

ERR

Bases: Metric

Expected Reciprocal Rank with cascade model for graded relevance.

Formula: ERR@k = Σ(i=1 to k) [ (1/i) × R(i) × Π(j=1 to i-1)(1 - R(j)) ] where R(i) = (2^grade - 1) / 2^max_grade

name property

Metric display name including cutoff and max_grade.

__init__(k, max_grade=4)

Initialize ERR metric.

Parameters:

Name Type Description Default
k int

Rank cutoff position.

required
max_grade int

Maximum relevance grade (default: 4). Grades should be in range [0, max_grade].

4

score_query(retrieved_doc_ids, expected_answers)

Score a single query using ERR.

Parameters:

Name Type Description Default
retrieved_doc_ids Sequence[str]

Ordered list of retrieved document IDs.

required
expected_answers Collection[str] | dict[str, int]

Either a set/list of expected answer IDs (binary relevance) or a dict mapping doc_id → relevance grade (0 to max_grade).

required

Returns:

Type Description
float

ERR score in range [0, 1].

evret.metrics.rbp

RBP@K metric implementation.

RBP

Bases: Metric

Rank-Biased Precision with geometric persistence weighting.

Formula: RBP(p) = (1 - p) × Σ(i=1 to k) [ p^(i-1) × rel(i) ]

expected_search_depth property

Expected number of positions a user examines.

Expected depth = 1 / (1 - p)

name property

Metric display name including cutoff and persistence.

__init__(k, p=0.8)

Initialize RBP metric.

Parameters:

Name Type Description Default
k int

Rank cutoff position.

required
p float

Persistence parameter (0 < p < 1). Default is 0.8. Higher p = more patient user, examines deeper. Lower p = impatient user, focuses on top ranks.

0.8

Raises:

Type Description
ValueError

If p is not in the valid range (0, 1).

compute_residual(num_retrieved)

Compute residual for incomplete rankings.

The residual represents the upper bound contribution from unseen ranks (k+1, k+2, ...) if all were relevant.

Residual = p^k

Parameters:

Name Type Description Default
num_retrieved int

Number of documents actually retrieved.

required

Returns:

Type Description
float

Residual value (upper bound on unseen contribution).

score_query(retrieved_doc_ids, expected_answers)

Score a single query using RBP.

Parameters:

Name Type Description Default
retrieved_doc_ids Sequence[str]

Ordered list of retrieved document IDs.

required
expected_answers Collection[str] | dict[str, int]

Either a set/list of expected answer IDs (binary relevance) or a dict mapping doc_id → relevance grade. For graded relevance, grades are normalized to [0, 1].

required

Returns:

Type Description
float

RBP score in range [0, 1].

evret.metrics.average_precision

Average Precision@K metric implementation.

AveragePrecision

Bases: Metric

Average Precision at top-k cutoff.