Skip to content

Package API

evret

Evret package.

AveragePrecision

Bases: Metric

Average Precision at top-k cutoff.

BaseRetriever

Bases: ABC

Abstract retriever interface used by evaluation pipelines.

batch_retrieve(queries, k)

Retrieve for each query using the same cutoff k.

retrieve(query, k) abstractmethod

Return top-k results for a single query.

ChromaRetriever

Bases: BaseRetriever

Retrieve documents from ChromaDB with a unified Evret interface.

DocumentExample dataclass

One document entry in an evaluation dataset.

EvaluationDataset dataclass

Evaluation dataset containing query examples and optional documents.

EvaluationResults dataclass

Aggregated metric results for an evaluation run.

summary()

Return metric summary map.

to_csv(path)

Write metric rows as CSV (metric, score).

to_dict()

Return serializable representation of this result object.

to_json(path)

Write results as JSON.

Evaluator

Run a list of metrics over a retriever and dataset.

Uses pluggable Judge system for text-based relevance matching.

Parameters:

Name Type Description Default
retriever BaseRetriever

Retriever to evaluate

required
metrics Sequence[Metric]

List of metrics to compute

required
judge Judge | None

Relevance judge (defaults to TokenOverlapJudge if None)

None

Examples:

>>> from evret import Evaluator, HitRate, Recall
>>> from evret.judges import TokenOverlapJudge, SemanticJudge, LLMJudge
>>>
>>> # Default: TokenOverlapJudge
>>> evaluator = Evaluator(retriever, [HitRate(k=4), Recall(k=4)])
>>>
>>> # Custom judge
>>> evaluator = Evaluator(
...     retriever,
...     [Recall(k=4)],
...     judge=SemanticJudge(threshold=0.8)
... )

EvretError

Bases: Exception

Base exception for Evret.

EvretValidationError

Bases: ValueError, EvretError

Raised when user input or data format is invalid.

HitRate

Bases: Metric

Binary top-k relevance presence metric.

Formula: HitRate@k = (1 / |Q|) * sum(1[relevant_i ∩ retrieved_i[:k] != ∅])

Judge

Bases: ABC

Base interface for relevance judges.

All judges implement a simple contract: - judge(context) → bool (is relevant?) - batch_judge(contexts) → list[bool] (batch evaluation)

Subclasses should override judge() and optionally batch_judge() for optimized batch processing.

name abstractmethod property

Judge display name for logging/debugging.

batch_judge(contexts)

Batch evaluation of multiple contexts.

Default implementation calls judge() for each context sequentially. Override this method for optimized batch processing (e.g., vectorized operations, async API calls, etc.).

Parameters:

Name Type Description Default
contexts list[JudgmentContext]

List of judgment contexts

required

Returns:

Type Description
list[bool]

List of boolean judgments (same order as input)

judge(context) abstractmethod

Return True if retrieved_text is relevant to expected_text given query.

Parameters:

Name Type Description Default
context JudgmentContext

Judgment context with query and texts

required

Returns:

Type Description
bool

True if retrieved text is relevant, False otherwise

JudgmentContext dataclass

Context passed to judge for relevance decision.

Attributes:

Name Type Description
query str

User query text

expected_text str

Expected/ground-truth relevant text

retrieved_text str

Retrieved candidate text to judge

LangChainRetrieverAdapter

Bases: LangChainBaseRetriever

Bridge Evret and LangChain retrievers.

LlamaIndexRetrieverAdapter

Bases: BaseRetriever

Wrap an Evret retriever as a LlamaIndex-compatible retriever.

MRR

Bases: Metric

Mean Reciprocal Rank query metric at top-k.

Formula: RR@k = 1 / rank_first_relevant if a hit exists in top-k, else 0.

MilvusRetriever

Bases: BaseRetriever

Retrieve documents from Milvus with a unified Evret interface.

NDCG

Bases: Metric

Normalized Discounted Cumulative Gain at top-k.

OptionalDependencyError

Bases: ImportError, EvretError

Raised when an optional dependency required by a feature is missing.

Precision

Bases: Metric

Purity metric over retrieved top-k documents.

Formula: Precision@k = |relevant ∩ retrieved[:k]| / k

QdrantRetriever

Bases: BaseRetriever

Retrieve documents from Qdrant with a unified Evret interface.

QueryExample dataclass

One query item in an evaluation dataset.

Supports two evaluation patterns: 1. Classic IR: Provide relevant_doc_ids (pre-labeled document identifiers) 2. Judge-based: Provide expected_answers (answer text snippets for judge to match)

Use relevant_doc_ids when you have pre-labeled ground truth document IDs. Use expected_answers when you want a judge to determine relevance by comparing against expected answer text.

Recall

Bases: Metric

Coverage metric over relevant documents.

Formula: Recall@k = |relevant ∩ retrieved[:k]| / |relevant|

RetrievalResult dataclass

Standard retrieval output for all retriever backends.

TokenOverlapJudge

Bases: Judge

Fast keyword/token-based relevance matching.

Suitable for exact/fuzzy text matching without semantic understanding. Uses token overlap with configurable thresholds to determine relevance.

Algorithm
  1. Try exact match
  2. Try substring containment
  3. Check token overlap with minimum token and ratio thresholds
  4. Optionally boost with query token overlap

Examples:

>>> judge = TokenOverlapJudge()  # Default settings
>>> judge = TokenOverlapJudge(min_tokens=3, overlap_ratio=0.7)
>>> judge = TokenOverlapJudge(min_tokens=2, overlap_ratio=0.6, query_boost=False)

Parameters:

Name Type Description Default
min_tokens int

Minimum shared tokens required (default: 2)

2
overlap_ratio float

Minimum overlap ratio 0-1 (default: 0.6)

0.6
query_boost bool

Allow query tokens to relax threshold (default: True)

True

name property

Judge display name.

__init__(min_tokens=2, overlap_ratio=0.6, query_boost=True)

Initialize token overlap judge with configurable thresholds.

judge(context)

Judge using token overlap algorithm.

Parameters:

Name Type Description Default
context JudgmentContext

Judgment context with query and texts

required

Returns:

Type Description
bool

True if retrieved text matches expected text

WeaviateRetriever

Bases: BaseRetriever

Retrieve documents from Weaviate with a unified Evret interface.