Skip to content

Dataset Format

Evret supports JSON and CSV for evaluation datasets.

JSON Format

Top level object has:

  • queries: required list
  • documents: optional list

Query Item Fields

  • query_id or id: string (required)
  • query_text or query: string (required)
  • relevant_doc_ids: list of strings (optional)
  • Use when you have pre-labeled document IDs as ground truth
  • For classic IR evaluation with exact doc ID matching
  • expected_answers: list of strings (optional)
  • Use when you want a judge to determine relevance
  • Store gold supporting text snippets that judges match against retrieved docs
  • relevant_docs: list of strings (deprecated, backward compatible)
  • Legacy field name, automatically mapped to relevant_doc_ids

Note: Provide either relevant_doc_ids OR expected_answers, not both

Document Item Fields

  • doc_id: string
  • text: string
  • metadata: object, optional

JSON Example with Expected Answers (Judge-Based Evaluation)

{
  "queries": [
    {
      "query_id": "q1",
      "query_text": "does a flight above 500 dollars need manager approval",
      "expected_answers": [
        "Flights above 500 dollars require manager approval before booking business travel."
      ]
    },
    {
      "query_id": "q2",
      "query_text": "what hotel reimbursement limit applies to business travel",
      "expected_answers": [
        "Hotel reimbursement is capped at 180 dollars per night unless finance approves an exception."
      ]
    }
  ],
  "documents": [
    {
      "doc_id": "travel_policy_2",
      "text": "Flights above 500 dollars require manager approval before booking business travel.",
      "metadata": {
        "source": "travel_policy.md",
        "section": "flight_approval"
      }
    },
    {
      "doc_id": "travel_policy_3",
      "text": "Hotel reimbursement is capped at 180 dollars per night unless finance approves an exception.",
      "metadata": {
        "source": "travel_policy.md",
        "section": "hotel_cap"
      }
    }
  ]
}

JSON Example with Document IDs (Classic IR Evaluation)

{
  "queries": [
    {
      "query_id": "q1",
      "query_text": "does a flight above 500 dollars need manager approval",
      "relevant_doc_ids": ["travel_policy_2"]
    },
    {
      "query_id": "q2",
      "query_text": "what hotel reimbursement limit applies to business travel",
      "relevant_doc_ids": ["travel_policy_3"]
    }
  ],
  "documents": [
    {
      "doc_id": "travel_policy_2",
      "text": "Flights above 500 dollars require manager approval before booking business travel.",
      "metadata": {
        "source": "travel_policy.md",
        "section": "flight_approval"
      }
    },
    {
      "doc_id": "travel_policy_3",
      "text": "Hotel reimbursement is capped at 180 dollars per night unless finance approves an exception.",
      "metadata": {
        "source": "travel_policy.md",
        "section": "hotel_cap"
      }
    }
  ]
}

CSV Format

Required columns:

  • query_text or query
  • relevant_docs

Optional columns:

  • query_id or id
  • relevant_doc_ids (for classic IR evaluation)
  • expected_answers (for judge-based evaluation)

Note: The old relevant_docs column is still supported for backward compatibility.

The relevance field (relevant_doc_ids, expected_answers, or legacy relevant_docs) can be:

  • JSON list string like "[\"Flights above 500 dollars require manager approval before booking business travel.\"]"
  • Comma separated values when the labels are short and unambiguous

CSV Example with Expected Answers

query_id,query_text,expected_answers
q1,does a flight above 500 dollars need manager approval,"[""Flights above 500 dollars require manager approval before booking business travel.""]"
q2,what hotel reimbursement limit applies to business travel,"[""Hotel reimbursement is capped at 180 dollars per night unless finance approves an exception.""]"

CSV Example with Document IDs

query_id,query_text,relevant_doc_ids
q1,does a flight above 500 dollars need manager approval,"[""travel_policy_2""]"
q2,what hotel reimbursement limit applies to business travel,"[""travel_policy_3""]"

Loader Methods

from evret import EvaluationDataset

dataset_json = EvaluationDataset.from_json("eval_data.json")
dataset_csv = EvaluationDataset.from_csv("eval_data.csv")