Hit Rate¶
What It Measures¶
Hit Rate answers one question:
"Did we retrieve at least one relevant document in the top-k results?"
It does not care whether the first hit is at rank 1 or rank 5. A query gets 1 when any relevant item appears in top-k, otherwise it gets 0.
Mathematical Formula¶
For query \(i\):
\[
\operatorname{Hit@}k_i =
\begin{cases}
1, & \text{if } G_i \cap R_i^{(k)} \neq \varnothing \\
0, & \text{otherwise}
\end{cases}
\]
Across all queries:
\[
\operatorname{HitRate@}k =
\frac{1}{|Q|}
\sum_{i=1}^{|Q|}
\operatorname{Hit@}k_i
\]
Formula Breakdown¶
- \(Q\): set of evaluation queries
- \(G_i\): ground truth relevant ids for query \(i\)
- \(R_i^{(k)}\): first \(k\) retrieved ids for query \(i\)
- \(G_i \cap R_i^{(k)}\): relevant ids found in the top-
k - \(\varnothing\): empty set
Evret returns 0.0 for a query when its relevant set is empty or no retrieved result matches.
Worked Example¶
Given:
k = 5retrieved@5 = [d1, d4, d2, d9, d8]relevant = {d2, d8, d10}
Step 1: intersection is {d2, d8}.
Step 2: intersection is not empty, so query score is 1.
If scores over 4 queries are [1, 0, 1, 1], final Hit Rate is:
\[
\frac{1 + 0 + 1 + 1}{4} = 0.75
\]
When To Use¶
- First debugging step in information retrieval
- Fast check after changing chunking or embeddings
- CI sanity check to catch obvious retrieval regressions