Adaptive Pruning for Late-Interaction Retrieval
via Multi-Armed Bandit Theory
Roi Pony, Adi Raz, Oshri Naparstek, Idan Friedman, Udi Barzelay
IBM Research Israel
Multi-vector late-interaction retrievers such as ColBERT achieve state-of-the-art retrieval quality, but their query-time cost is dominated by exhaustively computing token-level interactions (MaxSim) for every candidate document. We introduce Col-Bandit, a query-time pruning algorithm that casts reranking as a finite-population Top-K identification problem. Col-Bandit maintains uncertainty-aware bounds over partially observed document scores and adaptively reveals only the entries needed to determine the top results. It operates as a zero-shot, drop-in layer requiring no index modifications or retraining.
Walk through the late-interaction scoring step by step. Each cell in the score matrix is one MaxSim operation — and Col-Bandit's target for pruning.
Col-Bandit reframes late-interaction reranking as a top-K identification problem, using confidence bounds to adaptively prune the score matrix at query time.
Computes all N × T MaxSim values. For 1000 documents and 32 query tokens, that's 32,000 lookups — even for documents clearly not in the top-K.
Treats each document as a bandit arm. Samples tokens adaptively via LUCB — focusing on competitive candidates and ignoring obvious losers. Stops when top-K is separated.
Evaluated on 5 BEIR text datasets (ColBERTv2, Jina-ColBERT-v2) and 4 REAL-MM-RAG multimodal datasets (Granite Vision Embedding 3.2).
Overlap@K measures ranking fidelity: the fraction of documents in Col-Bandit's top-K that match full ColBERT's top-K.
An Overlap@5 of 90% means 4.5 out of 5 documents are correctly identified.
Compute shows the percentage of score matrix cells that Col-Bandit must compute to achieve 90% and 95% Overlap@K. Savings shows the compute reduction vs. full ColBERT (100% compute).
| Model | Overlap@1 | Overlap@5 | ||||||
|---|---|---|---|---|---|---|---|---|
| 90% | 95% | 90% | 95% | 90% | 95% | 90% | 95% | |
| Compute | Savings | Compute | Savings | |||||
| ColBERTv2 (BEIR) |
13% | 14% | 7.7× | 7.1× | 28% | 33% | 3.6× | 3.1× |
| Jina-ColBERT-v2 (BEIR) |
11% | 14% | 9.1× | 7.1× | 26% | 34% | 3.8× | 3.0× |
| Granite Vision (REAL-MM-RAG) |
16% | 18% | 6.3× | 5.9× | 31% | 41% | 3.2× | 2.5× |
Replay an actual Col-Bandit execution step by step. The left shows ColBERT's full scoring; the right shows Col-Bandit's adaptive approach.
@misc{pony2026colbanditzeroshotquerytimepruning,
title = {Col-Bandit: Zero-Shot Query-Time Pruning
for Late-Interaction Retrieval},
author = {Roi Pony and Adi Raz and Oshri Naparstek
and Idan Friedman and Udi Barzelay},
year = {2026},
eprint = {2602.02827},
archivePrefix = {arXiv},
primaryClass = {cs.IR},
url = {https://arxiv.org/abs/2602.02827}
}