Col-Bandit

Adaptive Pruning for Late-Interaction Retrieval
via Multi-Armed Bandit Theory

Roi Pony, Adi Raz, Oshri Naparstek, Idan Friedman, Udi Barzelay
IBM Research Israel

Multi-vector late-interaction retrievers such as ColBERT achieve state-of-the-art retrieval quality, but their query-time cost is dominated by exhaustively computing token-level interactions (MaxSim) for every candidate document. We introduce Col-Bandit, a query-time pruning algorithm that casts reranking as a finite-population Top-K identification problem. Col-Bandit maintains uncertainty-aware bounds over partially observed document scores and adaptively reveals only the entries needed to determine the top results. It operates as a zero-shot, drop-in layer requiring no index modifications or retraining.

Paper Code — coming soon

▶ Interactive Demo View Results →

How It Works

Col-Bandit Algorithm

Col-Bandit reframes late-interaction reranking as a top-K identification problem, using confidence bounds to adaptively prune the score matrix at query time.

Standard ColBERT

Computes all N × T MaxSim values. For 1000 documents and 32 query tokens, that's 32,000 lookups — even for documents clearly not in the top-K.

SCORE MATRIX H [N×T]

100% of cells computed

Col-Bandit

Treats each document as a bandit arm. Samples tokens adaptively via LUCB — focusing on competitive candidates and ignoring obvious losers. Stops when top-K is separated.

SCORE MATRIX H [N×T]

~16% of cells → same top-K

Experiments

Results

Evaluated on 5 BEIR text datasets (ColBERTv2, Jina-ColBERT-v2) and 4 REAL-MM-RAG multimodal datasets (Granite Vision Embedding 3.2).

Overlap@K measures ranking fidelity: the fraction of documents in Col-Bandit's top-K that match full ColBERT's top-K. An Overlap@5 of 90% means 4.5 out of 5 documents are correctly identified.

Table 1: Col-Bandit Efficiency Across Models

Compute shows the percentage of score matrix cells that Col-Bandit must compute to achieve 90% and 95% Overlap@K. Savings shows the compute reduction vs. full ColBERT (100% compute).

Model	Overlap@1				Overlap@5
Model	90%	95%	90%	95%	90%	95%	90%	95%
	Compute		Savings		Compute		Savings
ColBERTv2 (BEIR)	13%	14%	7.7×	7.1×	28%	33%	3.6×	3.1×
Jina-ColBERT-v2 (BEIR)	11%	14%	9.1×	7.1×	26%	34%	3.8×	3.0×
Granite Vision (REAL-MM-RAG)	16%	18%	6.3×	5.9×	31%	41%	3.2×	2.5×

11-18%

Compute for Overlap@1

Range across models: 11-16% (90%), 14-18% (95%)

26-41%

Compute for Overlap@5

Range across models: 26-31% (90%), 33-41% (95%)

5.9×-9.1×

Savings for Overlap@1

Range across models: 7.1×-9.1× (90%), 5.9×-7.1× (95%)

2.5×-3.8×

Savings for Overlap@5

Range across models: 3.6×-3.8× (90%), 2.5×-3.1× (95%)

Reference

Citation

@misc{pony2026colbanditzeroshotquerytimepruning,
  title   = {Col-Bandit: Zero-Shot Query-Time Pruning
             for Late-Interaction Retrieval},
  author  = {Roi Pony and Adi Raz and Oshri Naparstek
             and Idan Friedman and Udi Barzelay},
  year    = {2026},
  eprint  = {2602.02827},
  archivePrefix = {arXiv},
  primaryClass  = {cs.IR},
  url     = {https://arxiv.org/abs/2602.02827}
}

Col-Bandit

How ColBERT Scores Documents

Col-Bandit Algorithm

Standard ColBERT

Col-Bandit

Results

Table 1: Col-Bandit Efficiency Across Models

Watch It Run

Citation