Vector Index vs Brute Force Comparison

Motivation

When deploying a vector search system in production, you rarely compute exact nearest neighbors. Instead, databases like ArangoDB use approximate nearest neighbor (ANN) indexes that trade a small amount of accuracy for significantly faster search at scale.

But how much accuracy do you actually lose? And is the speed gain worth it?

In a companion experiment (Dense vs Token-Level Retrieval), we established exact brute-force baselines for 14 embedding models across two multi-hop QA datasets. This report takes the best-performing dense embedding variants and asks: what happens when we replace exact search with ArangoDB's APPROX_NEAR_COSINE?

How we measure

For each question, we run both methods side-by-side:

ArangoDB APPROX_NEAR_COSINE — sends the query vector to the remote ArangoDB cluster, which searches using its IVF vector index and returns the top-100 results. This includes network latency.
Local brute force — computes query_vec @ all_chunks.T (exact cosine similarity via numpy dot product) locally in memory. No network, no approximation.

Both methods receive the same query vector and return the same format (top-100 ranked chunk IDs with scores). We then compute retrieval metrics on both and measure the difference (loss). Timing is measured per-query for both methods, and we separately measure the network round-trip time (RTT) to isolate the actual index computation time from network overhead.

Metrics

Recall@K

What fraction of relevant documents appear in the top-K? E.g., need 3 docs, found 2 in top-100 → Recall = 0.67

Success@K

Did we find all relevant documents in top-K? Binary: 1 or 0. The strictest metric — missing even one means failure.

MRR First

1/rank of the first relevant document found. Higher = first relevant result appears earlier.

MRR Last

1/rank of the last relevant document. Zero if any are missing. Measures how deep you must look to have everything.

The loss shown in this report is the percentage drop in each metric when using the approximate index compared to exact brute force. E.g., "Recall loss 5%" means the index finds 5% fewer relevant documents than exact search.

Index Configuration

ArangoDB uses IVF (Inverted File) vector indexes powered by FAISS to perform approximate nearest neighbor search. Instead of comparing a query against every document (brute force), it partitions vectors into clusters and only searches the nearest clusters.

This is faster for large collections but introduces approximation error — relevant documents that fall into clusters not searched will be missed. The key parameters that control this speed-accuracy tradeoff are:

nLists

Number of clusters (Voronoi cells) to partition the vector space into. FAISS recommends: 4×√N to 16×√N. More clusters = smaller buckets = faster search but higher risk of missing neighbors.

nProbe

How many clusters to search per query. Higher = slower but more accurate. nProbe=1 is fastest (only check closest cluster). nProbe=nLists equals brute force.

Network RTT

Round-trip latency to the ArangoDB server. Adds a fixed overhead to every query regardless of index configuration. Measured via persistent HTTP connection.

6,655

Questions evaluated

101,958

Chunks in collection

3,192

nLists (midpoint of FAISS range)

159

nProbe (5% of nLists)

What we index

The ArangoDB collection musique_chunks stores 101,958 documents, each containing 9 dense embedding fields of different sizes (from 256 to 3072 dimensions, produced by OpenAI text-embedding-3-small and text-embedding-3-large). We create a separate vector index on each embedding field, all with identical IVF parameters (nLists=3,192, nProbe=159). This lets us see how the approximation loss varies with embedding dimensionality while keeping everything else constant.

FAISS recommended nLists range for N=101,958: [1,277 – 5,109]. We use the midpoint (3,192). Network RTT measured at ~32ms on persistent connection. The emb_large_3072 index is excluded — creating it with nLists=6,797 previously caused RocksDB corruption on a primary pod; not reattempted with current configuration.

Timing Comparison

Average query time breakdown. "Index only" = APPROX total minus network RTT (~32ms). "Index vs BF" shows how many times slower the index computation is compared to local brute force (<1 means index is faster, >1 means brute force is faster).

Timing Details Table (click to expand)

Key Finding: Network RTT (~32ms) dominates the total APPROX query time (~45-55ms). The actual index search takes only ~10-22ms, comparable to local brute force (~5-20ms). At higher dimensions (1536d) the index and brute force nearly tie. The vector index provides no speed advantage at 102K documents — its value would appear at millions of documents where brute force becomes too slow.

Metric Loss

How much accuracy does the approximate index lose compared to exact brute force? Select a metric to see the loss percentage across all embedding variants at each K.

Full Metrics Table (click to expand)

Key Finding: With nLists=3,192 and nProbe=159, the approximate index loses 5-7% in Recall@100 and 7-10% in Success@100 compared to exact search. MRR First is least affected (2-5%). The "large" model variants lose less than "small" (~5% vs ~7%), possibly because higher-quality embeddings cluster more coherently. Previous configuration (nLists=6,797, nProbe=10) lost 30-55% — proper tuning matters enormously.