Multi-Index RAG pipeline — Extensible architecture — Building with the Claude API

Real RAG systems có 3-5 indexes: - Semantic (embeddings) - BM25 (keyword) - Title/metadata specific - Domain-specific (finance, legal terms)

Bạn sẽ học được

Design Retriever wrap multiple SearchIndex
Add index types (keyword, domain-specific) without refactor
Understand RRF math for ranking fusion
Ship extensible RAG architecture

SearchIndex protocol

Use Python Protocol để define interface:

Any class with these 2 methods = valid SearchIndex.

VectorIndex, BM25Index both conform.

from typing import Protocol, List, Tuple, Dict, Any


class SearchIndex(Protocol):
    def add_document(self, content: str, metadata: Dict[str, Any] = None) -> None:
        ...
    
    def search(self, query: str, k: int = 5) -> List[Tuple[Dict, float]]:
        ...

Retriever — Universal coordinator

class Retriever:
    """Wraps multiple SearchIndex, fuses results via RRF."""
    
    def __init__(self, *indexes: SearchIndex):
        if len(indexes) == 0:
            raise ValueError("At least one index required")
        self._indexes = list(indexes)
    
    def add_document(self, content: str, metadata: Dict = None):
        """Add to all indexes."""
        for index in self._indexes:
            index.add_document(content, metadata)
    
    def add_documents(self, contents: List[str], metadatas: List[Dict] = None):
        """Batch add to all."""
        if metadatas is None:
            metadatas = [{}] * len(contents)
        for content, meta in zip(contents, metadatas):
            for index in self._indexes:
                index.add_document(content, meta)
    
    def search(
        self,
        query: str,
        k: int = 5,
        k_rrf: int = 60
    ) -> List[Tuple[Dict, float]]:
        """RRF fusion across all indexes."""
        # Collect from each
        all_results = [
            index.search(query, k=k*2)  # over-fetch, fuse cuts
            for index in self._indexes
        ]
        
        # Score by RRF
        scores: Dict[str, float] = {}
        docs_by_id: Dict[str, Dict] = {}
        
        for index_results in all_results:
            for rank, (doc, _) in enumerate(index_results):
                # Use content hash as ID (or metadata if consistent)
                doc_id = hash(doc["content"])
                docs_by_id[doc_id] = doc
                scores[doc_id] = scores.get(doc_id, 0) + 1.0 / (k_rrf + rank + 1)
        
        # Sort by combined score
        sorted_ids = sorted(scores.keys(), key=lambda x: -scores[x])[:k]
        return [(docs_by_id[doc_id], scores[doc_id]) for doc_id in sorted_ids]

Usage

# Create indexes
vector_idx = VectorIndex()
bm25_idx = BM25Index()

# Wrap
retriever = Retriever(vector_idx, bm25_idx)

# Add documents once, propagates to both
retriever.add_documents(chunks_content, metadatas)

# Search fused
results = retriever.search("INC-2023-Q4-011 resolution", k=5)

Add new index — no refactor

Want add TitleIndex (search doc titles exact)?

Retriever code unchanged. Power of protocol-based design.

class TitleIndex:
    """Search by exact title match."""
    
    def __init__(self):
        self.docs = []
    
    def add_document(self, content, metadata=None):
        self.docs.append({"content": content, "metadata": metadata or {}})
    
    def search(self, query, k=5):
        query_lower = query.lower()
        results = []
        for doc in self.docs:
            title = doc["metadata"].get("title", "").lower()
            if query_lower in title:
                results.append((doc, 1.0))
        return results[:k]


# Plug in
retriever = Retriever(vector_idx, bm25_idx, title_idx)

RRF deep dive

Formula

Where:

Why this formula?

Example

3 indexes, document D appears:

RRF(D) = 1/61 + 1/63 + 0 = 0.0324

Tuning k_rrf

Default 60 works for most.

rank_i(d) = rank of document d in index i (1-indexed)
k_rrf = constant, typical 60
Divide by rank → diminishing returns. Being #1 much better than #10.
Constant k_rrf → smooth, avoid divide-by-zero.
Sum across indexes → documents appear in multiple → higher combined.
Index 1: rank 1 → 1/61
Index 2: rank 3 → 1/63
Index 3: not in top K → 0
k_rrf=1: top-1 heavily weighted
k_rrf=60: balanced
k_rrf=100+: very smooth, more equal

RRF_score(d) = Σ_i  1 / (k_rrf + rank_i(d))

Hybrid search improvements

1. Index-specific weights

Trust some indexes more:

Example: vector=1.0, bm25=0.5 → semantic more trusted.

2. Query-dependent routing

Some queries best for specific index:

def search(self, query, k=5, weights=None):
    if weights is None:
        weights = [1.0] * len(self._indexes)
    
    scores = {}
    for index, weight, results in zip(self._indexes, weights, all_results):
        for rank, (doc, _) in enumerate(results):
            doc_id = hash(doc["content"])
            scores[doc_id] = scores.get(doc_id, 0) + weight / (60 + rank + 1)
    # ...

2. Query-dependent routing

3. Re-ranking with Claude

After retrieve top K (larger), use Claude rerank:

def smart_search(query, retriever):
    # If query has IDs / specific codes, favor BM25
    if re.search(r"[A-Z]{2,}-\d+", query):
        return retriever.search(query, weights=[0.3, 1.0, 0.2])
    # Else balanced
    return retriever.search(query, weights=[1.0, 1.0, 0.5])

3. Re-ranking with Claude

Costs extra Claude call per query but boosts quality.

def claude_rerank(query, candidates, k=5):
    candidates_text = "\n\n".join(
        f"[{i}] {c['content']}" for i, c in enumerate(candidates)
    )
    
    prompt = f"""Rank these candidates by relevance to query.

Query: {query}

<candidates>
{candidates_text}
</candidates>

Return JSON: {{"ranking": [top_indices]}}"""
    
    response = anthropic.messages.create(...)
    ranking = json.loads(response.content[0].text)
    return [candidates[i] for i in ranking["ranking"][:k]]

Production architecture

Each component swappable + testable.

User query
    │
    ▼
Query rewrite (optional, Claude)
    │
    ▼
┌─── Retriever ──────────────────┐
│                                │
│  Vector Index  ← 20 chunks     │
│  BM25 Index    ← 20 chunks     │
│  Title Index   ← 20 chunks     │
│                                │
│  RRF fusion    → top 10        │
└────────────────────────────────┘
    │
    ▼
Claude rerank (optional)
    │
    ▼
Top 3-5 chunks
    │
    ▼
Prompt + Claude
    │
    ▼
Answer + citations

Eval each component

Typical:

Significant improvement from fusion.

Vector: 75%
BM25: 68%
Hybrid: 87%

def eval_retrieval_pipeline(queries_with_expected, retriever):
    """Measure recall@K."""
    correct_at_k = {k: 0 for k in [1, 3, 5, 10]}
    
    for test in queries_with_expected:
        results = retriever.search(test["query"], k=10)
        retrieved_sources = [doc["metadata"]["source"] for doc, _ in results]
        
        for k in correct_at_k:
            if test["expected_source"] in retrieved_sources[:k]:
                correct_at_k[k] += 1
    
    n = len(queries_with_expected)
    return {k: correct_at_k[k] / n for k in correct_at_k}


# Compare
accuracy_vector = eval_retrieval_pipeline(queries, Retriever(vector_idx))
accuracy_bm25 = eval_retrieval_pipeline(queries, Retriever(bm25_idx))
accuracy_hybrid = eval_retrieval_pipeline(queries, Retriever(vector_idx, bm25_idx))

print(f"Vector only: recall@5 = {accuracy_vector[5]:.0%}")
print(f"BM25 only: recall@5 = {accuracy_bm25[5]:.0%}")
print(f"Hybrid: recall@5 = {accuracy_hybrid[5]:.0%}")

Anti-patterns

❌ Same index inside Retriever twice

Doesn't add value, inflates scores.

Fix: Unique indexes.

❌ Not all documents in all indexes

Doc in vector but not BM25 → can't match exact terms.

Fix: Retriever.add propagates to ALL indexes.

❌ Scoring scales mismatch

Mix raw scores từ different indexes → biased ranking.

Fix: Use RRF (ranks), not raw scores.

❌ Over-engineering

Start với 1 index. Add BM25 if eval shows gap. Don't jump to 5 indexes Day 1.

Fix: Eval-driven.

Áp dụng ngay

Bài tập 1: Implement Retriever (30 phút)

Wrap vector + BM25 in Retriever. Test RRF fusion gives reasonable ranking.

Bài tập 2: Add TitleIndex (20 phút)

Custom index for exact title match. Plug into Retriever. Test.

Tóm tắt

🎯 SearchIndex protocol unified interface.

🎯 Retriever wraps multiple indexes, RRF fuses.

🎯 RRF scoring uses ranks (not raw scores) → scale-invariant.

🎯 Extensible: add new index type without refactor.

🎯 Hybrid typically 10-15% higher recall than best single index.

Nội dung này có hữu ích không?