Trung cấpHướng dẫnClaude APINguồn: Anthropic

Router Query Engine — Tự động chọn index phù hợp

Minh TuấnCTO, Transform GroupTheo dõi

26/03/2026 541 0 6 phút đọc

Nghe bài viết

00:00

1 Muốn làm chủ router hoạt động như thế nào?, hãy bắt đầu từ việc hiểu Router nhận câu hỏi, phân tích nội dung, rồi chọn một hoặc nhiều query engines để truy vấn: Single Selection Router — Chọn một engine tốt nhất nhanh, tiết kiệm Multi Selection Router — Chọn nhiều engines — kỹ thuật này được nhiều developer áp dụng thành công trong dự án thực tế.
2 Một thực tế quan trọng về tạo nhiều indexes: Ví dụ: hệ thống hỗ trợ khách hàng với 3 loại tài liệu khác nhau: # 1. Technical Documentation tech_docs Documenttext"""API Authentication: Tất cả requests đến Claude API phải có Authorization header với API key. Format: 'x-api-key: YOUR_KEY' — tuy mang lại lợi ích rõ ràng nhưng cũng đòi hỏi đầu tư thời gian học và thử nghiệm phù hợp.
3 Dữ liệu từ llm router — intelligent selection cho thấy: LLMMultiSelector # Tạo query engine tools với descriptions tech_tool QueryEngineTool.from_defaults query_enginetech_index.as_query_enginesimilarity_top_k3, name"technical_documentation", description"Tài liệu kỹ thuật về AP — những con số này phản ánh mức độ cải thiện thực tế mà người dùng có thể kỳ vọng.
4 Bước thực hành then chốt trong embedding-based router (nhanh hơn, ít chi phí): # Embedding selector không cần LLM call, dùng cosine similarity embedding_router RouterQueryEngine selectorEmbeddingSingleSelector.from_defaults, query_engine_toolstech_tool, pricing_tool, usecase_tool, verboseTrue print" Test Embedding Router " q3 "Claude Opus giá bao nhiêu tiền?" printf" Q: q3" r3 embeddi — nắm vững điều này giúp bạn triển khai nhanh hơn và giảm thiểu lỗi thường gặp.
5 Một thực tế quan trọng về custom router logic: Tùy chỉnh routing logic cho use case đặc biệt: tools: """Router đơn giản dựa trên keywords.""" query_lower query.lower selected if anykw in query_lower for kw in "giá", "chi phí", "bao nhiêu tiền", "tier": selected.append"pricing_and_plans", 1.0 if anykw in query_lower for kw in "lỗ — tuy mang lại lợi ích rõ ràng nhưng cũng đòi hỏi đầu tư thời gian học và thử nghiệm phù hợp.

Khi hệ thống RAG của bạn có nhiều indexes khác nhau — một cho tài liệu kỹ thuật, một cho FAQ, một cho dữ liệu giá — làm thế nào để biết câu hỏi nào nên query index nào? Router Query Engine của LlamaIndex giải quyết vấn đề này: sử dụng Claude để phân tích câu hỏi và tự động route đến query engine phù hợp nhất.

Router hoạt động như thế nào?

Router nhận câu hỏi, phân tích nội dung, rồi chọn một hoặc nhiều query engines để truy vấn:

Single Selection Router — Chọn một engine tốt nhất (nhanh, tiết kiệm)
Multi Selection Router — Chọn nhiều engines, tổng hợp kết quả (toàn diện hơn)
LLM-based Router — Dùng LLM để quyết định (flexible, hiểu ngữ cảnh)
Embedding-based Router — Dùng cosine similarity (nhanh hơn, ít chi phí)

Cài đặt

pip install llama-index llama-index-llms-anthropic llama-index-embeddings-voyageai

import os
from llama_index.core import Settings, VectorStoreIndex, SummaryIndex, Document
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.voyageai import VoyageEmbedding

Settings.llm = Anthropic(model="claude-opus-4-5", max_tokens=2048)
Settings.embed_model = VoyageEmbedding(
    model_name="voyage-3",
    voyage_api_key=os.environ.get("VOYAGE_API_KEY")
)

print("Router Query Engine ready")

Tạo nhiều Indexes

Ví dụ: hệ thống hỗ trợ khách hàng với 3 loại tài liệu khác nhau:

# 1. Technical Documentation
tech_docs = [
    Document(text="""API Authentication: Tất cả requests đến Claude API
    phải có Authorization header với API key. Format: 'x-api-key: YOUR_KEY'.
    Không bao giờ để API key trong client-side code. Dùng environment variables."""),
    Document(text="""Rate Limiting: Claude API áp dụng rate limits theo tier.
    Tier 1: 50 RPM, 50K TPM. Tier 2: 1000 RPM, 100K TPM.
    Khi bị rate limit, nhận HTTP 429. Implement exponential backoff."""),
    Document(text="""Error Handling: HTTP 400 = Bad Request (check request format).
    HTTP 401 = Unauthorized (check API key). HTTP 429 = Rate Limited.
    HTTP 500 = Server Error (retry after delay). Luôn handle errors gracefully."""),
    Document(text="""Streaming: Sử dụng client.messages.stream() để nhận tokens
    từng phần. Hỗ trợ SSE (Server-Sent Events). Giảm perceived latency đáng kể.""")
]

# 2. Pricing and Plans
pricing_docs = [
    Document(text="""Claude Haiku: Input $0.25/MTok, Output $1.25/MTok.
    Model nhanh và tiết kiệm nhất. Phù hợp cho production high-volume tasks,
    classification, extraction, và simple Q&A."""),
    Document(text="""Claude Sonnet: Input $3/MTok, Output $15/MTok.
    Balance tốt giữa intelligence và speed. Dùng cho coding, analysis, và
    complex tasks cần reasoning tốt hơn Haiku."""),
    Document(text="""Claude Opus: Input $15/MTok, Output $75/MTok.
    Model mạnh nhất, tốt nhất cho research, complex reasoning, và tasks
    cần highest intelligence. Không phù hợp cho high-volume production."""),
    Document(text="""Free tier: $5 credit khi đăng ký. Tier progression dựa trên
    spend và time. Tier 4 yêu cầu ít nhất $40 spend và 7 ngày usage.""")
]

# 3. Use Cases and Examples
usecase_docs = [
    Document(text="""Chatbot: Dùng claude-haiku-4-5 cho speed. Implement conversation
    history trong messages array. System prompt định nghĩa persona và rules.
    Rate limit: giới hạn turns per session để tránh abuse."""),
    Document(text="""Code Generation: claude-opus-4-5 tốt nhất cho complex code.
    Cung cấp context đầy đủ: language, framework, existing code.
    Dùng tool use để execute và test code automatically."""),
    Document(text="""Document Analysis: Gửi PDF base64 trong vision messages.
    200K context window đủ cho tài liệu dài. Extract structured data
    bằng JSON mode. Combine với RAG cho knowledge base lớn.""")
]

# Tạo indexes
tech_index = VectorStoreIndex.from_documents(tech_docs)
pricing_index = VectorStoreIndex.from_documents(pricing_docs)
usecase_index = VectorStoreIndex.from_documents(usecase_docs)
summary_index = SummaryIndex.from_documents(tech_docs + pricing_docs + usecase_docs)

print("All indexes created")

LLM Router — Intelligent Selection

from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector, LLMMultiSelector
from llama_index.core.tools import QueryEngineTool

# Tạo query engine tools với descriptions
tech_tool = QueryEngineTool.from_defaults(
    query_engine=tech_index.as_query_engine(similarity_top_k=3),
    name="technical_documentation",
    description="Tài liệu kỹ thuật về API authentication, rate limiting, error handling, "
                "và streaming. Dùng cho câu hỏi về cách implement, xử lý lỗi, giới hạn API."
)

pricing_tool = QueryEngineTool.from_defaults(
    query_engine=pricing_index.as_query_engine(similarity_top_k=3),
    name="pricing_and_plans",
    description="Thông tin giá cả, tiers, và kế hoạch subscription của Claude API. "
                "Dùng khi cần biết chi phí, so sánh models về giá, hoặc upgrade plans."
)

usecase_tool = QueryEngineTool.from_defaults(
    query_engine=usecase_index.as_query_engine(similarity_top_k=3),
    name="use_cases_examples",
    description="Ví dụ use cases và hướng dẫn implement cho chatbot, code generation, "
                "document analysis. Dùng cho câu hỏi 'làm thế nào để xây dựng...'."
)

summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_index.as_query_engine(response_mode="tree_summarize"),
    name="comprehensive_summary",
    description="Tổng hợp thông tin từ tất cả tài liệu. Dùng cho câu hỏi tổng quát, "
                "so sánh toàn diện, hoặc khi cần overview về Claude API."
)

# Tạo Router với LLM Selector (single selection)
single_router = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(llm=Settings.llm),
    query_engine_tools=[tech_tool, pricing_tool, usecase_tool, summary_tool],
    verbose=True
)

# Chạy queries
print("
=== Test Single Router ===")
q1 = "Rate limit của Claude API là bao nhiêu?"
print(f"
Q: {q1}")
r1 = single_router.query(q1)
print(f"A: {r1.response}")

Multi-Selector Router

Khi câu hỏi liên quan đến nhiều topics, MultiSelector query nhiều indexes:

# Router với multi-selection
multi_router = RouterQueryEngine(
    selector=LLMMultiSelector.from_defaults(llm=Settings.llm),
    query_engine_tools=[tech_tool, pricing_tool, usecase_tool],
    verbose=True
)

print("
=== Test Multi Router ===")
q2 = "Tôi muốn xây dựng chatbot với Claude API. Chi phí bao nhiêu và làm thế nào?"
print(f"
Q: {q2}")
r2 = multi_router.query(q2)
print(f"A: {r2.response}")

Embedding-based Router (Nhanh hơn, ít chi phí)

from llama_index.core.selectors import EmbeddingSingleSelector

# Embedding selector không cần LLM call, dùng cosine similarity
embedding_router = RouterQueryEngine(
    selector=EmbeddingSingleSelector.from_defaults(),
    query_engine_tools=[tech_tool, pricing_tool, usecase_tool],
    verbose=True
)

print("
=== Test Embedding Router ===")
q3 = "Claude Opus giá bao nhiêu tiền?"
print(f"
Q: {q3}")
r3 = embedding_router.query(q3)
print(f"A: {r3.response}")

Custom Router Logic

Tùy chỉnh routing logic cho use case đặc biệt:

from llama_index.core.selectors.utils import select_query_engine_multi

def keyword_router(query, tools):
    """Router đơn giản dựa trên keywords."""
    query_lower = query.lower()

    selected = []
    if any(kw in query_lower for kw in ["giá", "chi phí", "bao nhiêu tiền", "tier"]):
        selected.append(("pricing_and_plans", 1.0))

    if any(kw in query_lower for kw in ["lỗi", "error", "auth", "rate limit", "implement"]):
        selected.append(("technical_documentation", 0.9))

    if any(kw in query_lower for kw in ["xây dựng", "ví dụ", "chatbot", "code"]):
        selected.append(("use_cases_examples", 0.8))

    if not selected:
        selected.append(("comprehensive_summary", 0.7))

    # Sắp xếp theo confidence
    selected.sort(key=lambda x: x[1], reverse=True)
    return selected[0][0]  # Trả về tool name tốt nhất

# Test keyword router
test_queries = [
    "Rate limit bao nhiêu?",
    "Claude Sonnet giá bao nhiêu?",
    "Cách xây chatbot với Claude?",
    "Overview về Claude API?"
]

for q in test_queries:
    selected = keyword_router(q, [tech_tool, pricing_tool, usecase_tool])
    print(f"Query: '{q}' -> Route to: {selected}")

Monitoring và Analytics

route_counts = {}

class MonitoredRouter:
    """Router với analytics tracking."""

    def __init__(self, router):
        self.router = router
        self.route_counts = {}
        self.query_times = []

    def query(self, question):
        import time
        start = time.time()

        response = self.router.query(question)
        elapsed = time.time() - start
        self.query_times.append(elapsed)

        return response

    def stats(self):
        if self.query_times:
            avg_time = sum(self.query_times) / len(self.query_times)
            return {
                "total_queries": len(self.query_times),
                "avg_latency_ms": avg_time * 1000
            }
        return {}

monitored = MonitoredRouter(single_router)
monitored.query("Claude Haiku giá bao nhiêu?")
print(f"
Stats: {monitored.stats()}")

Kết luận

Router Query Engine giải quyết vấn đề "query routing" — đảm bảo mỗi câu hỏi được xử lý bởi index phù hợp nhất. LLM-based router hiểu ngữ cảnh tốt hơn, embedding-based router nhanh hơn và rẻ hơn. Trong production, kết hợp cả hai: dùng embedding router cho cases rõ ràng, LLM router cho cases phức tạp.

Bước tiếp theo: Khám phá SubQuestion Engine để phân tách câu hỏi phức tạp thành sub-queries song song, hoặc đọc về Multi-Document Agent cho orchestration phức tạp hơn.

Gợi ý cho bạn

ReAct Agent với LlamaIndex + Claude — Lý luận + Hành đ��ng

Router Query Engine — Tự động chọn index phù hợp

Điểm nổi bật

Router hoạt động như thế nào?

Cài đặt

Tạo nhiều Indexes

LLM Router — Intelligent Selection

Multi-Selector Router

Embedding-based Router (Nhanh hơn, ít chi phí)

Custom Router Logic

Monitoring và Analytics

Kết luận

Bài viết liên quan

Gợi ý cho bạn

ReAct Agent với LlamaIndex + Claude — Lý luận + Hành đ��ng

SubQuestion Engine — Phân tách câu hỏi phức tạp tự động

LlamaIndex + Claude — RAG pipeline cơ bản

Tìm kiếm Wikipedia với Claude — Research agent đơn giản

Tin liên quan nên xem

Computer Use Demo — Claude điều khiển máy tính của bạn

Claude kiểm soát Mac của tôi 30 phút: Trải nghiệm thực tế về Computer Use

Calculator Tool — Bài học đầu tiên về Tool Use với Claude

Context Compaction — Tự động nén context cho conversations dài

Router Query Engine — Tự động chọn index phù hợp

Điểm nổi bật

Router hoạt động như thế nào?

Cài đặt

Tạo nhiều Indexes

LLM Router — Intelligent Selection

Multi-Selector Router

Embedding-based Router (Nhanh hơn, ít chi phí)

Custom Router Logic

Monitoring và Analytics

Kết luận

Bài viết liên quan

Gợi ý cho bạn

ReAct Agent với LlamaIndex + Claude — Lý luận + Hành đ��ng

SubQuestion Engine — Phân tách câu hỏi phức tạp tự động

LlamaIndex + Claude — RAG pipeline cơ bản

Tìm kiếm Wikipedia với Claude — Research agent đơn giản

Tin liên quan nên xem

Computer Use Demo — Claude điều khiển máy tính của bạn

Claude kiểm soát Mac của tôi 30 phút: Trải nghiệm thực tế về Computer Use

Calculator Tool — Bài học đầu tiên về Tool Use với Claude

Context Compaction — Tự động nén context cho conversations dài

Đăng ký nhận bản tin