Trung cấpHướng dẫnClaude APINguồn: Anthropic

Multi-Document Agent — Truy vấn nhiều tài liệu với LlamaIndex

Minh TuấnCTO, Transform GroupTheo dõi

26/03/2026 572 0 6 phút đọc

Nghe bài viết

00:00

1 Để áp dụng kiến trúc multi-document agent hiệu quả, bạn cần nắm rõ: Hệ thống bao gồm ba lớp: Document Agents — Mỗi tài liệu có agent riêng với index và query engine Top-level Agent — Agent điều phối, quyết định document nào cần query Tool Registry — Danh sách tất cả document agents dưới dạng tools — đây là bước quan trọng giúp tối ưu quy trình làm việc với AI trong thực tế.
2 Về bước 1: chuẩn bị nhiều tài liệu, thực tế cho thấy # Giả lập nhiều báo cáo từ các phòng ban khác nhau documents_data "bao_cao_kinh_doanh_q1": "title": "Báo cáo Kinh doanh Q1 2024", "text": """Doanh thu Q1 2024: 5.2 tỷ đồng, tăng 23% so với Q1 2023 — đây là con dao hai lưỡi nếu không hiểu rõ giới hạn và điều kiện áp dụng của nó.
3 Kết quả đo lường từ bước 2: tạo document agents: Mỗi tài liệu có hai loại index: VectorIndex cho câu hỏi cụ thể và SummaryIndex cho tóm tắt: document: """Xây dựng agent cho một tài liệu cụ thể.""" # Vector index cho semantic search vector_index VectorStoreIndex.from_documentsdocument vector_engine vector_index — các chỉ số cụ thể này giúp bạn đánh giá chính xác hiệu quả trước khi đầu tư nguồn lực.
4 Muốn làm chủ bước 4: truy vấn multi-document, hãy bắt đầu từ việc hiểu # Câu hỏi đơn giản - 1 document print" Q1: Đơn giản " response top_agent.chat"Doanh thu Q1 2024 là bao nhiêu?" printf"Trả lời: response.response " # Câu hỏi phức tạp - nhiều documents print" Q2: Cross-document " response top_agent — kỹ thuật này được nhiều developer áp dụng thành công trong dự án thực tế.
5 Góc nhìn thực tế về kết quả mẫu: Khi hỏi "So sánh doanh thu và chi phí marketing Q1:", agent sẽ: Query báo cáo kinh doanh: "Doanh thu Q1 là 5.2 tỷ" Query báo cáo marketing: "CAC 12 — hiệu quả phụ thuộc nhiều vào cách triển khai và ngữ cảnh sử dụng cụ thể.

Khi bạn cần trả lời câu hỏi span nhiều tài liệu khác nhau — "So sánh chính sách của Q1 và Q2", "Tìm tất cả đề cập về khách hàng X trong các báo cáo" — một query engine đơn lẻ không đủ. Multi-Document Agent của LlamaIndex giải quyết vấn đề này bằng cách tạo một agent thông minh có thể điều phối nhiều indexes cùng lúc.

Kết hợp với Claude, Multi-Document Agent có thể tổng hợp, so sánh, và phân tích thông tin từ nhiều nguồn — đưa ra câu trả lời toàn diện mà không cần người dùng biết data nằm ở đâu.

Kiến trúc Multi-Document Agent

Hệ thống bao gồm ba lớp:

Document Agents — Mỗi tài liệu có agent riêng với index và query engine
Top-level Agent — Agent điều phối, quyết định document nào cần query
Tool Registry — Danh sách tất cả document agents dưới dạng tools

Cài đặt

pip install llama-index llama-index-llms-anthropic llama-index-embeddings-voyageai

import os
from llama_index.core import Settings, VectorStoreIndex, SummaryIndex, Document
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.voyageai import VoyageEmbedding
from llama_index.core.tools import QueryEngineTool
from llama_index.core.agent import ReActAgent

Settings.llm = Anthropic(model="claude-opus-4-5", max_tokens=2048)
Settings.embed_model = VoyageEmbedding(
    model_name="voyage-3",
    voyage_api_key=os.environ.get("VOYAGE_API_KEY")
)

Bước 1: Chuẩn bị nhiều tài liệu

# Giả lập nhiều báo cáo từ các phòng ban khác nhau
documents_data = {
    "bao_cao_kinh_doanh_q1": {
        "title": "Báo cáo Kinh doanh Q1 2024",
        "text": """Doanh thu Q1 2024: 5.2 tỷ đồng, tăng 23% so với Q1 2023.
        Sản phẩm bán chạy nhất: Enterprise Plan (45% doanh thu).
        Khách hàng mới: 234 doanh nghiệp. Churn rate: 2.1%.
        Thị trường trọng điểm: TP.HCM (40%), Hà Nội (35%), Đà Nẵng (15%).
        Đội sale đạt 118% KPI. Pipeline Q2 ước tính 8.5 tỷ đồng."""
    },
    "bao_cao_ky_thuat_q1": {
        "title": "Báo cáo Kỹ thuật Q1 2024",
        "text": """Uptime hệ thống Q1: 99.95%. Tổng incidents: 3 (2 minor, 1 major).
        Major incident ngày 15/2: downtime 45 phút do lỗi database migration.
        API response time trung bình: 145ms (giảm từ 210ms Q4 2023).
        Deploy 47 features mới. Giảm 30% tech debt.
        Team mở rộng từ 12 lên 18 engineers. Đang tuyển thêm 5 người."""
    },
    "bao_cao_nhan_su_q1": {
        "title": "Báo cáo Nhân sự Q1 2024",
        "text": """Tổng nhân viên Q1: 87 người (tăng 15 so với Q4 2023).
        Tuyển mới: 18 người (Sales: 5, Engineering: 8, Marketing: 3, Support: 2).
        Nghỉ việc: 3 người. Turnover rate: 3.4% (tốt hơn benchmark ngành 5%).
        Chi phí nhân sự: 1.8 tỷ/tháng. Satisfaction score: 8.2/10.
        Training hours trung bình: 12 giờ/người/tháng."""
    },
    "bao_cao_marketing_q1": {
        "title": "Báo cáo Marketing Q1 2024",
        "text": """Leads generated Q1: 1,240 (tăng 56% so với Q1 2023).
        CAC (Customer Acquisition Cost): 12.5 triệu đồng.
        Conversion rate: 18.9% (lead to customer).
        Top channels: Google Ads (35%), Content Marketing (28%), Referral (22%).
        Blog traffic: 450K visits/tháng. Newsletter subscribers: 28,000.
        Events tổ chức: 3 webinars (tổng 890 attendees)."""
    }
}

# Chuyển thành LlamaIndex Documents
documents = {
    key: Document(
        text=data["text"],
        metadata={"title": data["title"], "doc_id": key, "quarter": "Q1_2024"}
    )
    for key, data in documents_data.items()
}

Bước 2: Tạo Document Agents

Mỗi tài liệu có hai loại index: VectorIndex (cho câu hỏi cụ thể) và SummaryIndex (cho tóm tắt):

def build_document_agent(doc_id, document):
    """Xây dựng agent cho một tài liệu cụ thể."""

    # Vector index cho semantic search
    vector_index = VectorStoreIndex.from_documents([document])
    vector_engine = vector_index.as_query_engine(similarity_top_k=3)

    # Summary index cho tóm tắt toàn bộ tài liệu
    summary_index = SummaryIndex.from_documents([document])
    summary_engine = summary_index.as_query_engine(response_mode="tree_summarize")

    doc_title = document.metadata["title"]

    # Tools cho document agent này
    doc_tools = [
        QueryEngineTool.from_defaults(
            query_engine=vector_engine,
            name=f"search_{doc_id}",
            description=f"Tìm kiếm thông tin cụ thể trong: {doc_title}. "
                       f"Dùng cho câu hỏi về số liệu, sự kiện, chi tiết."
        ),
        QueryEngineTool.from_defaults(
            query_engine=summary_engine,
            name=f"summarize_{doc_id}",
            description=f"Tóm tắt toàn bộ nội dung: {doc_title}. "
                       f"Dùng khi cần tổng quan."
        )
    ]

    # Document-level agent
    doc_agent = ReActAgent.from_tools(
        tools=doc_tools,
        llm=Settings.llm,
        verbose=False
    )

    return doc_agent

# Xây dựng tất cả document agents
print("Building document agents...")
doc_agents = {}
for doc_id, document in documents.items():
    doc_agents[doc_id] = build_document_agent(doc_id, document)
    print(f"  Built agent for: {documents_data[doc_id]['title']}")

print(f"
Total document agents: {len(doc_agents)}")

Bước 3: Tạo Top-level Agent

Top-level agent điều phối tất cả document agents, quyết định tài liệu nào cần truy vấn:

from llama_index.core.tools import FunctionTool

def create_top_level_tools(doc_agents, documents_data):
    """Tạo tools cho top-level agent từ document agents."""
    top_tools = []

    for doc_id, doc_agent in doc_agents.items():
        doc_title = documents_data[doc_id]["title"]

        def make_query_fn(agent):
            def query_document(question: str) -> str:
                """Query document agent và trả về kết quả."""
                return agent.chat(question).response
            return query_document

        # Tạo tool gọi document agent
        tool = FunctionTool.from_defaults(
            fn=make_query_fn(doc_agent),
            name=f"query_{doc_id}",
            description=f"Truy vấn thông tin từ: {doc_title}. "
                       f"Chứa dữ liệu về Q1 2024."
        )
        top_tools.append(tool)

    return top_tools

top_tools = create_top_level_tools(doc_agents, documents_data)

# Top-level orchestrator agent
top_agent = ReActAgent.from_tools(
    tools=top_tools,
    llm=Settings.llm,
    verbose=True,
    system_prompt="""Bạn là AI analyst thông minh. Bạn có thể truy vấn nhiều báo cáo khác nhau.
    Khi trả lời câu hỏi phức tạp, hãy:
    1. Xác định báo cáo nào cần tra cứu
    2. Query từng báo cáo liên quan
    3. Tổng hợp thông tin thành câu trả lời toàn diện
    Luôn trả lời bằng tiếng Việt."""
)

print("Top-level agent ready!")

Bước 4: Truy vấn Multi-Document

# Câu hỏi đơn giản - 1 document
print("=== Q1: Đơn giản ===")
response = top_agent.chat("Doanh thu Q1 2024 là bao nhiêu?")
print(f"Trả lời: {response.response}
")

# Câu hỏi phức tạp - nhiều documents
print("=== Q2: Cross-document ===")
response = top_agent.chat(
    "So sánh tình hình kinh doanh và kỹ thuật Q1 2024. "
    "Có điểm nào đáng chú ý không?"
)
print(f"Trả lời: {response.response}
")

# Câu hỏi phân tích tổng hợp
print("=== Q3: Phân tích tổng hợp ===")
response = top_agent.chat(
    "Chi phí nhân sự Q1 là bao nhiêu? Tương quan với doanh thu như thế nào?"
)
print(f"Trả lời: {response.response}")

Kết quả mẫu

Khi hỏi "So sánh doanh thu và chi phí marketing Q1:", agent sẽ:

Query báo cáo kinh doanh: "Doanh thu Q1 là 5.2 tỷ"
Query báo cáo marketing: "CAC = 12.5 triệu, 234 khách hàng mới"
Tính toán: "Chi phí marketing ≈ 2.9 tỷ (234 x 12.5tr), chiếm ~56% doanh thu"
Tổng hợp câu trả lời đầy đủ với insights

Optimization: Parallel Query

Với nhiều documents, query tuần tự chậm. LlamaIndex hỗ trợ query song song:

import asyncio
from llama_index.core.agent import ReActAgent

async def parallel_query(doc_agents, question):
    """Query tất cả document agents song song."""
    tasks = []
    for doc_id, agent in doc_agents.items():
        # Async chat
        tasks.append(agent.achat(question))

    results = await asyncio.gather(*tasks, return_exceptions=True)

    summaries = {}
    for (doc_id, _), result in zip(doc_agents.items(), results):
        if not isinstance(result, Exception):
            summaries[doc_id] = result.response

    return summaries

# Lấy overview tất cả documents song song
loop = asyncio.get_event_loop()
summaries = loop.run_until_complete(
    parallel_query(doc_agents, "Tóm tắt highlights Q1 2024")
)

Kết luận

Multi-Document Agent giải quyết một trong những thách thức lớn nhất của RAG: làm thế nào để AI trả lời câu hỏi cần thông tin từ nhiều nguồn khác nhau. Với LlamaIndex + Claude, bạn có thể xây dựng hệ thống phân tích dữ liệu thông minh có thể tổng hợp báo cáo, so sánh tài liệu, và đưa ra insights cross-document.

Tiếp theo, khám phá Router Query Engine để tự động định tuyến câu hỏi đến index phù hợp nhất, hoặc đọc về SubQuestion Engine để phân tách câu hỏi phức tạp.

Gợi ý cho bạn

Multi-Modal RAG với LlamaIndex + Claude Vision

Multi-Document Agent — Truy vấn nhiều tài liệu với LlamaIndex

Điểm nổi bật

Kiến trúc Multi-Document Agent

Cài đặt

Bước 1: Chuẩn bị nhiều tài liệu

Bước 2: Tạo Document Agents

Bước 3: Tạo Top-level Agent

Bước 4: Truy vấn Multi-Document

Kết quả mẫu

Optimization: Parallel Query

Kết luận

Bài viết liên quan

Gợi ý cho bạn

Multi-Modal RAG với LlamaIndex + Claude Vision

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Contextual Retrieval — Nâng cấp RAG với embeddings ngữ cảnh

LlamaIndex + Claude — RAG pipeline cơ bản

Tin liên quan nên xem

Context Compaction — Tự động nén context cho conversations dài

Human-in-the-Loop — Khi nào AI Agent cần xin phép con người

Retrieval Agent — Xây dựng Agentic RAG với Claude

Testing AI Agent — Framework đánh giá và kiểm thử agent production

Multi-Document Agent — Truy vấn nhiều tài liệu với LlamaIndex

Điểm nổi bật

Kiến trúc Multi-Document Agent

Cài đặt

Bước 1: Chuẩn bị nhiều tài liệu

Bước 2: Tạo Document Agents

Bước 3: Tạo Top-level Agent

Bước 4: Truy vấn Multi-Document

Kết quả mẫu

Optimization: Parallel Query

Kết luận

Bài viết liên quan

Gợi ý cho bạn

Multi-Modal RAG với LlamaIndex + Claude Vision

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Contextual Retrieval — Nâng cấp RAG với embeddings ngữ cảnh

LlamaIndex + Claude — RAG pipeline cơ bản

Tin liên quan nên xem

Context Compaction — Tự động nén context cho conversations dài

Human-in-the-Loop — Khi nào AI Agent cần xin phép con người

Retrieval Agent — Xây dựng Agentic RAG với Claude

Testing AI Agent — Framework đánh giá và kiểm thử agent production

Đăng ký nhận bản tin