Trung cấpHướng dẫnClaude APINguồn: Anthropic

RAG Agent với LangChain + Pinecone + Claude

Minh TuấnCTO, Transform GroupTheo dõi

26/03/2026 796 0 5 phút đọc

Nghe bài viết

00:00

1 Bước thực hành then chốt trong langchain expression language (lcel): LCEL cho phép chain các components bằng toán tử | pipe, tương tự Unix pipes: chain retriever | prompt_template | llm | output_parser Mỗi bước nhận output của bước trước làm input — nắm vững điều này giúp bạn triển khai nhanh hơn và giảm thiểu lỗi thường gặp.
2 Về tạo pinecone vector store, thực tế cho thấy as LCDocument # Tạo Pinecone index index_name "langchain-rag-demo" if index_name not in pc.list_indexes.names: pc.create_index nameindex_name, dimension1024, metric"cosine", specServerlessSpeccloud"aws", region"us-east-1" # Tạo Vector Store từ documents """ documents: list of LangChain Document objec — đây là con dao hai lưỡi nếu không hiểu rõ giới hạn và điều kiện áp dụng của nó.
3 Theo phân tích xây dựng rag chain với lcel, # Retriever từ vector store retriever vector_store.as_retriever search_type"similarity", search_kwargs"k": 4 # Prompt template rag_prompt ChatPromptTemplate.from_template"""Bạn là AI assistant hữu ích. Trả lời câu hỏi d — con số thực tế này đáng để tham khảo khi lập kế hoạch triển khai cho dự án của bạn.
4 Muốn làm chủ rag agent với tools, hãy bắt đầu từ việc hiểu Nâng cấp từ RAG chain lên RAG Agent — có thể quyết định khi nào cần retrieve: create_tool_calling_agent MessagesPlaceholder # Tạo RAG tool từ retriever @tool str -&gt str: """ Tìm kiếm thông tin trong knowledge base về Claude AP — kỹ thuật này được nhiều developer áp dụng thành công trong dự án thực tế.
5 Về conversation history với memory, thực tế cho thấy AIMessage """RAG Agent với conversation memory.""" agent_executor: self.agent_executor agent_executor self.chat_history user_input: """Chat với memory.""" result self.agent_executor.invoke "input": user_input, "chat_history": self.chat_history # Cập nhật history self.ch — đây là con dao hai lưỡi nếu không hiểu rõ giới hạn và điều kiện áp dụng của nó.

the word ai spelled in white letters on a black surface

LangChain là framework phổ biến nhất cho LLM applications, cung cấp abstractions mạnh mẽ để xây dựng RAG pipelines, agents, và chains. Kết hợp với Pinecone và Claude, bạn có thể xây dựng RAG agent production-ready với LangChain Expression Language (LCEL) — cú pháp pipe hiện đại và composable.

LangChain Expression Language (LCEL)

LCEL cho phép chain các components bằng toán tử | (pipe), tương tự Unix pipes:

chain = retriever | prompt_template | llm | output_parser

Mỗi bước nhận output của bước trước làm input. Điều này tạo ra code dễ đọc, composable, và có built-in support cho streaming và async.

Cài đặt

pip install langchain langchain-anthropic langchain-pinecone langchain-voyageai pinecone-client

import os
from langchain_anthropic import ChatAnthropic
from langchain_voyageai import VoyageAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone, ServerlessSpec

# Khởi tạo LLM
llm = ChatAnthropic(
    model="claude-opus-4-5",
    anthropic_api_key=os.environ.get("ANTHROPIC_API_KEY"),
    max_tokens=2048
)

# Khởi tạo embeddings
embeddings = VoyageAIEmbeddings(
    voyage_api_key=os.environ.get("VOYAGE_API_KEY"),
    model="voyage-3"
)

# Khởi tạo Pinecone
pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))

print("LangChain + Pinecone + Claude ready")

Tạo Pinecone Vector Store

from langchain_core.documents import Document as LCDocument

# Tạo Pinecone index
index_name = "langchain-rag-demo"
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

# Tạo Vector Store từ documents
def create_vector_store(documents):
    """
    documents: list of LangChain Document objects

    LangChain Document có:
    - page_content: nội dung text
    - metadata: dict với thông tin bổ sung
    """
    vector_store = PineconeVectorStore.from_documents(
        documents=documents,
        embedding=embeddings,
        index_name=index_name,
        namespace="default"
    )
    return vector_store

# Ví dụ documents
sample_docs = [
    LCDocument(
        page_content="Claude API hỗ trợ streaming responses để cải thiện UX. "
                    "Dùng client.messages.stream() để nhận text từng token.",
        metadata={"source": "api-docs", "category": "technical", "version": "2024"}
    ),
    LCDocument(
        page_content="Rate limits của Claude API: Tier 1 = 50 RPM, Tier 4 = 4000 RPM. "
                    "Upgrade tier bằng cách nạp credit hoặc đạt usage milestones.",
        metadata={"source": "api-docs", "category": "limits", "version": "2024"}
    ),
    LCDocument(
        page_content="Claude có context window 200K tokens. Đây là một trong những "
                    "context window lớn nhất trong các LLM thương mại hiện tại.",
        metadata={"source": "model-docs", "category": "capabilities", "version": "2024"}
    ),
    LCDocument(
        page_content="Function calling (tool use) cho phép Claude gọi external APIs. "
                    "Định nghĩa tools trong parameter 'tools', Claude tự quyết định khi nào dùng.",
        metadata={"source": "api-docs", "category": "features", "version": "2024"}
    )
]

vector_store = create_vector_store(sample_docs)
print("Vector store created with sample documents")

Xây dựng RAG Chain với LCEL

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Retriever từ vector store
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}
)

# Prompt template
rag_prompt = ChatPromptTemplate.from_template("""Bạn là AI assistant hữu ích.
Trả lời câu hỏi dựa trên context được cung cấp.
Nếu context không đủ, hãy thành thật nói rõ.

Context:
{context}

Câu hỏi: {question}

Trả lời:""")

def format_docs(docs):
    """Format retrieved documents thành string."""
    return "

---

".join([
        f"[{doc.metadata.get('source', 'unknown')}]
{doc.page_content}"
        for doc in docs
    ])

# Xây dựng RAG chain với LCEL
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

# Chạy chain
response = rag_chain.invoke("Claude có context window bao nhiêu tokens?")
print(f"Answer: {response}")

Streaming với LCEL

# Stream response từng token
print("Streaming response:")
for chunk in rag_chain.stream("Claude API hỗ trợ streaming không?"):
    print(chunk, end="", flush=True)
print()  # Newline

RAG Agent với Tools

Nâng cấp từ RAG chain lên RAG Agent — có thể quyết định khi nào cần retrieve:

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Tạo RAG tool từ retriever
@tool
def search_knowledge_base(query: str) -> str:
    """
    Tìm kiếm thông tin trong knowledge base về Claude API và tài liệu kỹ thuật.
    Dùng khi cần tra cứu tính năng, giới hạn, hoặc cách sử dụng Claude.
    """
    docs = retriever.invoke(query)
    return format_docs(docs)

@tool
def calculate(expression: str) -> str:
    """
    Tính toán biểu thức toán học đơn giản.
    Ví dụ: '2 + 2', '100 * 0.15', '50000 / 12'
    """
    try:
        result = eval(expression, {"__builtins__": {}})
        return f"Kết quả: {result}"
    except Exception as e:
        return f"Lỗi: {str(e)}"

# Agent tools
agent_tools = [search_knowledge_base, calculate]

# Agent prompt
agent_prompt = ChatPromptTemplate.from_messages([
    ("system", """Bạn là AI assistant thông minh.
    Dùng tools để trả lời câu hỏi chính xác nhất có thể.
    Luôn trả lời bằng tiếng Việt."""),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

# Tạo agent
agent = create_tool_calling_agent(llm, agent_tools, agent_prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=agent_tools,
    verbose=True,
    max_iterations=5
)

# Chạy agent
result = agent_executor.invoke({
    "input": "Rate limit của Claude Tier 1 là bao nhiêu? "
             "Nếu muốn gọi 10,000 requests/ngày thì cần tier mấy?",
    "chat_history": []
})
print(f"
Final answer: {result['output']}")

Conversation History với Memory

from langchain_core.messages import HumanMessage, AIMessage

class ConversationalRAGAgent:
    """RAG Agent với conversation memory."""

    def __init__(self, agent_executor):
        self.agent_executor = agent_executor
        self.chat_history = []

    def chat(self, user_input):
        """Chat với memory."""
        result = self.agent_executor.invoke({
            "input": user_input,
            "chat_history": self.chat_history
        })

        # Cập nhật history
        self.chat_history.extend([
            HumanMessage(content=user_input),
            AIMessage(content=result["output"])
        ])

        return result["output"]

    def reset(self):
        """Reset conversation."""
        self.chat_history = []

# Tạo conversational agent
conv_agent = ConversationalRAGAgent(agent_executor)

# Multi-turn conversation
responses = [
    conv_agent.chat("Claude API rate limit là bao nhiêu?"),
    conv_agent.chat("Tier 4 thì bao nhiêu?"),  # Nhớ context "rate limit"
    conv_agent.chat("Nếu gọi 50 requests/phút thì đủ dùng tier 1 không?")
]

for r in responses:
    print(f"
{r}")

Production Setup: Async và Caching

from langchain_core.caches import InMemoryCache
from langchain.globals import set_llm_cache
import asyncio

# Bật cache để tránh gọi API lặp lại cho cùng query
set_llm_cache(InMemoryCache())

# Async invocation cho throughput cao
async def batch_queries(questions):
    """Query nhiều câu hỏi song song."""
    tasks = [rag_chain.ainvoke(q) for q in questions]
    results = await asyncio.gather(*tasks)
    return results

questions = [
    "Claude context window là bao nhiêu?",
    "Streaming response hoạt động như thế nào?",
    "Tool use trong Claude là gì?"
]

answers = asyncio.run(batch_queries(questions))
for q, a in zip(questions, answers):
    print(f"Q: {q}")
    print(f"A: {a[:200]}...")
    print()

Kết luận

LangChain + Pinecone + Claude là stack production-proven cho RAG applications. LCEL giúp code clean và composable, Pinecone đảm bảo scale, và Claude cung cấp AI quality cao nhất. Pipeline này phù hợp cho chatbots, Q&A systems, và knowledge management tools.

Bước tiếp theo: Khám phá Multi-Modal RAG với LlamaIndex + Claude Vision để xử lý cả hình ảnh trong RAG, hoặc đọc về Router Query Engine cho intelligent query routing.

Gợi ý cho bạn

Agent Loop — Nền tảng xây dựng AI Agent với Claude

RAG Agent với LangChain + Pinecone + Claude

Điểm nổi bật

LangChain Expression Language (LCEL)

Cài đặt

Tạo Pinecone Vector Store

Xây dựng RAG Chain với LCEL

Streaming với LCEL

RAG Agent với Tools

Conversation History với Memory

Production Setup: Async và Caching

Kết luận

Bài viết liên quan

Gợi ý cho bạn

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Memory Management — Quản lý bộ nhớ dài hạn cho Claude agents

Context Compaction — Tự động nén context cho conversations dài

Human-in-the-Loop — Khi nào AI Agent cần xin phép con người

Tin liên quan nên xem

Testing AI Agent — Framework đánh giá và kiểm thử agent production

RAG với Pinecone + Claude — Vector database cho AI

RAG với MongoDB + Claude — Xây chatbot có kiến thức

ReAct Agent với LlamaIndex + Claude — Lý luận + Hành đ��ng

RAG Agent với LangChain + Pinecone + Claude

Điểm nổi bật

LangChain Expression Language (LCEL)

Cài đặt

Tạo Pinecone Vector Store

Xây dựng RAG Chain với LCEL

Streaming với LCEL

RAG Agent với Tools

Conversation History với Memory

Production Setup: Async và Caching

Kết luận

Bài viết liên quan

Gợi ý cho bạn

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Memory Management — Quản lý bộ nhớ dài hạn cho Claude agents

Context Compaction — Tự động nén context cho conversations dài

Human-in-the-Loop — Khi nào AI Agent cần xin phép con người

Tin liên quan nên xem

Testing AI Agent — Framework đánh giá và kiểm thử agent production

RAG với Pinecone + Claude — Vector database cho AI

RAG với MongoDB + Claude — Xây chatbot có kiến thức

ReAct Agent với LlamaIndex + Claude — Lý luận + Hành đ��ng

Đăng ký nhận bản tin