Walkthrough — Triển khai sampling end-to-end — MCP: Chủ đề nâng cao

Bạn sẽ học được

Mở và chạy project mẫu sampling.zip từ Anthropic.
Hiểu flow code của server gửi create_message và client nhận callback.
Quan sát raw JSON sampling request/response bằng MCP Inspector.
Modify project để thêm system prompt, multi-turn conversation, model preferences.
Debug 3 failure mode phổ biến: API key missing, sampling timeout, content type mismatch.

Project overview

Setup:

Điểm khác biệt với project notifications/: phải có Anthropic API key ở client side (không phải server).

sampling/
├── .gitignore
├── client.py          # Client với sampling_callback
├── pyproject.toml
├── README.md
└── server.py          # Server dùng ctx.session.create_message()

cd ~/mcp-advanced
unzip sampling.zip
cd sampling/

# Copy env
cp .env.example .env
# Mở .env, điền ANTHROPIC_API_KEY=sk-ant-api-...

# Install deps
uv sync

server.py — Tạo sampling request

Dissect

1. Không import anthropic ở server

Chú ý: server code không có from anthropic import .... Server không biết gì về LLM provider. Nó chỉ tạo prompt và delegate cho client.

2. ctx.session.create_message(...)

Đây là entry point sampling. SDK tự wrap thành sampling/createMessage JSON message gửi tới client.

3. Validation result

# server.py
from mcp.server.fastmcp import FastMCP, Context
from mcp.types import SamplingMessage, TextContent
from pydantic import Field

mcp = FastMCP(name="sampling-demo")


@mcp.tool(
    name="summarize",
    description="Summarize a piece of text"
)
async def summarize(
    text_to_summarize: str = Field(description="The text to summarize"),
    *,
    ctx: Context,
) -> str:
    # Build prompt
    prompt = f"""Please summarize the following text concisely in 2-3 sentences:

{text_to_summarize}
"""

    # Request sampling từ client
    result = await ctx.session.create_message(
        messages=[
            SamplingMessage(
                role="user",
                content=TextContent(type="text", text=prompt),
            )
        ],
        max_tokens=500,
        system_prompt="You are a helpful research assistant. Keep summaries concise.",
    )

    # Process result
    if result.content.type == "text":
        return result.content.text
    else:
        raise ValueError(f"Expected text, got {result.content.type}")


if __name__ == "__main__":
    mcp.run()

Dissect

Client có thể return content type khác (image, embedded file). Server phải verify trước khi dùng.

4. max_tokens=500, system_prompt="..."

Server decide các tham số. Client có thể tune thêm (temperature, top_p) qua SDK của mình nhưng server không control.

if result.content.type == "text":
    return result.content.text

client.py — Callback dùng Anthropic SDK

Dissect

1. load_dotenv() load API key từ .env

Pattern secure. Không hardcode key trong code.

2. Conversion MCP ↔ Anthropic

Hai format gần giống nhau nhưng không 100%. Conversion code tối giản ở đây — nếu production, nên có helper function robust hơn (handle image content, tool use, v.v.).

3. Model hardcode claude-sonnet-5

Client control model. Server không biết. Bạn có thể đổi thành claude-haiku-4-5-20251001 cho cost thấp hơn, hoặc claude-opus-4-8 cho quality cao hơn.

4. Return CreateMessageResult

Type strict. Nếu return sai type, SDK raise error.

# client.py
import asyncio
import os
from anthropic import AsyncAnthropic
from dotenv import load_dotenv

from mcp.client.stdio import stdio_client, StdioServerParameters
from mcp import ClientSession, RequestContext
from mcp.types import (
    CreateMessageRequestParams,
    CreateMessageResult,
    TextContent,
)

load_dotenv()  # load .env
anthropic = AsyncAnthropic(api_key=os.environ["ANTHROPIC_API_KEY"])


async def sampling_callback(
    context: RequestContext,
    params: CreateMessageRequestParams,
) -> CreateMessageResult:
    """Convert MCP sampling request → Anthropic API call → return result."""

    # Convert messages format
    anthropic_messages = []
    for msg in params.messages:
        anthropic_messages.append({
            "role": msg.role,  # "user" or "assistant"
            "content": msg.content.text if msg.content.type == "text" else "",
        })

    # Call Claude
    response = await anthropic.messages.create(
        model="claude-sonnet-5",
        max_tokens=params.max_tokens,
        system=params.system_prompt or "",
        messages=anthropic_messages,
    )

    # Extract text
    text = response.content[0].text if response.content else ""

    # Return result
    return CreateMessageResult(
        role="assistant",
        model="claude-sonnet-5",
        content=TextContent(type="text", text=text),
    )


async def run():
    server_params = StdioServerParameters(
        command="uv", args=["run", "server.py"]
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(
            read, write,
            sampling_callback=sampling_callback,
        ) as session:
            await session.initialize()

            # Example input
            long_text = """
            The Model Context Protocol is a standardized way for AI applications
            to access external tools, data, and context. It was open-sourced by
            Anthropic in November 2024 and has seen rapid adoption across the
            industry. MCP servers can expose tools, resources, and prompts to
            AI clients, enabling rich integrations with services like GitHub,
            Slack, databases, and custom enterprise systems.
            """

            result = await session.call_tool(
                name="summarize",
                arguments={"text_to_summarize": long_text},
            )

            print("=== SUMMARY ===")
            print(result.content[0].text)


if __name__ == "__main__":
    asyncio.run(run())

Chạy thực tế

Output điển hình:

cd sampling/
uv run client.py

Chạy thực tế (tiếp)

Behind the scene:

Thời gian: 3-8 giây, phần lớn là Anthropic API latency.

Client spawn server subprocess.
Handshake.
Client call summarize.
Server receive, build prompt, emit sampling/createMessage.
Client callback convert → call Anthropic API.
Anthropic response → convert → return to server.
Server return summary as tool result.
Client print.

=== SUMMARY ===
The Model Context Protocol (MCP) is an open-source standard created by Anthropic
in November 2024 that enables AI applications to connect with external tools,
data sources, and services. MCP has gained rapid industry adoption and allows
servers to expose capabilities to AI clients for rich integrations.

Quan sát raw JSON bằng Inspector

Chạy server với Inspector thay vì client.py:

Cách test sampling qua Inspector:

Trong UI Inspector, tab "Sampling", cấu hình backend (Anthropic/OpenAI/...) với API key.
Tab "Tools", call summarize với input text.
Observe message panel — sẽ có message:

npx @modelcontextprotocol/inspector uv run server.py

Quan sát raw JSON bằng Inspector (tiếp)

So với tool call: cùng pattern request/response có id, nhưng đảo chiều — server là bên gửi request, client trả response.

// Server → Client (sampling request)
{
  "jsonrpc": "2.0",
  "id": 5,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {"type": "text", "text": "Please summarize..."}
      }
    ],
    "maxTokens": 500,
    "systemPrompt": "You are a helpful research assistant..."
  }
}

// Client → Server (sampling response)
{
  "jsonrpc": "2.0",
  "id": 5,
  "result": {
    "role": "assistant",
    "model": "claude-sonnet-5",
    "content": {"type": "text", "text": "The Model Context Protocol..."}
  }
}

Modify project — Các bài tập thực hành

Exercise 1: Multi-turn conversation

Đổi tool để có multi-turn context:

Run, observe: 2 sampling request được gửi, mỗi cái gọi Claude riêng.

Exercise 2: Model preferences

Add hint để client biết ưu tiên model nào. Dùng Pydantic types (SDK dùng Pydantic validation):

@mcp.tool()
async def analyze_with_followup(
    text: str,
    followup_question: str,
    *,
    ctx: Context,
) -> str:
    # Turn 1: initial analysis
    result1 = await ctx.session.create_message(
        messages=[
            SamplingMessage(
                role="user",
                content=TextContent(type="text", text=f"Analyze this text: {text}")
            )
        ],
        max_tokens=1000,
    )

    # Turn 2: followup based on analysis
    result2 = await ctx.session.create_message(
        messages=[
            SamplingMessage(role="user", content=TextContent(type="text", text=f"Analyze: {text}")),
            SamplingMessage(role="assistant", content=TextContent(type="text", text=result1.content.text)),
            SamplingMessage(role="user", content=TextContent(type="text", text=followup_question)),
        ],
        max_tokens=500,
    )

    return f"Analysis:\n{result1.content.text}\n\nFollowup:\n{result2.content.text}"

Exercise 2: Model preferences

Note: client code hiện tại hardcode claude-sonnet-5, sẽ ignore hint. Để respect hint, sửa callback:

from mcp.types import ModelPreferences, ModelHint

result = await ctx.session.create_message(
    messages=[...],
    max_tokens=500,
    model_preferences=ModelPreferences(
        hints=[ModelHint(name="claude-haiku-4-5")],  # prefer faster model
        cost_priority=0.9,
        speed_priority=0.8,
        intelligence_priority=0.3,
    ),
)

Modify project — Các bài tập thực hành (tiếp)

Exercise 3: Error handling

Test khi API key invalid:

async def sampling_callback(context, params):
    # Honor model hint nếu có
    hints = getattr(params, "model_preferences", {}).get("hints", [])
    model = hints[0]["name"] if hints else "claude-sonnet-5"

    response = await anthropic.messages.create(
        model=model,
        ...
    )

Exercise 3: Error handling

Observe: exception stack trace. Cải thiện callback để trả graceful error:

# Set dummy key
export ANTHROPIC_API_KEY=sk-invalid
uv run client.py

Modify project — Các bài tập thực hành (tiếp)

Tool sẽ nhận error text thay vì crash.

Exercise 4: Redact sensitive data

Assume text_to_summarize có email. Redact trước khi sampling:

async def sampling_callback(context, params):
    try:
        response = await anthropic.messages.create(...)
        return CreateMessageResult(...)
    except Exception as e:
        # Return error result instead of raising
        return CreateMessageResult(
            role="assistant",
            model="error",
            content=TextContent(type="text", text=f"[Sampling error: {e}]")
        )

Exercise 4: Redact sensitive data

Observe: Claude vẫn summarize được nhưng không thấy email thực. Privacy preserved.

import re

@mcp.tool()
async def summarize(text_to_summarize: str, *, ctx: Context) -> str:
    # Redact emails
    redacted = re.sub(r'\S+@\S+', '[EMAIL]', text_to_summarize)

    # Sampling với redacted text
    result = await ctx.session.create_message(
        messages=[SamplingMessage(role="user", content=TextContent(type="text", text=f"Summarize: {redacted}"))],
        max_tokens=500,
    )
    return result.content.text

Các failure mode phổ biến — Và cách debug

Failure 1: ANTHROPIC_API_KEY không load

Symptom: Exception AuthenticationError: No API key provided.

Root cause: .env không được load. Hoặc env var name sai.

Fix:

Failure 2: Sampling request timeout

Symptom: Tool call hang, không có progress.

Root cause:

Fix:

Client callback crash silently.
Network issue tới Anthropic.
max_tokens quá lớn → Claude trả lâu.
Add try/except trong callback với log rõ.
Add timeout:

cat .env  # verify file có đúng format
# Nên:
# ANTHROPIC_API_KEY=sk-ant-api03-...

# Test load:
python -c "from dotenv import load_dotenv; load_dotenv(); import os; print(os.environ.get('ANTHROPIC_API_KEY', 'NOT_SET')[:10])"

Failure 2: Sampling request timeout

Failure 3: Content type mismatch

Symptom: Server raise ValueError: Expected text, got image.

Root cause: Client callback trả về content type ngoài text nhưng server code chỉ handle text.

Fix: Handle nhiều content type hoặc ép client chỉ return text bằng prompt rõ.

Failure 4: Rate limit từ Anthropic

Symptom: Error rate_limit_error trong callback.

Fix:

Retry với exponential backoff.
Upgrade Anthropic tier.
Cache sampling response.

  response = await asyncio.wait_for(
      anthropic.messages.create(...),
      timeout=60
  )

Tích hợp với Claude Desktop

Khi gắn server này vào Claude Desktop (client thật), Claude Desktop tự handle sampling:

Thử:

Đây là magic moment của sampling: user không cần API key, không cần setup, chỉ cần subscription Claude Pro đã có.

Config server trong claude_desktop_config.json:
Claude Desktop spawn server.
Khi server gọi sampling/createMessage, Claude Desktop dùng chính Claude model của user (subscription Claude Pro của user) — không cần API key riêng cho server.
Config server.
Trong Claude, hỏi: "Dùng tool summarize để tóm tắt đoạn này: [paste long text]."
Claude gọi tool, tool request sampling, Claude Desktop dùng current Claude model của user để trả lời.

   {
     "mcpServers": {
       "sampling-demo": {
         "command": "uv",
         "args": ["run", "/path/to/sampling/server.py"]
       }
     }
   }

Anti-patterns từ walkthrough

❌ Hardcode API key trong code

Hiện tượng: anthropic = AsyncAnthropic(api_key="sk-ant-...") trong client.py.

Cách đúng: .env + os.environ. Add .env to .gitignore.

❌ Không validate content type

Hiện tượng: Giả định result.content luôn là text.

Cách đúng: Check .type trước khi truy cập .text.

❌ Log full prompt + response trong production

Hiện tượng: print(prompt) reveal potentially sensitive data.

Cách đúng: Log hash hoặc sample rate thấp. Audit log riêng với ACL.

❌ Không handle client không support sampling

Hiện tượng: Server gọi create_message, client không implement → raise exception không rõ ràng.

Cách đúng: Check capability, có fallback path.

Áp dụng ngay

Bài tập 1: Mở rộng thành tool translate (30 phút)

Bước 1: Clone sampling/ project.

Bước 2: Thay tool summarize bằng translate(text, target_language).

Bước 3: Prompt engineering: system prompt như "You are a professional translator. Translate precisely, keep formatting, do not add commentary."

Bước 4: Run, test với 3 ngôn ngữ.

Bước 5: Observe:

Bài tập 2 (optional): Chain 2 MCP server

Build 2 server:

Client connect cả 2 server. Observe: research call fetch_url first, rồi sampling với kết quả. Đây là pattern chained MCP từ bài 10.8.

Quality so với Google Translate? ___________
Thời gian trung bình per translation? ___________
Token consumption (check Anthropic console)? ___________
Server A: tool fetch_url(url) — không dùng sampling, chỉ fetch.
Server B: tool research(topic) — dùng Server A + sampling để summarize.

Tóm tắt walkthrough

🎯 Project sampling/ có 5 file — khác notifications/ ở chỗ cần .env với API key.

🎯 Server không import Anthropic SDK — server agnostic với LLM provider.

🎯 Client callback convert MCP ↔ Anthropic format — conversion ngắn nhưng phải verify content type.

🎯 max_tokens + system_prompt là tham số quan trọng — ảnh hưởng quality & cost.

🎯 Inspector visualize rõ server→client request — xem được JSON hai chiều.

🎯 Integration Claude Desktop: user không cần API key — subscription Claude Pro cover, đây là killer feature sampling.

Tài liệu tham khảo

sampling.zip — project mẫu chính thức (lấy link từ course gốc Anthropic Academy, hoặc tham khảo python-sdk/examples)
Anthropic API Messages API — SDK reference
MCP Python SDK — sampling example

Nội dung này có hữu ích không?