State, stateless_http và json_response — Hai flag quyết định production — MCP: Chủ đề nâng cao

> "Tại sao code của tôi chạy hoàn hảo trên stdio + single instance HTTP, nhưng sau khi scale lên 5 replica qua load balancer, sampling không work, progress bar biến mất, user complain tool bị treo?"

Bạn sẽ học được

Giải thích được vì sao horizontal scaling lại khó khăn với stateful MCP server, và lý thuyết đằng sau.
Liệt kê chính xác 5 tính năng bị disable khi bật stateless_http=True.
Phân biệt được hiệu ứng của stateless_http (cấu trúc session) và json_response (streaming).
Áp dụng được decision framework để chọn đúng flag cho deployment scenario cụ thể.
Biết được các giải pháp thay thế nếu không muốn bật flag nhưng vẫn cần scale (sticky session, tier architecture).

Tại sao scaling MCP lại khó — Câu chuyện kiến trúc

Điểm xuất phát: 1 instance, 10 user

Ngày đầu, MCP server của bạn chạy trên 1 container. 10 user. Mỗi user mở 1 session với 1 primary SSE connection. Server giữ 10 SSE connection trong RAM, map mỗi connection tới mcp-session-id. Mọi thứ ổn.

Vấn đề: 10,000 user muốn dùng cùng lúc

Server tải nặng, CPU 100%. Giải pháp kinh điển: horizontal scaling — deploy nhiều replica sau load balancer.

Và đây là lúc mọi thứ bắt đầu gãy. Lý do: load balancer mặc định random routing — request nào đến instance nào là ngẫu nhiên.

Nhớ lại bài 10.4: client cần 2 kênh đến server:

Nếu:

Pod 2 xử lý tool call, muốn gửi progress qua primary SSE → nhưng SSE đó ở pod 1. Pod 2 không có cách gửi, progress lost. Tool xong cần sampling → sampling request phải đi qua primary SSE → ở pod 1 → pod 2 không gọi được → sampling fail.

Đây là coordination problem giữa các pod.

2 giải pháp mà protocol designer cân nhắc

Option A: Share state giữa các pod.

Dùng Redis, Postgres LISTEN/NOTIFY, hoặc message queue. Pod 2 gửi progress vào Redis pub/sub, pod 1 subscribe → đọc ra và forward cho client.

Workable nhưng phức tạp:

Option B: Loại bỏ state. Accept mất vài tính năng.

stateless_http=True. Không session, không primary SSE, không routing problem. Trade feature lấy simplicity.

SDK chọn option B làm sẵn có (1 flag), và để option A cho developer tự implement nếu cần.

Đây là cái khó: nếu bạn muốn cả 2 (scale + full features), bạn phải tự build infra coordinate. Spec không giúp.

GET primary SSE — kết nối long-lived.
POST tool call — mỗi lần gọi tool.
GET SSE của user A đi vào pod 1 (pod 1 giữ connection).
POST tool call của user A đi vào pod 2.
Infrastructure phức tạp hơn (thêm Redis, thêm operational load).
Latency thêm một hop.
Failure mode mới (Redis down thì sao?).

┌─────────────────────────────────────────────────────────┐
│                                                         │
│   USER A                                                │
│    │                                                    │
│    │ GET SSE  ───▶ Pod 1 (giữ connection)               │
│    │                                                    │
│    │ POST tool ──▶ Pod 2 (muốn gửi progress)            │
│    │                   │                                │
│    │                   ▼                                │
│    │              ❌ Không có connection trong Pod 2     │
│    │                                                    │
│    │              ❌ Progress không đến user             │
│    │              ❌ Sampling không đến user             │
│                                                         │
└─────────────────────────────────────────────────────────┘

stateless_http=True — Bật lên thì mất gì

Flag này hiểu đơn giản: "tôi sẽ treat mỗi request HTTP như standalone, không track session ID, không SSE persistent".

Khi bật:

Note đặc biệt: initialize cũng bị skip. Client gửi tools/call thẳng, server xử lý. Pattern giống REST API hơn là MCP.

Quy tắc vàng của stateless_http=True

Khi bật flag này, coi MCP server của bạn như tập hợp tool REST API với JSON Schema input/output. Không dùng được các tính năng phụ thuộc bidirectional. Và đó là OK — nhiều use case chỉ cần vậy.

┌─────────────────────────────────────────────────────────┐
│   TRƯỚC (stateless_http=False, default)                 │
│   ──────────────────────────────────                    │
│   ✅ mcp-session-id cấp trong initialize                 │
│   ✅ GET /mcp SSE long-lived (primary channel)           │
│   ✅ Progress notifications                              │
│   ✅ Log notifications                                   │
│   ✅ Sampling (server→client request)                    │
│   ✅ Roots (server→client request)                       │
│   ✅ Resource updated notifications                      │
│   ✅ Subscriptions / list_changed events                 │
│                                                         │
│   SAU (stateless_http=True)                             │
│   ─────────────────────                                 │
│   ❌ Không session ID                                    │
│   ❌ Không primary SSE (GET /mcp trả 405)                │
│   ❌ Không progress                                      │
│   ❌ Không log streaming                                 │
│   ❌ Không sampling                                      │
│   ❌ Không roots request từ server                       │
│   ❌ Không subscriptions                                 │
│                                                         │
│   Bù lại, được gì?                                      │
│   ✅ Client KHÔNG cần initialize (tool call trực tiếp)   │
│   ✅ Horizontal scale trivial — bất kỳ pod nào cũng OK   │
│   ✅ Infrastructure đơn giản — chỉ POST, không SSE       │
│                                                         │
└─────────────────────────────────────────────────────────┘

json_response=True — Tắt streaming

Flag thứ 2 đơn giản hơn nhiều:

Khi nào bật json_response=True?

Điểm mấu chốt: flag này chỉ ảnh hưởng POST response

Primary SSE (nếu stateless_http=False) vẫn hoạt động. Server vẫn có thể gửi sampling request qua primary SSE. Chỉ là tool call-specific stream mất.

Mặc định (False): tool call response đi qua SSE stream — có thể emit log, progress, rồi final result.
True: tool call response là JSON thuần, một cục — không stream. Chỉ có final result.
Proxy/API gateway không support SSE passthrough đúng cách.
Client bạn control là CLI hoặc backend service — không cần progress UI.
Tool của bạn vốn chạy nhanh (< 1-2s), progress không có giá trị.

┌────────────────────────────────────────────────────────┐
│                                                        │
│  json_response=False (default)                         │
│  ──────────────────────                                │
│  POST /mcp                                             │
│    Content-Type: text/event-stream (response)          │
│  event: message                                        │
│  data: {progress 20%}                                  │
│  event: message                                        │
│  data: {progress 50%}                                  │
│  event: message                                        │
│  data: {log: "processing..."}                          │
│  event: message                                        │
│  data: {CallToolResult}                                │
│  (stream close)                                        │
│                                                        │
│  json_response=True                                    │
│  ─────────────────                                     │
│  POST /mcp                                             │
│    Content-Type: application/json (response)           │
│  {                                                     │
│    "jsonrpc": "2.0", "id": 7,                          │
│    "result": { ... CallToolResult ... }                │
│  }                                                     │
│  (no progress, no log during execution)                │
│                                                        │
└────────────────────────────────────────────────────────┘

Ma trận so sánh 4 combinations

Dòng 3 (stateless_http=True + json_response=False) hợp lệ nhưng ít phổ biến: bạn giữ streaming cho POST response (log / intermediate data trong tool call) nhưng bỏ primary SSE (sampling, global progress). Dùng khi tool cần log nhưng server không cần bidirectional request.

Dòng 4 là chế độ "MCP như REST API" — đơn giản nhất, scale nhất, nhưng năng lực thấp nhất.

stateless_http	json_response	Server→client?	Progress/Log?	Scale?	Khi nào dùng
False (default)	False (default)	✅ Full	✅ Full	⚠️ Cần sticky session	Single instance / small scale prod
False	True	✅ Qua primary SSE	⚠️ Chỉ qua primary SSE	⚠️ Cần sticky session	Proxy buffer không được, giữ sampling
True	False	❌	❌	✅ Trivial	Hiếm — hầu như không logic (false ambiguity)
True	True	❌	❌	✅ Trivial	Public API scale lớn, không cần bidirectional

Decision framework — Chọn flag ra sao

Đây là flow hỏi đáp giúp bạn quyết định:

Quy tắc thực tế:

Dev local → tất cả False. Test full feature.
Staging / small prod → tất cả False + single instance. Monitor throughput.
Large prod → hỏi 4 câu trên.

┌─────────────────────────────────────────────────────────┐
│                                                         │
│   1. Bạn có chạy trên >1 instance không?                │
│      ├── KHÔNG → stateless_http=False, OK single.       │
│      └── CÓ ↓                                           │
│                                                         │
│   2. Có thể dùng session affinity (sticky session)?     │
│      ├── CÓ → stateless_http=False + sticky.            │
│      │        Giữ full features, scale được.            │
│      └── KHÔNG ↓                                        │
│                                                         │
│   3. Server có cần sampling / roots / progress không?   │
│      ├── CÓ → Xây Redis pub/sub layer để share state.   │
│      │        Phức tạp, cần infra mới.                  │
│      │        HOẶC chia tier: stateful pool riêng.      │
│      └── KHÔNG → stateless_http=True. Sống với thiếu.   │
│                                                         │
│   4. Proxy/gateway của bạn hỗ trợ SSE không?            │
│      ├── CÓ → json_response=False, stream tool result.  │
│      └── KHÔNG → json_response=True, chấp nhận thiếu    │
│                  progress/log trong tool.               │
│                                                         │
└─────────────────────────────────────────────────────────┘

Sticky session — Workaround số 1

Nếu bạn muốn giữ full features mà vẫn scale, sticky session (session affinity) là giải pháp phổ biến nhất.

Cách hoạt động: Load balancer dùng cookie hoặc hash của session ID để "đóng đinh" client A vào pod 1 cho toàn bộ session.

Cấu hình nginx:

Cấu hình Kubernetes Service:

upstream mcp_backend {
    ip_hash;  # hash theo IP client
    server pod1:8000;
    server pod2:8000;
    server pod3:8000;
}

Sticky session — Workaround số 1 (tiếp)

Trade-off:

✅ Giữ full features, scale multi-pod.
⚠️ Pod restart (deploy, rollout) → mất connection của users trên pod đó. Phải implement graceful reconnect.
⚠️ Load distribution không đồng đều (client A connect mãi pod 1 → pod 1 load cao).

apiVersion: v1
kind: Service
spec:
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 3600

Tier architecture — Workaround số 2

Thay vì 1 deployment giải quyết hết, chia 2 tier:

Client gọi đúng tier dựa trên tool:

Phức tạp hơn nhưng production-realistic cho SaaS lớn.

Tool đơn giản → stateless tier (scale cao, chi phí thấp).
Tool phức tạp cần sampling/progress → stateful tier.

                      ┌──────────────┐
                      │ Load Balancer│
                      └──────┬───────┘
              ┌──────────────┴────────────────┐
              ▼                               ▼
     ┌─────────────────┐             ┌─────────────────┐
     │ STATELESS TIER  │             │ STATEFUL TIER   │
     │ (10-50 pods)    │             │ (2-5 pods, sticky)│
     │                 │             │                 │
     │ Quick tools:    │             │ Heavy tools:    │
     │ - lookup        │             │ - research      │
     │ - transform     │             │ - summarize     │
     │ - search        │             │ - chain calls   │
     │                 │             │                 │
     │ stateless_http  │             │ stateless_http  │
     │     = True      │             │     = False     │
     └─────────────────┘             └─────────────────┘

Dev vs Production mismatch — Bug nguồn số 1

Anti-pattern: Dev stdio, deploy HTTP, không test HTTP ở dev

Developer viết code, mcp.run() default stdio. Test local với Claude Desktop. Mọi thứ work.

Deploy production với mcp.run(transport="streamable-http", stateless_http=True). Tool sampling fail. Bug mất 2 tuần để debug vì code path khác.

Khuyến nghị: Luôn test transport prod ở dev

CI test với cả 3 config:

# server.py
import os

transport = os.environ.get("MCP_TRANSPORT", "stdio")
stateless = os.environ.get("MCP_STATELESS", "false").lower() == "true"
json_resp = os.environ.get("MCP_JSON_RESPONSE", "false").lower() == "true"

if __name__ == "__main__":
    if transport == "stdio":
        mcp.run()
    else:
        mcp.run(
            transport=transport,
            stateless_http=stateless,
            json_response=json_resp,
        )

Khuyến nghị: Luôn test transport prod ở dev

Bug caught ngay trong CI, không để lộn sang prod.

MCP_TRANSPORT=stdio pytest tests/
MCP_TRANSPORT=streamable-http pytest tests/
MCP_TRANSPORT=streamable-http MCP_STATELESS=true pytest tests/

Ví dụ theo ngành

🏢 SaaS B2B với MCP cho enterprise customer

Pain: Có cả tool đơn giản (search tickets) và tool phức tạp (summarize week, cần sampling). Enterprise customer expect full features.

Giải pháp:

🔍 DevTool open-source maintainer

Pain: MCP server public, free tier. Không muốn tự trả tiền sampling.

Giải pháp:

🛠️ Internal platform team expose MCP cho data team

Pain: Các data analyst dùng MCP để Claude query warehouse. Query có thể 2-5 phút. Cần progress.

Giải pháp:

🔌 Payment provider expose MCP để checkout

Pain: Bắt buộc stateless vì load balancer của họ không hỗ trợ sticky session. Nhưng tool payment cần sampling để tóm tắt cart.

Giải pháp:

Single tier với sticky session (Kubernetes ClientIP affinity).
5 pods trong pool, grace shutdown 60s.
Max 100 concurrent session per pod → scale gần linear.
Kết quả: 500 customer concurrent, feature đầy đủ, no bug related to state routing.
stateless_http=True. Mất sampling hoàn toàn — thay bằng docs hướng dẫn user enable sampling ở client.
Chấp nhận không có progress. Thiết kế tool chạy < 3s.
Deploy Cloudflare Workers (serverless) — scale tới infinite.
Kết quả: 50k user active, $0 server cost. Trade-off: UX không xịn như server stateful.
Single instance (chỉ 50 user total).
stateless_http=False, json_response=False. Full features.
Dùng tcp keep-alive + SSE heartbeat để tránh timeout.
Kết quả: Feature đầy đủ, setup đơn giản nhất, không cần scale gì thêm.
Thay vì sampling, server gọi Claude trực tiếp với subscription key riêng.
stateless_http=True, json_response=True. Trade-off: pay API cost, gain simplicity.
Có rate limit để tránh abuse.
Kết quả: Deploy done trong 2 tuần. Cost predictable ($0.02/call).

Anti-patterns

❌ Bật stateless_http=True chỉ vì "an toàn hơn"

Hiện tượng: Dev đọc docs, thấy flag có thể bật, "bật cho chắc".

Cách đúng: Flag này giảm khả năng, không "an toàn hơn". Chỉ bật khi bạn thực sự cần scale horizontal mà không có sticky session.

❌ Không log flag config vào startup log

Hiện tượng: Production chạy không đúng config mà không ai biết.

Cách đúng: Log rõ ràng khi server start — nhớ ghi ra stderr (stdout là kênh JSON message của stdio transport, in prose sẽ phá protocol):

Mỗi lần user báo bug, check log xem config lúc đó.

❌ Coi json_response=True là "tối ưu performance"

Hiện tượng: "JSON nhẹ hơn SSE đúng không?" → bật json_response=True để "tăng tốc".

Cách đúng: Overhead SSE negligible. Sự khác biệt thực sự là user experience (progress visible vs. black box). Chọn dựa trên UX không phải micro-optim.

❌ Flip flag mà không test lại toàn bộ

Hiện tượng: Production đổi từ stateful sang stateless, không regress test.

Cách đúng: Mỗi thay đổi flag = test matrix đầy đủ. Các tool phụ thuộc bidirectional → test kỹ.

❌ Quên graceful shutdown khi dùng sticky session

Hiện tượng: Deploy rollout, pod bị kill, 100 client mất kết nối đột ngột.

Cách đúng:

SIGTERM → stop nhận connection mới, drain existing session.
Client SDK retry logic → auto reconnect tới pod khác (mất state mềm OK).

import sys
print(f"[MCP] transport={transport} stateless={stateless} json_response={json_resp}",
      file=sys.stderr)

Mẹo nâng cao

Mẹo 1: Feature flag detection phía client

Sau handshake, client có thể check serverInfo.capabilities. Nếu thiếu sampling, subscriptions... — biết ngay server đang stateless mode, adapt UI tương ứng.

Mẹo 2: Dùng tool metadata để hint

Server có thể đánh dấu tool nào phụ thuộc bidirectional trong description:

Client thấy keyword "sampling" → có thể fallback hoặc error graceful.

Mẹo 3: Redis pub/sub pattern để giữ bidirectional ở multi-pod

Nếu bạn quyết định build option A (share state), pattern thường dùng:

@mcp.tool(
    name="summarize",
    description="Summarize long text. Requires sampling support."
)

Mẹo 3: Redis pub/sub pattern để giữ bidirectional ở multi-pod

Latency thêm ~1-5ms nhưng cho phép full features + full scale.

Mẹo 4: Observe 2 metric riêng biệt

Distinguish:

Tỷ lệ active/total cho biết transport đang stateful hay stateless effectively.

Mẹo 5: Đọc SDK source để biết chi tiết

Khi doubt, đọc python-sdk/src/mcp/server/streamable_http.py. SDK là nguồn truth — behavior của flag được implement rõ ràng trong code.

mcp_sessions_active (số session đang có primary SSE).
mcp_requests_total{route="tools/call"} (số request POST tool).

# On pod 1 (giữ SSE connection):
await redis.subscribe(f"session:{sid}")
async for message in redis.listen():
    await sse_writer.send(message)

# On pod 2 (xử lý tool call):
await redis.publish(f"session:{sid}", progress_message_json)

Áp dụng ngay

Bài tập 1: Test 4 combinations (30 phút)

Bước 1: Tạo server với 2 tool:

Bước 2: Chạy server với env var điều khiển 2 flag:

Bước 3: Với mỗi case, dùng MCP Inspector gọi research, ghi lại:

Bước 4: Điền bảng:

Bài tập 2 (optional): Viết decision doc cho team

Viết 1 trang markdown ghi rõ:

Document này sẽ cứu bạn sau 6 tháng khi onboard developer mới hoặc đổi infra.

add(a, b) — nhanh.
research(topic) — await asyncio.sleep(3) + ctx.report_progress(50, 100) + ctx.info("researching...").
Progress có hiện không? ___________
Log có xuất hiện không? ___________
Tool result có trả về đúng không? ___________
Setup production hiện tại (1 instance? multi? có sticky?).
Các tool có cần bidirectional không?
Config flag được chọn vì lý do gì.

Case	Progress	Log	Result OK	Dùng cho
False / False				___________
True / False				___________
False / True				___________
True / True				___________

# Case 1: cả 2 False
MCP_STATELESS=false MCP_JSON_RESPONSE=false uv run server.py

# Case 2: chỉ stateless=True
MCP_STATELESS=true MCP_JSON_RESPONSE=false uv run server.py

# Case 3: chỉ json_response=True
MCP_STATELESS=false MCP_JSON_RESPONSE=true uv run server.py

# Case 4: cả 2 True
MCP_STATELESS=true MCP_JSON_RESPONSE=true uv run server.py

Tóm tắt bài học

🎯 Scaling MCP = scaling stateful system — không đơn giản như scale REST API, vì có SSE primary connection.

🎯 stateless_http=True = disable bidirectional — mất sampling, roots, progress, log, subscriptions. Đổi lại: scale trivial.

🎯 json_response=True = disable streaming response POST — mất progress/log trong tool call. Không ảnh hưởng primary SSE.

🎯 4 combinations, 4 tradeoff — chọn dựa trên: số instance, có sticky session không, có cần bidirectional không.

🎯 Sticky session là workaround quan trọng nhất — giữ full features + scale multi-pod. Trade-off: rollout phức tạp hơn.

🎯 Test matrix trong CI — mỗi combination cần regression test. Dev stdio, deploy HTTP mà không test HTTP = tự chuốc bug.

🎯 Đây là nguồn bug hàng đầu trong MCP production — rất nhiều post-deploy bug mà developer report trên forum có gốc ở flag config sai hoặc không match transport test local.

Tài liệu tham khảo

MCP Streamable HTTP — stateless mode
Python SDK: streamable_http.py — source code implement 2 flag
Kubernetes Session Affinity docs
nginx Sticky Session modules

Nội dung này có hữu ích không?