Nâng caoHướng dẫnClaude APINguồn: Anthropic

Smart Model Routing — Tu dong chon Haiku/Sonnet/Opus theo task

Minh TuấnCTO, Transform GroupTheo dõi

28/03/2026 738 0 13 phút đọc

Nghe bài viết

00:00

1 Smart Model Routing giup ban tu dong chon model phu hop, giam chi phi 60-80% ma van dam bao chat luong.
2 Neu dung Opus cho tat ca: 100% Opus: chi phi rat cao, nhung 70% request khong can suc manh do Voi routing: 60% Haiku + 30% Sonnet + 10% Opus = giam 70% chi phi tong Nhung quan trong hon chi phi, routing con dam bao response time tot hon.
3 Haiku tra loi trong 200-500ms, trong khi Opus co the mat 5-15 giay.
4 Yeu cau nhieu buoc (0.0 - 0.2) step_indicators = len(re.findall( r'(d+.|buoc|sau do|tiep theo|cuoi cung)', message.lower() )) scores.append(min(step_indicators * 0.05, 0.2)) # 5.
5 Mot cau hoi "Hom nay thu may?" khong can Opus, nhung mot yeu cau phan tich hop dong phap ly thi Haiku khong du.

Không phải mọi request đều cần model mạnh nhất. Một câu hỏi "Hôm nay thứ mấy?" không cần Opus, nhưng một yêu cầu phân tích hợp đồng pháp lý thì Haiku không đủ. Smart Model Routing giúp bạn tự động chọn model phù hợp, giảm chi phí 60-80% mà vẫn đảm bảo chất lượng.

Tại sao cần Model Routing?

Các model Claude có độ mạnh và chi phí khác nhau đáng kể:

Haiku: Nhanh nhất, rẻ nhất. Phù hợp cho các tác vụ đơn giản: phân loại, trích xuất, trả lời ngắn
Sonnet: Cân bằng giữa tốc độ và chất lượng. Phù hợp cho đa số tác vụ: viết nội dung, phân tích, code
Opus: Mạnh nhất, đắt nhất. Chỉ cần cho tác vụ phức tạp: suy luận nhiều bước, phân tích sâu, sáng tạo phức tạp

Chi phí so sánh

Giả sử một ứng dụng xử lý 100,000 requests/ngày. Nếu dùng Opus cho tất cả:

100% Opus: chi phí rất cao, nhưng 70% request không cần sức mạnh đó
Với routing: 60% Haiku + 30% Sonnet + 10% Opus = giảm 70% chi phí tổng

Nhưng quan trọng hơn chi phí, routing còn đảm bảo response time tốt hơn. Haiku trả lời trong 200-500ms, trong khi Opus có thể mất 5-15 giây. Người dùng hỏi câu đơn giản mà đợi 10 giây là trải nghiệm tệ.

Classification-based Routing

Phương pháp phổ biến nhất: dùng một model nhẹ (Haiku) để phân loại độ phức tạp của request, rồi route đến model phù hợp.

Kiến trúc tổng quan

# Flow:
# User Request -> Classifier (Haiku) -> Route Decision -> Target Model -> Response

# Classifier output:
# - "simple"  -> Haiku  (phan loai, tra loi ngan, extraction)
# - "medium"  -> Sonnet (viet content, phan tich, code generation)
# - "complex" -> Opus   (suy luan phuc tap, phan tich sau, sang tao)

Python implementation

import anthropic
from enum import Enum
from dataclasses import dataclass

class Complexity(Enum):
    SIMPLE = "simple"
    MEDIUM = "medium"
    COMPLEX = "complex"

@dataclass
class RouteDecision:
    complexity: Complexity
    model: str
    reason: str
    confidence: float

class ModelRouter:
    """Router tu dong chon model dua tren do phuc tap cua request."""

    MODEL_MAP = {
        Complexity.SIMPLE: "claude-haiku-4-20250514",
        Complexity.MEDIUM: "claude-sonnet-4-20250514",
        Complexity.COMPLEX: "claude-opus-4-20250514",
    }

    COST_PER_1K_INPUT = {
        "claude-haiku-4-20250514": 0.001,
        "claude-sonnet-4-20250514": 0.003,
        "claude-opus-4-20250514": 0.015,
    }

    def __init__(self):
        self.client = anthropic.Anthropic()
        self.stats = {"simple": 0, "medium": 0, "complex": 0}

    def classify(self, user_message: str) -> RouteDecision:
        """Dung Haiku de phan loai do phuc tap cua request."""

        classification_prompt = f"""Phan loai do phuc tap cua yeu cau sau.
Tra loi CHINH XAC mot trong ba muc: simple, medium, complex.

Quy tac phan loai:
- simple: Cau hoi co tra loi ngan, phan loai, trich xuat thong tin,
  dich thuat ngan, tinh toan don gian
- medium: Viet noi dung 1-3 doan, phan tich van ban, giai thich
  khai niem, viet code don gian, tom tat
- complex: Suy luan nhieu buoc, phan tich phuc tap, viet code phuc tap,
  so sanh nhieu yeu to, sang tao dai, phan tich phap ly/tai chinh

Yeu cau: {user_message}

Tra loi theo format:
COMPLEXITY: [simple|medium|complex]
CONFIDENCE: [0.0-1.0]
REASON: [ly do ngan gon]"""

        response = self.client.messages.create(
            model="claude-haiku-4-20250514",
            max_tokens=100,
            messages=[{"role": "user", "content": classification_prompt}]
        )

        result = response.content[0].text
        return self._parse_classification(result)

    def _parse_classification(self, result: str) -> RouteDecision:
        """Parse ket qua phan loai tu Haiku."""
        lines = result.strip().split('
')
        complexity_str = "medium"
        confidence = 0.8
        reason = ""

        for line in lines:
            if line.startswith("COMPLEXITY:"):
                complexity_str = line.split(":")[1].strip().lower()
            elif line.startswith("CONFIDENCE:"):
                try:
                    confidence = float(line.split(":")[1].strip())
                except ValueError:
                    confidence = 0.8
            elif line.startswith("REASON:"):
                reason = line.split(":", 1)[1].strip()

        complexity = Complexity(complexity_str)
        model = self.MODEL_MAP[complexity]

        return RouteDecision(
            complexity=complexity,
            model=model,
            reason=reason,
            confidence=confidence
        )

    def route(self, user_message: str, system_prompt: str = "") -> str:
        """Phan loai va gui request den model phu hop."""

        # Buoc 1: Phan loai
        decision = self.classify(user_message)
        self.stats[decision.complexity.value] += 1

        print(f"Routing to {decision.model} "
              f"(complexity: {decision.complexity.value}, "
              f"confidence: {decision.confidence:.0%})")

        # Buoc 2: Gui request den model da chon
        messages = [{"role": "user", "content": user_message}]
        kwargs = {
            "model": decision.model,
            "max_tokens": 4096,
            "messages": messages,
        }
        if system_prompt:
            kwargs["system"] = system_prompt

        response = self.client.messages.create(**kwargs)

        return response.content[0].text

    def get_stats(self) -> dict:
        """Tra ve thong ke routing."""
        total = sum(self.stats.values())
        if total == 0:
            return self.stats
        return {
            k: {"count": v, "percentage": f"{v/total*100:.1f}%"}
            for k, v in self.stats.items()
        }

Sử dụng router

# Khoi tao router
router = ModelRouter()

# Cac request khac nhau se duoc route den model khac nhau

# Request don gian -> Haiku
result1 = router.route("Dich sang tieng Anh: Xin chao")
# Output: Routing to claude-haiku-4-20250514 (complexity: simple)

# Request trung binh -> Sonnet
result2 = router.route("Viet mot email xin loi khach hang ve viec giao hang tre 3 ngay")
# Output: Routing to claude-sonnet-4-20250514 (complexity: medium)

# Request phuc tap -> Opus
result3 = router.route(
    "Phan tich uu nhuoc diem cua 5 kien truc microservices pho bien, "
    "so sanh voi monolith trong boi canh startup Viet Nam co 10 developer, "
    "de xuat kien truc phu hop nhat voi lo trinh chuyen doi 2 nam"
)
# Output: Routing to claude-opus-4-20250514 (complexity: complex)

# Xem thong ke
print(router.get_stats())

Complexity Scoring — Phương pháp nâng cao

Thay vì chỉ dùng Haiku phân loại, bạn có thể kết hợp nhiều tín hiệu để tính điểm phức tạp (complexity score) mà không cần gọi API:

import re
from typing import List

class ComplexityScorer:
    """Tinh diem phuc tap cua request dua tren nhieu tin hieu."""

    # Tu khoa chi muc phuc tap cao
    COMPLEX_KEYWORDS = [
        "phan tich", "so sanh", "danh gia", "thiet ke",
        "kien truc", "chien luoc", "toi uu", "debug",
        "review code", "refactor", "bao mat", "hieu suat"
    ]

    # Tu khoa chi muc don gian
    SIMPLE_KEYWORDS = [
        "dich", "tom tat ngan", "la gi", "dinh nghia",
        "chuyen doi", "tinh", "liet ke", "format"
    ]

    def score(self, message: str) -> float:
        """Tra ve diem phuc tap tu 0.0 (don gian) den 1.0 (phuc tap)."""
        scores = []

        # 1. Do dai message (0.0 - 0.3)
        word_count = len(message.split())
        if word_count < 20:
            scores.append(0.0)
        elif word_count < 50:
            scores.append(0.1)
        elif word_count < 150:
            scores.append(0.2)
        else:
            scores.append(0.3)

        # 2. Tu khoa phuc tap (0.0 - 0.3)
        complex_count = sum(
            1 for kw in self.COMPLEX_KEYWORDS
            if kw in message.lower()
        )
        scores.append(min(complex_count * 0.1, 0.3))

        # 3. Tu khoa don gian (giam diem) (-0.2 - 0.0)
        simple_count = sum(
            1 for kw in self.SIMPLE_KEYWORDS
            if kw in message.lower()
        )
        scores.append(max(-simple_count * 0.1, -0.2))

        # 4. Yeu cau nhieu buoc (0.0 - 0.2)
        step_indicators = len(re.findall(
            r'(d+.|buoc|sau do|tiep theo|cuoi cung)', message.lower()
        ))
        scores.append(min(step_indicators * 0.05, 0.2))

        # 5. Co code/data dinh kem (0.0 - 0.2)
        has_code = '```' in message or 'def ' in message or 'function ' in message
        scores.append(0.15 if has_code else 0.0)

        total = max(0.0, min(1.0, sum(scores)))
        return round(total, 2)

    def get_model(self, score: float) -> str:
        """Chon model dua tren diem phuc tap."""
        if score < 0.3:
            return "claude-haiku-4-20250514"
        elif score < 0.7:
            return "claude-sonnet-4-20250514"
        else:
            return "claude-opus-4-20250514"

Kết hợp hai phương pháp

class HybridRouter:
    """Ket hop rule-based scoring voi LLM classification."""

    def __init__(self):
        self.scorer = ComplexityScorer()
        self.llm_router = ModelRouter()

    def route(self, message: str) -> str:
        # Buoc 1: Rule-based scoring (mien phi, nhanh)
        score = self.scorer.score(message)

        # Neu diem rat thap hoac rat cao, khong can goi Haiku
        if score < 0.15:
            model = "claude-haiku-4-20250514"
            print(f"Fast route (score={score}): {model}")
        elif score > 0.85:
            model = "claude-opus-4-20250514"
            print(f"Fast route (score={score}): {model}")
        else:
            # Vung xam: dung Haiku de phan loai chinh xac hon
            decision = self.llm_router.classify(message)
            model = decision.model
            print(f"LLM route (score={score}, "
                  f"llm={decision.complexity.value}): {model}")

        # Buoc 2: Gui request
        client = anthropic.Anthropic()
        response = client.messages.create(
            model=model,
            max_tokens=4096,
            messages=[{"role": "user", "content": message}]
        )
        return response.content[0].text

Fallback Chains

Khi model được chọn không trả về kết quả tốt (output quá ngắn, từ chối trả lời, hoặc người dùng không hài lòng), bạn cần fallback lên model mạnh hơn:

class FallbackRouter:
    """Router voi co che fallback tu dong."""

    CHAIN = [
        "claude-haiku-4-20250514",
        "claude-sonnet-4-20250514",
        "claude-opus-4-20250514",
    ]

    def __init__(self):
        self.client = anthropic.Anthropic()

    def route_with_fallback(
        self,
        message: str,
        start_model: str,
        min_response_length: int = 50,
        max_retries: int = 2
    ) -> dict:
        """Gui request voi co che fallback."""
        start_idx = self.CHAIN.index(start_model)

        for i in range(start_idx, min(start_idx + max_retries + 1, len(self.CHAIN))):
            model = self.CHAIN[i]

            try:
                response = self.client.messages.create(
                    model=model,
                    max_tokens=4096,
                    messages=[{"role": "user", "content": message}]
                )

                result = response.content[0].text

                # Kiem tra chat luong output
                if len(result) >= min_response_length:
                    return {
                        "model_used": model,
                        "fallback_count": i - start_idx,
                        "response": result,
                        "input_tokens": response.usage.input_tokens,
                        "output_tokens": response.usage.output_tokens,
                    }
                else:
                    print(f"{model}: Response qua ngan ({len(result)} chars), "
                          f"fallback len model manh hon...")

            except Exception as e:
                print(f"{model}: Error - {e}, fallback...")

        return {
            "model_used": self.CHAIN[-1],
            "fallback_count": len(self.CHAIN) - start_idx - 1,
            "response": "Khong the xu ly yeu cau nay.",
            "error": True
        }

A/B Testing Models

Trước khi quyết định routing strategy, bạn nên A/B test để biết model nào phù hợp nhất cho từng loại task của ứng dụng:

import random
import json
import time
from datetime import datetime

class ModelABTest:
    """A/B testing giua cac model Claude."""

    def __init__(self, test_name: str, models: list, split: list = None):
        self.test_name = test_name
        self.models = models
        self.split = split or [1.0 / len(models)] * len(models)
        self.results = []

    def select_model(self) -> str:
        """Chon model theo ty le split."""
        rand = random.random()
        cumulative = 0
        for model, ratio in zip(self.models, self.split):
            cumulative += ratio
            if rand <= cumulative:
                return model
        return self.models[-1]

    def run_test(self, message: str, evaluate_fn=None) -> dict:
        """Chay mot test va ghi lai ket qua."""
        model = self.select_model()
        client = anthropic.Anthropic()

        start_time = time.time()
        response = client.messages.create(
            model=model,
            max_tokens=4096,
            messages=[{"role": "user", "content": message}]
        )
        latency = time.time() - start_time

        result_text = response.content[0].text

        record = {
            "timestamp": datetime.now().isoformat(),
            "model": model,
            "message_preview": message[:100],
            "response_length": len(result_text),
            "latency_seconds": round(latency, 2),
            "input_tokens": response.usage.input_tokens,
            "output_tokens": response.usage.output_tokens,
        }

        # Danh gia chat luong neu co evaluate function
        if evaluate_fn:
            record["quality_score"] = evaluate_fn(message, result_text)

        self.results.append(record)
        return {"response": result_text, "metadata": record}

    def get_summary(self) -> dict:
        """Tong hop ket qua A/B test."""
        summary = {}
        for model in self.models:
            model_results = [r for r in self.results if r["model"] == model]
            if not model_results:
                continue

            summary[model] = {
                "count": len(model_results),
                "avg_latency": round(
                    sum(r["latency_seconds"] for r in model_results) / len(model_results), 2
                ),
                "avg_response_length": round(
                    sum(r["response_length"] for r in model_results) / len(model_results)
                ),
                "avg_tokens": round(
                    sum(r["output_tokens"] for r in model_results) / len(model_results)
                ),
            }

            quality_scores = [r.get("quality_score") for r in model_results if r.get("quality_score")]
            if quality_scores:
                summary[model]["avg_quality"] = round(
                    sum(quality_scores) / len(quality_scores), 2
                )

        return summary

Chạy A/B test

# Tao test so sanh Haiku va Sonnet cho viec tom tat
test = ModelABTest(
    test_name="summarization_test",
    models=["claude-haiku-4-20250514", "claude-sonnet-4-20250514"],
    split=[0.5, 0.5]  # 50/50
)

# Chay 100 requests
test_messages = [
    "Tom tat bai viet sau trong 3 cau: ...",
    "Tom tat email nay thanh 1 doan: ...",
    # ... them cac messages test
]

for msg in test_messages:
    test.run_test(msg)

# Xem ket qua
summary = test.get_summary()
print(json.dumps(summary, indent=2))

# Ket qua giup ban quyet dinh:
# Neu Haiku tom tat du tot -> dung Haiku cho task nay (re hon 5x)
# Neu Sonnet tot hon dang ke -> chi phi them la xung dang

Đo lường chi phí tiết kiệm

Để chứng minh giá trị của routing, bạn cần đo lường chi phí thực tế:

class CostTracker:
    """Theo doi chi phi API voi va khong co routing."""

    PRICING = {
        "claude-haiku-4-20250514": {"input": 0.001, "output": 0.005},
        "claude-sonnet-4-20250514": {"input": 0.003, "output": 0.015},
        "claude-opus-4-20250514": {"input": 0.015, "output": 0.075},
    }

    def __init__(self):
        self.requests = []

    def log_request(self, model: str, input_tokens: int, output_tokens: int):
        cost = (
            (input_tokens / 1000) * self.PRICING[model]["input"] +
            (output_tokens / 1000) * self.PRICING[model]["output"]
        )
        self.requests.append({
            "model": model,
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
            "cost": cost
        })

    def compare_with_single_model(self, baseline_model: str) -> dict:
        """So sanh chi phi routing vs dung 1 model cho tat ca."""
        actual_cost = sum(r["cost"] for r in self.requests)

        # Tinh chi phi neu dung baseline cho tat ca
        baseline_cost = sum(
            (r["input_tokens"] / 1000) * self.PRICING[baseline_model]["input"] +
            (r["output_tokens"] / 1000) * self.PRICING[baseline_model]["output"]
            for r in self.requests
        )

        savings = baseline_cost - actual_cost
        savings_pct = (savings / baseline_cost * 100) if baseline_cost > 0 else 0

        return {
            "total_requests": len(self.requests),
            "routing_cost": f"${actual_cost:.4f}",
            f"all_{baseline_model}_cost": f"${baseline_cost:.4f}",
            "savings": f"${savings:.4f}",
            "savings_percentage": f"{savings_pct:.1f}%",
        }

Domain-specific Routing

Ngoài độ phức tạp, bạn có thể route dựa trên domain của request. Mỗi domain có model phù hợp riêng:

class DomainRouter:
    """Route model dua tren domain cua request."""

    DOMAIN_MODELS = {
        # Domain -> (default_model, description)
        "translation": ("claude-haiku-4-20250514", "Dich thuat don gian dung Haiku"),
        "summarization": ("claude-haiku-4-20250514", "Tom tat ngan dung Haiku"),
        "code_generation": ("claude-sonnet-4-20250514", "Viet code dung Sonnet"),
        "content_writing": ("claude-sonnet-4-20250514", "Viet noi dung dung Sonnet"),
        "analysis": ("claude-sonnet-4-20250514", "Phan tich dung Sonnet"),
        "legal_review": ("claude-opus-4-20250514", "Review phap ly dung Opus"),
        "architecture": ("claude-opus-4-20250514", "Thiet ke kien truc dung Opus"),
        "research": ("claude-opus-4-20250514", "Nghien cuu sau dung Opus"),
    }

    def __init__(self):
        self.client = anthropic.Anthropic()
        self.classifier = ModelRouter()

    def detect_domain(self, message: str) -> str:
        """Dung Haiku de xac dinh domain cua request."""
        prompt = f"""Phan loai yeu cau sau vao MOT trong cac domain:
translation, summarization, code_generation, content_writing,
analysis, legal_review, architecture, research

Yeu cau: {message}

Tra loi CHI MOT tu (domain name):"""

        response = self.client.messages.create(
            model="claude-haiku-4-20250514",
            max_tokens=20,
            messages=[{"role": "user", "content": prompt}]
        )
        domain = response.content[0].text.strip().lower()
        return domain if domain in self.DOMAIN_MODELS else "analysis"

    def route(self, message: str) -> dict:
        """Route request den model phu hop theo domain."""
        domain = self.detect_domain(message)
        model, reason = self.DOMAIN_MODELS[domain]

        response = self.client.messages.create(
            model=model,
            max_tokens=4096,
            messages=[{"role": "user", "content": message}]
        )

        return {
            "domain": domain,
            "model": model,
            "reason": reason,
            "response": response.content[0].text,
        }

Kết hợp Domain và Complexity routing

Trong thực tế, bạn nên kết hợp cả hai phương pháp. Ví dụ: dịch thuật thường dùng Haiku, nhưng dịch tài liệu pháp lý phức tạp thì cần Sonnet hoặc Opus. Domain routing cho bạn baseline, complexity scoring điều chỉnh lên hoặc xuống từ đó.

class SmartRouter:
    """Ket hop domain routing va complexity scoring."""

    def __init__(self):
        self.domain_router = DomainRouter()
        self.complexity_scorer = ComplexityScorer()
        self.client = anthropic.Anthropic()

    def route(self, message: str) -> dict:
        # Buoc 1: Xac dinh domain va model mac dinh
        domain = self.domain_router.detect_domain(message)
        base_model, _ = DomainRouter.DOMAIN_MODELS[domain]

        # Buoc 2: Tinh complexity score
        score = self.complexity_scorer.score(message)

        # Buoc 3: Dieu chinh model dua tren complexity
        models_tier = [
            "claude-haiku-4-20250514",
            "claude-sonnet-4-20250514",
            "claude-opus-4-20250514",
        ]
        base_tier = models_tier.index(base_model)

        # Neu complexity cao, tang len 1 tier
        if score > 0.7 and base_tier < 2:
            final_model = models_tier[base_tier + 1]
            reason = f"Upgraded tu {base_model} do complexity={score}"
        # Neu complexity thap, giam xuong 1 tier
        elif score < 0.2 and base_tier > 0:
            final_model = models_tier[base_tier - 1]
            reason = f"Downgraded tu {base_model} do complexity={score}"
        else:
            final_model = base_model
            reason = f"Giu nguyen {base_model}"

        response = self.client.messages.create(
            model=final_model,
            max_tokens=4096,
            messages=[{"role": "user", "content": message}]
        )

        return {
            "domain": domain,
            "complexity_score": score,
            "model": final_model,
            "reason": reason,
            "response": response.content[0].text,
        }

Routing với Streaming

Khi dùng streaming, bạn không thể đợi kết quả hoàn chỉnh để đánh giá chất lượng. Thay vào đó, bạn route trước và stream kết quả:

def route_and_stream(message: str):
    """Route va stream response."""
    router = HybridRouter()

    # Classification van nhu cu
    score = router.scorer.score(message)
    if score < 0.15:
        model = "claude-haiku-4-20250514"
    elif score > 0.85:
        model = "claude-opus-4-20250514"
    else:
        decision = router.llm_router.classify(message)
        model = decision.model

    # Stream response
    client = anthropic.Anthropic()
    with client.messages.stream(
        model=model,
        max_tokens=4096,
        messages=[{"role": "user", "content": message}]
    ) as stream:
        print(f"[Streaming from {model}]")
        for text in stream.text_stream:
            print(text, end="", flush=True)
        print()  # Newline cuoi

    # Lay usage info sau khi stream xong
    final_message = stream.get_final_message()
    print(f"Tokens: {final_message.usage.input_tokens} in, "
          f"{final_message.usage.output_tokens} out")

Production Considerations

Khi triển khai routing trong production, cần chú ý:

Latency budget

Classification bằng Haiku mất thêm 200-500ms. Nếu ứng dụng cần response dưới 1 giây, dùng rule-based scoring (miễn phí) thay vì LLM classification.

Caching classification

# Cache ket qua classification cho cac message tuong tu
import hashlib

class CachedRouter(ModelRouter):
    def __init__(self):
        super().__init__()
        self.cache = {}

    def classify(self, message: str) -> RouteDecision:
        # Tao cache key tu 200 ky tu dau cua message
        key = hashlib.md5(message[:200].encode()).hexdigest()

        if key in self.cache:
            return self.cache[key]

        decision = super().classify(message)
        self.cache[key] = decision
        return decision

Monitoring và alerts

Theo dõi tỷ lệ routing: nếu Opus tăng đột ngột, có thể classifier bị lỗi
Theo dõi fallback rate: nếu quá cao, routing strategy cần điều chỉnh
Theo dõi chi phí hàng ngày và so sánh với baseline

Tóm tắt

Smart Model Routing là một trong những kỹ thuật quan trọng nhất để tối ưu chi phí khi sử dụng Claude API trong production. Các điểm chính:

Dùng Haiku phân loại, route đến model phù hợp — tiết kiệm 60-80% chi phí
Kết hợp rule-based scoring và LLM classification cho độ chính xác cao
Xây dựng fallback chains để đảm bảo chất lượng output
A/B test để tìm routing strategy tối ưu cho từng loại task
Đo lường chi phí thực tế để chứng minh ROI của hệ thống routing

Tìm hiểu thêm về các kỹ thuật API nâng cao tại Thư viện Nâng cao.

Tính năng liên quan:Model Routing Cost Optimization Classification A/B Testing

Bai viet co huu ich khong?

Writer cho nền tảng kiến thức Claude AI cho người Việt. Software engineer với hơn 20 năm kinh nghiệm, đam mê AI và chia sẻ kiến thức công nghệ.

5 bài viết · 16K lượt đọc

Bình luận (0)

Đăng nhập để bình luận...

Đăng nhập để bình luận

Đang tải bình luận...

Gợi ý cho bạn

Claude Streaming API — Real-time response cho ứng dụng chat

Smart Model Routing — Tu dong chon Haiku/Sonnet/Opus theo task

Điểm nổi bật

Tại sao cần Model Routing?

Chi phí so sánh

Classification-based Routing

Kiến trúc tổng quan

Python implementation

Sử dụng router

Complexity Scoring — Phương pháp nâng cao

Kết hợp hai phương pháp

Fallback Chains

A/B Testing Models

Chạy A/B test

Đo lường chi phí tiết kiệm

Domain-specific Routing

Kết hợp Domain và Complexity routing

Routing với Streaming

Production Considerations

Latency budget

Caching classification

Monitoring và alerts

Tóm tắt

Gợi ý cho bạn

Claude Streaming API — Real-time response cho ứng dụng chat

Claude API Error Handling — Retry, Rate Limit và Production Resilience

Testing và Debug MCP Server — Đảm bảo chất lượng cho production

Claude Batch API — Xử lý hàng loạt với chi phí giảm 50%

Tin liên quan nên xem

MCP Server kết nối API bên thứ ba — Weather, Stocks, News cho Claude

Prompt Caching — Tiết kiệm 90% chi phí cho system prompt lặp lại

Claude Agent SDK Deep Dive — Xây dựng agent với TypeScript SDK

Claude API — Hướng dẫn từ A đến Z cho developer

Smart Model Routing — Tu dong chon Haiku/Sonnet/Opus theo task

Điểm nổi bật

Tại sao cần Model Routing?

Chi phí so sánh

Classification-based Routing

Kiến trúc tổng quan

Python implementation

Sử dụng router

Complexity Scoring — Phương pháp nâng cao

Kết hợp hai phương pháp

Fallback Chains

A/B Testing Models

Chạy A/B test

Đo lường chi phí tiết kiệm

Domain-specific Routing

Kết hợp Domain và Complexity routing

Routing với Streaming

Production Considerations

Latency budget

Caching classification

Monitoring và alerts

Tóm tắt

Gợi ý cho bạn

Claude Streaming API — Real-time response cho ứng dụng chat

Claude API Error Handling — Retry, Rate Limit và Production Resilience

Testing và Debug MCP Server — Đảm bảo chất lượng cho production

Claude Batch API — Xử lý hàng loạt với chi phí giảm 50%

Tin liên quan nên xem

MCP Server kết nối API bên thứ ba — Weather, Stocks, News cho Claude

Prompt Caching — Tiết kiệm 90% chi phí cho system prompt lặp lại

Claude Agent SDK Deep Dive — Xây dựng agent với TypeScript SDK

Claude API — Hướng dẫn từ A đến Z cho developer

Đăng ký nhận bản tin