{"product_id":"usage-cost-api-theo-doi-chi-phi-claude-api-real-time","title":"Usage \u0026 Cost API — Theo dõi chi phí Claude API real-time","description":"\n\u003cp\u003eKhi API usage tăng lên, câu hỏi không còn là \"API hoạt động không?\" mà là \"tôi đang tiêu bao nhiêu và tiêu vào đâu?\" \u003cstrong\u003eClaude Usage API\u003c\/strong\u003e cung cấp granular data về token consumption và cost — đủ để build cost dashboard chuyên nghiệp và thiết lập alerts trước khi bill surprise cuối tháng.\u003c\/p\u003e\n\n\u003ch2\u003eAnthropic Usage API Overview\u003c\/h2\u003e\n\n\u003cp\u003eAnthropic cung cấp Usage API tại \u003ccode\u003ehttps:\/\/api.anthropic.com\/v1\/usage\u003c\/code\u003e cho phép query:\u003c\/p\u003e\n\n\u003cul\u003e\n  \u003cli\u003eToken usage theo ngày, tuần, tháng\u003c\/li\u003e\n  \u003cli\u003eBreakdown theo model (haiku vs sonnet vs opus)\u003c\/li\u003e\n  \u003cli\u003eInput vs output vs cache tokens riêng biệt\u003c\/li\u003e\n  \u003cli\u003eCost estimates dựa trên current pricing\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cpre\u003e\u003ccode\u003eimport anthropic\nimport httpx\nimport json\nfrom datetime import datetime, timedelta\n\nclass UsageTracker:\n    def __init__(self, api_key: str = None):\n        import os\n        self.api_key = api_key or os.environ.get(\"ANTHROPIC_API_KEY\")\n        self.base_url = \"https:\/\/api.anthropic.com\/v1\"\n        self.headers = {\n            \"x-api-key\": self.api_key,\n            \"anthropic-version\": \"2023-06-01\",\n            \"content-type\": \"application\/json\"\n        }\n\n    def get_usage(self, start_date: str, end_date: str = None) -\u0026gt; dict:\n        \"\"\"\n        start_date, end_date: format YYYY-MM-DD\n        Returns usage data broken down by model and date\n        \"\"\"\n        end_date = end_date or datetime.now().strftime(\"%Y-%m-%d\")\n\n        response = httpx.get(\n            f\"{self.base_url}\/usage\",\n            headers=self.headers,\n            params={\n                \"start_time\": f\"{start_date}T00:00:00Z\",\n                \"end_time\": f\"{end_date}T23:59:59Z\",\n                \"granularity\": \"daily\"  # daily | hourly\n            }\n        )\n        response.raise_for_status()\n        return response.json()\n\n    def get_current_month_usage(self) -\u0026gt; dict:\n        \"\"\"Lấy usage tháng hiện tại\"\"\"\n        today = datetime.now()\n        first_of_month = today.replace(day=1).strftime(\"%Y-%m-%d\")\n        return self.get_usage(first_of_month)\n\n    def get_last_30_days(self) -\u0026gt; dict:\n        \"\"\"Lấy usage 30 ngày gần nhất\"\"\"\n        end = datetime.now()\n        start = end - timedelta(days=30)\n        return self.get_usage(\n            start.strftime(\"%Y-%m-%d\"),\n            end.strftime(\"%Y-%m-%d\")\n        )\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eTính toán Chi phí\u003c\/h2\u003e\n\n\u003cpre\u003e\u003ccode\u003e# Pricing tham khảo (USD per million tokens) — check Anthropic.com cho giá mới nhất\nMODEL_PRICING = {\n    \"claude-opus-4-5\": {\n        \"input\": 15.0,\n        \"output\": 75.0,\n        \"cache_write\": 18.75,\n        \"cache_read\": 1.5\n    },\n    \"claude-sonnet-4-5\": {\n        \"input\": 3.0,\n        \"output\": 15.0,\n        \"cache_write\": 3.75,\n        \"cache_read\": 0.3\n    },\n    \"claude-haiku-4-5\": {\n        \"input\": 0.8,\n        \"output\": 4.0,\n        \"cache_write\": 1.0,\n        \"cache_read\": 0.08\n    }\n}\n\ndef calculate_cost(usage_data: dict) -\u0026gt; dict:\n    \"\"\"Tính cost từ usage data\"\"\"\n    total_cost = 0\n    model_costs = {}\n\n    for entry in usage_data.get(\"data\", []):\n        model = entry.get(\"model\", \"unknown\")\n        pricing = MODEL_PRICING.get(model, {\"input\": 0, \"output\": 0, \"cache_write\": 0, \"cache_read\": 0})\n\n        input_tokens = entry.get(\"input_tokens\", 0)\n        output_tokens = entry.get(\"output_tokens\", 0)\n        cache_write = entry.get(\"cache_creation_input_tokens\", 0)\n        cache_read = entry.get(\"cache_read_input_tokens\", 0)\n\n        cost = (\n            (input_tokens \/ 1_000_000) * pricing[\"input\"] +\n            (output_tokens \/ 1_000_000) * pricing[\"output\"] +\n            (cache_write \/ 1_000_000) * pricing[\"cache_write\"] +\n            (cache_read \/ 1_000_000) * pricing[\"cache_read\"]\n        )\n\n        if model not in model_costs:\n            model_costs[model] = {\n                \"input_tokens\": 0, \"output_tokens\": 0,\n                \"cache_write\": 0, \"cache_read\": 0, \"cost_usd\": 0\n            }\n\n        model_costs[model][\"input_tokens\"] += input_tokens\n        model_costs[model][\"output_tokens\"] += output_tokens\n        model_costs[model][\"cache_write\"] += cache_write\n        model_costs[model][\"cache_read\"] += cache_read\n        model_costs[model][\"cost_usd\"] += cost\n        total_cost += cost\n\n    return {\n        \"total_cost_usd\": round(total_cost, 4),\n        \"by_model\": {k: {**v, \"cost_usd\": round(v[\"cost_usd\"], 4)} for k, v in model_costs.items()}\n    }\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eTrack Usage Per-Request\u003c\/h2\u003e\n\n\u003cp\u003eNgoài Usage API, track từng request ngay trong code để có granular data:\u003c\/p\u003e\n\n\u003cpre\u003e\u003ccode\u003eimport anthropic\nimport sqlite3\nfrom datetime import datetime\n\n# Setup SQLite database\ndef setup_usage_db(db_path: str = \"claude_usage.db\"):\n    conn = sqlite3.connect(db_path)\n    conn.execute(\"\"\"\n        CREATE TABLE IF NOT EXISTS api_calls (\n            id INTEGER PRIMARY KEY AUTOINCREMENT,\n            timestamp TEXT,\n            model TEXT,\n            feature TEXT,\n            user_id TEXT,\n            input_tokens INTEGER,\n            output_tokens INTEGER,\n            cache_write_tokens INTEGER,\n            cache_read_tokens INTEGER,\n            cost_usd REAL,\n            latency_ms INTEGER,\n            success INTEGER\n        )\n    \"\"\")\n    conn.commit()\n    return conn\n\nclass TrackedClaudeClient:\n    \"\"\"Wrapper quanh Anthropic client với automatic usage tracking\"\"\"\n\n    def __init__(self, db_path: str = \"claude_usage.db\"):\n        self.client = anthropic.Anthropic()\n        self.db = setup_usage_db(db_path)\n\n    def create(self, feature: str = \"unknown\", user_id: str = \"anonymous\", **kwargs) -\u0026gt; anthropic.types.Message:\n        \"\"\"\n        Drop-in replacement cho client.messages.create()\n        Tự động track usage sau mỗi call\n        \"\"\"\n        start_time = datetime.now()\n        success = True\n\n        try:\n            response = self.client.messages.create(**kwargs)\n\n            # Extract usage\n            usage = response.usage\n            model = kwargs.get(\"model\", \"unknown\")\n            pricing = MODEL_PRICING.get(model, {\"input\": 0, \"output\": 0, \"cache_write\": 0, \"cache_read\": 0})\n\n            input_tokens = usage.input_tokens\n            output_tokens = usage.output_tokens\n            cache_write = getattr(usage, 'cache_creation_input_tokens', 0) or 0\n            cache_read = getattr(usage, 'cache_read_input_tokens', 0) or 0\n\n            cost = (\n                (input_tokens \/ 1_000_000) * pricing[\"input\"] +\n                (output_tokens \/ 1_000_000) * pricing[\"output\"] +\n                (cache_write \/ 1_000_000) * pricing[\"cache_write\"] +\n                (cache_read \/ 1_000_000) * pricing[\"cache_read\"]\n            )\n\n            latency_ms = int((datetime.now() - start_time).total_seconds() * 1000)\n\n            self._log_usage(\n                model=model,\n                feature=feature,\n                user_id=user_id,\n                input_tokens=input_tokens,\n                output_tokens=output_tokens,\n                cache_write=cache_write,\n                cache_read=cache_read,\n                cost_usd=cost,\n                latency_ms=latency_ms,\n                success=1\n            )\n\n            return response\n\n        except Exception as e:\n            latency_ms = int((datetime.now() - start_time).total_seconds() * 1000)\n            self._log_usage(\n                model=kwargs.get(\"model\", \"unknown\"),\n                feature=feature,\n                user_id=user_id,\n                input_tokens=0, output_tokens=0,\n                cache_write=0, cache_read=0,\n                cost_usd=0, latency_ms=latency_ms, success=0\n            )\n            raise\n\n    def _log_usage(self, **kwargs):\n        self.db.execute(\"\"\"\n            INSERT INTO api_calls\n            (timestamp, model, feature, user_id, input_tokens, output_tokens,\n             cache_write_tokens, cache_read_tokens, cost_usd, latency_ms, success)\n            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)\n        \"\"\", (\n            datetime.now().isoformat(),\n            kwargs[\"model\"], kwargs[\"feature\"], kwargs[\"user_id\"],\n            kwargs[\"input_tokens\"], kwargs[\"output_tokens\"],\n            kwargs[\"cache_write\"], kwargs[\"cache_read\"],\n            kwargs[\"cost_usd\"], kwargs[\"latency_ms\"], kwargs[\"success\"]\n        ))\n        self.db.commit()\n\n    def get_stats(self, days: int = 7) -\u0026gt; dict:\n        \"\"\"Query usage statistics từ local DB\"\"\"\n        since = (datetime.now() - timedelta(days=days)).isoformat()\n        cursor = self.db.execute(\"\"\"\n            SELECT\n                model,\n                feature,\n                COUNT(*) as calls,\n                SUM(input_tokens) as total_input,\n                SUM(output_tokens) as total_output,\n                SUM(cost_usd) as total_cost,\n                AVG(latency_ms) as avg_latency,\n                SUM(CASE WHEN success=0 THEN 1 ELSE 0 END) as errors\n            FROM api_calls\n            WHERE timestamp \u0026gt; ?\n            GROUP BY model, feature\n            ORDER BY total_cost DESC\n        \"\"\", (since,))\n        return [dict(zip([col[0] for col in cursor.description], row)) for row in cursor.fetchall()]\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eBudget Alerts System\u003c\/h2\u003e\n\n\u003cpre\u003e\u003ccode\u003eimport smtplib\nfrom email.mime.text import MIMEText\n\nclass BudgetMonitor:\n    def __init__(self, monthly_budget_usd: float, alert_thresholds: list = None):\n        self.monthly_budget = monthly_budget_usd\n        self.thresholds = alert_thresholds or [0.5, 0.75, 0.9, 1.0]  # 50%, 75%, 90%, 100%\n        self.tracker = UsageTracker()\n        self.alerted_thresholds = set()\n\n    def check_and_alert(self):\n        \"\"\"Kiểm tra spending và gửi alert nếu cần\"\"\"\n        usage = self.tracker.get_current_month_usage()\n        cost = calculate_cost(usage)\n        current_spend = cost[\"total_cost_usd\"]\n        spend_ratio = current_spend \/ self.monthly_budget\n\n        print(f\"Current spend: ${current_spend:.2f} \/ ${self.monthly_budget:.2f} ({spend_ratio*100:.1f}%)\")\n\n        for threshold in self.thresholds:\n            if spend_ratio \u0026gt;= threshold and threshold not in self.alerted_thresholds:\n                self._send_alert(current_spend, threshold)\n                self.alerted_thresholds.add(threshold)\n\n        return {\n            \"current_spend\": current_spend,\n            \"budget\": self.monthly_budget,\n            \"remaining\": self.monthly_budget - current_spend,\n            \"percentage_used\": round(spend_ratio * 100, 1)\n        }\n\n    def _send_alert(self, current_spend: float, threshold: float):\n        \"\"\"Gửi email\/Slack alert\"\"\"\n        message = f\"\"\"\nCLAUDE API BUDGET ALERT\n\nSpending has reached {threshold*100:.0f}% of monthly budget.\n\nCurrent: ${current_spend:.2f}\nBudget: ${self.monthly_budget:.2f}\nRemaining: ${self.monthly_budget - current_spend:.2f}\n\nPlease review API usage at: https:\/\/console.anthropic.com\n\"\"\"\n        print(f\"ALERT: {message}\")\n        # Implement email\/Slack\/webhook notification here\n\n    def get_burn_rate_projection(self) -\u0026gt; dict:\n        \"\"\"Dự báo chi phí cuối tháng dựa trên burn rate hiện tại\"\"\"\n        today = datetime.now()\n        days_elapsed = today.day\n        days_in_month = 30  # Approximate\n\n        usage = self.tracker.get_current_month_usage()\n        current_spend = calculate_cost(usage)[\"total_cost_usd\"]\n\n        daily_rate = current_spend \/ days_elapsed if days_elapsed \u0026gt; 0 else 0\n        projected_month_total = daily_rate * days_in_month\n        days_until_budget_exhausted = (self.monthly_budget - current_spend) \/ daily_rate if daily_rate \u0026gt; 0 else float('inf')\n\n        return {\n            \"daily_burn_rate\": round(daily_rate, 4),\n            \"projected_month_total\": round(projected_month_total, 2),\n            \"over_budget\": projected_month_total \u0026gt; self.monthly_budget,\n            \"days_until_exhausted\": round(days_until_budget_exhausted, 1) if days_until_budget_exhausted != float('inf') else None\n        }\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eCost Dashboard (Terminal)\u003c\/h2\u003e\n\n\u003cpre\u003e\u003ccode\u003edef print_cost_report(tracked_client: TrackedClaudeClient, days: int = 7):\n    \"\"\"In báo cáo chi phí ra terminal\"\"\"\n    stats = tracked_client.get_stats(days)\n\n    print(f\"\n{'='*60}\")\n    print(f\"CLAUDE API USAGE REPORT — Last {days} days\")\n    print(f\"{'='*60}\")\n\n    total_cost = sum(s[\"total_cost\"] for s in stats)\n    total_calls = sum(s[\"calls\"] for s in stats)\n    total_errors = sum(s[\"errors\"] for s in stats)\n\n    print(f\"Total Cost: ${total_cost:.4f}\")\n    print(f\"Total API Calls: {total_calls:,}\")\n    print(f\"Error Rate: {(total_errors\/total_calls*100):.1f}%\" if total_calls \u0026gt; 0 else \"No calls\")\n\n    print(f\"\n{'Feature':\u0026lt;20} {'Model':\u0026lt;20} {'Calls':\u0026gt;8} {'Cost':\u0026gt;10} {'Avg Latency':\u0026gt;12}\")\n    print(\"-\" * 72)\n\n    for stat in stats[:20]:  # Top 20\n        print(\n            f\"{stat['feature']:\u0026lt;20} \"\n            f\"{stat['model']:\u0026lt;20} \"\n            f\"{stat['calls']:\u0026gt;8,} \"\n            f\"${stat['total_cost']:\u0026gt;9.4f} \"\n            f\"{stat['avg_latency']:\u0026gt;10.0f}ms\"\n        )\n\n    print(f\"\nTop cost driver: {stats[0]['feature'] if stats else 'N\/A'}\")\n\n# Sử dụng\nclient = TrackedClaudeClient()\n\n# Dùng như normal client\nresponse = client.create(\n    feature=\"blog_generation\",\n    user_id=\"user_123\",\n    model=\"claude-haiku-4-5\",\n    max_tokens=1000,\n    messages=[{\"role\": \"user\", \"content\": \"Write a short blog post about Vietnam tech scene\"}]\n)\n\nprint_cost_report(client, days=7)\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eTối ưu chi phí dựa trên Usage Data\u003c\/h2\u003e\n\n\u003cp\u003eSau khi có usage data, đây là những optimizations phổ biến nhất:\u003c\/p\u003e\n\n\u003cul\u003e\n  \u003cli\u003e\n\u003cstrong\u003eModel right-sizing\u003c\/strong\u003e — Nếu feature X dùng Opus nhưng chỉ cần summarization đơn giản, switch sang Haiku. Tiết kiệm 10-20x.\u003c\/li\u003e\n  \u003cli\u003e\n\u003cstrong\u003ePrompt Caching\u003c\/strong\u003e — System prompts dài được gọi nhiều lần? Enable cache để giảm 90% input token cost.\u003c\/li\u003e\n  \u003cli\u003e\n\u003cstrong\u003eOutput length control\u003c\/strong\u003e — Add \u003ccode\u003emax_tokens\u003c\/code\u003e phù hợp. Nhiều features không cần 4000 tokens.\u003c\/li\u003e\n  \u003cli\u003e\n\u003cstrong\u003eBatch processing\u003c\/strong\u003e — Thay vì N individual calls, dùng Batch API (50% discount) cho non-urgent tasks.\u003c\/li\u003e\n  \u003cli\u003e\n\u003cstrong\u003eInput compression\u003c\/strong\u003e — Summarize long documents trước khi gửi thay vì gửi toàn bộ.\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2\u003eTổng kết\u003c\/h2\u003e\n\n\u003cp\u003eCost visibility là bước đầu tiên để tối ưu API spending. Với Usage API + per-request tracking + budget alerts, bạn luôn biết tiền đang đi đâu và có thể action ngay khi spending tăng bất thường.\u003c\/p\u003e\n\n\u003cp\u003eXem thêm: \u003ca href=\"\/en\/collections\/nang-cao\"\u003ePrompt Caching\u003c\/a\u003e và \u003ca href=\"\/en\/collections\/nang-cao\"\u003eSpeculative Caching\u003c\/a\u003e — hai kỹ thuật giảm cost hiệu quả nhất.\u003c\/p\u003e\n","brand":"Minh Tuấn","offers":[{"title":"Default Title","offer_id":47721899229396,"sku":null,"price":0.0,"currency_code":"VND","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0821\/0264\/9044\/files\/usage-cost-api-theo-doi-chi-phi-claude-api-real-time.jpg?v=1774521777","url":"https:\/\/claude.vn\/en\/products\/usage-cost-api-theo-doi-chi-phi-claude-api-real-time","provider":"CLAUDE.VN","version":"1.0","type":"link"}