{"product_id":"fine-tuning-claude-tren-aws-bedrock-hướng-dẫn-từng-bước","title":"Fine-tuning Claude trên AWS Bedrock — Hướng dẫn từng bước","description":"\n\u003cp\u003ePrompt engineering và RAG giải quyết được 90% use cases. Nhưng khi bạn cần Claude \u003cem\u003ehoàn toàn\u003c\/em\u003e thích nghi với domain-specific language, style, hoặc format của tổ chức — \u003cstrong\u003efine-tuning\u003c\/strong\u003e là bước tiếp theo. AWS Bedrock là platform chính thức để fine-tune Claude một cách managed và secure.\u003c\/p\u003e\n\n\u003ch2\u003eFine-tuning vs Prompt Engineering — Khi nào nên fine-tune?\u003c\/h2\u003e\n\n\u003cp\u003eFine-tuning KHÔNG phải lúc nào cũng là câu trả lời đúng. Cân nhắc kỹ trước khi invest vào quá trình phức tạp này:\u003c\/p\u003e\n\n\u003ctable\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n\u003cth\u003eTình huống\u003c\/th\u003e\n\u003cth\u003eGiải pháp tốt hơn\u003c\/th\u003e\n\u003c\/tr\u003e\n  \u003c\/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n\u003ctd\u003eCần format output cụ thể\u003c\/td\u003e\n\u003ctd\u003ePrompt engineering + structured output\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eCần knowledge domain mới\u003c\/td\u003e\n\u003ctd\u003eRAG — inject knowledge vào context\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eCần style writing đặc biệt\u003c\/td\u003e\n\u003ctd\u003eFine-tuning (nếu consistent và large dataset)\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eCần reduce latency\u003c\/td\u003e\n\u003ctd\u003ePrompt caching + smaller model\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eCần domain jargon + tone nhất quán\u003c\/td\u003e\n\u003ctd\u003eFine-tuning (use case tốt)\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eCần 1000+ consistent examples\u003c\/td\u003e\n\u003ctd\u003eFine-tuning (đủ data)\u003c\/td\u003e\n\u003c\/tr\u003e\n  \u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cp\u003e\u003cstrong\u003eRule of thumb:\u003c\/strong\u003e Nếu có thể đạt kết quả tốt với \u0026lt; 10 examples trong prompt, dùng few-shot prompting thay vì fine-tune.\u003c\/p\u003e\n\n\u003ch2\u003eBước 1: Chuẩn bị AWS Bedrock\u003c\/h2\u003e\n\n\u003cpre\u003e\u003ccode\u003eimport boto3\nimport json\nimport os\n\n# Setup AWS credentials\nsession = boto3.Session(\n    aws_access_key_id=os.environ.get(\"AWS_ACCESS_KEY_ID\"),\n    aws_secret_access_key=os.environ.get(\"AWS_SECRET_ACCESS_KEY\"),\n    region_name=\"us-east-1\"  # Bedrock fine-tuning available in us-east-1, us-west-2\n)\n\nbedrock_client = session.client('bedrock')\nbedrock_runtime = session.client('bedrock-runtime')\n\n# Kiểm tra models có sẵn để fine-tune\ndef list_available_base_models():\n    \"\"\"Liệt kê Claude models có thể fine-tune\"\"\"\n    response = bedrock_client.list_foundation_models(\n        byProvider=\"Anthropic\",\n        byOutputModality=\"TEXT\",\n        byCustomizationType=\"FINE_TUNING\"\n    )\n    for model in response['modelSummaries']:\n        print(f\"Model: {model['modelId']}\")\n        print(f\"  Name: {model['modelName']}\")\n        print(f\"  Status: {model['modelLifecycle']['status']}\")\n    return response['modelSummaries']\n\navailable_models = list_available_base_models()\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eBước 2: Chuẩn bị Training Data\u003c\/h2\u003e\n\n\u003cp\u003eFine-tuning yêu cầu data theo format JSONL cụ thể. Mỗi dòng là một training example:\u003c\/p\u003e\n\n\u003cpre\u003e\u003ccode\u003e# Format chuẩn cho Claude fine-tuning trên Bedrock\ntraining_examples = [\n    {\n        \"prompt\": \"Viết email từ chối ứng viên nhẹ nhàng cho vị trí Senior Developer\",\n        \"completion\": \"\"\"Kính gửi [Tên ứng viên],\n\nCảm ơn bạn đã dành thời gian tham gia quá trình tuyển dụng cho vị trí Senior Developer tại [Công ty].\n\nSau khi xem xét kỹ lưỡng, chúng tôi quyết định tiếp tục với ứng viên có profile phù hợp hơn với nhu cầu hiện tại của team. Quyết định này không hề dễ dàng vì bạn có nhiều điểm mạnh đáng ghi nhận.\n\nChúng tôi sẽ lưu CV của bạn để xem xét cho các cơ hội phù hợp trong tương lai. Chúc bạn thành công trong sự nghiệp.\n\nTrân trọng,\n[Tên HR]\"\"\"\n    },\n    {\n        \"prompt\": \"Viết email xác nhận offer letter cho ứng viên đã được nhận\",\n        \"completion\": \"\"\"Kính gửi [Tên ứng viên],\n\nThay mặt Ban Giám đốc [Công ty], tôi vui mừng thông báo bạn đã được chính thức chấp nhận vào vị trí [Chức vụ].\n\nChi tiết offer:\n- Vị trí: [Chức vụ]\n- Mức lương: [Số tiền] VND\/tháng\n- Ngày bắt đầu: [Ngày]\n- Địa điểm: [Văn phòng]\n\nVui lòng xác nhận nhận email này và phản hồi trong vòng 3 ngày làm việc. Chúng tôi sẽ gửi hợp đồng chính thức sau khi nhận được xác nhận của bạn.\n\nChào mừng bạn đến với đội ngũ [Công ty]!\n\nTrân trọng,\n[Tên HR]\"\"\"\n    }\n    # ... minimum 100-1000 examples cho fine-tuning hiệu quả\n]\n\ndef prepare_training_data(examples: list, output_path: str):\n    \"\"\"Chuẩn bị file JSONL cho Bedrock fine-tuning\"\"\"\n    with open(output_path, 'w', encoding='utf-8') as f:\n        for example in examples:\n            # Bedrock format: anthropic_version + messages\n            formatted = {\n                \"anthropic_version\": \"bedrock-2023-05-31\",\n                \"messages\": [\n                    {\"role\": \"user\", \"content\": example[\"prompt\"]},\n                    {\"role\": \"assistant\", \"content\": example[\"completion\"]}\n                ]\n            }\n            f.write(json.dumps(formatted, ensure_ascii=False) + \"\n\")\n\n    print(f\"Prepared {len(examples)} training examples -\u0026gt; {output_path}\")\n    return output_path\n\ntraining_file = prepare_training_data(training_examples, \"\/tmp\/training_data.jsonl\")\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eBước 3: Kiểm tra chất lượng Training Data\u003c\/h2\u003e\n\n\u003cpre\u003e\u003ccode\u003edef validate_training_data(file_path: str) -\u0026gt; dict:\n    \"\"\"Validate training data trước khi upload\"\"\"\n    issues = []\n    stats = {\n        \"total_examples\": 0,\n        \"avg_prompt_length\": 0,\n        \"avg_completion_length\": 0,\n        \"min_completion_tokens\": float('inf'),\n        \"max_completion_tokens\": 0\n    }\n\n    prompt_lengths = []\n    completion_lengths = []\n\n    with open(file_path, 'r', encoding='utf-8') as f:\n        for line_num, line in enumerate(f, 1):\n            try:\n                example = json.loads(line.strip())\n                messages = example.get(\"messages\", [])\n\n                if len(messages) \u0026lt; 2:\n                    issues.append(f\"Line {line_num}: Need at least 2 messages (user + assistant)\")\n                    continue\n\n                user_msg = next((m for m in messages if m[\"role\"] == \"user\"), None)\n                assistant_msg = next((m for m in messages if m[\"role\"] == \"assistant\"), None)\n\n                if not user_msg:\n                    issues.append(f\"Line {line_num}: Missing user message\")\n                if not assistant_msg:\n                    issues.append(f\"Line {line_num}: Missing assistant message\")\n\n                # Check lengths\n                user_len = len(user_msg[\"content\"]) if user_msg else 0\n                asst_len = len(assistant_msg[\"content\"]) if assistant_msg else 0\n\n                if user_len \u0026lt; 10:\n                    issues.append(f\"Line {line_num}: Prompt too short ({user_len} chars)\")\n                if asst_len \u0026lt; 20:\n                    issues.append(f\"Line {line_num}: Completion too short ({asst_len} chars)\")\n\n                prompt_lengths.append(user_len)\n                completion_lengths.append(asst_len)\n                stats[\"total_examples\"] += 1\n\n            except json.JSONDecodeError as e:\n                issues.append(f\"Line {line_num}: JSON parse error: {e}\")\n\n    if prompt_lengths:\n        stats[\"avg_prompt_length\"] = sum(prompt_lengths) \/ len(prompt_lengths)\n    if completion_lengths:\n        stats[\"avg_completion_length\"] = sum(completion_lengths) \/ len(completion_lengths)\n        stats[\"min_completion_tokens\"] = min(completion_lengths)\n        stats[\"max_completion_tokens\"] = max(completion_lengths)\n\n    print(f\"Validation Results:\")\n    print(f\"  Total examples: {stats['total_examples']}\")\n    print(f\"  Avg prompt: {stats['avg_prompt_length']:.0f} chars\")\n    print(f\"  Avg completion: {stats['avg_completion_length']:.0f} chars\")\n    print(f\"  Issues found: {len(issues)}\")\n    for issue in issues[:5]:  # Show first 5 issues\n        print(f\"  - {issue}\")\n\n    return {\"stats\": stats, \"issues\": issues, \"valid\": len(issues) == 0}\n\nvalidation = validate_training_data(training_file)\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eBước 4: Upload Data lên S3\u003c\/h2\u003e\n\n\u003cpre\u003e\u003ccode\u003eimport boto3\nfrom datetime import datetime\n\ns3_client = session.client('s3')\n\ndef upload_training_data(local_path: str, bucket: str, prefix: str = \"bedrock-ft\") -\u0026gt; str:\n    \"\"\"Upload training data lên S3\"\"\"\n    timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n    s3_key = f\"{prefix}\/training\/{timestamp}\/training_data.jsonl\"\n\n    s3_client.upload_file(local_path, bucket, s3_key)\n    s3_uri = f\"s3:\/\/{bucket}\/{s3_key}\"\n    print(f\"Uploaded to: {s3_uri}\")\n    return s3_uri\n\nS3_BUCKET = \"my-bedrock-finetuning-bucket\"\ntraining_s3_uri = upload_training_data(training_file, S3_BUCKET)\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eBước 5: Tạo Fine-tuning Job\u003c\/h2\u003e\n\n\u003cpre\u003e\u003ccode\u003edef create_finetuning_job(\n    training_data_uri: str,\n    output_s3_uri: str,\n    base_model_id: str = \"anthropic.claude-haiku-20240307-v1:0\",\n    job_name: str = None,\n    num_epochs: int = 2,\n    batch_size: int = 8,\n    learning_rate: float = 0.00001\n) -\u0026gt; str:\n    \"\"\"Tạo và start fine-tuning job\"\"\"\n\n    if not job_name:\n        job_name = f\"claude-ft-{datetime.now().strftime('%Y%m%d-%H%M%S')}\"\n\n    response = bedrock_client.create_model_customization_job(\n        jobName=job_name,\n        baseModelIdentifier=base_model_id,\n        customizationType=\"FINE_TUNING\",\n        roleArn=os.environ.get(\"BEDROCK_ROLE_ARN\"),  # IAM role với Bedrock + S3 access\n        trainingDataConfig={\n            \"s3Uri\": training_data_uri\n        },\n        outputDataConfig={\n            \"s3Uri\": output_s3_uri\n        },\n        hyperParameters={\n            \"epochCount\": str(num_epochs),\n            \"batchSize\": str(batch_size),\n            \"learningRate\": str(learning_rate)\n        }\n    )\n\n    job_arn = response['jobArn']\n    print(f\"Fine-tuning job created: {job_arn}\")\n    return job_arn\n\njob_arn = create_finetuning_job(\n    training_data_uri=training_s3_uri,\n    output_s3_uri=f\"s3:\/\/{S3_BUCKET}\/bedrock-ft\/output\/\",\n    num_epochs=3,\n    batch_size=8,\n    learning_rate=0.00001\n)\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eBước 6: Monitor Training Progress\u003c\/h2\u003e\n\n\u003cpre\u003e\u003ccode\u003eimport time\n\ndef monitor_training_job(job_arn: str, poll_interval: int = 60) -\u0026gt; dict:\n    \"\"\"Monitor fine-tuning job cho đến khi hoàn thành\"\"\"\n    print(f\"Monitoring job: {job_arn}\")\n    start_time = datetime.now()\n\n    while True:\n        response = bedrock_client.get_model_customization_job(\n            jobIdentifier=job_arn\n        )\n\n        status = response['status']\n        elapsed = (datetime.now() - start_time).total_seconds() \/ 60\n\n        print(f\"[{elapsed:.0f}min] Status: {status}\")\n\n        if status == \"Completed\":\n            print(f\"Training completed!\")\n            print(f\"Output model ARN: {response.get('outputModelArn')}\")\n            return {\n                \"status\": \"completed\",\n                \"model_arn\": response.get('outputModelArn'),\n                \"duration_minutes\": elapsed\n            }\n\n        elif status in [\"Failed\", \"Stopped\"]:\n            print(f\"Training {status}\")\n            print(f\"Reason: {response.get('failureMessage', 'Unknown')}\")\n            return {\"status\": status.lower(), \"error\": response.get('failureMessage')}\n\n        elif status in [\"InProgress\", \"Starting\"]:\n            # Log training metrics nếu có\n            metrics = response.get('trainingMetrics', {})\n            if metrics:\n                print(f\"  Training loss: {metrics.get('trainingLoss', 'N\/A')}\")\n\n        time.sleep(poll_interval)\n\nresult = monitor_training_job(job_arn)\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eBước 7: Evaluate Fine-tuned Model\u003c\/h2\u003e\n\n\u003cpre\u003e\u003ccode\u003edef evaluate_model(\n    finetuned_model_arn: str,\n    test_prompts: list[dict],\n    base_model_id: str = \"anthropic.claude-haiku-20240307-v1:0\"\n) -\u0026gt; dict:\n    \"\"\"So sánh fine-tuned model vs base model\"\"\"\n\n    results = []\n\n    for test in test_prompts:\n        # Call fine-tuned model\n        ft_response = bedrock_runtime.invoke_model(\n            modelId=finetuned_model_arn,\n            body=json.dumps({\n                \"anthropic_version\": \"bedrock-2023-05-31\",\n                \"max_tokens\": 1000,\n                \"messages\": [{\"role\": \"user\", \"content\": test[\"prompt\"]}]\n            })\n        )\n        ft_result = json.loads(ft_response['body'].read())\n        ft_output = ft_result['content'][0]['text']\n\n        # Call base model\n        base_response = bedrock_runtime.invoke_model(\n            modelId=base_model_id,\n            body=json.dumps({\n                \"anthropic_version\": \"bedrock-2023-05-31\",\n                \"max_tokens\": 1000,\n                \"messages\": [{\"role\": \"user\", \"content\": test[\"prompt\"]}]\n            })\n        )\n        base_result = json.loads(base_response['body'].read())\n        base_output = base_result['content'][0]['text']\n\n        results.append({\n            \"prompt\": test[\"prompt\"],\n            \"expected\": test.get(\"expected\", \"\"),\n            \"finetuned_output\": ft_output,\n            \"base_output\": base_output\n        })\n\n    # Dùng Claude để score results\n    scored_results = score_with_claude(results)\n    return scored_results\n\ndef score_with_claude(results: list) -\u0026gt; dict:\n    \"\"\"Dùng Claude làm judge để evaluate kết quả\"\"\"\n    import anthropic\n    client = anthropic.Anthropic()\n\n    scores = {\"finetuned\": [], \"base\": []}\n\n    for result in results[:5]:  # Sample 5 for evaluation\n        response = client.messages.create(\n            model=\"claude-opus-4-5\",\n            max_tokens=500,\n            messages=[{\n                \"role\": \"user\",\n                \"content\": f\"\"\"Compare these two responses to the prompt. Rate each 1-10 for quality, style-match, and accuracy.\n\nPrompt: {result['prompt']}\nExpected style: {result.get('expected', 'Professional Vietnamese business email')}\n\nResponse A (Fine-tuned): {result['finetuned_output'][:300]}\nResponse B (Base model): {result['base_output'][:300]}\n\nScore format: A:X B:Y (just numbers)\"\"\"\n            }]\n        )\n        text = response.content[0].text\n        # Parse scores\n        try:\n            parts = text.split()\n            a_score = float([p for p in parts if p.startswith(\"A:\")][0].split(\":\")[1])\n            b_score = float([p for p in parts if p.startswith(\"B:\")][0].split(\":\")[1])\n            scores[\"finetuned\"].append(a_score)\n            scores[\"base\"].append(b_score)\n        except Exception:\n            pass\n\n    avg_ft = sum(scores[\"finetuned\"]) \/ len(scores[\"finetuned\"]) if scores[\"finetuned\"] else 0\n    avg_base = sum(scores[\"base\"]) \/ len(scores[\"base\"]) if scores[\"base\"] else 0\n\n    print(f\"Fine-tuned model score: {avg_ft:.1f}\/10\")\n    print(f\"Base model score: {avg_base:.1f}\/10\")\n    print(f\"Improvement: {avg_ft - avg_base:+.1f} points\")\n\n    return {\"finetuned_score\": avg_ft, \"base_score\": avg_base, \"improvement\": avg_ft - avg_base}\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eChi phí Fine-tuning\u003c\/h2\u003e\n\n\u003ctable\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n\u003cth\u003eComponent\u003c\/th\u003e\n\u003cth\u003ePricing (tham khảo)\u003c\/th\u003e\n\u003c\/tr\u003e\n  \u003c\/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n\u003ctd\u003eTraining tokens\u003c\/td\u003e\n\u003ctd\u003e$0.008 per 1K training tokens\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eInference (fine-tuned)\u003c\/td\u003e\n\u003ctd\u003eSimilar to base model + provisioned throughput\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eStorage (S3)\u003c\/td\u003e\n\u003ctd\u003eStandard S3 rates\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eMinimum dataset\u003c\/td\u003e\n\u003ctd\u003e100+ examples (recommend 1000+)\u003c\/td\u003e\n\u003c\/tr\u003e\n  \u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cp\u003e\u003cstrong\u003eTypical training cost:\u003c\/strong\u003e 1000 examples x ~500 tokens avg = 500K tokens = khoảng $4 USD. Nhỏ so với ongoing prompt engineering time.\u003c\/p\u003e\n\n\u003ch2\u003eTổng kết\u003c\/h2\u003e\n\n\u003cp\u003eFine-tuning Claude trên AWS Bedrock là quy trình có cấu trúc rõ ràng: prepare data → validate → upload S3 → create job → monitor → evaluate. Thành công phụ thuộc vào \u003cstrong\u003echất lượng training data\u003c\/strong\u003e hơn là số lượng.\u003c\/p\u003e\n\n\u003cp\u003eChỉ nên fine-tune khi: có 500+ high-quality examples, task có consistent style\/format mà prompt engineering không đạt được, và ROI rõ ràng so với cost.\u003c\/p\u003e\n\n\u003cp\u003eXem thêm: \u003ca href=\"\/en\/collections\/nang-cao\"\u003eTool Evaluation trong Agent Systems\u003c\/a\u003e để build testing frameworks cho cả fine-tuned và base models.\u003c\/p\u003e\n\n\u003chr\u003e\n\u003ch3\u003eBài viết liên quan\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"\/en\/products\/batch-processing-x%E1%BB%AD-ly-hang-lo%E1%BA%A1t-request-v%E1%BB%9Bi-claude-api\"\u003eBatch Processing — Xử lý hàng loạt request với Claude API\u003c\/a\u003e\u003c\/li\u003e\n\u003cli\u003e\u003ca href=\"\/en\/products\/rag-v%E1%BB%9Bi-pinecone-claude-vector-database-cho-ai\"\u003eRAG với Pinecone + Claude — Vector database cho AI\u003c\/a\u003e\u003c\/li\u003e\n\u003cli\u003e\u003ca href=\"\/en\/products\/multi-document-agent-truy-v%E1%BA%A5n-nhi%E1%BB%81u-tai-li%E1%BB%87u-v%E1%BB%9Bi-llamaindex\"\u003eMulti-Document Agent — Truy vấn nhiều tài liệu với LlamaIndex\u003c\/a\u003e\u003c\/li\u003e\n\u003cli\u003e\u003ca href=\"\/en\/products\/claude-cho-data-validation-va-data-quality\"\u003eClaude cho Data: Validation và data quality\u003c\/a\u003e\u003c\/li\u003e\n\u003cli\u003e\u003ca href=\"\/en\/products\/claude-cho-engineering-debug-va-x%E1%BB%AD-ly-l%E1%BB%97i\"\u003eClaude cho Engineering: Debug và xử lý lỗi\u003c\/a\u003e\u003c\/li\u003e\n\u003c\/ul\u003e","brand":"Minh Tuấn","offers":[{"title":"Default Title","offer_id":47721899557076,"sku":null,"price":0.0,"currency_code":"VND","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0821\/0264\/9044\/files\/fine-tuning-claude-tren-aws-bedrock-h_ng-d_n-t_ng-b_c_2b5df2ad-7ce6-4fc5-866d-655f9321edb2.jpg?v=1774521780","url":"https:\/\/claude.vn\/en\/products\/fine-tuning-claude-tren-aws-bedrock-h%c6%b0%e1%bb%9bng-d%e1%ba%abn-t%e1%bb%abng-b%c6%b0%e1%bb%9bc","provider":"CLAUDE.VN","version":"1.0","type":"link"}