{"product_id":"trich-xuất-json-co-cấu-truc-với-tool-use-khong-cần-regex","title":"Trích xuất JSON có cấu trúc với Tool Use — Không cần regex","description":"\n\u003cp\u003eMột trong những thách thức lớn nhất khi làm việc với LLM trong production là \u003cstrong\u003eoutput không nhất quán\u003c\/strong\u003e. Claude có thể trả lời \u003cem\u003e\"Sentiment: Positive\"\u003c\/em\u003e hoặc \u003cem\u003e\"Tích cực\"\u003c\/em\u003e hoặc \u003cem\u003e\"Đây là review tích cực vì...\"\u003c\/em\u003e — tùy hứng. Đây là cơn ác mộng khi build pipeline tự động.\u003c\/p\u003e\n\n\u003cp\u003eTool Use giải quyết vấn đề này một cách thanh lịch: thay vì nhờ Claude trả lời bằng text, bạn \u003cstrong\u003eđịnh nghĩa một tool giả\u003c\/strong\u003e với input schema chính xác là JSON structure bạn cần. Claude sẽ \"gọi tool\" bằng cách điền vào schema đó — và bạn đọc phần \u003ccode\u003etool.input\u003c\/code\u003e thay vì parse text.\u003c\/p\u003e\n\n\u003cp\u003eTrick này đặc biệt mạnh vì: không cần thực thi tool, không cần gửi tool_result, không cần vòng lặp phức tạp. Chỉ cần một lần gọi API và đọc structured data từ response.\u003c\/p\u003e\n\n\u003ch2\u003eVí dụ 1: Tóm tắt bài viết có cấu trúc\u003c\/h2\u003e\n\n\u003cp\u003eGiả sử bạn cần trích xuất thông tin từ bài báo: tác giả, chủ đề chính, điểm chất lượng và keywords.\u003c\/p\u003e\n\n\u003cpre\u003e\u003ccode\u003eimport anthropic\nimport json\n\nclient = anthropic.Anthropic()\n\n# Dinh nghia \"tool\" - thuc ra la schema output ban muon\narticle_summarizer = {\n    \"name\": \"print_article_summary\",\n    \"description\": \"Tom tat bai viet theo format co cau truc\",\n    \"input_schema\": {\n        \"type\": \"object\",\n        \"properties\": {\n            \"author\": {\n                \"type\": \"string\",\n                \"description\": \"Ten tac gia bai viet\"\n            },\n            \"topics\": {\n                \"type\": \"array\",\n                \"items\": {\"type\": \"string\"},\n                \"description\": \"Danh sach chu de chinh (toi da 5)\"\n            },\n            \"summary\": {\n                \"type\": \"string\",\n                \"description\": \"Tom tat ngan gon trong 2-3 cau\"\n            },\n            \"quality_score\": {\n                \"type\": \"number\",\n                \"description\": \"Diem chat luong 0-10\"\n            }\n        },\n        \"required\": [\"author\", \"topics\", \"summary\", \"quality_score\"]\n    }\n}\n\narticle_text = \"\"\"\nTac gia: Nguyen Van A\nTieu de: Trien vong AI trong y te Viet Nam 2026\n...noi dung bai viet...\n\"\"\"\n\nresponse = client.messages.create(\n    model=\"claude-opus-4-5\",\n    max_tokens=1024,\n    tools=[article_summarizer],\n    tool_choice={\"type\": \"tool\", \"name\": \"print_article_summary\"},\n    messages=[{\n        \"role\": \"user\",\n        \"content\": f\"Hay tom tat bai viet nay: {article_text}\"\n    }]\n)\n\n# Doc ket qua - khong can parse text!\ntool_use = response.content[0]\nsummary_data = tool_use.input\n\nprint(f\"Tac gia: {summary_data['author']}\")\nprint(f\"Chu de: {', '.join(summary_data['topics'])}\")\nprint(f\"Diem: {summary_data['quality_score']}\/10\")\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003eChú ý \u003ccode\u003etool_choice={\"type\": \"tool\", \"name\": \"print_article_summary\"}\u003c\/code\u003e — đây là cách \u003cstrong\u003eép buộc\u003c\/strong\u003e Claude phải gọi đúng tool đó. Không có tùy chọn, không có ngoại lệ.\u003c\/p\u003e\n\n\u003ch2\u003eVí dụ 2: Named Entity Recognition (NER)\u003c\/h2\u003e\n\n\u003cp\u003eNER là bài toán trích xuất các thực thể có tên (người, tổ chức, địa điểm) từ văn bản. Trước đây cần model NLP chuyên biệt, giờ Claude làm được với vài dòng code:\u003c\/p\u003e\n\n\u003cpre\u003e\u003ccode\u003ener_tool = {\n    \"name\": \"extract_entities\",\n    \"description\": \"Trich xuat cac thuc the co ten tu van ban\",\n    \"input_schema\": {\n        \"type\": \"object\",\n        \"properties\": {\n            \"people\": {\n                \"type\": \"array\",\n                \"items\": {\"type\": \"string\"},\n                \"description\": \"Ten nguoi duoc de cap\"\n            },\n            \"organizations\": {\n                \"type\": \"array\",\n                \"items\": {\"type\": \"string\"},\n                \"description\": \"Ten to chuc, cong ty, co quan\"\n            },\n            \"locations\": {\n                \"type\": \"array\",\n                \"items\": {\"type\": \"string\"},\n                \"description\": \"Ten dia diem, quoc gia, thanh pho\"\n            },\n            \"dates\": {\n                \"type\": \"array\",\n                \"items\": {\"type\": \"string\"},\n                \"description\": \"Cac moc thoi gian duoc de cap\"\n            }\n        },\n        \"required\": [\"people\", \"organizations\", \"locations\", \"dates\"]\n    }\n}\n\ntext = \"\"\"\nNguyen Van Binh, CEO cua VinAI, vua ky ket hop tac\nvoi Google DeepMind tai Ha Noi ngay 15\/3\/2026.\nThoa thuan tri gia 50 trieu USD nham phat trien\nhe thong AI cho thi truong Dong Nam A.\n\"\"\"\n\nresponse = client.messages.create(\n    model=\"claude-opus-4-5\",\n    max_tokens=512,\n    tools=[ner_tool],\n    tool_choice={\"type\": \"tool\", \"name\": \"extract_entities\"},\n    messages=[{\"role\": \"user\", \"content\": f\"Trich xuat entities: {text}\"}]\n)\n\nentities = response.content[0].input\n# entities = {\n#   \"people\": [\"Nguyen Van Binh\"],\n#   \"organizations\": [\"VinAI\", \"Google DeepMind\"],\n#   \"locations\": [\"Ha Noi\", \"Dong Nam A\"],\n#   \"dates\": [\"15\/3\/2026\"]\n# }\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eVí dụ 3: Sentiment Analysis đa chiều\u003c\/h2\u003e\n\n\u003cp\u003eThay vì chỉ Positive\/Negative, bạn có thể yêu cầu phân tích cảm xúc chi tiết với điểm số và lý do:\u003c\/p\u003e\n\n\u003cpre\u003e\u003ccode\u003esentiment_tool = {\n    \"name\": \"analyze_sentiment\",\n    \"description\": \"Phan tich cam xuc van ban theo nhieu chieu\",\n    \"input_schema\": {\n        \"type\": \"object\",\n        \"properties\": {\n            \"overall_sentiment\": {\n                \"type\": \"string\",\n                \"enum\": [\"very_positive\", \"positive\", \"neutral\", \"negative\", \"very_negative\"]\n            },\n            \"score\": {\n                \"type\": \"number\",\n                \"description\": \"Diem cam xuc tu -1.0 (rat tieu cuc) den 1.0 (rat tich cuc)\"\n            },\n            \"emotions\": {\n                \"type\": \"array\",\n                \"items\": {\n                    \"type\": \"object\",\n                    \"properties\": {\n                        \"emotion\": {\"type\": \"string\"},\n                        \"intensity\": {\"type\": \"number\", \"description\": \"0.0 den 1.0\"}\n                    }\n                },\n                \"description\": \"Cac cam xuc cu the duoc phat hien\"\n            },\n            \"key_phrases\": {\n                \"type\": \"array\",\n                \"items\": {\"type\": \"string\"},\n                \"description\": \"Cac cum tu quyet dinh den sentiment\"\n            },\n            \"reasoning\": {\n                \"type\": \"string\",\n                \"description\": \"Ly giai ngan gon ve ket qua phan tich\"\n            }\n        },\n        \"required\": [\"overall_sentiment\", \"score\", \"emotions\", \"key_phrases\", \"reasoning\"]\n    }\n}\n\nreview = \"\"\"\nSan pham nay that su vuot ngoai mong doi! Chat lieu tot, giao hang\nnhanh hon du kien. Tuy nhien, huong dan su dung kha kho hieu,\nphai doc nhieu lan moi hieu. Nhin chung, toi rat hai long va\nse mua lai lan sau.\n\"\"\"\n\nresponse = client.messages.create(\n    model=\"claude-opus-4-5\",\n    max_tokens=512,\n    tools=[sentiment_tool],\n    tool_choice={\"type\": \"tool\", \"name\": \"analyze_sentiment\"},\n    messages=[{\"role\": \"user\", \"content\": review}]\n)\n\nresult = response.content[0].input\nprint(f\"Sentiment: {result['overall_sentiment']} (score: {result['score']})\")\nprint(f\"Emotions: {result['emotions']}\")\nprint(f\"Reasoning: {result['reasoning']}\")\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eVí dụ 4: Text Classification đa nhãn\u003c\/h2\u003e\n\n\u003cp\u003ePhân loại nội dung vào nhiều categories cùng lúc — phổ biến trong content moderation và tagging:\u003c\/p\u003e\n\n\u003cpre\u003e\u003ccode\u003eclassifier_tool = {\n    \"name\": \"classify_content\",\n    \"description\": \"Phan loai noi dung vao cac danh muc\",\n    \"input_schema\": {\n        \"type\": \"object\",\n        \"properties\": {\n            \"primary_category\": {\n                \"type\": \"string\",\n                \"enum\": [\"technology\", \"business\", \"health\", \"entertainment\", \"sports\", \"politics\", \"other\"]\n            },\n            \"secondary_categories\": {\n                \"type\": \"array\",\n                \"items\": {\n                    \"type\": \"string\",\n                    \"enum\": [\"AI\", \"finance\", \"startup\", \"mobile\", \"cloud\", \"education\", \"sustainability\"]\n                },\n                \"description\": \"Danh muc phu (co the nhieu)\"\n            },\n            \"target_audience\": {\n                \"type\": \"string\",\n                \"enum\": [\"general\", \"professional\", \"student\", \"developer\", \"executive\"]\n            },\n            \"content_maturity\": {\n                \"type\": \"string\",\n                \"enum\": [\"all_ages\", \"teen\", \"adult\"]\n            },\n            \"confidence\": {\n                \"type\": \"number\",\n                \"description\": \"Do tin cay phan loai tu 0.0 den 1.0\"\n            }\n        },\n        \"required\": [\"primary_category\", \"secondary_categories\", \"target_audience\", \"content_maturity\", \"confidence\"]\n    }\n}\n\n# Su dung tuong tu nhu tren\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eVí dụ 5: Trích xuất với Unknown Keys\u003c\/h2\u003e\n\n\u003cp\u003eĐây là trường hợp thú vị nhất: khi bạn không biết trước các keys. Ví dụ, trích xuất thông số kỹ thuật từ mô tả sản phẩm:\u003c\/p\u003e\n\n\u003cpre\u003e\u003ccode\u003eflexible_extractor = {\n    \"name\": \"extract_specifications\",\n    \"description\": \"Trich xuat cac thong so ky thuat tu mo ta san pham\",\n    \"input_schema\": {\n        \"type\": \"object\",\n        \"properties\": {\n            \"product_name\": {\"type\": \"string\"},\n            \"specifications\": {\n                \"type\": \"object\",\n                \"description\": \"Cap key-value cac thong so ky thuat. Key la ten thong so, value la gia tri.\",\n                \"additionalProperties\": {\n                    \"type\": \"string\"\n                }\n            },\n            \"price_vnd\": {\n                \"type\": \"number\",\n                \"description\": \"Gia san pham theo VND, null neu khong co\"\n            }\n        },\n        \"required\": [\"product_name\", \"specifications\"]\n    }\n}\n\nproduct_description = \"\"\"\nLaptop Dell XPS 15 9530 - Bao hanh 12 thang\nProcessor: Intel Core i9-13900H, 24 cores\nRAM: 32GB DDR5 4800MHz\nStorage: 1TB NVMe SSD PCIe Gen 4\nDisplay: 15.6 inch OLED, 3.5K 120Hz\nGPU: NVIDIA RTX 4060 8GB\nGia: 45.990.000d\n\"\"\"\n\nresponse = client.messages.create(\n    model=\"claude-opus-4-5\",\n    max_tokens=512,\n    tools=[flexible_extractor],\n    tool_choice={\"type\": \"tool\", \"name\": \"extract_specifications\"},\n    messages=[{\"role\": \"user\", \"content\": product_description}]\n)\n\nspecs = response.content[0].input\nprint(f\"San pham: {specs['product_name']}\")\nfor key, value in specs['specifications'].items():\n    print(f\"  {key}: {value}\")\n# Output:\n# San pham: Dell XPS 15 9530\n#   Processor: Intel Core i9-13900H, 24 cores\n#   RAM: 32GB DDR5 4800MHz\n#   Storage: 1TB NVMe SSD...\n#   ...\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003eClaude sẽ tự động tạo ra các keys phù hợp dựa trên nội dung — không cần biết trước schema!\u003c\/p\u003e\n\n\u003ch2\u003eSo sánh: Tool Use vs Prompt thông thường\u003c\/h2\u003e\n\n\u003ctable\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n\u003cth\u003eTiêu chí\u003c\/th\u003e\n\u003cth\u003ePrompt thông thường\u003c\/th\u003e\n\u003cth\u003eTool Use (Structured)\u003c\/th\u003e\n\u003c\/tr\u003e\n  \u003c\/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n\u003ctd\u003eOutput format\u003c\/td\u003e\n\u003ctd\u003eCó thể thay đổi\u003c\/td\u003e\n\u003ctd\u003eLuôn đúng schema\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eParse complexity\u003c\/td\u003e\n\u003ctd\u003eCần regex\/text parsing\u003c\/td\u003e\n\u003ctd\u003eJSON trực tiếp\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eMissing fields\u003c\/td\u003e\n\u003ctd\u003eKhông được đảm bảo\u003c\/td\u003e\n\u003ctd\u003eRequired fields luôn có\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eType safety\u003c\/td\u003e\n\u003ctd\u003eKhông có\u003c\/td\u003e\n\u003ctd\u003eValidated theo schema\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eNested data\u003c\/td\u003e\n\u003ctd\u003eKhó parse\u003c\/td\u003e\n\u003ctd\u003eDễ dàng\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eToken cost\u003c\/td\u003e\n\u003ctd\u003eThấp hơn\u003c\/td\u003e\n\u003ctd\u003eCao hơn nhẹ (schema tokens)\u003c\/td\u003e\n\u003c\/tr\u003e\n  \u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003ch2\u003eKhi nào dùng kỹ thuật này?\u003c\/h2\u003e\n\n\u003cul\u003e\n  \u003cli\u003e\n\u003cstrong\u003eData extraction pipeline\u003c\/strong\u003e — Xử lý hàng nghìn documents cần consistent output\u003c\/li\u003e\n  \u003cli\u003e\n\u003cstrong\u003eAPI response\u003c\/strong\u003e — Trả JSON cho frontend mà không cần transform\u003c\/li\u003e\n  \u003cli\u003e\n\u003cstrong\u003eDatabase population\u003c\/strong\u003e — Tự động điền vào fields cụ thể\u003c\/li\u003e\n  \u003cli\u003e\n\u003cstrong\u003eMulti-language NLP\u003c\/strong\u003e — NER, sentiment cho tiếng Việt không cần model riêng\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp\u003eKỹ thuật này đặc biệt valuable khi bạn cần \u003cstrong\u003eđộ tin cậy cao trong production\u003c\/strong\u003e. Thay vì chạy 1000 requests và hy vọng Claude luôn output đúng format, bạn có guarantee về schema từ đầu.\u003c\/p\u003e\n\n\u003cp\u003eBài tiếp theo: xây dựng Customer Service Agent thực sự — nơi Tool Use không chỉ là trick mà là cơ chế cốt lõi để chatbot truy vấn database và xử lý đơn hàng.\u003c\/p\u003e\n","brand":"Minh Tuấn","offers":[{"title":"Default Title","offer_id":47721764683988,"sku":null,"price":0.0,"currency_code":"VND","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0821\/0264\/9044\/files\/trich-xu_t-json-co-c_u-truc-v_i-tool-use-khong-c_n-regex.jpg?v=1774506557","url":"https:\/\/claude.vn\/products\/trich-xu%e1%ba%a5t-json-co-c%e1%ba%a5u-truc-v%e1%bb%9bi-tool-use-khong-c%e1%ba%a7n-regex","provider":"CLAUDE.VN","version":"1.0","type":"link"}