Bài tập — Structured data cho 3 use case — Building with the Claude API

Bạn sẽ học được

Áp dụng pattern prefill + stop_sequence cho 3 use case khác nhau
Viết validation function cho output structured
Handle edge case parse fail
Build reusable utility function cho structured extraction

Đề bài

Hoàn thành 3 task. Mỗi task tự code + test trong notebook 03_structured.ipynb.

Task 1: Recipe extractor

Input: Text mô tả món ăn (paragraph tiếng Việt).

Output JSON:

Test input:

{
  "dish_name": "Phở bò",
  "prep_time_minutes": 30,
  "cook_time_minutes": 120,
  "servings": 4,
  "ingredients": [
    {"name": "xương bò", "amount": 1, "unit": "kg"},
    {"name": "bánh phở", "amount": 400, "unit": "g"}
  ],
  "steps": [
    "Ninh xương 2 giờ",
    "Nêm gia vị..."
  ]
}

Task 1: Recipe extractor (tiếp)

Validation:

Starter code:

prep_time_minutes và cook_time_minutes là int
ingredients có ít nhất 3 phần tử, mỗi phần tử có name, amount, unit
steps là list strings

"Phở bò truyền thống Hà Nội. Chuẩn bị 30 phút, nấu 2 tiếng. 
Cho 4 người ăn. Cần 1kg xương bò, 400g bánh phở, hành tím, gừng nướng. 
Đầu tiên ninh xương 2 giờ với gừng hành nướng, sau đó lọc lấy nước dùng..."

Task 1: Recipe extractor (tiếp)

def extract_recipe(text: str) -> dict:
    messages = [
        {"role": "user", "content": f"Extract recipe data:\n\n{text}"},
        {"role": "assistant", "content": "```json\n"}
    ]
    # TODO: call client.messages.create with stop_sequences
    # TODO: parse json
    # TODO: validate
    pass

Task 2: SQL query generator

Input: Schema + natural language question (tiếng Việt).

Output: Pure SQL string (không markdown, không comment).

Test input:

Expected output (ví dụ):

Schema:
- users(id, name, email, created_at, country)
- orders(id, user_id, amount, status, created_at)

Question: "Liệt kê top 5 user ở Việt Nam có tổng order lớn nhất tháng 3/2026"

Task 2: SQL query generator (tiếp)

Validation:

Gợi ý:

Output chứa keyword SELECT
Không chứa ` ` ` hoặc explain
Kết thúc với ;
Prefill ` `sql\n `
stop_sequences=["`"]
Temperature=0

SELECT u.id, u.name, SUM(o.amount) as total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.country = 'Vietnam'
  AND o.created_at >= '2026-03-01'
  AND o.created_at < '2026-04-01'
  AND o.status = 'completed'
GROUP BY u.id, u.name
ORDER BY total DESC
LIMIT 5;

Task 3: Tweet sentiment batch classifier

Input: List 10 tweet.

Output: List 10 object {tweet, sentiment, confidence}.

Validation:

Challenge: Batch-process 10 tweet trong 1 call (tiết kiệm token vs 10 calls).

Gợi ý:

Length list output = length input
Mỗi sentiment thuộc {"positive", "negative", "neutral"}
confidence là float 0-1
Pass list tweet vào user message dạng numbered list
Prefill ` `json\n[ `
Stop sequence ` ` `

[
  {"tweet": "Yêu app này quá!", "sentiment": "positive", "confidence": 0.95},
  {"tweet": "Lại crash nữa rồi", "sentiment": "negative", "confidence": 0.92},
  ...
]

Skeleton hoàn chỉnh

import json
from anthropic import Anthropic

client = Anthropic()
model = "claude-sonnet-5-20260205"


def call_json(user_prompt: str, max_retries: int = 2) -> dict:
    """Utility chung: call Claude, get JSON."""
    for attempt in range(max_retries + 1):
        messages = [
            {"role": "user", "content": user_prompt},
            {"role": "assistant", "content": "```json\n"}
        ]
        msg = client.messages.create(
            model=model,
            max_tokens=2000,
            messages=messages,
            stop_sequences=["```"],
            temperature=0
        )
        raw = msg.content[0].text.strip()
        try:
            return json.loads(raw)
        except json.JSONDecodeError as e:
            if attempt == max_retries:
                print(f"Failed after {max_retries} retries: {raw}")
                raise
            print(f"Retry {attempt + 1}")
    return None


# Task 1
def extract_recipe(text: str) -> dict:
    prompt = f"""Extract recipe từ text Vietnamese sau:

<text>
{text}
</text>

Schema:
- dish_name: string
- prep_time_minutes: int
- cook_time_minutes: int
- servings: int
- ingredients: [{{name, amount, unit}}]
- steps: [string]"""
    return call_json(prompt)


def validate_recipe(recipe: dict) -> bool:
    assert isinstance(recipe["prep_time_minutes"], int)
    assert isinstance(recipe["cook_time_minutes"], int)
    assert len(recipe["ingredients"]) >= 3
    for ing in recipe["ingredients"]:
        assert all(k in ing for k in ["name", "amount", "unit"])
    return True


# Task 2
def generate_sql(schema: str, question: str) -> str:
    messages = [
        {"role": "user", "content": f"""Schema:
{schema}

Question: {question}

Generate clean SQL query."""},
        {"role": "assistant", "content": "```sql\n"}
    ]
    msg = client.messages.create(
        model=model,
        max_tokens=500,
        messages=messages,
        stop_sequences=["```"],
        temperature=0
    )
    return msg.content[0].text.strip()


# Task 3
def classify_tweets(tweets: list) -> list:
    numbered = "\n".join(f"{i+1}. {t}" for i, t in enumerate(tweets))
    prompt = f"""Classify sentiment của mỗi tweet sau. Return JSON array.

Tweets:
{numbered}

Schema cho mỗi element: {{tweet: string, sentiment: "positive|negative|neutral", confidence: 0-1}}"""
    messages = [
        {"role": "user", "content": prompt},
        {"role": "assistant", "content": "```json\n"}
    ]
    msg = client.messages.create(
        model=model,
        max_tokens=2000,
        messages=messages,
        stop_sequences=["```"],
        temperature=0
    )
    return json.loads(msg.content[0].text.strip())


# === Test ===
recipe = extract_recipe("Phở bò truyền thống Hà Nội...")
print(recipe)
validate_recipe(recipe)

sql = generate_sql("users(id, name, country)\norders(id, user_id, amount)",
                    "Top 5 user Vietnam có tổng order nhiều nhất")
print(sql)

tweets = [
    "Yêu app này quá!",
    "Lại crash nữa rồi",
    "OK, không tệ",
    # ... thêm 7 tweets
]
results = classify_tweets(tweets)
print(json.dumps(results, ensure_ascii=False, indent=2))

Self-review checklist

[ ] 3 task đều chạy được không crash
[ ] Output match schema mong đợi
[ ] Validation catch malformed output
[ ] Retry logic triggered ít nhất 1 lần (có thể test bằng sabotage temperature=1)
[ ] call_json là utility tái sử dụng được cho task khác

Mẹo debug

Nếu JSON parse fail liên tục

In raw output xem Claude trả về gì:
Check quote: đôi khi Claude dùng smart quote "..." thay vì ASCII "...":

print("Raw:", repr(raw))

Nếu JSON parse fail liên tục

Check trailing comma: JSON không chấp nhận {"a": 1,}:

raw = raw.replace(""", '"').replace(""", '"')

Mẹo debug (tiếp)

Nếu SQL có markdown

Nếu batch classification output sai length

Check stop_sequences đã set đúng chưa
Check prefill là ` `sql\n ` (có newline)
Prompt explicit: "Return exactly 10 items"
Include test count trong prompt

import re
raw = re.sub(r",\s*([}\]])", r"\1", raw)

Tóm tắt

🎯 3 task = 3 pattern structured output — JSON nested, SQL plain, JSON array batch.

🎯 call_json utility — abstract pattern common, dùng cho mọi task JSON sau.

🎯 Validation là safety net — catch output không đúng trước khi dùng downstream.

🎯 Batch processing tiết kiệm token — 10 tweet / 1 call < 10 calls / 1 tweet.

Nội dung này có hữu ích không?