Tuần trước, đây là task thực tế:
- Nhận ra hầu hết AI failures liên quan 2 hoặc nhiều thuộc tính tương tác, không phải 1
- Chẩn đoán các failure pattern phổ biến (hallucinated citations, long-conversation drift, confidently wrong math, agreeable bad premises) bằng cách đặt tên properties liên quan
- Áp dụng targeted fix dựa vào thuộc tính limiting (nút cổ chai)
- Xây diagnostic habit: trước khi reach for prompt fix, hỏi "which properties am I looking at?"
Four properties — nhắc lại nhanh
Trước khi diagnose collision, nhắc 4 properties (nếu cần ôn, quay lại Bài 17.3, 17.6, 17.8, 17.10):
🔮 Next Token Prediction — "Generates what sounds right"
Predicts text-shape, không phải lookup facts.
🌐 Knowledge — "Knows what it was trained on"
Broad but uneven, frozen at cutoff.
🧠 Working Memory — "Attends to what's nearby"
Fixed context window, cliff failure.
🎚️ Steerability — "Follows the loudest instruction"
Pattern-matches instructions, not intent.5 collision patterns phổ biến
Collision 1: Next Token Prediction × Knowledge = Hallucinated specifics
Symptom: Hỏi về niche topic → get paper title, author names, journal, DOI — none real.
Cơ chế:
Ví dụ mạch lạc:
Fix (targeted):
Collision 2: Working Memory × Steerability = Long-conversation drift
Symptom: Set careful constraints at start; 20 messages later, half are being ignored.
Cơ chế:
Ví dụ:
- NTP: Model đang generate text fitting "what a plausible citation looks like"
- Knowledge: Có gap (niche or post-cutoff)
- Model không biết phân biệt what it knows vs what it's inventing
- ✅ Source grounding: force retrieve from real document corpus → model can only cite retrieved
- ✅ Explicit "tag fabricated": "For each citation, tag HIGH confidence (verified in your training) or LOW (inferring)"
- ✅ Use web search + verify: Research mode Claude, Perplexity
- ❌ Don't: prompt "don't fabricate" — model doesn't know it's fabricating
- Working Memory: Early context fading (lost in middle as chat grows)
- Steerability: Follows whatever instructions most salient right now
- Early constraints → distant → less salient → not applied
User: "Cite 3 papers on X (niche topic)"
Model:
- [Plausible-sounding citation 1] ← maybe real
- [Plausible-sounding citation 2] ← fabricated
- [Plausible-sounding citation 3] ← fabricatedCollision 2: Working Memory × Steerability = Long-conversation drift
Fix (targeted):
Collision 3: Next Token Prediction × Steerability = Letter-over-spirit at scale
Symptom: Instructions honored literally, intent consistently missed.
Cơ chế:
Ví dụ:
- ✅ Re-supply critical context: restate constraints near current message
- ✅ Start fresh với essentials up front (better than long drifted chat)
- ✅ Use system prompt for standing constraints (doesn't dilute)
- ✅ Use Projects/Custom Instructions — persists across chats
- ❌ Don't: assume "early instructions still apply"
- NTP: Pattern-match on words
- Steerability: Follow instruction without understanding goal
- Model picks easiest-to-match interpretation
Turn 1: "Always respond in markdown. Never use first person.
Max 200 words."
Turns 2-30: Normal chat, various topics.
Turn 31: Response in prose, uses "I think", 400 words.
Constraints still in context, but attention weight diluted.Collision 3: Next Token Prediction × Steerability = Letter-over-spirit at scale
Fix (targeted):
Collision 4: Knowledge × Steerability = Confidently wrong on niche instructions
Symptom: Give domain-specific instruction → AI follows but produces wrong output because lacks domain knowledge.
Cơ chế:
Ví dụ:
- ✅ State intent per cluster: "These 15 emails: more formal. These 5: warmer."
- ✅ IPO framework (Bài 17.10)
- ✅ Chain-of-thought: ask model reason about each case
- ❌ Don't: rely on single instruction for heterogeneous task
- Knowledge: Gap in specialized domain
- Steerability: Follows instruction confidently anyway
- Result: Confident wrong output (no signal of knowledge gap)
User: "Make these 20 emails more professional."
Model: Adds formal language to all 20.
But: 5 of the 20 had the opposite problem — too formal,
losing warmth with long-term customers.
Model applied one interpretation uniformly.Collision 4: Knowledge × Steerability = Confidently wrong on niche instructions
Fix (targeted):
Collision 5: All 4 — Long multi-step task with specific domain
Symptom: Task is long (Working Memory), domain-specific (Knowledge), multi-step (Steerability drift), requires specifics (NTP hallucination). Everything fails.
Ví dụ: "Review 50-page medical contract between pharma company and CRO, flag compliance issues under 21 CFR Part 11."
Fix (layered):
- ✅ Provide domain context: paste FrameX docs into prompt
- ✅ Ask to verify knowledge: "Do you know FrameX? If not, flag."
- ✅ Show example first: give 1-2 examples of correct FrameX tests
- ❌ Don't: assume model "probably knows"
- Working Memory: 50 pages stretching context
- Knowledge: 21 CFR Part 11 niche + regulatory
- Steerability: Multi-step review drifts
- NTP: Specifics (sections, regulatory references) hallucination risk
- ✅ Chunking (WM fix)
- ✅ RAG with 21 CFR docs (Knowledge fix)
- ✅ Checkpoints per chunk (Steerability fix)
- ✅ "Cite exact section from contract" (NTP fix)
- ✅ Human legal review of output (all fix)
User: "Write a unit test for this Python function using our
internal test framework 'FrameX'."
Model: [writes test with pytest-like syntax that LOOKS like
FrameX but isn't actually FrameX]
Model didn't flag: "I'm not familiar with FrameX."Diagnostic framework: "Which 2 collided?"
Trước khi reach for prompt fix, ask:
┌──────────────────────────────────────────────────────────┐ │ │ │ DIAGNOSTIC CHECKLIST: │ │ │ │ 1. What property question was most relevant? │ │ - Where do answers come from? (NTP) │ │ - What does it know? (Knowledge) │ │ - What's it attending to? (WM) │ │ - How much am I in control? (Steerability) │ │ │ │ 2. What was the FAILURE MODE? │ │ - Fabricated specifics? → NTP │ │ - Stale / wrong facts? → Knowledge │ │ - Ignored earlier info? → Working Memory │ │ - Literal interpretation miss intent? → Steerability │ │ │ │ 3. PAIR them up — most failures involve 2: │ │ - Fabricated + niche domain → NTP × Knowledge │ │ - Drifted + long session → WM × Steerability │ │ - Wrong output + specific instruction → K × S │ │ - Literal + varied cases → NTP × Steerability │ │ │ │ 4. Apply TARGETED fix: │ │ - NTP: verify specifics, use grounding │ │ - Knowledge: add context, use RAG/search │ │ - WM: re-supply context, chunk, start fresh │ │ - Steerability: restate goal, checkpoints, examples │ │ │ └──────────────────────────────────────────────────────────┘
Diagnostic table: Common symptoms → properties → fixes
| Symptom | Properties likely at play | Best fix |
|---|---|---|
| AI cited paper that doesn't exist | NTP × Knowledge | Source grounding or verify specifics |
| After 30 messages AI forgot rules | WM × Steerability | Restate constraints or fresh chat |
| AI confidently wrong on niche term | Knowledge × Steerability | Provide domain context upfront |
| "Made shorter" but cut key info | NTP × Steerability | State goal alongside instruction (Bài 17.10) |
| Wrong math despite clear problem | NTP × Steerability | Code execution |
| AI agreed with wrong premise | Fingerprint (sycophancy) × Steerability | Neutral framing + "disagree if I'm wrong" |
| Long doc analysis missing middle | WM × NTP | Chunk + front-load critical |
| Policy answer from outdated info | Knowledge × WM | Provide current policy in context |
| Multi-step plan goes off rails | Steerability drift + WM | Checkpoints + re-verify goal |
| AI doesn't ask clarifying question | Steerability (+ over-caution) | Instruct: "Ask if ambiguous before proceeding" |
Ví dụ theo ngành
💼 Sales Manager — "AI biến một deal thành 2 deals"
Situation: Sales manager dùng Claude để summarize 20 sales call transcripts cho team review.
Output: Claude cho 15 deal summaries — but 2 của chúng hợp nhất thành 1 deal mới bịa ra, và split 1 real deal thành 2 riêng biệt.
Diagnostic:
Collision: WM × NTP
Fix:
Kết quả: 0 lỗi merge/split. Process 2x faster.
🔍 Research Analyst — "Lit review bịa 3 paper"
Situation: Academic researcher nhờ Claude review 40 papers về emerging field.
Output: Excellent synthesis — but 3/40 paper names cited don't exist.
Diagnostic:
Collision: Knowledge × NTP (classic)
Fix:
Kết quả: 100% citation accuracy.
⚖️ Legal Counsel — "50-page contract review — AI missed 3 critical clauses"
Situation: Senior counsel review 50-page M&A contract via Claude.
Output: Found 8 issues — but missed 3 critical indemnification clauses buried in sections 4-6 (middle).
Diagnostic:
Collision: WM × Steerability
Fix:
Kết quả: Caught all 11 issues (including 3 missed earlier). Deal saved $1.5M risk.
💰 Finance Analyst — "Wrong IRR in memo"
Situation: Analyst asks Claude: "Calculate IRR for project with these cashflows: -100, 30, 35, 45, 50. Write investment memo with IRR + NPV at 10% discount."
Output: "IRR = 18.4%, NPV = $35.72M" (both approximate, off from actual).
Diagnostic:
Collision: NTP × Steerability (brittle arithmetic)
Fix:
Kết quả: Precision 100%. Memo credible.
🎧 Customer Success Manager — "Chat got worse over 30 messages"
Situation: CSM starts chat với Claude to draft renewal proposals for 10 clients. Long session, refining each.
Output: First 5 proposals — excellent, on-brand. Last 5 — drift: inconsistent tone, mixing brand styles, missing structure element from earlier.
Diagnostic:
Collision: WM × Steerability
Fix:
Kết quả: Consistency 10/10 proposals.
🏥 Clinical Research Coordinator — "AI gave wrong drug protocol"
Situation: CRC asks Claude about study drug dosing protocol for Phase III trial.
Output: Claude confidently states protocol. Later, checking with PI: several details wrong.
Diagnostic:
Collision: Knowledge × Steerability
Fix:
Kết quả: 0 errors. CRC can trust outputs.
- Working Memory: 20 transcripts ~80K tokens, pushed into context length limits
- NTP: Generating summary with normalized structure — model "filled pattern" instead of preserving unique deals
- Chunk: 5 transcripts per pass
- Per pass: extract structured output {deal_id, summary, status}
- Aggregate at end
- Verify count (20 in, 20 out)
- Knowledge: emerging field → thin coverage
- NTP: generating citations in academic format → fabrication pattern
- Upload 40 PDFs into Claude Projects
- Prompt: "Synthesize based ONLY on uploaded papers. Cite by [Paper #]. If unsure, flag [NEEDS VERIFY]"
- After draft: verify every [Paper #] references uploaded doc
- No citations from "training memory"
- Working Memory: "lost in the middle" — sections 4-6 deep in attention dead zone
- Steerability: instruction "find issues" was generic, drifted during long review
- Chunk 50 pages into 5 × 10-page sections
- Per chunk: specific instruction — "Find all indemnification, limitation of liability, termination clauses. Flag any deviation from market standard."
- Aggregate at end with dedupe
- NTP: arithmetic weakness (approximating by patterns)
- Steerability: instruction "calculate" was literal but model lacked mechanism for precision
- Code execution: model write Python, execute, use exact number
- Prompt: "Compute using Python. Present exact number. Don't approximate."
- Working Memory: early style guide + examples (Turn 1-3) now deep in history
- Steerability: latest instruction salience trumps earlier structure
- Minor: fingerprint verbosity leaking in as attention diffuses
- Setup Claude Project với style guide + brand examples as standing docs
- Per proposal: new chat trong Project (context clean, style reference loaded)
- If long session: handoff doc every 5 proposals
- Knowledge: Phase III trials are specific; protocol details niche
- Steerability: Claude followed "tell me the protocol" without flagging uncertainty
- Over-caution fingerprint absent when it should've been present (ironically)
- RAG: upload study protocol PDF
- Prompt: "Answer ONLY from uploaded protocol. If info not in protocol, say 'Not in uploaded documents.'"
- Never ask protocol questions without upload
# Output from code execution:
IRR = 18.73%
NPV = $36.23MAnti-patterns
❌ "It's just hallucination" — single-property diagnosis
Tại sao sai: Too generic → fix too generic.
Cách đúng: Name 2 properties. "NTP × Knowledge gap on niche topic" → targeted fix (RAG + grounding).
❌ "Re-prompt until it works"
Tại sao sai: Random prompting. Same failure likely repeat. Waste time.
Cách đúng: Diagnose first. Different failure types need different fixes.
❌ "Model got better, same prompt works now"
Tại sao sai: Maybe. But limitations shifted, not disappeared. Collision patterns still exist.
Cách đúng: Keep diagnostic habit. Test periodically. Boundaries move.
❌ "Single fix = solves all my problems"
Tại sao sai: Different tasks have different property profiles. One fix (e.g., "always use RAG") helps some tasks, hurts others.
Cách đúng: Per-task diagnosis. Right tool for right collision.
❌ "AI will eventually ask me if confused"
Tại sao sai: Model often picks interpretation without asking (Steerability + NTP collision).
Cách đúng: Instruct upfront: "If ambiguous, ask before proceeding."
Mẹo nâng cao
Mẹo 1: "Failure log" với property tagging
Maintain log:
Over weeks, patterns emerge. Apply preemptively.
Mẹo 2: Task complexity matrix before starting
Date | Task | Failure | Properties | Fix applied | Result
2026-03-01 | Lit review | Bịa 3 citations | NTP × Knowledge |
RAG with PDFs | 100% accurate
2026-03-05 | Long contract review | Missed middle clauses |
WM × Steerability | Chunking | Caught all
2026-03-10 | Email redraft | Letter-over-spirit | NTP × Steerability |
IPO framework | BetterMẹo 2: Task complexity matrix before starting
Mẹo 3: "Stress test" bằng cách introduce property issues
To test new prompt:
For each task, rate (1-5):
- Specificity required (NTP risk)
- Domain niche-ness (Knowledge risk)
- Content length (WM risk)
- Instruction complexity (Steerability risk)
Total > 12 → plan for collision. Use RAG + chunking + checkpoints.
Total < 8 → low risk, standard prompt.Mẹo 3: "Stress test" bằng cách introduce property issues
Mẹo 4: "Property-aware review" habit
After every AI output:
Test 1 (WM stress): Add 30 irrelevant turns before.
Test 2 (Knowledge stress): Ask about obscure niche variant.
Test 3 (Steerability stress): Give contradicting constraints.
Test 4 (NTP stress): Ask for 5 very specific facts.
See which breaks first → know your weakest property for this task.Mẹo 4: "Property-aware review" habit
Mẹo 5: Cross-property failure cascades
Some failures cascade:
Quick 30-second review:
✓ Specifics (names, numbers, citations): verified?
✓ Knowledge currency: up-to-date?
✓ All input context attended: nothing missed?
✓ Intent (not just instruction): achieved?
If no → identify collision → fix specifically.Mẹo 5: Cross-property failure cascades
Prevent cascades: catch failures early, one layer at a time.
Initial: Bịa specifics (NTP)
→ Include in longer doc
→ Long doc → WM limit
→ User skim, miss that specifics bịa (NTP amplified)
→ Act on wrong infoÁp dụng ngay
Bài tập 1: Failure Diagnosis (~25 phút)
Lý do: Most real-world AI failures aren't 1 property acting up. They're 2 properties meeting. Naming which 2 changes the fix entirely.
Step 1: Nhìn lại experience với AI (kể cả quan sát trong khóa học này). Identify 2-3 lần AI output thực sự làm bạn disappointed hoặc surprised.
For each, describe trong 1-2 câu: what bạn asked, what bạn got, what was disappointing/surprising.
Step 2: Walk through each event with the AI:
Step 3: Evaluate AI's diagnosis against what bạn giờ biết. Do you agree? If not, push back.
Step 4: For each diagnosis:
If possible, test the adjustment right now on a similar task.
Step 5: Nhìn lại task list từ Bài 17.0 với tất cả annotations:
For tasks gave you trouble most, name which 2 properties were colliding. Write diagnosis next to each.
Bài tập 2: Preemptive collision mapping (optional)
For your top 5 recurring AI tasks, complete:
Apply setups. Track whether collision avoided.
- Property tags (Bài 17.1)
- Verification scores (Bài 17.3)
- Knowledge flags (Bài 17.5)
- Context needs (Bài 17.7)
- Goal statements (Bài 17.9)
| Task | Primary property risk | Secondary property risk | Expected collision | Preemptive setup |
|---|---|---|---|---|
| 1. | ||||
| 2. | ||||
| 3. | ||||
| 4. | ||||
| 5. |
Failure 1: ________________________________
Failure 2: ________________________________
Failure 3: ________________________________Suy ngẫm bài học
- Naming the property pair có change fix bạn'd reach for không? Before this course, bạn có chọn different (less effective) fix không?
- Which property pairing bạn nghĩ you'll gặp most in day-to-day work?
- Có failure nào từ trước mà giờ bạn'd gọi tên differently?
Tóm tắt bài học
🎯 Real-world failures usually involve 2 properties interacting, not 1. Naming both → pointed fix.
🎯 5 diagnostic pairs to recognize:
🎯 Diagnostic habit: before reaching for prompt fix, ask "which properties am I looking at?"
🎯 Targeted fix matrix:
🎯 This diagnostic move is Discernment applied. Bạn evaluate better when bạn know what kind of wrong you're looking at.
🎯 It feeds Delegation too: repeated compound failures on same task-type = signal about what to restructure, break up, or keep cho yourself.
- NTP × Knowledge (hallucinated specifics)
- Working Memory × Steerability (long-conversation drift)
- NTP × Steerability (letter-over-spirit at scale)
- Knowledge × Steerability (confidently wrong niche)
- All 4 (long multi-step specific-domain)
- NTP → grounding, verification
- Knowledge → RAG, search, context
- Working Memory → chunk, re-supply, fresh chat
- Steerability → IPO framework, checkpoints, examples
- Bài 17.3, 17.6, 17.8, 17.10 — 4 properties in detail
- Anthropic — AI Fluency course — Complementary 4D Framework
- Bài 17.12 — Putting it all together