When Properties Collide — Chẩn đoán khi 2 thuộc tính giao thoa

Tích hợp & áp dụngNâng cao25 phút

Tuần trước, đây là task thực tế:

Bạn sẽ học được
  • Nhận ra hầu hết AI failures liên quan 2 hoặc nhiều thuộc tính tương tác, không phải 1
  • Chẩn đoán các failure pattern phổ biến (hallucinated citations, long-conversation drift, confidently wrong math, agreeable bad premises) bằng cách đặt tên properties liên quan
  • Áp dụng targeted fix dựa vào thuộc tính limiting (nút cổ chai)
  • Xây diagnostic habit: trước khi reach for prompt fix, hỏi "which properties am I looking at?"

Four properties — nhắc lại nhanh

Trước khi diagnose collision, nhắc 4 properties (nếu cần ôn, quay lại Bài 17.3, 17.6, 17.8, 17.10):

🔮 Next Token Prediction — "Generates what sounds right"
    Predicts text-shape, không phải lookup facts.

🌐 Knowledge — "Knows what it was trained on"
    Broad but uneven, frozen at cutoff.

🧠 Working Memory — "Attends to what's nearby"
    Fixed context window, cliff failure.

🎚️ Steerability — "Follows the loudest instruction"
    Pattern-matches instructions, not intent.

5 collision patterns phổ biến

Collision 1: Next Token Prediction × Knowledge = Hallucinated specifics

Symptom: Hỏi về niche topic → get paper title, author names, journal, DOI — none real.

Cơ chế:

Ví dụ mạch lạc:

Fix (targeted):

Collision 2: Working Memory × Steerability = Long-conversation drift

Symptom: Set careful constraints at start; 20 messages later, half are being ignored.

Cơ chế:

Ví dụ:

  • NTP: Model đang generate text fitting "what a plausible citation looks like"
  • Knowledge: Có gap (niche or post-cutoff)
  • Model không biết phân biệt what it knows vs what it's inventing
  • ✅ Source grounding: force retrieve from real document corpus → model can only cite retrieved
  • ✅ Explicit "tag fabricated": "For each citation, tag HIGH confidence (verified in your training) or LOW (inferring)"
  • ✅ Use web search + verify: Research mode Claude, Perplexity
  • ❌ Don't: prompt "don't fabricate" — model doesn't know it's fabricating
  • Working Memory: Early context fading (lost in middle as chat grows)
  • Steerability: Follows whatever instructions most salient right now
  • Early constraints → distant → less salient → not applied
User: "Cite 3 papers on X (niche topic)"
Model: 
  - [Plausible-sounding citation 1]     ← maybe real
  - [Plausible-sounding citation 2]     ← fabricated
  - [Plausible-sounding citation 3]     ← fabricated

Collision 2: Working Memory × Steerability = Long-conversation drift

Fix (targeted):

Collision 3: Next Token Prediction × Steerability = Letter-over-spirit at scale

Symptom: Instructions honored literally, intent consistently missed.

Cơ chế:

Ví dụ:

  • ✅ Re-supply critical context: restate constraints near current message
  • ✅ Start fresh với essentials up front (better than long drifted chat)
  • ✅ Use system prompt for standing constraints (doesn't dilute)
  • ✅ Use Projects/Custom Instructions — persists across chats
  • ❌ Don't: assume "early instructions still apply"
  • NTP: Pattern-match on words
  • Steerability: Follow instruction without understanding goal
  • Model picks easiest-to-match interpretation
Turn 1: "Always respond in markdown. Never use first person. 
         Max 200 words."

Turns 2-30: Normal chat, various topics.

Turn 31: Response in prose, uses "I think", 400 words.
Constraints still in context, but attention weight diluted.

Collision 3: Next Token Prediction × Steerability = Letter-over-spirit at scale

Fix (targeted):

Collision 4: Knowledge × Steerability = Confidently wrong on niche instructions

Symptom: Give domain-specific instruction → AI follows but produces wrong output because lacks domain knowledge.

Cơ chế:

Ví dụ:

  • ✅ State intent per cluster: "These 15 emails: more formal. These 5: warmer."
  • ✅ IPO framework (Bài 17.10)
  • ✅ Chain-of-thought: ask model reason about each case
  • ❌ Don't: rely on single instruction for heterogeneous task
  • Knowledge: Gap in specialized domain
  • Steerability: Follows instruction confidently anyway
  • Result: Confident wrong output (no signal of knowledge gap)
User: "Make these 20 emails more professional."
Model: Adds formal language to all 20.

But: 5 of the 20 had the opposite problem — too formal, 
    losing warmth with long-term customers.
    
Model applied one interpretation uniformly.

Collision 4: Knowledge × Steerability = Confidently wrong on niche instructions

Fix (targeted):

Collision 5: All 4 — Long multi-step task with specific domain

Symptom: Task is long (Working Memory), domain-specific (Knowledge), multi-step (Steerability drift), requires specifics (NTP hallucination). Everything fails.

Ví dụ: "Review 50-page medical contract between pharma company and CRO, flag compliance issues under 21 CFR Part 11."

Fix (layered):

  • ✅ Provide domain context: paste FrameX docs into prompt
  • ✅ Ask to verify knowledge: "Do you know FrameX? If not, flag."
  • ✅ Show example first: give 1-2 examples of correct FrameX tests
  • ❌ Don't: assume model "probably knows"
  • Working Memory: 50 pages stretching context
  • Knowledge: 21 CFR Part 11 niche + regulatory
  • Steerability: Multi-step review drifts
  • NTP: Specifics (sections, regulatory references) hallucination risk
  • ✅ Chunking (WM fix)
  • ✅ RAG with 21 CFR docs (Knowledge fix)
  • ✅ Checkpoints per chunk (Steerability fix)
  • ✅ "Cite exact section from contract" (NTP fix)
  • ✅ Human legal review of output (all fix)
User: "Write a unit test for this Python function using our 
       internal test framework 'FrameX'."
       
Model: [writes test with pytest-like syntax that LOOKS like 
       FrameX but isn't actually FrameX]
       
Model didn't flag: "I'm not familiar with FrameX."

Diagnostic framework: "Which 2 collided?"

Trước khi reach for prompt fix, ask:

┌──────────────────────────────────────────────────────────┐
│                                                          │
│  DIAGNOSTIC CHECKLIST:                                   │
│                                                          │
│  1. What property question was most relevant?            │
│     - Where do answers come from? (NTP)                  │
│     - What does it know? (Knowledge)                     │
│     - What's it attending to? (WM)                       │
│     - How much am I in control? (Steerability)           │
│                                                          │
│  2. What was the FAILURE MODE?                           │
│     - Fabricated specifics? → NTP                        │
│     - Stale / wrong facts? → Knowledge                   │
│     - Ignored earlier info? → Working Memory             │
│     - Literal interpretation miss intent? → Steerability │
│                                                          │
│  3. PAIR them up — most failures involve 2:              │
│     - Fabricated + niche domain → NTP × Knowledge        │
│     - Drifted + long session → WM × Steerability         │
│     - Wrong output + specific instruction → K × S        │
│     - Literal + varied cases → NTP × Steerability        │
│                                                          │
│  4. Apply TARGETED fix:                                  │
│     - NTP: verify specifics, use grounding               │
│     - Knowledge: add context, use RAG/search             │
│     - WM: re-supply context, chunk, start fresh          │
│     - Steerability: restate goal, checkpoints, examples  │
│                                                          │
└──────────────────────────────────────────────────────────┘

Diagnostic table: Common symptoms → properties → fixes

SymptomProperties likely at playBest fix
AI cited paper that doesn't existNTP × KnowledgeSource grounding or verify specifics
After 30 messages AI forgot rulesWM × SteerabilityRestate constraints or fresh chat
AI confidently wrong on niche termKnowledge × SteerabilityProvide domain context upfront
"Made shorter" but cut key infoNTP × SteerabilityState goal alongside instruction (Bài 17.10)
Wrong math despite clear problemNTP × SteerabilityCode execution
AI agreed with wrong premiseFingerprint (sycophancy) × SteerabilityNeutral framing + "disagree if I'm wrong"
Long doc analysis missing middleWM × NTPChunk + front-load critical
Policy answer from outdated infoKnowledge × WMProvide current policy in context
Multi-step plan goes off railsSteerability drift + WMCheckpoints + re-verify goal
AI doesn't ask clarifying questionSteerability (+ over-caution)Instruct: "Ask if ambiguous before proceeding"

Ví dụ theo ngành

💼 Sales Manager — "AI biến một deal thành 2 deals"

Situation: Sales manager dùng Claude để summarize 20 sales call transcripts cho team review.

Output: Claude cho 15 deal summaries — but 2 của chúng hợp nhất thành 1 deal mới bịa ra, và split 1 real deal thành 2 riêng biệt.

Diagnostic:

Collision: WM × NTP

Fix:

Kết quả: 0 lỗi merge/split. Process 2x faster.

🔍 Research Analyst — "Lit review bịa 3 paper"

Situation: Academic researcher nhờ Claude review 40 papers về emerging field.

Output: Excellent synthesis — but 3/40 paper names cited don't exist.

Diagnostic:

Collision: Knowledge × NTP (classic)

Fix:

Kết quả: 100% citation accuracy.

⚖️ Legal Counsel — "50-page contract review — AI missed 3 critical clauses"

Situation: Senior counsel review 50-page M&A contract via Claude.

Output: Found 8 issues — but missed 3 critical indemnification clauses buried in sections 4-6 (middle).

Diagnostic:

Collision: WM × Steerability

Fix:

Kết quả: Caught all 11 issues (including 3 missed earlier). Deal saved $1.5M risk.

💰 Finance Analyst — "Wrong IRR in memo"

Situation: Analyst asks Claude: "Calculate IRR for project with these cashflows: -100, 30, 35, 45, 50. Write investment memo with IRR + NPV at 10% discount."

Output: "IRR = 18.4%, NPV = $35.72M" (both approximate, off from actual).

Diagnostic:

Collision: NTP × Steerability (brittle arithmetic)

Fix:

Kết quả: Precision 100%. Memo credible.

🎧 Customer Success Manager — "Chat got worse over 30 messages"

Situation: CSM starts chat với Claude to draft renewal proposals for 10 clients. Long session, refining each.

Output: First 5 proposals — excellent, on-brand. Last 5 — drift: inconsistent tone, mixing brand styles, missing structure element from earlier.

Diagnostic:

Collision: WM × Steerability

Fix:

Kết quả: Consistency 10/10 proposals.

🏥 Clinical Research Coordinator — "AI gave wrong drug protocol"

Situation: CRC asks Claude about study drug dosing protocol for Phase III trial.

Output: Claude confidently states protocol. Later, checking with PI: several details wrong.

Diagnostic:

Collision: Knowledge × Steerability

Fix:

Kết quả: 0 errors. CRC can trust outputs.

  • Working Memory: 20 transcripts ~80K tokens, pushed into context length limits
  • NTP: Generating summary with normalized structure — model "filled pattern" instead of preserving unique deals
  • Chunk: 5 transcripts per pass
  • Per pass: extract structured output {deal_id, summary, status}
  • Aggregate at end
  • Verify count (20 in, 20 out)
  • Knowledge: emerging field → thin coverage
  • NTP: generating citations in academic format → fabrication pattern
  • Upload 40 PDFs into Claude Projects
  • Prompt: "Synthesize based ONLY on uploaded papers. Cite by [Paper #]. If unsure, flag [NEEDS VERIFY]"
  • After draft: verify every [Paper #] references uploaded doc
  • No citations from "training memory"
  • Working Memory: "lost in the middle" — sections 4-6 deep in attention dead zone
  • Steerability: instruction "find issues" was generic, drifted during long review
  • Chunk 50 pages into 5 × 10-page sections
  • Per chunk: specific instruction — "Find all indemnification, limitation of liability, termination clauses. Flag any deviation from market standard."
  • Aggregate at end with dedupe
  • NTP: arithmetic weakness (approximating by patterns)
  • Steerability: instruction "calculate" was literal but model lacked mechanism for precision
  • Code execution: model write Python, execute, use exact number
  • Prompt: "Compute using Python. Present exact number. Don't approximate."
  • Working Memory: early style guide + examples (Turn 1-3) now deep in history
  • Steerability: latest instruction salience trumps earlier structure
  • Minor: fingerprint verbosity leaking in as attention diffuses
  • Setup Claude Project với style guide + brand examples as standing docs
  • Per proposal: new chat trong Project (context clean, style reference loaded)
  • If long session: handoff doc every 5 proposals
  • Knowledge: Phase III trials are specific; protocol details niche
  • Steerability: Claude followed "tell me the protocol" without flagging uncertainty
  • Over-caution fingerprint absent when it should've been present (ironically)
  • RAG: upload study protocol PDF
  • Prompt: "Answer ONLY from uploaded protocol. If info not in protocol, say 'Not in uploaded documents.'"
  • Never ask protocol questions without upload
# Output from code execution:
IRR = 18.73%
NPV = $36.23M

Anti-patterns

❌ "It's just hallucination" — single-property diagnosis

Tại sao sai: Too generic → fix too generic.

Cách đúng: Name 2 properties. "NTP × Knowledge gap on niche topic" → targeted fix (RAG + grounding).

❌ "Re-prompt until it works"

Tại sao sai: Random prompting. Same failure likely repeat. Waste time.

Cách đúng: Diagnose first. Different failure types need different fixes.

❌ "Model got better, same prompt works now"

Tại sao sai: Maybe. But limitations shifted, not disappeared. Collision patterns still exist.

Cách đúng: Keep diagnostic habit. Test periodically. Boundaries move.

❌ "Single fix = solves all my problems"

Tại sao sai: Different tasks have different property profiles. One fix (e.g., "always use RAG") helps some tasks, hurts others.

Cách đúng: Per-task diagnosis. Right tool for right collision.

❌ "AI will eventually ask me if confused"

Tại sao sai: Model often picks interpretation without asking (Steerability + NTP collision).

Cách đúng: Instruct upfront: "If ambiguous, ask before proceeding."

Mẹo nâng cao

Mẹo 1: "Failure log" với property tagging

Maintain log:

Over weeks, patterns emerge. Apply preemptively.

Mẹo 2: Task complexity matrix before starting

Date | Task | Failure | Properties | Fix applied | Result

2026-03-01 | Lit review | Bịa 3 citations | NTP × Knowledge | 
  RAG with PDFs | 100% accurate

2026-03-05 | Long contract review | Missed middle clauses | 
  WM × Steerability | Chunking | Caught all

2026-03-10 | Email redraft | Letter-over-spirit | NTP × Steerability | 
  IPO framework | Better

Mẹo 2: Task complexity matrix before starting

Mẹo 3: "Stress test" bằng cách introduce property issues

To test new prompt:

For each task, rate (1-5):
  - Specificity required (NTP risk)
  - Domain niche-ness (Knowledge risk)
  - Content length (WM risk)
  - Instruction complexity (Steerability risk)

Total > 12 → plan for collision. Use RAG + chunking + checkpoints.
Total < 8 → low risk, standard prompt.

Mẹo 3: "Stress test" bằng cách introduce property issues

Mẹo 4: "Property-aware review" habit

After every AI output:

Test 1 (WM stress): Add 30 irrelevant turns before.
Test 2 (Knowledge stress): Ask about obscure niche variant.
Test 3 (Steerability stress): Give contradicting constraints.
Test 4 (NTP stress): Ask for 5 very specific facts.

See which breaks first → know your weakest property for this task.

Mẹo 4: "Property-aware review" habit

Mẹo 5: Cross-property failure cascades

Some failures cascade:

Quick 30-second review:
  ✓ Specifics (names, numbers, citations): verified?
  ✓ Knowledge currency: up-to-date?
  ✓ All input context attended: nothing missed?
  ✓ Intent (not just instruction): achieved?

If no → identify collision → fix specifically.

Mẹo 5: Cross-property failure cascades

Prevent cascades: catch failures early, one layer at a time.

Initial: Bịa specifics (NTP)
  → Include in longer doc
  → Long doc → WM limit
  → User skim, miss that specifics bịa (NTP amplified)
  → Act on wrong info

Áp dụng ngay

Bài tập 1: Failure Diagnosis (~25 phút)

Lý do: Most real-world AI failures aren't 1 property acting up. They're 2 properties meeting. Naming which 2 changes the fix entirely.

Step 1: Nhìn lại experience với AI (kể cả quan sát trong khóa học này). Identify 2-3 lần AI output thực sự làm bạn disappointed hoặc surprised.

For each, describe trong 1-2 câu: what bạn asked, what bạn got, what was disappointing/surprising.

Step 2: Walk through each event with the AI:

Step 3: Evaluate AI's diagnosis against what bạn giờ biết. Do you agree? If not, push back.

Step 4: For each diagnosis:

If possible, test the adjustment right now on a similar task.

Step 5: Nhìn lại task list từ Bài 17.0 với tất cả annotations:

For tasks gave you trouble most, name which 2 properties were colliding. Write diagnosis next to each.

Bài tập 2: Preemptive collision mapping (optional)

For your top 5 recurring AI tasks, complete:

Apply setups. Track whether collision avoided.

  • Property tags (Bài 17.1)
  • Verification scores (Bài 17.3)
  • Knowledge flags (Bài 17.5)
  • Context needs (Bài 17.7)
  • Goal statements (Bài 17.9)
TaskPrimary property riskSecondary property riskExpected collisionPreemptive setup
1.
2.
3.
4.
5.
Failure 1: ________________________________
Failure 2: ________________________________
Failure 3: ________________________________

Suy ngẫm bài học

  • Naming the property pair có change fix bạn'd reach for không? Before this course, bạn có chọn different (less effective) fix không?
  • Which property pairing bạn nghĩ you'll gặp most in day-to-day work?
  • Có failure nào từ trước mà giờ bạn'd gọi tên differently?

Tóm tắt bài học

🎯 Real-world failures usually involve 2 properties interacting, not 1. Naming both → pointed fix.

🎯 5 diagnostic pairs to recognize:

🎯 Diagnostic habit: before reaching for prompt fix, ask "which properties am I looking at?"

🎯 Targeted fix matrix:

🎯 This diagnostic move is Discernment applied. Bạn evaluate better when bạn know what kind of wrong you're looking at.

🎯 It feeds Delegation too: repeated compound failures on same task-type = signal about what to restructure, break up, or keep cho yourself.

  • NTP × Knowledge (hallucinated specifics)
  • Working Memory × Steerability (long-conversation drift)
  • NTP × Steerability (letter-over-spirit at scale)
  • Knowledge × Steerability (confidently wrong niche)
  • All 4 (long multi-step specific-domain)
  • NTP → grounding, verification
  • Knowledge → RAG, search, context
  • Working Memory → chunk, re-supply, fresh chat
  • Steerability → IPO framework, checkpoints, examples
Tài liệu tham khảo
  • Bài 17.3, 17.6, 17.8, 17.10 — 4 properties in detail
  • Anthropic — AI Fluency course — Complementary 4D Framework
  • Bài 17.12 — Putting it all together
Nội dung này có hữu ích không?