Tag: AI internals

  • How Claude Decides What Tool to Call
  • How Prompt Caching Cuts AI Costs by 90%

    How Prompt Caching Cuts AI Costs by 90%

    The 90% Discount Most API Users Never Claim Anthropic’s cache cuts API costs by 90% — yet most developers sending requests to Claude, GPT, or Gemini have never configured it. Prompt caching, which Anthropic launched in July 2024, reduces input token costs from $3 per million to $0.30 per million for cached portions on Claude…

  • How MCP Servers Exchange Tools With Claude

    How MCP Servers Exchange Tools With Claude

    Claude doesn’t actually call your tools directly. When you type a message in Claude Desktop and it responds by reading a file, querying a database, or hitting an API, there’s an entire protocol running behind the scenes that brokers every single tool call. That protocol is the Model Context Protocol (MCP), and understanding how it…

  • How Claude Code Hooks Are Triggered

    How Claude Code Hooks Are Triggered

    Claude doesn’t actually use traditional code hooks — and that distinction changes everything about how you should design applications around it. Key Facts Most People Don’t Know Claude’s API processes requests through a 3-tier prompt classification system that categorizes inputs in under 47 milliseconds before routing to specialized model variants Anthropic’s Constitutional AI framework uses…

  • How LLM Token Sampling Is Actually Built

    How LLM Token Sampling Is Actually Built

    Your AI doesn’t pick the best word. Every time ChatGPT, Claude, or Gemini generates a response, it’s rolling a weighted die over tens of thousands of possible tokens — and the math behind that roll determines everything from creative flair to catastrophic hallucination. Understanding how token sampling works isn’t just academic curiosity; it’s the single…

📺 YouTube📘 Facebook