<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Hamza Khaled Mahmoud — AI Engineering Notes</title>
    <link>https://h19overflow.github.io/Portfolio/</link>
    <description>Essential practical notes on AI systems, agents, tooling, orchestration, security, and evals.</description>
    <language>en</language>
    <item>
      <title>Agent Harness Loop</title>
      <link>https://h19overflow.github.io/Portfolio/share/agent-harness-loop.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/agent-harness-loop.html</guid>
      <pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate>
      <description>An AI agent is not just a model call. It is a runtime that prepares context, calls the model, executes tools, streams output, handles provider quirks, and persists state.</description>
      <category>Agent Core</category>
    </item>
    <item>
      <title>State and Context Compression</title>
      <link>https://h19overflow.github.io/Portfolio/share/state-and-context-compression.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/state-and-context-compression.html</guid>
      <pubDate>Sun, 14 Jun 2026 00:00:00 GMT</pubDate>
      <description>Long-running agents need durable state and controlled context shrinkage. Otherwise they cannot resume, search prior work, audit tool calls, recover after crashes, or stay under...</description>
      <category>Agent Core</category>
    </item>
    <item>
      <title>CLI Runtime UX</title>
      <link>https://h19overflow.github.io/Portfolio/share/cli-runtime-ux.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/cli-runtime-ux.html</guid>
      <pubDate>Sat, 13 Jun 2026 00:00:00 GMT</pubDate>
      <description>A responsive agent CLI is an event-driven terminal application, not a linear input() / print() script. Hermes uses prompt_toolkit for input/control and Rich for output...</description>
      <category>Agent Core</category>
    </item>
    <item>
      <title>Separate Memory Layers</title>
      <link>https://h19overflow.github.io/Portfolio/share/separate-memory-layers.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/separate-memory-layers.html</guid>
      <pubDate>Fri, 12 Jun 2026 00:00:00 GMT</pubDate>
      <description>Memory is not one thing. A production agent needs at least two memory layers: 1. Curated durable memory — compact stable facts that should shape future behavior.</description>
      <category>Memory</category>
    </item>
    <item>
      <title>Cache Boundaries and Frozen Snapshots</title>
      <link>https://h19overflow.github.io/Portfolio/share/cache-boundaries-and-frozen-snapshots.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/cache-boundaries-and-frozen-snapshots.html</guid>
      <pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate>
      <description>A memory write and a prompt mutation are not the same operation. Hermes lets the agent write durable memory immediately, but it does not let that write silently rewrite the active...</description>
      <category>Memory</category>
    </item>
    <item>
      <title>Fenced Recall and Background Sync</title>
      <link>https://h19overflow.github.io/Portfolio/share/fenced-recall-and-background-sync.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/fenced-recall-and-background-sync.html</guid>
      <pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate>
      <description>Retrieved memory is not the same as the user’s message, and it is not the same as durable conversation history. Hermes injects recalled memory into the API-facing copy of the...</description>
      <category>Memory</category>
    </item>
    <item>
      <title>Tools as Contracts</title>
      <link>https://h19overflow.github.io/Portfolio/share/tools-as-contracts.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/tools-as-contracts.html</guid>
      <pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
      <description>A tool is not just a function. A tool is a contract shown to the model. That contract includes: Every visible tool consumes prompt space and model attention. If too many tools...</description>
      <category>Tooling</category>
    </item>
    <item>
      <title>Tool Registry, Discovery, and Dispatch</title>
      <link>https://h19overflow.github.io/Portfolio/share/tool-registry-discovery-and-dispatch.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/tool-registry-discovery-and-dispatch.html</guid>
      <pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate>
      <description>A scalable agent tool system needs decoupled registration and centralized execution. Hermes solves this with: 1. self-registering tool modules; 2. AST pre-scan before import;</description>
      <category>Tooling</category>
    </item>
    <item>
      <title>Progressive Disclosure and Tool Search</title>
      <link>https://h19overflow.github.io/Portfolio/share/progressive-disclosure-and-tool-search.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/progressive-disclosure-and-tool-search.html</guid>
      <pubDate>Sun, 07 Jun 2026 00:00:00 GMT</pubDate>
      <description>Every tool schema sent to the model costs tokens on every turn. A scalable tool ecosystem cannot expose every optional MCP/plugin tool directly all the time.</description>
      <category>Tooling</category>
    </item>
    <item>
      <title>Skills as Procedures</title>
      <link>https://h19overflow.github.io/Portfolio/share/skills-as-procedures.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/skills-as-procedures.html</guid>
      <pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate>
      <description>Many things agents need are not new actions. They are better ways to use existing actions. A terminal is a tool. Incident response is a skill. &gt; [!summary]</description>
      <category>Skills</category>
    </item>
    <item>
      <title>Skill Mounting and Supply Chain</title>
      <link>https://h19overflow.github.io/Portfolio/share/skill-mounting-and-supply-chain.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/skill-mounting-and-supply-chain.html</guid>
      <pubDate>Fri, 05 Jun 2026 00:00:00 GMT</pubDate>
      <description>Skills are prompt-affecting dependencies. Once skills can be installed, synced, viewed, invoked, or loaded from external directories, they become part of the agent supply chain.</description>
      <category>Skills</category>
    </item>
    <item>
      <title>Orchestration Boundaries</title>
      <link>https://h19overflow.github.io/Portfolio/share/orchestration-boundaries.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/orchestration-boundaries.html</guid>
      <pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate>
      <description>Subagents are not magic. They are concurrent workers with context, tools, credentials, time limits, logs, cleanup needs, and failure modes. If agents can recursively spawn agents...</description>
      <category>Orchestration</category>
    </item>
    <item>
      <title>Threading in Delegation</title>
      <link>https://h19overflow.github.io/Portfolio/share/threading-in-delegation.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/threading-in-delegation.html</guid>
      <pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate>
      <description>Hermes uses threading because the core agent loop is synchronous, while delegated subagent work is often independent and I/O-bound. Threading lets Hermes fan out multiple subagents...</description>
      <category>Orchestration</category>
    </item>
    <item>
      <title>Trust Boundaries Across Agent Systems</title>
      <link>https://h19overflow.github.io/Portfolio/share/trust-boundaries-across-agent-systems.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/trust-boundaries-across-agent-systems.html</guid>
      <pubDate>Tue, 02 Jun 2026 00:00:00 GMT</pubDate>
      <description>AI agents turn text into action. That means any text that can influence future model behavior, tool use, or subagent behavior is part of the security boundary.</description>
      <category>Security</category>
    </item>
    <item>
      <title>Authentication and Credential Pools</title>
      <link>https://h19overflow.github.io/Portfolio/share/authentication-and-credential-pools.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/authentication-and-credential-pools.html</guid>
      <pubDate>Mon, 01 Jun 2026 00:00:00 GMT</pubDate>
      <description>Agent auth is runtime infrastructure, not just environment variables. Long-running agents need credential status tracking, rotation, exhaustion handling, OAuth refresh recovery, and...</description>
      <category>Security</category>
    </item>
    <item>
      <title>Advanced Evals for LLM Agents: Drift, Tool Use, and Task Fulfillment</title>
      <link>https://h19overflow.github.io/Portfolio/share/advanced-evals-for-llm-agents-drift-tool-use-and-task-fulfillment.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/advanced-evals-for-llm-agents-drift-tool-use-and-task-fulfillment.html</guid>
      <pubDate>Sun, 31 May 2026 00:00:00 GMT</pubDate>
      <description>_Last updated: 2026-06-15_ This guide synthesizes the user's timestamped notes from Phil Hetzel's Braintrust talk, plus current public guidance from Braintrust, OpenAI Evals/agent...</description>
      <category>Evaluation</category>
    </item>
    <item>
      <title>Advanced Agent Evals Overview</title>
      <link>https://h19overflow.github.io/Portfolio/share/advanced-agent-evals-overview.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/advanced-agent-evals-overview.html</guid>
      <pubDate>Sat, 30 May 2026 00:00:00 GMT</pubDate>
      <description>Advanced evals turn agent behavior into measurable, replayable, and improvable evidence. This folder is the learning breakdown: Agents are hard to evaluate because they are...</description>
      <category>Evaluation</category>
    </item>
    <item>
      <title>Eval Philosophy and Production Flywheel</title>
      <link>https://h19overflow.github.io/Portfolio/share/eval-philosophy-and-production-flywheel.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/eval-philosophy-and-production-flywheel.html</guid>
      <pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate>
      <description>LLM and agent evals are decision-support systems, not perfect truth machines. They help decide whether a model, prompt, tool, or workflow change is safe to ship.</description>
      <category>Evaluation</category>
    </item>
    <item>
      <title>Score Vectors and Hard Gates</title>
      <link>https://h19overflow.github.io/Portfolio/share/score-vectors-and-hard-gates.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/score-vectors-and-hard-gates.html</guid>
      <pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate>
      <description>Do not compress agent quality into one number. Use a score vector plus hard gates. A single average hides dangerous failures. Example: a candidate improves overall...</description>
      <category>Evaluation</category>
    </item>
    <item>
      <title>Model Behavior Evals</title>
      <link>https://h19overflow.github.io/Portfolio/share/model-behavior-evals.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/model-behavior-evals.html</guid>
      <pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate>
      <description>Model behavior evals check whether the model output follows the behavioral contract: valid format, correct content, grounded claims, safe response, and appropriate style.</description>
      <category>Evaluation</category>
    </item>
    <item>
      <title>Tool and Trajectory Evals</title>
      <link>https://h19overflow.github.io/Portfolio/share/tool-and-trajectory-evals.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/tool-and-trajectory-evals.html</guid>
      <pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate>
      <description>For agents, the path matters. A correct-looking final answer is not enough if the agent used the wrong tool, passed bad arguments, ignored tool output, or mutated state incorrectly.</description>
      <category>Evaluation</category>
    </item>
    <item>
      <title>Task Fulfillment and State Evals</title>
      <link>https://h19overflow.github.io/Portfolio/share/task-fulfillment-and-state-evals.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/task-fulfillment-and-state-evals.html</guid>
      <pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate>
      <description>A task is successful only when the user's real goal is satisfied and the external world is in the correct state. Agents can produce polished final messages while failing the real task:</description>
      <category>Evaluation</category>
    </item>
    <item>
      <title>Drift, Regression, and Enhancement</title>
      <link>https://h19overflow.github.io/Portfolio/share/drift-regression-and-enhancement.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/drift-regression-and-enhancement.html</guid>
      <pubDate>Sun, 24 May 2026 00:00:00 GMT</pubDate>
      <description>A change is not an enhancement unless it improves target behavior without introducing hidden regressions in safety, tools, state, cost, latency, or important slices.</description>
      <category>Evaluation</category>
    </item>
    <item>
      <title>Datasets, Scorers, and Judges</title>
      <link>https://h19overflow.github.io/Portfolio/share/datasets-scorers-and-judges.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/datasets-scorers-and-judges.html</guid>
      <pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate>
      <description>The quality of an eval is limited by the quality of its dataset and scorer. A good dataset captures real risks; a good scorer checks the actual contract.</description>
      <category>Evaluation</category>
    </item>
    <item>
      <title>Traces, CI, and Production Monitoring</title>
      <link>https://h19overflow.github.io/Portfolio/share/traces-ci-and-production-monitoring.html</link>
      <guid isPermaLink="true">https://h19overflow.github.io/Portfolio/share/traces-ci-and-production-monitoring.html</guid>
      <pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate>
      <description>If you cannot replay or inspect a run, you cannot reliably debug it or turn it into a regression test. Without traces and CI integration: Capture enough trace data to reconstruct...</description>
      <category>Evaluation</category>
    </item>
  </channel>
</rss>
