<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://abhimanyuaryan.github.io/feed.xml" rel="self" type="application/atom+xml"/><link href="https://abhimanyuaryan.github.io/" rel="alternate" type="text/html" hreflang="en"/><updated>2026-04-07T06:46:21+00:00</updated><id>https://abhimanyuaryan.github.io/feed.xml</id><title type="html">blank</title><subtitle>Senior FullStack Engineer &amp; Educator. Masters in Informatics Engineering. Building multi-agent systems, web frameworks, and AR/VR experiences. </subtitle><entry><title type="html">How to Build the Harness</title><link href="https://abhimanyuaryan.github.io/blog/2026/how-to-build-the-harness/" rel="alternate" type="text/html" title="How to Build the Harness"/><published>2026-04-06T00:00:00+00:00</published><updated>2026-04-06T00:00:00+00:00</updated><id>https://abhimanyuaryan.github.io/blog/2026/how-to-build-the-harness</id><content type="html" xml:base="https://abhimanyuaryan.github.io/blog/2026/how-to-build-the-harness/"><![CDATA[<p>COMING SOON</p>]]></content><author><name></name></author><category term="agents"/><category term="LLM"/><category term="AI"/><category term="testing"/><category term="evaluation"/><category term="harness"/><summary type="html"><![CDATA[A practical guide to building a robust evaluation harness that captures, replays, and measures agent behavior — the testing infrastructure your agent system actually needs.]]></summary></entry><entry><title type="html">How to Redesign the Dyad Agent</title><link href="https://abhimanyuaryan.github.io/blog/2026/how-to-redesign-the-dyad-agent/" rel="alternate" type="text/html" title="How to Redesign the Dyad Agent"/><published>2026-04-06T00:00:00+00:00</published><updated>2026-04-06T00:00:00+00:00</updated><id>https://abhimanyuaryan.github.io/blog/2026/how-to-redesign-the-dyad-agent</id><content type="html" xml:base="https://abhimanyuaryan.github.io/blog/2026/how-to-redesign-the-dyad-agent/"><![CDATA[<p>I’ve been following Dyad’s work closely, particularly through people I deeply respect like Dr. Chris Rackauckas. I had the opportunity to meet them at JuliaCon Paris, where I helped organize the event.</p> <p>After watching their recent developments (<a href="https://juliahub.com/blog/agentic-ai-dyad">Agentic AI with Dyad</a>) and having conversations with Avik Sengupta (VP of Engineering at JuliaHub), I’m convinced that Dyad represents the future of AI modeling.</p> <h2 id="why-ai-modeling-is-the-future">Why AI Modeling is the Future</h2> <p>Everything around us starts with modeling:</p> <ul> <li><strong>Planes</strong> - Aerodynamic simulations and stress testing</li> <li><strong>Cars</strong> - Crash simulations and performance optimization</li> <li><strong>Buildings</strong> - Structural analysis and environmental modeling</li> </ul> <p>Even the <a href="https://youtu.be/shazte3R52k?t=93">Hindu temple built in Paris</a> began with 3D modeling before becoming a physical structure.</p> <p>AI modeling is poised to become a multi-trillion dollar industry, and Dyad is positioned exactly where the future of agentic AI is heading.</p> <h3 id="how-are-agents-improving-over-time">How are agents improving over time</h3> <p>Agents are suppose to be more and more autonomous on long term tasks. But imagine if you have to monitor every step of the agent and intervene when it fails. Then you are the bottleneck and the agent is not really autonomous.</p> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/dyad-agent/dyad_agent_now-480.webp 480w,/assets/img/posts/dyad-agent/dyad_agent_now-800.webp 800w,/assets/img/posts/dyad-agent/dyad_agent_now-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/dyad-agent/dyad_agent_now.png" class="img-fluid rounded z-depth-1 mx-auto d-block" width="100%" height="auto" title="Dyad Agent Current State" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> <h3 id="why-juliahub-has-what-nobody-has-openaianthropiccursor-etc">Why JuliaHub has what nobody has (openai/anthropic/cursor etc.)</h3> <p>JuliaHub has access to the Julia compiler and a team comprising some of the smartest people in the world—researchers and engineers from top universities like MIT, IIT, and others—who have spent years building a programming language that runs like C yet remains as easy to use as Python.</p> <p>How this translates to Dyad:</p> <ul> <li><strong>Compiler</strong> - The ability to understand and optimize code at a deep level</li> <li><strong>Type System</strong> - The ability to reason about data structures and types</li> <li><strong>MCP/Plugin/Skill Store</strong> - Julia has rich user base and ecosystem in geospatial, scientific computing, bioinformatics, data science, etc. Now imagine if you could leverage this ecosystem through Dyad by building a skill store. A truly unique value proposition. Something that openai/anthropic/cursor etc. can’t do or atleast trying to replicate. Recently they have started hiring scientists and engineers from these fields: <a href="https://www.anthropic.com/research/introducing-anthropic-science">Anthropic Science</a></li> </ul> <p>How can all of this be leveraged to build a truly unique agent/harness?</p> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/dyad-agent/dyad_agent_after-480.webp 480w,/assets/img/posts/dyad-agent/dyad_agent_after-800.webp 800w,/assets/img/posts/dyad-agent/dyad_agent_after-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/dyad-agent/dyad_agent_after.png" class="img-fluid rounded z-depth-1 mx-auto d-block" width="100%" height="auto" title="Dyad Agent Future State" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> <h3 id="how-to-redesign-the-harness">How to redesign the harness</h3> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/dyad-agent/harness-480.webp 480w,/assets/img/posts/dyad-agent/harness-800.webp 800w,/assets/img/posts/dyad-agent/harness-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/dyad-agent/harness.png" class="img-fluid rounded z-depth-1 mx-auto d-block" width="100%" height="auto" title="Harness Architecture Diagram" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> <table> <thead> <tr> <th>Component</th> <th>Implementation</th> </tr> </thead> <tbody> <tr> <td><strong>Harness</strong></td> <td>Dyad (the evaluation and execution framework)</td> </tr> <tr> <td><strong>Model</strong></td> <td>Claude Opus 4.6/4.5 (frontier reasoning)</td> </tr> <tr> <td><strong>Context</strong></td> <td>Julia Compiler + Type System + MCP/Plugin/Skill Store</td> </tr> </tbody> </table> <h3 id="the-meta-harness-pattern">The Meta-Harness Pattern</h3> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/dyad-agent/harness_loop-480.webp 480w,/assets/img/posts/dyad-agent/harness_loop-800.webp 800w,/assets/img/posts/dyad-agent/harness_loop-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/dyad-agent/harness_loop.png" class="img-fluid rounded z-depth-1 mx-auto d-block" width="100%" height="auto" title="Harness Loop Diagram" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> <p>From the paper: <a href="https://yoonholee.com/meta-harness/">Meta-Harness</a></p> <p>The key insight is to continuously improve the harness itself through a feedback loop:</p> <ol> <li><strong>Run</strong> the harness on diverse tasks</li> <li><strong>Evaluate</strong> outcomes manually (success/failure patterns)</li> <li><strong>Log</strong> what worked and what didn’t</li> <li><strong>Use</strong> a coding agent to analyze the log and improve the harness code</li> </ol> <p>This creates a self-improving system, similar to how OpenCLAW maintains a <code class="language-plaintext highlighter-rouge">SOUL.md</code> that evolves over time with lessons learned.</p> <h3 id="memory-architecture">Memory Architecture</h3> <p>Context sits outside the harness, but the agent has persistent memory. Learning can be captured in the harness itself through a layered memory system.</p> <p><strong>Claude Code’s Memory Pattern:</strong></p> <p>Memory is central to any agent. It has three layers:</p> <ul> <li><strong>User Memory</strong> - Persists across future runs (user role, feedback, preferences)</li> <li><strong>Session Memory</strong> - Captures everything in the current session (state, task specs, work log)</li> <li><strong>Sync Memory</strong> - Team memory and patterns (shared knowledge, org-level patterns)</li> </ul> <p><strong>Storage Structure:</strong></p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~/.agent/
  user-memory/.../*.md
  session-memory/.../*.md  
  sync-memory/.../*.md
</code></pre></div></div> <p>This mirrors how learning happens at user, organization, and team levels.</p> <h3 id="memory-improvement-patterns">Memory Improvement Patterns</h3> <p>Memory can be improved in two ways:</p> <ol> <li><strong>Continuous Learning</strong> - After each user-agent feedback loop</li> <li><strong>Dreaming Pattern</strong> - OpenCLAW’s approach of offline memory consolidation and improvement</li> </ol> <p><a href="https://docs.openclaw.ai/concepts/dreaming">OpenCLAW Dreaming Documentation</a></p> <h3 id="what-future-holds-for-dyad">What future holds for Dyad</h3> <p>Limitations of text-based models in building complex 3D structures. Coming soon…</p> <h3 id="food-for-thought">Food for thought</h3> <blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">World models and partnership with Yann LeCun for AMI Paris</p>&mdash; Abhimanyu Aryan (@theabhimanyu) <a href="https://twitter.com/theabhimanyu/status/2033303935612559555?ref_src=twsrc%5Etfw">March 31, 2026</a></blockquote> <script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>]]></content><author><name></name></author><summary type="html"><![CDATA[I’ve been following Dyad’s work closely, particularly through people I deeply respect like Dr. Chris Rackauckas. I had the opportunity to meet them at JuliaCon Paris, where I helped organize the event.]]></summary></entry><entry><title type="html">How to Build the Control Room for Your Agent</title><link href="https://abhimanyuaryan.github.io/blog/2026/how-to-build-the-control-room-for-your-agent/" rel="alternate" type="text/html" title="How to Build the Control Room for Your Agent"/><published>2026-04-03T00:00:00+00:00</published><updated>2026-04-03T00:00:00+00:00</updated><id>https://abhimanyuaryan.github.io/blog/2026/how-to-build-the-control-room-for-your-agent</id><content type="html" xml:base="https://abhimanyuaryan.github.io/blog/2026/how-to-build-the-control-room-for-your-agent/"><![CDATA[<p>Most agent systems fail for a simple reason: they have a model, but no control room.</p> <p>The control room is the part that receives events, routes them, keeps sessions isolated, and makes sure the agent only acts when it should.</p> <h2 id="what-the-gateway-should-do">What the gateway should do</h2> <p>The gateway is the control plane for the whole agent system. It should:</p> <ul> <li>receive every external input</li> <li>normalize events into a common format</li> <li>assign each event to the right session</li> <li>queue runs so only one turn happens per session at a time</li> <li>expose the same state to every client</li> </ul> <p>If the runtime is the worker, the gateway is the traffic controller.</p> <h2 id="design-principles">Design principles</h2> <h3 id="1-make-it-the-source-of-truth">1. Make it the source of truth</h3> <p>Do not let clients read local session files directly. The gateway should own session state and expose it through an API.</p> <h3 id="2-treat-sessions-as-first-class">2. Treat sessions as first-class</h3> <p>Separate DMs, group chats, threads, and device-specific contexts. A good gateway prevents context leakage by default.</p> <h3 id="3-use-a-typed-protocol">3. Use a typed protocol</h3> <p>Use a small message model with clear request, response, and event frames. Keep connect/auth/version checks at the boundary.</p> <h3 id="4-serialize-work-per-session">4. Serialize work per session</h3> <p>Use a lane-aware FIFO queue so one session cannot execute two turns at once. Parallelism is fine across sessions.</p> <h3 id="5-persist-everything-durably">5. Persist everything durably</h3> <p>Write session transcripts and metadata to disk so the system survives restarts without losing continuity.</p> <h2 id="a-simple-gateway-shape">A simple gateway shape</h2> <p>Think of the gateway as four layers:</p> <ol> <li><strong>Inputs</strong>: messages, heartbeats, cronjobs, hooks, webhooks</li> <li><strong>Router</strong>: maps each event to a session and a queue lane</li> <li><strong>State</strong>: stores transcripts, session IDs, and metadata</li> <li><strong>Clients</strong>: web UI, CLI, desktop app, mobile nodes</li> </ol> <p>That is enough to make an agent feel consistent, always-on, and responsive.</p> <h2 id="practical-defaults">Practical defaults</h2> <ul> <li>use one primary DM session and separate group sessions</li> <li>keep secure DM mode on when multiple people can contact the agent</li> <li>require a token for remote gateway access</li> <li>coalesce noisy inputs with a collect mode</li> <li>keep the runtime stateless and let the gateway own continuity</li> </ul> <h2 id="final-thought">Final thought</h2> <p>A strong agent does not come from a bigger prompt alone. It comes from a gateway that makes the agent legible, durable, and safe to operate.</p> <p>Build the control room first.</p>]]></content><author><name></name></author><category term="AI"/><category term="agents"/><category term="LLM"/><category term="AI"/><category term="architecture"/><summary type="html"><![CDATA[A practical design guide for building an agent gateway as the control plane for inputs, sessions, routing, and client surfaces.]]></summary></entry><entry><title type="html">Agent Architectures: From Single Agent to Hybrid MAS</title><link href="https://abhimanyuaryan.github.io/blog/2026/agent_architectures/" rel="alternate" type="text/html" title="Agent Architectures: From Single Agent to Hybrid MAS"/><published>2026-03-20T00:00:00+00:00</published><updated>2026-03-20T00:00:00+00:00</updated><id>https://abhimanyuaryan.github.io/blog/2026/agent_architectures</id><content type="html" xml:base="https://abhimanyuaryan.github.io/blog/2026/agent_architectures/"><![CDATA[<p>This post is rendered directly from a Jupyter notebook. It covers five distinct agent architectural paradigms for benchmarking planning, from a baseline single-agent system to various multi-agent systems (MAS).</p> <div class="jupyter-notebook" style="position: relative; width: 100%; margin: 0 auto;"> <div class="jupyter-notebook-iframe-container"> <iframe src="/assets/jupyter/agent-architectures.ipynb.html" style="position: absolute; top: 0; left: 0; border-style: none;" width="100%" height="100%" onload="this.parentElement.style.paddingBottom = (this.contentWindow.document.documentElement.scrollHeight + 10) + 'px'"></iframe> </div> </div>]]></content><author><name></name></author><category term="AI"/><category term="agents"/><category term="LLM"/><category term="AI"/><category term="architecture"/><category term="multi-agent-systems"/><summary type="html"><![CDATA[A formal treatment of five agent architectural paradigms — single agent, independent, centralized, decentralized, and hybrid multi-agent systems — with mathematical foundations and implementation walkthroughs.]]></summary></entry><entry><title type="html">Building a Single Agent System: From Formal Foundations to Working Code</title><link href="https://abhimanyuaryan.github.io/blog/2026/building-a-single-agent-system/" rel="alternate" type="text/html" title="Building a Single Agent System: From Formal Foundations to Working Code"/><published>2026-03-09T00:00:00+00:00</published><updated>2026-03-09T00:00:00+00:00</updated><id>https://abhimanyuaryan.github.io/blog/2026/building-a-single-agent-system</id><content type="html" xml:base="https://abhimanyuaryan.github.io/blog/2026/building-a-single-agent-system/"><![CDATA[<p>Large language models are increasingly used not just as chatbots, but as the reasoning core of <em>autonomous agents</em> — systems that observe an environment, decide on actions, and execute them in a loop until a goal is reached.</p> <p>In this post, I walk through the formal foundations of agent systems and then build a concrete single-agent implementation: an LLM-powered agent that navigates and completes missions in a GTA V game environment.</p> <h2 id="formalizing-agent-systems">Formalizing Agent Systems</h2> <p>Building on multi-agent formulations <a class="citation" href="#zhou2024webarena">(Zhou et al., 2024)</a> <a class="citation" href="#guo2024llmmultiagents">(Guo et al., 2024)</a>, an agent system is denoted by:</p> \[S = \{A, E, C, \Omega\}\] <p>where $A = {a_1, \ldots, a_n}$ (with $n \ge 1$) is a set of agents, $E$ is a shared environment, $C$ is a communication topology, and $\Omega$ is an orchestration policy.</p> <p>Each agent $a_i$ is represented by the tuple:</p> \[S_i = (\phi_i, A_i, M_i, \pi_i)\] <p>where:</p> <ul> <li>$\phi_i$ is the <strong>reasoning policy</strong> (typically an LLM)</li> <li>$A_i = { \mathrm{ToolCall}(t, \theta) \mid t \in T,\ \theta \in \Theta_t }$ is the <strong>action space</strong> — the set of tool calls the agent can make, where $T$ is the set of available tools (e.g., navigation, shooting, vehicle control) and $\Theta_t$ are valid parameter configurations for tool $t$</li> <li>$M_i$ is the <strong>internal memory</strong></li> <li>$\pi_i: H \to A_i$ is the <strong>decision function</strong>, mapping observation histories to actions</li> </ul> <p>The observation history space $H$ contains sequences of action-observation pairs. The decision function $\pi_i$ is instantiated by the reasoning policy $\phi_i$: given a history $h_{i,t}$, the LLM generates a reasoning trace and selects the next action.</p> <p>For example, a history:</p> \[h_{i,t} = \Big[\big(\texttt{navigate\_to}(\text{waypoint}=\text{"Vinewood Hills"}),\ \text{"Arrived at Vinewood Hills"}\big), \ldots\Big]\] <p>is processed by $\phi_i$ to produce the next tool call $\alpha_{i,t+1}$.</p> <h3 id="the-agent-loop">The Agent Loop</h3> <p>At timestep $t$, agent $a_i$ selects an action $\alpha_{i,t} \in A_i$ according to:</p> \[\alpha_{i,t} = \pi_i(h_{i,t}), \quad o_{i,t} = E(\alpha_{i,t}), \quad h_{i,t+1} = f_i(h_{i,t}, \alpha_{i,t}, o_{i,t})\] <p>where $E$ denotes the environment and $h_{i,0} = {s_0}$ contains the initial task specification. The history update function $f_i: H \times A_i \times O \to H$ appends the new action-observation pair to the agent’s history:</p> \[h_{i,t+1} = f_i(h_{i,t}, \alpha_{i,t}, o_{i,t}) = h_{i,t} \oplus (\alpha_{i,t}, o_{i,t})\] <p>subject to context window truncation when $|h_{i,t+1}| &gt; \text{MAX_TOKENS}$.</p> <p>This update mechanism applies uniformly to both single-agent (SAS) and multi-agent (MAS) configurations. In MAS, communication between agents happens through explicit message passing in the orchestration layer.</p> <hr/> <h2 id="from-theory-to-code">From Theory to Code</h2> <p>For a single-agent system ($n = 1$), the formalism simplifies: there is no communication topology $C$ and the orchestration policy $\Omega$ reduces to a simple loop. What remains is the core agent loop.</p> <p>Let’s build this concretely. We’ll create an agent that operates in a GTA V game environment — receiving observations about the game state (player position, nearby vehicles, NPCs, mission objectives) and issuing actions (move, drive, interact) to complete missions.</p> <h3 id="the-base-agent">The Base Agent</h3> <p>The base class captures the structure from our formal definition. It holds the reasoning policy (via a Copilot SDK client), maintains conversation history ($M_i$), and defines the <code class="language-plaintext highlighter-rouge">act</code> interface ($\pi_i$):</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">abc</span> <span class="kn">import</span> <span class="n">ABC</span><span class="p">,</span> <span class="n">abstractmethod</span>
<span class="kn">from</span> <span class="n">copilot</span> <span class="kn">import</span> <span class="n">CopilotClient</span>
<span class="kn">import</span> <span class="n">re</span>


<span class="k">class</span> <span class="nc">GTABaseAgent</span><span class="p">(</span><span class="n">ABC</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">
    Abstract base class for LLM-powered GTA V agents.
    Maps directly to the formal agent tuple S_i = (φ_i, A_i, M_i, π_i).
    </span><span class="sh">"""</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">model_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">client</span><span class="p">:</span> <span class="n">CopilotClient</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">model_name</span> <span class="o">=</span> <span class="n">model_name</span>       <span class="c1"># φ_i: reasoning policy
</span>        <span class="n">self</span><span class="p">.</span><span class="n">client</span> <span class="o">=</span> <span class="n">client</span>               <span class="c1"># SDK client for φ_i
</span>        <span class="n">self</span><span class="p">.</span><span class="n">conversation</span> <span class="o">=</span> <span class="p">[]</span>             <span class="c1"># M_i: internal memory
</span>        <span class="n">self</span><span class="p">.</span><span class="n">step_count</span> <span class="o">=</span> <span class="mi">0</span>
        <span class="n">self</span><span class="p">.</span><span class="n">mission_id</span> <span class="o">=</span> <span class="bp">None</span>
        <span class="n">self</span><span class="p">.</span><span class="n">objective</span> <span class="o">=</span> <span class="bp">None</span>

    <span class="k">def</span> <span class="nf">reset</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">mission_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">objective</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Reset agent state for a new mission episode.</span><span class="sh">"""</span>
        <span class="n">self</span><span class="p">.</span><span class="n">conversation</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="n">self</span><span class="p">.</span><span class="n">step_count</span> <span class="o">=</span> <span class="mi">0</span>
        <span class="n">self</span><span class="p">.</span><span class="n">mission_id</span> <span class="o">=</span> <span class="n">mission_id</span>
        <span class="n">self</span><span class="p">.</span><span class="n">objective</span> <span class="o">=</span> <span class="n">objective</span>

    <span class="nd">@abstractmethod</span>
    <span class="k">async</span> <span class="k">def</span> <span class="nf">act</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">observation_text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
        <span class="sh">"""</span><span class="s">
        π_i: H -&gt; A_i
        Receives an observation string, returns an action string.
        </span><span class="sh">"""</span>
        <span class="k">pass</span>

    <span class="k">def</span> <span class="nf">_lookup_location</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">query</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
        <span class="sh">"""</span><span class="s">Look up known locations and waypoints in the game world.</span><span class="sh">"""</span>
        <span class="n">match</span> <span class="o">=</span> <span class="n">re</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="sa">r</span><span class="sh">"</span><span class="s">lookup:\s*(.+)</span><span class="sh">"</span><span class="p">,</span> <span class="n">query</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">match</span><span class="p">:</span>
            <span class="n">location</span> <span class="o">=</span> <span class="k">match</span><span class="p">.</span><span class="nf">group</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="nf">strip</span><span class="p">().</span><span class="nf">lower</span><span class="p">()</span>
            <span class="k">return</span> <span class="nf">lookup_game_location</span><span class="p">(</span><span class="n">location</span><span class="p">)</span>
        <span class="k">return</span> <span class="sh">"</span><span class="s">Location not found.</span><span class="sh">"</span>

    <span class="k">def</span> <span class="nf">log</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">msg</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">[</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">__class__</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s">] </span><span class="si">{</span><span class="n">msg</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div> <p>Notice how the class mirrors our formal tuple:</p> <ul> <li><code class="language-plaintext highlighter-rouge">self.client</code> + <code class="language-plaintext highlighter-rouge">self.model_name</code> → $\phi_i$ (reasoning policy)</li> <li>Actions are defined by the system prompt → $A_i$ (action space)</li> <li><code class="language-plaintext highlighter-rouge">self.conversation</code> → $M_i$ (internal memory)</li> <li><code class="language-plaintext highlighter-rouge">act()</code> → $\pi_i$ (decision function)</li> </ul> <h3 id="the-llm-wrapper">The LLM Wrapper</h3> <p>The reasoning policy $\phi_i$ is implemented via the GitHub Copilot SDK. The key function handles retries, rate limiting, and prompt construction:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">asyncio</span>
<span class="kn">from</span> <span class="n">copilot</span> <span class="kn">import</span> <span class="n">CopilotClient</span><span class="p">,</span> <span class="n">SessionConfig</span><span class="p">,</span> <span class="n">MessageOptions</span>

<span class="n">MAX_RETRIES</span> <span class="o">=</span> <span class="mi">5</span>
<span class="n">INITIAL_RETRY_DELAY</span> <span class="o">=</span> <span class="mi">5</span>
<span class="n">INTER_REQUEST_DELAY</span> <span class="o">=</span> <span class="mf">2.0</span>
<span class="n">DEFAULT_MODEL</span> <span class="o">=</span> <span class="sh">"</span><span class="s">gpt-5-mini</span><span class="sh">"</span>


<span class="k">async</span> <span class="k">def</span> <span class="nf">get_copilot_client</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="n">CopilotClient</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Create and start a CopilotClient.</span><span class="sh">"""</span>
    <span class="n">client</span> <span class="o">=</span> <span class="nc">CopilotClient</span><span class="p">()</span>
    <span class="k">await</span> <span class="n">client</span><span class="p">.</span><span class="nf">start</span><span class="p">()</span>
    <span class="k">return</span> <span class="n">client</span>


<span class="k">async</span> <span class="k">def</span> <span class="nf">call_copilot_with_retry</span><span class="p">(</span>
    <span class="n">client</span><span class="p">:</span> <span class="n">CopilotClient</span><span class="p">,</span>
    <span class="n">model_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">messages</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">dict</span><span class="p">],</span>
    <span class="n">system_prompt</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">temperature</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">
    Calls φ_i (the LLM) via the Copilot SDK with rate limiting and retries.
    </span><span class="sh">"""</span>
    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(</span><span class="n">INTER_REQUEST_DELAY</span><span class="p">)</span>

    <span class="n">full_prompt</span> <span class="o">=</span> <span class="nf">_build_prompt</span><span class="p">(</span><span class="n">system_prompt</span><span class="p">,</span> <span class="n">messages</span><span class="p">)</span>

    <span class="n">retries</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="n">delay</span> <span class="o">=</span> <span class="n">INITIAL_RETRY_DELAY</span>
    <span class="n">last_error</span> <span class="o">=</span> <span class="bp">None</span>

    <span class="k">while</span> <span class="n">retries</span> <span class="o">&lt;</span> <span class="n">MAX_RETRIES</span><span class="p">:</span>
        <span class="n">session</span> <span class="o">=</span> <span class="bp">None</span>
        <span class="k">try</span><span class="p">:</span>
            <span class="n">session</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="p">.</span><span class="nf">create_session</span><span class="p">(</span>
                <span class="nc">SessionConfig</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="n">model_name</span><span class="p">)</span>
            <span class="p">)</span>
            <span class="n">response</span> <span class="o">=</span> <span class="k">await</span> <span class="n">session</span><span class="p">.</span><span class="nf">send_and_wait</span><span class="p">(</span>
                <span class="nc">MessageOptions</span><span class="p">(</span><span class="n">prompt</span><span class="o">=</span><span class="n">full_prompt</span><span class="p">),</span>
                <span class="n">timeout</span><span class="o">=</span><span class="mf">60.0</span><span class="p">,</span>
            <span class="p">)</span>
            <span class="k">if</span> <span class="n">response</span> <span class="ow">and</span> <span class="n">response</span><span class="p">.</span><span class="n">data</span> <span class="ow">and</span> <span class="n">response</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">content</span><span class="p">:</span>
                <span class="k">return</span> <span class="n">response</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">content</span><span class="p">.</span><span class="nf">strip</span><span class="p">()</span>
            <span class="k">else</span><span class="p">:</span>
                <span class="k">raise</span> <span class="nc">Exception</span><span class="p">(</span><span class="sh">"</span><span class="s">Empty response from Copilot SDK</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">except</span> <span class="nb">TimeoutError</span><span class="p">:</span>
            <span class="n">last_error</span> <span class="o">=</span> <span class="nc">TimeoutError</span><span class="p">(</span><span class="sh">"</span><span class="s">Request timed out</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
            <span class="n">last_error</span> <span class="o">=</span> <span class="n">e</span>
        <span class="k">finally</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">session</span><span class="p">:</span>
                <span class="k">try</span><span class="p">:</span>
                    <span class="k">await</span> <span class="n">session</span><span class="p">.</span><span class="nf">destroy</span><span class="p">()</span>
                <span class="k">except</span> <span class="nb">Exception</span><span class="p">:</span>
                    <span class="k">pass</span>

        <span class="n">retries</span> <span class="o">+=</span> <span class="mi">1</span>
        <span class="k">if</span> <span class="n">retries</span> <span class="o">&lt;</span> <span class="n">MAX_RETRIES</span><span class="p">:</span>
            <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(</span><span class="n">delay</span><span class="p">)</span>
            <span class="n">delay</span> <span class="o">*=</span> <span class="mi">2</span>

    <span class="k">raise</span> <span class="nc">Exception</span><span class="p">(</span>
        <span class="sa">f</span><span class="sh">"</span><span class="s">Failed after </span><span class="si">{</span><span class="n">MAX_RETRIES</span><span class="si">}</span><span class="s"> retries. Last error: </span><span class="si">{</span><span class="n">last_error</span><span class="si">}</span><span class="sh">"</span>
    <span class="p">)</span>
</code></pre></div></div> <p>The prompt builder serializes the conversation history $h_{i,t}$ into a single string — because each Copilot SDK session takes a flat prompt rather than a structured message list:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">_build_prompt</span><span class="p">(</span><span class="n">system_prompt</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">messages</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">dict</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">
    Serialize system instructions + history into a single prompt.
    This is h_{i,t} formatted for the LLM.
    </span><span class="sh">"""</span>
    <span class="n">parts</span> <span class="o">=</span> <span class="p">[</span><span class="sa">f</span><span class="sh">"</span><span class="s">[System Instructions]</span><span class="se">\n</span><span class="si">{</span><span class="n">system_prompt</span><span class="si">}</span><span class="se">\n</span><span class="sh">"</span><span class="p">]</span>

    <span class="k">for</span> <span class="n">msg</span> <span class="ow">in</span> <span class="n">messages</span><span class="p">:</span>
        <span class="n">role</span> <span class="o">=</span> <span class="n">msg</span><span class="p">[</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">]</span>
        <span class="n">content</span> <span class="o">=</span> <span class="n">msg</span><span class="p">[</span><span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">]</span>
        <span class="k">if</span> <span class="n">role</span> <span class="o">==</span> <span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">:</span>
            <span class="n">parts</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">[Observation]</span><span class="se">\n</span><span class="si">{</span><span class="n">content</span><span class="si">}</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">elif</span> <span class="n">role</span> <span class="ow">in</span> <span class="p">(</span><span class="sh">"</span><span class="s">model</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">assistant</span><span class="sh">"</span><span class="p">):</span>
            <span class="n">parts</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">[Your Previous Action]</span><span class="se">\n</span><span class="si">{</span><span class="n">content</span><span class="si">}</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>

    <span class="n">parts</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="sh">"</span><span class="s">[Your Action]</span><span class="se">\n</span><span class="s">Respond with exactly one action:</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">return</span> <span class="sh">"</span><span class="se">\n</span><span class="sh">"</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="n">parts</span><span class="p">)</span>
</code></pre></div></div> <h3 id="the-single-agent">The Single Agent</h3> <p>With the base class and LLM wrapper in place, the single agent is straightforward. It implements the agent loop: observe → reason → act → update history → repeat:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">.gta_base</span> <span class="kn">import</span> <span class="n">GTABaseAgent</span>
<span class="kn">from</span> <span class="n">.copilot_llm</span> <span class="kn">import</span> <span class="n">call_copilot_with_retry</span>
<span class="kn">from</span> <span class="n">copilot</span> <span class="kn">import</span> <span class="n">CopilotClient</span>

<span class="n">SYSTEM_PROMPT</span> <span class="o">=</span> <span class="sh">"""</span><span class="se">\
</span><span class="s">You are an autonomous agent operating inside GTA V. Your goal is to complete </span><span class="se">\
</span><span class="s">missions by navigating the open world, interacting with NPCs, driving vehicles, </span><span class="se">\
</span><span class="s">and executing objectives.

## Actions
Respond with EXACTLY ONE action per turn (no extra text):

1. **move** – walk/run to a location
   `move: to &lt;location_or_coordinates&gt;`

2. **drive** – enter and drive a vehicle
   `drive: to &lt;destination&gt; via &lt;route_preference&gt;`

3. **interact** – interact with an NPC or object
   `interact: &lt;target&gt; with action &lt;action_type&gt;`

4. **shoot** – engage a target
   `shoot: &lt;target&gt; with &lt;weapon&gt;`

5. **wait** – wait for a condition
   `wait: until &lt;condition&gt;`

6. **lookup** – look up a location or mission intel
   `lookup: &lt;query&gt;`

7. **impossible** – declare the mission cannot be completed
   `impossible: &lt;reason&gt;`

## Environment
- You receive observations about: player position, health, nearby entities </span><span class="se">\
</span><span class="s">(NPCs, vehicles, objects), current objective, and minimap waypoints.
- The world is persistent — NPCs remember interactions, police respond to crimes, </span><span class="se">\
</span><span class="s">and time passes.

## Strategy
- First, assess your current position relative to the objective.
- If you don</span><span class="sh">'</span><span class="s">t know where to go, use `lookup: &lt;destination&gt;`.
- Use vehicles for long distances.
- Avoid unnecessary combat — it attracts police attention.
- Complete objectives in order. Multi-step missions require sequential actions.

## Important
- Respond with ONLY the action, nothing else.
- One action per turn. No explanations.
</span><span class="sh">"""</span>


<span class="k">class</span> <span class="nc">GTASingleAgent</span><span class="p">(</span><span class="n">GTABaseAgent</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">
    Single Agent for GTA V missions.
    Implements the agent loop: π_i(h_{i,t}) -&gt; α_{i,t}
    </span><span class="sh">"""</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">model_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">client</span><span class="p">:</span> <span class="n">CopilotClient</span><span class="p">):</span>
        <span class="nf">super</span><span class="p">().</span><span class="nf">__init__</span><span class="p">(</span><span class="n">model_name</span><span class="p">,</span> <span class="n">client</span><span class="p">)</span>

    <span class="k">async</span> <span class="k">def</span> <span class="nf">act</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">observation_text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
        <span class="c1"># 1. Update history: h_{i,t} = h_{i,t-1} ⊕ (o_{i,t})
</span>        <span class="k">if</span> <span class="n">observation_text</span><span class="p">:</span>
            <span class="n">self</span><span class="p">.</span><span class="n">conversation</span><span class="p">.</span><span class="nf">append</span><span class="p">({</span>
                <span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="n">observation_text</span>
            <span class="p">})</span>

        <span class="c1"># 2. Query φ_i: α_{i,t} = π_i(h_{i,t})
</span>        <span class="n">action_text</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">call_copilot_with_retry</span><span class="p">(</span>
            <span class="n">self</span><span class="p">.</span><span class="n">client</span><span class="p">,</span>
            <span class="n">self</span><span class="p">.</span><span class="n">model_name</span><span class="p">,</span>
            <span class="n">self</span><span class="p">.</span><span class="n">conversation</span><span class="p">,</span>
            <span class="n">SYSTEM_PROMPT</span><span class="p">,</span>
        <span class="p">)</span>

        <span class="c1"># 3. Append action to memory: h_{i,t+1}
</span>        <span class="n">self</span><span class="p">.</span><span class="n">conversation</span><span class="p">.</span><span class="nf">append</span><span class="p">({</span>
            <span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">model</span><span class="sh">"</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="n">action_text</span>
        <span class="p">})</span>

        <span class="c1"># 4. Handle lookup action (oracle tool call)
</span>        <span class="k">if</span> <span class="sh">"</span><span class="s">lookup:</span><span class="sh">"</span> <span class="ow">in</span> <span class="n">action_text</span><span class="p">.</span><span class="nf">lower</span><span class="p">():</span>
            <span class="n">result</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="nf">_lookup_location</span><span class="p">(</span><span class="n">action_text</span><span class="p">)</span>
            <span class="k">if</span> <span class="n">result</span><span class="p">:</span>
                <span class="n">self</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">📍 </span><span class="si">{</span><span class="n">action_text</span><span class="si">}</span><span class="s"> -&gt; </span><span class="si">{</span><span class="n">result</span><span class="p">[</span><span class="si">:</span><span class="mi">60</span><span class="p">]</span><span class="si">}</span><span class="s">...</span><span class="sh">"</span><span class="p">)</span>
                <span class="n">self</span><span class="p">.</span><span class="n">conversation</span><span class="p">.</span><span class="nf">append</span><span class="p">({</span>
                    <span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">,</span>
                    <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="n">result</span>
                <span class="p">})</span>
                <span class="k">return</span> <span class="k">await</span> <span class="n">self</span><span class="p">.</span><span class="nf">act</span><span class="p">(</span><span class="bp">None</span><span class="p">)</span>

        <span class="k">return</span> <span class="n">action_text</span>
</code></pre></div></div> <h3 id="running-the-agent">Running the Agent</h3> <p>Putting it all together — here’s how you’d run a mission episode:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">asyncio</span>
<span class="kn">from</span> <span class="n">gta_agent</span> <span class="kn">import</span> <span class="n">GTASingleAgent</span>
<span class="kn">from</span> <span class="n">copilot_llm</span> <span class="kn">import</span> <span class="n">get_copilot_client</span>


<span class="k">async</span> <span class="k">def</span> <span class="nf">run_mission</span><span class="p">():</span>
    <span class="n">client</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">get_copilot_client</span><span class="p">()</span>
    <span class="n">agent</span> <span class="o">=</span> <span class="nc">GTASingleAgent</span><span class="p">(</span><span class="n">model_name</span><span class="o">=</span><span class="sh">"</span><span class="s">gpt-5-mini</span><span class="sh">"</span><span class="p">,</span> <span class="n">client</span><span class="o">=</span><span class="n">client</span><span class="p">)</span>
    <span class="n">agent</span><span class="p">.</span><span class="nf">reset</span><span class="p">(</span>
        <span class="n">mission_id</span><span class="o">=</span><span class="sh">"</span><span class="s">heist_01</span><span class="sh">"</span><span class="p">,</span>
        <span class="n">objective</span><span class="o">=</span><span class="sh">"</span><span class="s">Drive to the Vanilla Unicorn, meet Trevor, </span><span class="sh">"</span>
                  <span class="sh">"</span><span class="s">then escape the police in a getaway vehicle.</span><span class="sh">"</span>
    <span class="p">)</span>

    <span class="c1"># Initial observation from the environment
</span>    <span class="n">obs</span> <span class="o">=</span> <span class="p">(</span>
        <span class="sh">"</span><span class="s">Position: Downtown Vinewood (x=248, y=1024). </span><span class="sh">"</span>
        <span class="sh">"</span><span class="s">Health: 100%. Armor: 50%. </span><span class="sh">"</span>
        <span class="sh">"</span><span class="s">Nearby: 1 parked Kuruma (unlocked), 3 pedestrians. </span><span class="sh">"</span>
        <span class="sh">"</span><span class="s">Objective: Go to the Vanilla Unicorn. </span><span class="sh">"</span>
        <span class="sh">"</span><span class="s">Distance to objective: 2.4 km NW.</span><span class="sh">"</span>
    <span class="p">)</span>

    <span class="n">max_steps</span> <span class="o">=</span> <span class="mi">50</span>
    <span class="k">for</span> <span class="n">step</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">max_steps</span><span class="p">):</span>
        <span class="n">action</span> <span class="o">=</span> <span class="k">await</span> <span class="n">agent</span><span class="p">.</span><span class="nf">act</span><span class="p">(</span><span class="n">obs</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Step </span><span class="si">{</span><span class="n">step</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">action</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

        <span class="k">if</span> <span class="sh">"</span><span class="s">impossible:</span><span class="sh">"</span> <span class="ow">in</span> <span class="n">action</span><span class="p">.</span><span class="nf">lower</span><span class="p">():</span>
            <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Agent declared mission impossible.</span><span class="sh">"</span><span class="p">)</span>
            <span class="k">break</span>

        <span class="c1"># In a real setup, you'd send the action to the
</span>        <span class="c1"># GTA V environment and get back the next observation.
</span>        <span class="c1"># obs = env.step(action)
</span>        <span class="k">break</span>  <span class="c1"># demo: single step
</span>

<span class="n">asyncio</span><span class="p">.</span><span class="nf">run</span><span class="p">(</span><span class="nf">run_mission</span><span class="p">())</span>
</code></pre></div></div> <h2 id="key-design-decisions">Key Design Decisions</h2> <p>A few things worth noting about this architecture:</p> <p><strong>Flat prompt construction.</strong> The Copilot SDK uses a single-prompt-per-session model. We serialize the entire conversation history $h_{i,t}$ into one string. This means the full context is visible to the LLM on every call, but we pay the token cost of replaying history. In practice, you’d truncate when approaching the context window limit — exactly the $|h_{i,t+1}| &gt; \text{MAX_TOKENS}$ constraint from the formalism.</p> <p><strong>Exponential backoff.</strong> LLM APIs are rate-limited. The retry wrapper doubles the delay on each failure (5s → 10s → 20s → 40s → 80s). This is important for any production agent that runs for dozens or hundreds of steps.</p> <p><strong>Tool calls as actions.</strong> The <code class="language-plaintext highlighter-rouge">lookup</code> action demonstrates how $A_i$ includes tool calls. When the agent outputs <code class="language-plaintext highlighter-rouge">lookup: Vanilla Unicorn</code>, we intercept it, query an oracle, inject the result as a new observation, and re-enter the agent loop. The agent doesn’t see this as a special case — it’s just another action-observation pair in the history.</p> <p><strong>Stateless sessions, stateful history.</strong> Each LLM call creates a fresh Copilot SDK session (stateless), but the agent maintains its own conversation history (stateful). This separation means session failures don’t corrupt the agent’s memory.</p> <h2 id="from-single-to-multi-agent">From Single to Multi-Agent</h2> <p>The formal framework makes it clear how to extend this to multi-agent systems. You’d add:</p> <ul> <li><strong>More agents</strong> in $A$ with different specializations (a driver agent, a combat agent, a negotiation agent)</li> <li><strong>Communication topology</strong> $C$ defining which agents can message each other</li> <li><strong>Orchestration policy</strong> $\Omega$ deciding which agent acts at each timestep</li> </ul> <p>But the single-agent case is where you get the fundamentals right. Get the agent loop, memory management, and tool integration working reliably for one agent before scaling to many.</p> <hr/> <h2 id="references">References</h2> <h2 class="bibliography">2024</h2> <ol class="bibliography"><li><div class="row"> <div class="col col-sm-2 abbr"> <abbr class="badge rounded w-100">ICLR</abbr> </div> <div id="zhou2024webarena" class="col-sm-8"> <div class="title">WebArena: A Realistic Web Environment for Building Autonomous Agents</div> <div class="author"> Shuyan Zhou, Frank F. Xu, Hao Zhu, and <span class="more-authors" title="click to view 8 more authors" onclick=" var element=$(this); element.attr('title', ''); var more_authors_text=element.text() == '8 more authors' ? 'Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig' : '8 more authors'; var cursorPosition=0; var textAdder=setInterval(function(){ element.html(more_authors_text.substring(0, cursorPosition + 1)); if (++cursorPosition == more_authors_text.length){ clearInterval(textAdder); } }, '10'); ">8 more authors</span> </div> <div class="periodical"> <em>In International Conference on Learning Representations (ICLR)</em>, 2024 </div> <div class="periodical"> </div> <div class="links"> <a class="bibtex btn btn-sm z-depth-0" role="button">Bib</a> </div> <div class="bibtex hidden"> <figure class="highlight"><pre><code class="language-bibtex" data-lang="bibtex"><span class="nc">@inproceedings</span><span class="p">{</span><span class="nl">zhou2024webarena</span><span class="p">,</span>
  <span class="na">title</span> <span class="p">=</span> <span class="s">{{WebArena}: A Realistic Web Environment for Building Autonomous Agents}</span><span class="p">,</span>
  <span class="na">author</span> <span class="p">=</span> <span class="s">{Zhou, Shuyan and Xu, Frank F. and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham}</span><span class="p">,</span>
  <span class="na">booktitle</span> <span class="p">=</span> <span class="s">{International Conference on Learning Representations (ICLR)}</span><span class="p">,</span>
  <span class="na">year</span> <span class="p">=</span> <span class="s">{2024}</span><span class="p">,</span>
  <span class="na">url</span> <span class="p">=</span> <span class="s">{https://arxiv.org/abs/2307.13854}</span>
<span class="p">}</span></code></pre></figure> </div> </div> </div> </li> <li><div class="row"> <div class="col col-sm-2 abbr"> <abbr class="badge rounded w-100">arXiv</abbr> </div> <div id="guo2024llmmultiagents" class="col-sm-8"> <div class="title">Large Language Model based Multi-Agents: A Survey of Progress and Challenges</div> <div class="author"> Taicheng Guo, Xiuying Chen, Yaqi Wang, and <span class="more-authors" title="click to view 5 more authors" onclick=" var element=$(this); element.attr('title', ''); var more_authors_text=element.text() == '5 more authors' ? 'Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang' : '5 more authors'; var cursorPosition=0; var textAdder=setInterval(function(){ element.html(more_authors_text.substring(0, cursorPosition + 1)); if (++cursorPosition == more_authors_text.length){ clearInterval(textAdder); } }, '10'); ">5 more authors</span> </div> <div class="periodical"> <em>arXiv preprint arXiv:2402.01680</em>, 2024 </div> <div class="periodical"> </div> <div class="links"> <a class="bibtex btn btn-sm z-depth-0" role="button">Bib</a> </div> <div class="bibtex hidden"> <figure class="highlight"><pre><code class="language-bibtex" data-lang="bibtex"><span class="nc">@article</span><span class="p">{</span><span class="nl">guo2024llmmultiagents</span><span class="p">,</span>
  <span class="na">title</span> <span class="p">=</span> <span class="s">{Large Language Model based Multi-Agents: A Survey of Progress and Challenges}</span><span class="p">,</span>
  <span class="na">author</span> <span class="p">=</span> <span class="s">{Guo, Taicheng and Chen, Xiuying and Wang, Yaqi and Chang, Ruidi and Pei, Shichao and Chawla, Nitesh V. and Wiest, Olaf and Zhang, Xiangliang}</span><span class="p">,</span>
  <span class="na">journal</span> <span class="p">=</span> <span class="s">{arXiv preprint arXiv:2402.01680}</span><span class="p">,</span>
  <span class="na">year</span> <span class="p">=</span> <span class="s">{2024}</span><span class="p">,</span>
  <span class="na">url</span> <span class="p">=</span> <span class="s">{https://arxiv.org/abs/2402.01680}</span>
<span class="p">}</span></code></pre></figure> </div> </div> </div> </li></ol>]]></content><author><name></name></author><category term="AI"/><category term="agents"/><category term="LLM"/><category term="AI"/><category term="architecture"/><summary type="html"><![CDATA[A walkthrough of formalizing single-agent systems for LLM-powered autonomous task execution, with a complete Python implementation using the GitHub Copilot SDK and a GTA V environment.]]></summary></entry><entry><title type="html">Context Graphs Are the Future of AI Infrastructure</title><link href="https://abhimanyuaryan.github.io/blog/2026/context-graphs-future-of-ai/" rel="alternate" type="text/html" title="Context Graphs Are the Future of AI Infrastructure"/><published>2026-01-15T10:00:00+00:00</published><updated>2026-01-15T10:00:00+00:00</updated><id>https://abhimanyuaryan.github.io/blog/2026/context-graphs-future-of-ai</id><content type="html" xml:base="https://abhimanyuaryan.github.io/blog/2026/context-graphs-future-of-ai/"><![CDATA[<p>When Jaya Gupta published <a href="https://www.linkedin.com/pulse/how-do-you-build-context-graph-jaya-gupta-xicwe/">How Do You Build a Context Graph?</a> in late December 2025, I felt something click into place. Not because the ideas were new to me — but because she articulated, under a single term, a vision I’d been building toward for over two years across seemingly different projects.</p> <p>Knowledge graphs. Ontologies. Database migration. LLM hallucination reduction. Multi-agent systems. Relational databases vs. graph databases. These weren’t separate interests — they were all pieces of the same puzzle. <strong>Context graphs</strong> is the name for what emerges when you put them together.</p> <p>Let me walk through how I got here.</p> <h2 id="it-started-with-hallucinations-november-2023">It Started with Hallucinations (November 2023)</h2> <p>I started working seriously with knowledge graphs in late 2023. The problem that pulled me in was deceptively simple: <strong>LLMs hallucinate</strong>, and no amount of prompt engineering fully fixes it.</p> <p>The insight was that if you ground LLM outputs in structured, verifiable knowledge — a graph of entities and their real relationships — the model has something to reason <em>from</em> rather than just generating plausible-sounding text. Knowledge graphs become a source of truth that constrains generation.</p> <p>This became the foundation of my <a href="/news/announcement_10/">talk at the Voxel51 AI Meetup in September 2024</a>, where I demonstrated using <strong>LangChain and Neo4j</strong> to reduce hallucinations in ChatGPT-style systems. The <a href="https://github.com/AbhimanyuAryan/voxel51">code is on GitHub</a>. The core approach: instead of retrieving flat text chunks, retrieve <em>graph-structured knowledge</em> — entities, relationships, and their context — so the LLM has structured facts to anchor its response.</p> <p>But this raised a deeper question: where does the graph come from? How do you build and maintain it? And how do you make sure the <em>structure itself</em> is right?</p> <h2 id="understanding-database-paradigms-the-migration-project">Understanding Database Paradigms (The Migration Project)</h2> <p>Around the same time, I was working on <a href="/projects/8_nosql_database_evaluation/">migrating a hospital management database</a> across three fundamentally different paradigms: <strong>Oracle SQL → MongoDB → Neo4j</strong>.</p> <p>This project taught me something that textbooks don’t emphasize enough: <strong>the way you store data shapes the way you can reason about it.</strong></p> <p>A relational database (Oracle) captures state beautifully — normalized tables, foreign keys, constraints. Clean and precise. But relationships are implicit, buried in JOIN operations.</p> <p>A document store (MongoDB) captures context — all the information about an entity lives together in a rich, nested document. Great for retrieval. But relationships between documents are second-class citizens.</p> <p>A graph database (Neo4j) makes relationships first-class. Suddenly you can ask questions like “what’s the shortest path between Patient A and Patient B?” — traversals that would require recursive JOINs in SQL become single Cypher queries.</p> <p>The migration forced me to confront hard design decisions: consolidating Doctor, Nurse, and Technician into unified Staff nodes, removing unnecessary connector entities (Episode), rethinking how triggers and views translate across paradigms. Each database enforced a different worldview on the same underlying reality.</p> <p><strong>This is the ontology problem in miniature.</strong> Every database schema is an implicit ontology — a claim about what entities exist, how they relate, and what matters. Migrating between schemas is really migrating between ontologies.</p> <h2 id="ontologies-as-the-foundation-september-2024">Ontologies as the Foundation (September 2024)</h2> <p>Two things happened in September 2024 that sharpened this thinking.</p> <p>First, my <a href="/news/announcement_10/">Voxel51 talk</a> — demonstrating that knowledge graphs concretely reduce LLM hallucinations when used as retrieval infrastructure.</p> <p>Then, days later, I had <a href="/news/announcement_8/">a conversation with Jérémy Ravenel</a> about using <strong>ontologies as the base of next-generation AI systems</strong>. This was the conversation where it all started coming together.</p> <p>Jérémy and I discussed how ontologies could:</p> <ul> <li><strong>Enhance knowledge representation</strong> in LLMs — giving models structured priors about what kinds of things exist and how they can relate</li> <li><strong>Improve reasoning</strong> — moving from pattern matching to structured inference over formally defined relationships</li> <li><strong>Enable accurate retrieval</strong> — using ontological relationships to return contextually grounded results</li> <li><strong>Map entity relations across domains</strong> — the same entity can play different roles in different contexts</li> </ul> <p>We spent a lot of time on <strong>framework selection</strong> — which tools and standards to use for building ontologies. OWL? SKOS? Custom schemas? The choice shapes everything downstream. As we agreed: “a good beginning is half done.”</p> <p>This conversation planted a seed: what if you combined the <em>grounding power</em> of knowledge graphs (reducing hallucinations), the <em>structural flexibility</em> of different database paradigms (relational, document, graph), and the <em>formal precision</em> of ontologies (defining what entities and relationships are possible)?</p> <h2 id="the-world-model-connection-september-2025">The World Model Connection (September 2025)</h2> <p>Then Meta released <a href="https://huggingface.co/facebook/cwm">Code World Models (CWM)</a> in September 2025, and the transfer learning potential was immediately obvious to me.</p> <p>CWM learns compressed representations of how environments work by observing trajectories through them. Not static snapshots — <strong>dynamics</strong>. How does state change? What happens when you take an action? What are the causal relationships?</p> <p>The connection to knowledge graphs:</p> <ul> <li><strong>Knowledge graphs</strong> capture static structure — what exists and how it’s connected</li> <li><strong>World models</strong> capture dynamics — how the system <em>behaves</em></li> <li><strong>Ontologies</strong> provide the schema — what <em>kinds</em> of structures and dynamics are possible</li> </ul> <p>Mix them and you get something that doesn’t just store what’s true — it models how things work and can predict what happens next. That’s not a database anymore. That’s <strong>infrastructure for intelligence</strong>.</p> <h2 id="jaya-guptas-context-graph-framework-december-2025">Jaya Gupta’s Context Graph Framework (December 2025)</h2> <p>When Jaya Gupta’s <a href="https://www.linkedin.com/pulse/how-do-you-build-context-graph-jaya-gupta-xicwe/">context graph article</a> landed, I read it as someone who’d been living every dimension of the problem she described. Her framework brought together ideas I’d been working on separately under one coherent vision.</p> <h3 id="the-two-clocks-problem">The Two Clocks Problem</h3> <p>Gupta identifies that we’ve built trillion-dollar infrastructure for the <strong>state clock</strong> (what’s true now) and almost nothing for the <strong>event clock</strong> (what happened, in what order, with what reasoning).</p> <p>I’ve seen this firsthand. The Oracle database in my migration project captures state perfectly — current patients, current staff, current bills. The MongoDB version captures richer context per entity. The Neo4j version captures relationships. <strong>But none of them capture <em>why</em> the data looks the way it does</strong> — the decisions, the reasoning, the traces that produced the current state.</p> <p>That’s exactly the gap. The reasoning connecting observations to actions was never treated as data.</p> <h3 id="agents-as-informed-walkers">Agents as Informed Walkers</h3> <p>Gupta draws on <strong>node2vec</strong> and graph representation learning: you don’t need to predefine the ontology. Agent trajectories through problem space discover structure through use. The schema isn’t the starting point — it’s the output.</p> <p>This resonates with everything I’ve built. When I migrated the hospital database to Neo4j, I had to <em>manually</em> discover which entities mattered and how they related. I consolidated Doctor, Nurse, and Technician into Staff. I debated whether to keep Episode as a node or remove it. These were ontology design decisions that required deep understanding of how the system is actually used.</p> <p>Agents could do this automatically. An agent traversing a system — investigating issues, completing tasks, making decisions — implicitly discovers the ontology through its trajectory. Accumulate enough trajectories and the structure emerges.</p> <p>This is also why <strong>reducing hallucinations matters</strong> at this level. If agents are the walkers discovering ontology, they need to be grounded in reality — not hallucinating entities and relationships that don’t exist. The knowledge graph grounding I demonstrated at Voxel51 is prerequisite infrastructure for reliable agent-driven ontology discovery.</p> <h3 id="context-graphs-as-world-models">Context Graphs as World Models</h3> <p>The most powerful idea: a context graph with enough accumulated structure becomes a <strong>world model</strong>. It encodes not just what exists, but how the system behaves. It enables simulation — “what if?” rather than “what happened?”</p> <p>This is where CWM’s approach and Gupta’s vision converge. Facebook showed world models can be learned from code trajectories. Gupta argues they can be learned from organizational agent trajectories. The principle is identical: observe enough dynamics and a predictive model emerges.</p> <p>And the world model needs all the layers I’ve been building:</p> <ul> <li><strong>Relational databases</strong> for clean, normalized state</li> <li><strong>Graph databases</strong> for rich, traversable relationships</li> <li><strong>Ontologies</strong> for structural priors about what’s possible</li> <li><strong>Knowledge graphs</strong> for grounding agents in verified facts</li> <li><strong>LLM hallucination reduction</strong> for trustworthy agent behavior</li> <li><strong>Agent trajectories</strong> for discovering dynamics and building the event clock</li> </ul> <h2 id="where-this-is-going">Where This Is Going</h2> <p>I fully agree with Gupta’s framing: context graphs are not just a better retrieval system — they’re <strong>organizational intelligence that compounds</strong>.</p> <p>The convergence I see across my own work:</p> <ol> <li><strong>Knowledge graphs</strong> provide the grounding layer — concrete entities, relationships, verified facts. They keep agents honest and reduce hallucinations.</li> <li><strong>Database paradigm fluency</strong> is essential — you need to understand how relational, document, and graph models each capture different aspects of reality, because context graphs need all three perspectives.</li> <li><strong>Ontologies</strong> provide the structural layer — formal definitions that constrain what the graph can represent, learned and refined through use.</li> <li><strong>World models</strong> (à la CWM) provide the dynamics layer — how the system behaves, learned from agent trajectories.</li> <li><strong>Context graphs</strong> are the synthesis — capturing not just state but reasoning, not just data but decision traces, not just structure but dynamics.</li> </ol> <p>Three problems need solving:</p> <ul> <li><strong>The two clocks problem</strong> — building event clock infrastructure alongside state infrastructure, across relational and graph paradigms</li> <li><strong>Schema as output</strong> — letting grounded, non-hallucinating agents discover ontology through informed traversal</li> <li><strong>World models, not retrieval</strong> — context graphs that simulate futures, not just retrieve pasts</li> </ul> <p>Every project I’ve touched in the last two years — from <a href="https://github.com/AbhimanyuAryan/voxel51">reducing LLM hallucinations with Neo4j</a>, to <a href="/projects/8_nosql_database_evaluation/">migrating databases across paradigms</a>, to <a href="/news/announcement_8/">exploring ontologies with Jérémy Ravenel</a> — was building toward this. Not because I planned it that way, but because these problems are deeply connected.</p> <p>Context graphs are where knowledge graphs, ontologies, database design, agent systems, and world models converge. I believe this is where AI infrastructure is heading — and I intend to help build it.</p>]]></content><author><name></name></author><category term="AI"/><category term="knowledge-graphs"/><category term="ontologies"/><category term="AI"/><summary type="html"><![CDATA[From reducing LLM hallucinations with knowledge graphs, to migrating between database paradigms, to ontology-driven AI — how two years of work led me to context graphs as the next frontier.]]></summary></entry><entry><title type="html">Getting Started with Deep Learning in Swift and TensorFlow</title><link href="https://abhimanyuaryan.github.io/blog/2019/deeplearning-with-swift-and-tensorflow/" rel="alternate" type="text/html" title="Getting Started with Deep Learning in Swift and TensorFlow"/><published>2019-11-22T09:56:23+00:00</published><updated>2019-11-22T09:56:23+00:00</updated><id>https://abhimanyuaryan.github.io/blog/2019/deeplearning-with-swift-and-tensorflow</id><content type="html" xml:base="https://abhimanyuaryan.github.io/blog/2019/deeplearning-with-swift-and-tensorflow/"><![CDATA[<p>There are 3 ways to get started coding with Swift &amp; TensorFlow:</p> <ul> <li><strong>Google Colab</strong> <em>(Basic: Windows/Mac/Linux)</em></li> <li><strong>Command Line</strong> <em>(Advanced: Mac/Linux)</em></li> <li><strong>REPL Playground XCode</strong> <em>(Basic: Mac — Coming Soon)</em></li> </ul> <blockquote> <p>Note: I’ll cover the first two approaches today — Google Colab &amp; command line. The 3rd approach (XCode Playground) will be a separate post.</p> </blockquote> <hr/> <h2 id="1-google-colab">1. Google Colab</h2> <p>First, create an empty <code class="language-plaintext highlighter-rouge">swift.ipynb</code> notebook:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">touch </span>swift.ipynb
code swift.ipynb
</code></pre></div></div> <p>Open it in VSCode and paste this JSON to make it a Swift kernel notebook:</p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"nbformat"</span><span class="p">:</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w">
  </span><span class="nl">"nbformat_minor"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w">
  </span><span class="nl">"metadata"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"colab"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"swift_notebook.ipynb"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0.3.2"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"provenance"</span><span class="p">:</span><span class="w"> </span><span class="p">[],</span><span class="w">
      </span><span class="nl">"collapsed_sections"</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="nl">"kernelspec"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"swift"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"display_name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Swift"</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="nl">"cells"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"metadata"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"icDfXRlHRYvE"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"colab_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"code"</span><span class="w">
      </span><span class="p">},</span><span class="w">
      </span><span class="nl">"cell_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"code"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"source"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"let x = 2</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span><span class="w"> </span><span class="s2">"let y = 2</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span><span class="w"> </span><span class="s2">"print(</span><span class="se">\"</span><span class="s2">Hello world, this is Swift! </span><span class="se">\\</span><span class="s2">(x + y)</span><span class="se">\"</span><span class="s2">)"</span><span class="p">],</span><span class="w">
      </span><span class="nl">"execution_count"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w">
      </span><span class="nl">"outputs"</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div> <p>Then go to <a href="https://colab.research.google.com/notebooks/welcome.ipynb">colab.research.google.com</a> → <strong>File &gt; Upload Notebook</strong> → upload your <code class="language-plaintext highlighter-rouge">Swift.ipynb</code>. You can now write Swift &amp; TensorFlow in Colab!</p> <hr/> <h2 id="2-command-line">2. Command Line</h2> <p>Download Swift-TensorFlow for Mac or Ubuntu from <a href="https://github.com/tensorflow/swift/blob/master/Installation.md">the official installation guide</a>.</p> <p>Once set up, create <code class="language-plaintext highlighter-rouge">basics.swift</code>:</p> <div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">print</span><span class="p">(</span><span class="s">"Tensorflow Basics Tutorial"</span><span class="p">)</span>

<span class="kd">import</span> <span class="kt">TensorFlow</span>

<span class="k">let</span> <span class="nv">x</span> <span class="o">=</span> <span class="kt">Tensor</span><span class="o">&lt;</span><span class="kt">Float</span><span class="o">&gt;</span><span class="p">([[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">]])</span>
<span class="nf">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</code></pre></div></div> <p>Compile and run:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>swift basics.swift
</code></pre></div></div> <p>Output:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Tensorflow Basics Tutorial
[[2.0, 2.0], [2.0, 2.0]]
</code></pre></div></div> <p>Swift also has a Python-like <strong>REPL</strong> since it’s built on the LLVM infrastructure.</p> <hr/> <h3 id="whats-next">What’s Next?</h3> <ul> <li>Why TensorFlow &amp; Swift?</li> <li>Swift Compiler Technology — how it compares to the competition</li> <li>Using Python libraries with Swift-TensorFlow</li> </ul>]]></content><author><name></name></author><category term="machine-learning"/><category term="deep-learning"/><category term="swift"/><category term="tensorflow"/><summary type="html"><![CDATA[TensorFlow is now available in Swift for Deep Learning. This post helps you get started with Google Colab and the command line.]]></summary></entry><entry><title type="html">Introduction to Active Learning</title><link href="https://abhimanyuaryan.github.io/blog/2019/active-learning/" rel="alternate" type="text/html" title="Introduction to Active Learning"/><published>2019-08-26T09:56:23+00:00</published><updated>2019-08-26T09:56:23+00:00</updated><id>https://abhimanyuaryan.github.io/blog/2019/active-learning</id><content type="html" xml:base="https://abhimanyuaryan.github.io/blog/2019/active-learning/"><![CDATA[<p>Active Learning was introduced by <a href="http://burrsettles.com/">Burr Settles</a> at the University of Wisconsin.</p> <p>According to Wikipedia, <strong>Active Learning</strong> is a sub-field of Semi-Supervised Learning. Let’s understand Semi-Supervised Learning in simple terms:</p> <blockquote> <p><em>“The ability to get a large number of images makes this a great candidate for semi-supervised learning.”</em></p> </blockquote> <p>A very simple approach to semi-supervised learning:</p> <ol> <li>Capture <strong>11,000 images</strong></li> <li>Label <strong>100 images</strong> and train <code class="language-plaintext highlighter-rouge">model_1</code></li> <li>Use <code class="language-plaintext highlighter-rouge">model_1</code> to label the other <strong>10,900 images</strong></li> <li>Train <code class="language-plaintext highlighter-rouge">model_2</code> with the “labeled” 10,000 images</li> </ol> <p>…results in a <code class="language-plaintext highlighter-rouge">model_2</code> that does <strong>better</strong> than <code class="language-plaintext highlighter-rouge">model_1</code>.</p> <p>This is the core idea — you use a model’s own predictions to generate pseudo-labels for unlabeled data, then retrain on that larger labeled set. Active learning takes this a step further by <strong>choosing which samples to label</strong> intelligently (e.g., the ones the model is most uncertain about), making each human annotation count more.</p>]]></content><author><name></name></author><category term="machine-learning"/><category term="deep-learning"/><category term="machine-learning"/><category term="active-learning"/><summary type="html"><![CDATA[What is active learning, how does it relate to semi-supervised learning, and why is it useful?]]></summary></entry><entry><title type="html">Getting Setup with Fast.ai for Machine Learning (No GPU Required)</title><link href="https://abhimanyuaryan.github.io/blog/2019/gpu-less-fastai-ml-course/" rel="alternate" type="text/html" title="Getting Setup with Fast.ai for Machine Learning (No GPU Required)"/><published>2019-03-11T09:56:23+00:00</published><updated>2019-03-11T09:56:23+00:00</updated><id>https://abhimanyuaryan.github.io/blog/2019/gpu-less-fastai-ml-course</id><content type="html" xml:base="https://abhimanyuaryan.github.io/blog/2019/gpu-less-fastai-ml-course/"><![CDATA[<p>Howdy! This post is for people who own laptops without good GPU specs, have a poor internet connection, and still want to learn ML from fast.ai.</p> <p>Setting up a dev environment can feel like a waste of time. If you’re one of those people, this post should help.</p> <h2 id="free-options">Free Options</h2> <ul> <li><strong>Kaggle</strong></li> <li><strong>Google Colab + GitHub</strong></li> </ul> <h2 id="paid-options">Paid Options</h2> <ul> <li><strong>AWS</strong></li> </ul> <hr/> <h3 id="kaggle">Kaggle</h3> <p>Kaggle is amazing if you want to start quickly — no downloading datasets. Datasets range from GBs to TBs, so not having to download them locally is a huge win.</p> <p>To use the fast.ai library, run in a Kaggle notebook:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="n">fastai</span><span class="o">==</span><span class="mf">0.7</span><span class="p">.</span><span class="mi">0</span>
</code></pre></div></div> <p>📓 <a href="https://www.kaggle.com/abhimanyuaryan/new-york-city-taxi-fare-prediction/">Sample Kaggle Notebook - NYC Taxi Fare Prediction</a></p> <p>By default your dataset gets added to the input directory:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">PATH</span> <span class="o">=</span> <span class="sh">"</span><span class="s">../input/</span><span class="sh">"</span>
<span class="n">df_raw</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nf">read_csv</span><span class="p">(</span><span class="sa">f</span><span class="sh">'</span><span class="si">{</span><span class="n">PATH</span><span class="si">}</span><span class="s">train.csv</span><span class="sh">'</span><span class="p">,</span> <span class="n">nrows</span><span class="o">=</span><span class="mi">50_000_000</span><span class="p">)</span>
</code></pre></div></div> <hr/> <h3 id="google-colab">Google Colab</h3> <p>Google Colab provides a <strong>free GPU</strong>. Here’s how to use it with GitHub:</p> <ol> <li>Create a <code class="language-plaintext highlighter-rouge">.ipynb</code> notebook locally</li> <li>Push it to GitHub</li> <li>Go to <a href="https://colab.research.google.com">colab.research.google.com</a> and load your repo</li> </ol> <h4 id="ways-to-download-datasets-in-colab">Ways to download datasets in Colab:</h4> <p><strong>Curl:</strong></p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl &lt;link_to_dataset&gt;
</code></pre></div></div> <p><em>(as shown in <a href="https://youtu.be/CzdWqFTmn0Y?t=969">Jeremy’s video</a>)</em></p> <p><strong>Kaggle API:</strong></p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Step 1: Upload Kaggle API key
</span><span class="kn">from</span> <span class="n">google.colab</span> <span class="kn">import</span> <span class="n">files</span>
<span class="n">files</span><span class="p">.</span><span class="nf">upload</span><span class="p">()</span>

<span class="c1"># Step 2: Install Kaggle API client
</span><span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">q</span> <span class="n">kaggle</span>
</code></pre></div></div> <hr/> <h3 id="aws-last-resort">AWS (Last Resort)</h3> <p>Jeremy has an <a href="https://course.fast.ai/lessons/aws.html">AWS starter video</a>. AWS p2 instances cost around <strong>$0.9/hr</strong> — decide for yourself!</p>]]></content><author><name></name></author><category term="machine-learning"/><category term="machine-learning"/><category term="fastai"/><category term="python"/><summary type="html"><![CDATA[How to run the Fast.ai ML course on Kaggle, Google Colab, and AWS — even without a good GPU or fast internet]]></summary></entry><entry><title type="html">Object-Oriented Programming in Julia</title><link href="https://abhimanyuaryan.github.io/blog/2019/object-oriented-programming-in-julia/" rel="alternate" type="text/html" title="Object-Oriented Programming in Julia"/><published>2019-01-15T00:00:00+00:00</published><updated>2019-01-15T00:00:00+00:00</updated><id>https://abhimanyuaryan.github.io/blog/2019/object-oriented-programming-in-julia</id><content type="html" xml:base="https://abhimanyuaryan.github.io/blog/2019/object-oriented-programming-in-julia/"><![CDATA[<p>Originally published on <a href="https://medium.com/@abhimanyuaryan/object-oriented-programming-in-julia-4dbde2661fde">Medium</a>.</p>]]></content><author><name></name></author><category term="julia"/><category term="julia"/><category term="programming"/><category term="oop"/><summary type="html"><![CDATA[Julia doesn't have classes, but it supports powerful OOP-like patterns through multiple dispatch, abstract types, and structs. Learn how to write clean, composable Julia code.]]></summary></entry></feed>