blank

How to Build the Harness

2026-04-06T00:00:00+00:00

A practical guide to building a robust evaluation harness that captures, replays, and measures agent behavior — the testing infrastructure your agent system actually needs.

How to Redesign the Dyad Agent

2026-04-06T00:00:00+00:00

I’ve been following Dyad’s work closely, particularly through people I deeply respect like Dr. Chris Rackauckas. I had the opportunity to meet them at JuliaCon Paris, where I helped organize the event.

After watching their recent developments (Agentic AI with Dyad) and having conversations with Avik Sengupta (VP of Engineering at JuliaHub), I’m convinced that Dyad represents the future of AI modeling.

Why AI Modeling is the Future

Everything around us starts with modeling:

Planes - Aerodynamic simulations and stress testing
Cars - Crash simulations and performance optimization
Buildings - Structural analysis and environmental modeling

Even the Hindu temple built in Paris began with 3D modeling before becoming a physical structure.

AI modeling is poised to become a multi-trillion dollar industry, and Dyad is positioned exactly where the future of agentic AI is heading.

How are agents improving over time

Agents are suppose to be more and more autonomous on long term tasks. But imagine if you have to monitor every step of the agent and intervene when it fails. Then you are the bottleneck and the agent is not really autonomous.

Why JuliaHub has what nobody has (openai/anthropic/cursor etc.)

JuliaHub has access to the Julia compiler and a team comprising some of the smartest people in the world—researchers and engineers from top universities like MIT, IIT, and others—who have spent years building a programming language that runs like C yet remains as easy to use as Python.

How this translates to Dyad:

Compiler - The ability to understand and optimize code at a deep level
Type System - The ability to reason about data structures and types
MCP/Plugin/Skill Store - Julia has rich user base and ecosystem in geospatial, scientific computing, bioinformatics, data science, etc. Now imagine if you could leverage this ecosystem through Dyad by building a skill store. A truly unique value proposition. Something that openai/anthropic/cursor etc. can’t do or atleast trying to replicate. Recently they have started hiring scientists and engineers from these fields: Anthropic Science

How can all of this be leveraged to build a truly unique agent/harness?

How to redesign the harness

Component	Implementation
Harness	Dyad (the evaluation and execution framework)
Model	Claude Opus 4.6/4.5 (frontier reasoning)
Context	Julia Compiler + Type System + MCP/Plugin/Skill Store

The Meta-Harness Pattern

From the paper: Meta-Harness

The key insight is to continuously improve the harness itself through a feedback loop:

Run the harness on diverse tasks
Evaluate outcomes manually (success/failure patterns)
Log what worked and what didn’t
Use a coding agent to analyze the log and improve the harness code

This creates a self-improving system, similar to how OpenCLAW maintains a SOUL.md that evolves over time with lessons learned.

Memory Architecture

Context sits outside the harness, but the agent has persistent memory. Learning can be captured in the harness itself through a layered memory system.

Claude Code’s Memory Pattern:

Memory is central to any agent. It has three layers:

User Memory - Persists across future runs (user role, feedback, preferences)
Session Memory - Captures everything in the current session (state, task specs, work log)
Sync Memory - Team memory and patterns (shared knowledge, org-level patterns)

Storage Structure:

~/.agent/
  user-memory/.../*.md
  session-memory/.../*.md  
  sync-memory/.../*.md

This mirrors how learning happens at user, organization, and team levels.

Memory Improvement Patterns

Memory can be improved in two ways:

Continuous Learning - After each user-agent feedback loop
Dreaming Pattern - OpenCLAW’s approach of offline memory consolidation and improvement

OpenCLAW Dreaming Documentation

What future holds for Dyad

Limitations of text-based models in building complex 3D structures. Coming soon…

Food for thought

World models and partnership with Yann LeCun for AMI Paris
— Abhimanyu Aryan (@theabhimanyu) March 31, 2026

How to Build the Control Room for Your Agent

2026-04-03T00:00:00+00:00

Most agent systems fail for a simple reason: they have a model, but no control room.

The control room is the part that receives events, routes them, keeps sessions isolated, and makes sure the agent only acts when it should.

What the gateway should do

The gateway is the control plane for the whole agent system. It should:

receive every external input
normalize events into a common format
assign each event to the right session
queue runs so only one turn happens per session at a time
expose the same state to every client

If the runtime is the worker, the gateway is the traffic controller.

Design principles

1. Make it the source of truth

Do not let clients read local session files directly. The gateway should own session state and expose it through an API.

2. Treat sessions as first-class

Separate DMs, group chats, threads, and device-specific contexts. A good gateway prevents context leakage by default.

3. Use a typed protocol

Use a small message model with clear request, response, and event frames. Keep connect/auth/version checks at the boundary.

4. Serialize work per session

Use a lane-aware FIFO queue so one session cannot execute two turns at once. Parallelism is fine across sessions.

5. Persist everything durably

Write session transcripts and metadata to disk so the system survives restarts without losing continuity.

A simple gateway shape

Think of the gateway as four layers:

Inputs: messages, heartbeats, cronjobs, hooks, webhooks
Router: maps each event to a session and a queue lane
State: stores transcripts, session IDs, and metadata
Clients: web UI, CLI, desktop app, mobile nodes

That is enough to make an agent feel consistent, always-on, and responsive.

Practical defaults

use one primary DM session and separate group sessions
keep secure DM mode on when multiple people can contact the agent
require a token for remote gateway access
coalesce noisy inputs with a collect mode
keep the runtime stateless and let the gateway own continuity

Final thought

A strong agent does not come from a bigger prompt alone. It comes from a gateway that makes the agent legible, durable, and safe to operate.

Build the control room first.

Agent Architectures: From Single Agent to Hybrid MAS

2026-03-20T00:00:00+00:00

This post is rendered directly from a Jupyter notebook. It covers five distinct agent architectural paradigms for benchmarking planning, from a baseline single-agent system to various multi-agent systems (MAS).

Building a Single Agent System: From Formal Foundations to Working Code

2026-03-09T00:00:00+00:00

Large language models are increasingly used not just as chatbots, but as the reasoning core of autonomous agents — systems that observe an environment, decide on actions, and execute them in a loop until a goal is reached.

In this post, I walk through the formal foundations of agent systems and then build a concrete single-agent implementation: an LLM-powered agent that navigates and completes missions in a GTA V game environment.

Formalizing Agent Systems

Building on multi-agent formulations (Zhou et al., 2024) (Guo et al., 2024), an agent system is denoted by:

\[S = \{A, E, C, \Omega\}\]

where $A = {a_1, \ldots, a_n}$ (with $n \ge 1$) is a set of agents, $E$ is a shared environment, $C$ is a communication topology, and $\Omega$ is an orchestration policy.

Each agent $a_i$ is represented by the tuple:

\[S_i = (\phi_i, A_i, M_i, \pi_i)\]

where:

$\phi_i$ is the reasoning policy (typically an LLM)
$A_i = { \mathrm{ToolCall}(t, \theta) \mid t \in T,\ \theta \in \Theta_t }$ is the action space — the set of tool calls the agent can make, where $T$ is the set of available tools (e.g., navigation, shooting, vehicle control) and $\Theta_t$ are valid parameter configurations for tool $t$
$M_i$ is the internal memory
$\pi_i: H \to A_i$ is the decision function, mapping observation histories to actions

The observation history space $H$ contains sequences of action-observation pairs. The decision function $\pi_i$ is instantiated by the reasoning policy $\phi_i$: given a history $h_{i,t}$, the LLM generates a reasoning trace and selects the next action.

For example, a history:

\[h_{i,t} = \Big[\big(\texttt{navigate\_to}(\text{waypoint}=\text{"Vinewood Hills"}),\ \text{"Arrived at Vinewood Hills"}\big), \ldots\Big]\]

is processed by $\phi_i$ to produce the next tool call $\alpha_{i,t+1}$.

The Agent Loop

At timestep $t$, agent $a_i$ selects an action $\alpha_{i,t} \in A_i$ according to:

\[\alpha_{i,t} = \pi_i(h_{i,t}), \quad o_{i,t} = E(\alpha_{i,t}), \quad h_{i,t+1} = f_i(h_{i,t}, \alpha_{i,t}, o_{i,t})\]

where $E$ denotes the environment and $h_{i,0} = {s_0}$ contains the initial task specification. The history update function $f_i: H \times A_i \times O \to H$ appends the new action-observation pair to the agent’s history:

\[h_{i,t+1} = f_i(h_{i,t}, \alpha_{i,t}, o_{i,t}) = h_{i,t} \oplus (\alpha_{i,t}, o_{i,t})\]

subject to context window truncation when $|h_{i,t+1}| > \text{MAX_TOKENS}$.

This update mechanism applies uniformly to both single-agent (SAS) and multi-agent (MAS) configurations. In MAS, communication between agents happens through explicit message passing in the orchestration layer.

From Theory to Code

For a single-agent system ($n = 1$), the formalism simplifies: there is no communication topology $C$ and the orchestration policy $\Omega$ reduces to a simple loop. What remains is the core agent loop.

Let’s build this concretely. We’ll create an agent that operates in a GTA V game environment — receiving observations about the game state (player position, nearby vehicles, NPCs, mission objectives) and issuing actions (move, drive, interact) to complete missions.

The Base Agent

The base class captures the structure from our formal definition. It holds the reasoning policy (via a Copilot SDK client), maintains conversation history ($M_i$), and defines the act interface ($\pi_i$):

from abc import ABC, abstractmethod
from copilot import CopilotClient
import re


class GTABaseAgent(ABC):
    """
    Abstract base class for LLM-powered GTA V agents.
    Maps directly to the formal agent tuple S_i = (φ_i, A_i, M_i, π_i).
    """
    def __init__(self, model_name: str, client: CopilotClient):
        self.model_name = model_name       # φ_i: reasoning policy
        self.client = client               # SDK client for φ_i
        self.conversation = []             # M_i: internal memory
        self.step_count = 0
        self.mission_id = None
        self.objective = None

    def reset(self, mission_id: str, objective: str):
        """Reset agent state for a new mission episode."""
        self.conversation = []
        self.step_count = 0
        self.mission_id = mission_id
        self.objective = objective

    @abstractmethod
    async def act(self, observation_text: str) -> str:
        """
        π_i: H -> A_i
        Receives an observation string, returns an action string.
        """
        pass

    def _lookup_location(self, query: str) -> str:
        """Look up known locations and waypoints in the game world."""
        match = re.search(r"lookup:\s*(.+)", query)
        if match:
            location = match.group(1).strip().lower()
            return lookup_game_location(location)
        return "Location not found."

    def log(self, msg: str):
        print(f"[{self.__class__.__name__}] {msg}")

Notice how the class mirrors our formal tuple:

self.client + self.model_name → $\phi_i$ (reasoning policy)
Actions are defined by the system prompt → $A_i$ (action space)
self.conversation → $M_i$ (internal memory)
act() → $\pi_i$ (decision function)

The LLM Wrapper

The reasoning policy $\phi_i$ is implemented via the GitHub Copilot SDK. The key function handles retries, rate limiting, and prompt construction:

import asyncio
from copilot import CopilotClient, SessionConfig, MessageOptions

MAX_RETRIES = 5
INITIAL_RETRY_DELAY = 5
INTER_REQUEST_DELAY = 2.0
DEFAULT_MODEL = "gpt-5-mini"


async def get_copilot_client() -> CopilotClient:
    """Create and start a CopilotClient."""
    client = CopilotClient()
    await client.start()
    return client


async def call_copilot_with_retry(
    client: CopilotClient,
    model_name: str,
    messages: list[dict],
    system_prompt: str,
    temperature: float = 0.0,
) -> str:
    """
    Calls φ_i (the LLM) via the Copilot SDK with rate limiting and retries.
    """
    await asyncio.sleep(INTER_REQUEST_DELAY)

    full_prompt = _build_prompt(system_prompt, messages)

    retries = 0
    delay = INITIAL_RETRY_DELAY
    last_error = None

    while retries < MAX_RETRIES:
        session = None
        try:
            session = await client.create_session(
                SessionConfig(model=model_name)
            )
            response = await session.send_and_wait(
                MessageOptions(prompt=full_prompt),
                timeout=60.0,
            )
            if response and response.data and response.data.content:
                return response.data.content.strip()
            else:
                raise Exception("Empty response from Copilot SDK")
        except TimeoutError:
            last_error = TimeoutError("Request timed out")
        except Exception as e:
            last_error = e
        finally:
            if session:
                try:
                    await session.destroy()
                except Exception:
                    pass

        retries += 1
        if retries < MAX_RETRIES:
            await asyncio.sleep(delay)
            delay *= 2

    raise Exception(
        f"Failed after {MAX_RETRIES} retries. Last error: {last_error}"
    )

The prompt builder serializes the conversation history $h_{i,t}$ into a single string — because each Copilot SDK session takes a flat prompt rather than a structured message list:

def _build_prompt(system_prompt: str, messages: list[dict]) -> str:
    """
    Serialize system instructions + history into a single prompt.
    This is h_{i,t} formatted for the LLM.
    """
    parts = [f"[System Instructions]\n{system_prompt}\n"]

    for msg in messages:
        role = msg["role"]
        content = msg["content"]
        if role == "user":
            parts.append(f"[Observation]\n{content}\n")
        elif role in ("model", "assistant"):
            parts.append(f"[Your Previous Action]\n{content}\n")

    parts.append("[Your Action]\nRespond with exactly one action:")
    return "\n".join(parts)

The Single Agent

With the base class and LLM wrapper in place, the single agent is straightforward. It implements the agent loop: observe → reason → act → update history → repeat:

from .gta_base import GTABaseAgent
from .copilot_llm import call_copilot_with_retry
from copilot import CopilotClient

SYSTEM_PROMPT = """\
You are an autonomous agent operating inside GTA V. Your goal is to complete \
missions by navigating the open world, interacting with NPCs, driving vehicles, \
and executing objectives.

## Actions
Respond with EXACTLY ONE action per turn (no extra text):

1. **move** – walk/run to a location
   `move: to `

2. **drive** – enter and drive a vehicle
   `drive: to  via `

3. **interact** – interact with an NPC or object
   `interact:  with action `

4. **shoot** – engage a target
   `shoot:  with `

5. **wait** – wait for a condition
   `wait: until `

6. **lookup** – look up a location or mission intel
   `lookup: `

7. **impossible** – declare the mission cannot be completed
   `impossible: `

## Environment
- You receive observations about: player position, health, nearby entities \
(NPCs, vehicles, objects), current objective, and minimap waypoints.
- The world is persistent — NPCs remember interactions, police respond to crimes, \
and time passes.

## Strategy
- First, assess your current position relative to the objective.
- If you don't know where to go, use `lookup: `.
- Use vehicles for long distances.
- Avoid unnecessary combat — it attracts police attention.
- Complete objectives in order. Multi-step missions require sequential actions.

## Important
- Respond with ONLY the action, nothing else.
- One action per turn. No explanations.
"""


class GTASingleAgent(GTABaseAgent):
    """
    Single Agent for GTA V missions.
    Implements the agent loop: π_i(h_{i,t}) -> α_{i,t}
    """
    def __init__(self, model_name: str, client: CopilotClient):
        super().__init__(model_name, client)

    async def act(self, observation_text: str) -> str:
        # 1. Update history: h_{i,t} = h_{i,t-1} ⊕ (o_{i,t})
        if observation_text:
            self.conversation.append({
                "role": "user",
                "content": observation_text
            })

        # 2. Query φ_i: α_{i,t} = π_i(h_{i,t})
        action_text = await call_copilot_with_retry(
            self.client,
            self.model_name,
            self.conversation,
            SYSTEM_PROMPT,
        )

        # 3. Append action to memory: h_{i,t+1}
        self.conversation.append({
            "role": "model",
            "content": action_text
        })

        # 4. Handle lookup action (oracle tool call)
        if "lookup:" in action_text.lower():
            result = self._lookup_location(action_text)
            if result:
                self.log(f"📍 {action_text} -> {result[:60]}...")
                self.conversation.append({
                    "role": "user",
                    "content": result
                })
                return await self.act(None)

        return action_text

Running the Agent

Putting it all together — here’s how you’d run a mission episode:

import asyncio
from gta_agent import GTASingleAgent
from copilot_llm import get_copilot_client


async def run_mission():
    client = await get_copilot_client()
    agent = GTASingleAgent(model_name="gpt-5-mini", client=client)
    agent.reset(
        mission_id="heist_01",
        objective="Drive to the Vanilla Unicorn, meet Trevor, "
                  "then escape the police in a getaway vehicle."
    )

    # Initial observation from the environment
    obs = (
        "Position: Downtown Vinewood (x=248, y=1024). "
        "Health: 100%. Armor: 50%. "
        "Nearby: 1 parked Kuruma (unlocked), 3 pedestrians. "
        "Objective: Go to the Vanilla Unicorn. "
        "Distance to objective: 2.4 km NW."
    )

    max_steps = 50
    for step in range(max_steps):
        action = await agent.act(obs)
        print(f"Step {step}: {action}")

        if "impossible:" in action.lower():
            print("Agent declared mission impossible.")
            break

        # In a real setup, you'd send the action to the
        # GTA V environment and get back the next observation.
        # obs = env.step(action)
        break  # demo: single step


asyncio.run(run_mission())

Key Design Decisions

A few things worth noting about this architecture:

Flat prompt construction. The Copilot SDK uses a single-prompt-per-session model. We serialize the entire conversation history $h_{i,t}$ into one string. This means the full context is visible to the LLM on every call, but we pay the token cost of replaying history. In practice, you’d truncate when approaching the context window limit — exactly the $|h_{i,t+1}| > \text{MAX_TOKENS}$ constraint from the formalism.

Exponential backoff. LLM APIs are rate-limited. The retry wrapper doubles the delay on each failure (5s → 10s → 20s → 40s → 80s). This is important for any production agent that runs for dozens or hundreds of steps.

Tool calls as actions. The lookup action demonstrates how $A_i$ includes tool calls. When the agent outputs lookup: Vanilla Unicorn, we intercept it, query an oracle, inject the result as a new observation, and re-enter the agent loop. The agent doesn’t see this as a special case — it’s just another action-observation pair in the history.

Stateless sessions, stateful history. Each LLM call creates a fresh Copilot SDK session (stateless), but the agent maintains its own conversation history (stateful). This separation means session failures don’t corrupt the agent’s memory.

From Single to Multi-Agent

The formal framework makes it clear how to extend this to multi-agent systems. You’d add:

More agents in $A$ with different specializations (a driver agent, a combat agent, a negotiation agent)
Communication topology $C$ defining which agents can message each other
Orchestration policy $\Omega$ deciding which agent acts at each timestep

But the single-agent case is where you get the fundamentals right. Get the agent loop, memory management, and tool integration working reliably for one agent before scaling to many.

References

2024

ICLR

WebArena: A Realistic Web Environment for Building Autonomous Agents

Shuyan Zhou, Frank F. Xu, Hao Zhu, and 8 more authors

In International Conference on Learning Representations (ICLR), 2024

Bib

@inproceedings{zhou2024webarena,
  title = {{WebArena}: A Realistic Web Environment for Building Autonomous Agents},
  author = {Zhou, Shuyan and Xu, Frank F. and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2024},
  url = {https://arxiv.org/abs/2307.13854}
}

arXiv

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Taicheng Guo, Xiuying Chen, Yaqi Wang, and 5 more authors

arXiv preprint arXiv:2402.01680, 2024

Bib

@article{guo2024llmmultiagents,
  title = {Large Language Model based Multi-Agents: A Survey of Progress and Challenges},
  author = {Guo, Taicheng and Chen, Xiuying and Wang, Yaqi and Chang, Ruidi and Pei, Shichao and Chawla, Nitesh V. and Wiest, Olaf and Zhang, Xiangliang},
  journal = {arXiv preprint arXiv:2402.01680},
  year = {2024},
  url = {https://arxiv.org/abs/2402.01680}
}

Context Graphs Are the Future of AI Infrastructure

2026-01-15T10:00:00+00:00

When Jaya Gupta published How Do You Build a Context Graph? in late December 2025, I felt something click into place. Not because the ideas were new to me — but because she articulated, under a single term, a vision I’d been building toward for over two years across seemingly different projects.

Knowledge graphs. Ontologies. Database migration. LLM hallucination reduction. Multi-agent systems. Relational databases vs. graph databases. These weren’t separate interests — they were all pieces of the same puzzle. Context graphs is the name for what emerges when you put them together.

Let me walk through how I got here.

It Started with Hallucinations (November 2023)

I started working seriously with knowledge graphs in late 2023. The problem that pulled me in was deceptively simple: LLMs hallucinate, and no amount of prompt engineering fully fixes it.

The insight was that if you ground LLM outputs in structured, verifiable knowledge — a graph of entities and their real relationships — the model has something to reason from rather than just generating plausible-sounding text. Knowledge graphs become a source of truth that constrains generation.

This became the foundation of my talk at the Voxel51 AI Meetup in September 2024, where I demonstrated using LangChain and Neo4j to reduce hallucinations in ChatGPT-style systems. The code is on GitHub. The core approach: instead of retrieving flat text chunks, retrieve graph-structured knowledge — entities, relationships, and their context — so the LLM has structured facts to anchor its response.

But this raised a deeper question: where does the graph come from? How do you build and maintain it? And how do you make sure the structure itself is right?

Understanding Database Paradigms (The Migration Project)

Around the same time, I was working on migrating a hospital management database across three fundamentally different paradigms: Oracle SQL → MongoDB → Neo4j.

This project taught me something that textbooks don’t emphasize enough: the way you store data shapes the way you can reason about it.

A relational database (Oracle) captures state beautifully — normalized tables, foreign keys, constraints. Clean and precise. But relationships are implicit, buried in JOIN operations.

A document store (MongoDB) captures context — all the information about an entity lives together in a rich, nested document. Great for retrieval. But relationships between documents are second-class citizens.

A graph database (Neo4j) makes relationships first-class. Suddenly you can ask questions like “what’s the shortest path between Patient A and Patient B?” — traversals that would require recursive JOINs in SQL become single Cypher queries.

The migration forced me to confront hard design decisions: consolidating Doctor, Nurse, and Technician into unified Staff nodes, removing unnecessary connector entities (Episode), rethinking how triggers and views translate across paradigms. Each database enforced a different worldview on the same underlying reality.

This is the ontology problem in miniature. Every database schema is an implicit ontology — a claim about what entities exist, how they relate, and what matters. Migrating between schemas is really migrating between ontologies.

Ontologies as the Foundation (September 2024)

Two things happened in September 2024 that sharpened this thinking.

First, my Voxel51 talk — demonstrating that knowledge graphs concretely reduce LLM hallucinations when used as retrieval infrastructure.

Then, days later, I had a conversation with Jérémy Ravenel about using ontologies as the base of next-generation AI systems. This was the conversation where it all started coming together.

Jérémy and I discussed how ontologies could:

Enhance knowledge representation in LLMs — giving models structured priors about what kinds of things exist and how they can relate
Improve reasoning — moving from pattern matching to structured inference over formally defined relationships
Enable accurate retrieval — using ontological relationships to return contextually grounded results
Map entity relations across domains — the same entity can play different roles in different contexts

We spent a lot of time on framework selection — which tools and standards to use for building ontologies. OWL? SKOS? Custom schemas? The choice shapes everything downstream. As we agreed: “a good beginning is half done.”

This conversation planted a seed: what if you combined the grounding power of knowledge graphs (reducing hallucinations), the structural flexibility of different database paradigms (relational, document, graph), and the formal precision of ontologies (defining what entities and relationships are possible)?

The World Model Connection (September 2025)

Then Meta released Code World Models (CWM) in September 2025, and the transfer learning potential was immediately obvious to me.

CWM learns compressed representations of how environments work by observing trajectories through them. Not static snapshots — dynamics. How does state change? What happens when you take an action? What are the causal relationships?

The connection to knowledge graphs:

Knowledge graphs capture static structure — what exists and how it’s connected
World models capture dynamics — how the system behaves
Ontologies provide the schema — what kinds of structures and dynamics are possible

Mix them and you get something that doesn’t just store what’s true — it models how things work and can predict what happens next. That’s not a database anymore. That’s infrastructure for intelligence.

Jaya Gupta’s Context Graph Framework (December 2025)

When Jaya Gupta’s context graph article landed, I read it as someone who’d been living every dimension of the problem she described. Her framework brought together ideas I’d been working on separately under one coherent vision.

The Two Clocks Problem

Gupta identifies that we’ve built trillion-dollar infrastructure for the state clock (what’s true now) and almost nothing for the event clock (what happened, in what order, with what reasoning).

I’ve seen this firsthand. The Oracle database in my migration project captures state perfectly — current patients, current staff, current bills. The MongoDB version captures richer context per entity. The Neo4j version captures relationships. But none of them capture why the data looks the way it does — the decisions, the reasoning, the traces that produced the current state.

That’s exactly the gap. The reasoning connecting observations to actions was never treated as data.

Agents as Informed Walkers

Gupta draws on node2vec and graph representation learning: you don’t need to predefine the ontology. Agent trajectories through problem space discover structure through use. The schema isn’t the starting point — it’s the output.

This resonates with everything I’ve built. When I migrated the hospital database to Neo4j, I had to manually discover which entities mattered and how they related. I consolidated Doctor, Nurse, and Technician into Staff. I debated whether to keep Episode as a node or remove it. These were ontology design decisions that required deep understanding of how the system is actually used.

Agents could do this automatically. An agent traversing a system — investigating issues, completing tasks, making decisions — implicitly discovers the ontology through its trajectory. Accumulate enough trajectories and the structure emerges.

This is also why reducing hallucinations matters at this level. If agents are the walkers discovering ontology, they need to be grounded in reality — not hallucinating entities and relationships that don’t exist. The knowledge graph grounding I demonstrated at Voxel51 is prerequisite infrastructure for reliable agent-driven ontology discovery.

Context Graphs as World Models

The most powerful idea: a context graph with enough accumulated structure becomes a world model. It encodes not just what exists, but how the system behaves. It enables simulation — “what if?” rather than “what happened?”

This is where CWM’s approach and Gupta’s vision converge. Facebook showed world models can be learned from code trajectories. Gupta argues they can be learned from organizational agent trajectories. The principle is identical: observe enough dynamics and a predictive model emerges.

And the world model needs all the layers I’ve been building:

Relational databases for clean, normalized state
Graph databases for rich, traversable relationships
Ontologies for structural priors about what’s possible
Knowledge graphs for grounding agents in verified facts
LLM hallucination reduction for trustworthy agent behavior
Agent trajectories for discovering dynamics and building the event clock

Where This Is Going

I fully agree with Gupta’s framing: context graphs are not just a better retrieval system — they’re organizational intelligence that compounds.

The convergence I see across my own work:

Knowledge graphs provide the grounding layer — concrete entities, relationships, verified facts. They keep agents honest and reduce hallucinations.
Database paradigm fluency is essential — you need to understand how relational, document, and graph models each capture different aspects of reality, because context graphs need all three perspectives.
Ontologies provide the structural layer — formal definitions that constrain what the graph can represent, learned and refined through use.
World models (à la CWM) provide the dynamics layer — how the system behaves, learned from agent trajectories.
Context graphs are the synthesis — capturing not just state but reasoning, not just data but decision traces, not just structure but dynamics.

Three problems need solving:

The two clocks problem — building event clock infrastructure alongside state infrastructure, across relational and graph paradigms
Schema as output — letting grounded, non-hallucinating agents discover ontology through informed traversal
World models, not retrieval — context graphs that simulate futures, not just retrieve pasts

Every project I’ve touched in the last two years — from reducing LLM hallucinations with Neo4j, to migrating databases across paradigms, to exploring ontologies with Jérémy Ravenel — was building toward this. Not because I planned it that way, but because these problems are deeply connected.

Context graphs are where knowledge graphs, ontologies, database design, agent systems, and world models converge. I believe this is where AI infrastructure is heading — and I intend to help build it.

Getting Started with Deep Learning in Swift and TensorFlow

2019-11-22T09:56:23+00:00

There are 3 ways to get started coding with Swift & TensorFlow:

Google Colab (Basic: Windows/Mac/Linux)
Command Line (Advanced: Mac/Linux)
REPL Playground XCode (Basic: Mac — Coming Soon)

Note: I’ll cover the first two approaches today — Google Colab & command line. The 3rd approach (XCode Playground) will be a separate post.

1. Google Colab

First, create an empty swift.ipynb notebook:

touch swift.ipynb
code swift.ipynb

Open it in VSCode and paste this JSON to make it a Swift kernel notebook:

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "swift_notebook.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "swift",
      "display_name": "Swift"
    }
  },
  "cells": [
    {
      "metadata": {
        "id": "icDfXRlHRYvE",
        "colab_type": "code"
      },
      "cell_type": "code",
      "source": ["let x = 2\n", "let y = 2\n", "print(\"Hello world, this is Swift! \\(x + y)\")"],
      "execution_count": 0,
      "outputs": []
    }
  ]
}

Then go to colab.research.google.com → File > Upload Notebook → upload your Swift.ipynb. You can now write Swift & TensorFlow in Colab!

2. Command Line

Download Swift-TensorFlow for Mac or Ubuntu from the official installation guide.

Once set up, create basics.swift:

print("Tensorflow Basics Tutorial")

import TensorFlow

let x = Tensor<Float>([[2, 2], [2, 2]])
print(x)

Compile and run:

swift basics.swift

Output:

Tensorflow Basics Tutorial
[[2.0, 2.0], [2.0, 2.0]]

Swift also has a Python-like REPL since it’s built on the LLVM infrastructure.

What’s Next?

Why TensorFlow & Swift?
Swift Compiler Technology — how it compares to the competition
Using Python libraries with Swift-TensorFlow

Introduction to Active Learning

2019-08-26T09:56:23+00:00

Active Learning was introduced by Burr Settles at the University of Wisconsin.

According to Wikipedia, Active Learning is a sub-field of Semi-Supervised Learning. Let’s understand Semi-Supervised Learning in simple terms:

“The ability to get a large number of images makes this a great candidate for semi-supervised learning.”

A very simple approach to semi-supervised learning:

Capture 11,000 images
Label 100 images and train model_1
Use model_1 to label the other 10,900 images
Train model_2 with the “labeled” 10,000 images

…results in a model_2 that does better than model_1.

This is the core idea — you use a model’s own predictions to generate pseudo-labels for unlabeled data, then retrain on that larger labeled set. Active learning takes this a step further by choosing which samples to label intelligently (e.g., the ones the model is most uncertain about), making each human annotation count more.

Getting Setup with Fast.ai for Machine Learning (No GPU Required)

2019-03-11T09:56:23+00:00

Howdy! This post is for people who own laptops without good GPU specs, have a poor internet connection, and still want to learn ML from fast.ai.

Setting up a dev environment can feel like a waste of time. If you’re one of those people, this post should help.

Free Options

Kaggle
Google Colab + GitHub

Paid Options

AWS

Kaggle

Kaggle is amazing if you want to start quickly — no downloading datasets. Datasets range from GBs to TBs, so not having to download them locally is a huge win.

To use the fast.ai library, run in a Kaggle notebook:

!pip install fastai==0.7.0

📓 Sample Kaggle Notebook - NYC Taxi Fare Prediction

By default your dataset gets added to the input directory:

PATH = "../input/"
df_raw = pd.read_csv(f'{PATH}train.csv', nrows=50_000_000)

Google Colab

Google Colab provides a free GPU. Here’s how to use it with GitHub:

Create a .ipynb notebook locally
Push it to GitHub
Go to colab.research.google.com and load your repo

Ways to download datasets in Colab:

Curl:

curl

(as shown in Jeremy’s video)

Kaggle API:

# Step 1: Upload Kaggle API key
from google.colab import files
files.upload()

# Step 2: Install Kaggle API client
!pip install -q kaggle

AWS (Last Resort)

Jeremy has an AWS starter video. AWS p2 instances cost around $0.9/hr — decide for yourself!

Object-Oriented Programming in Julia

2019-01-15T00:00:00+00:00

Julia doesn't have classes, but it supports powerful OOP-like patterns through multiple dispatch, abstract types, and structs. Learn how to write clean, composable Julia code.