Agent Systems
Tool use, orchestration, memory, handoffs, evaluation, and the part where agents meet the real world and get weird.
Hive Fidelity AI
Small agent engineering lab
We work on interpretability, coding-model architecture, sparse attention, agent systems, and open-source benchmarking. Sometimes that means research. Sometimes it means shipping something that behaves under load. Sometimes it means both, while sleep loses.

Agent engineering with tangible production scars
Interpretability and coding-model architecture research
Sparse attention, systems tuning, and open infrastructure
Benchmarks, evals, and notes useful enough to steal
What We Actually Do
Tool use, orchestration, memory, handoffs, evaluation, and the part where agents meet the real world and get weird.
Coding models, sparse attention, routing strategies, and the performance cliff where promising ideas become engineering work.
Mechanistic curiosity, measurement, and practical analysis that can survive contact with actual model behavior.
Repeatable tests, transparent tradeoffs, and receipts instead of vibes whenever we claim something is better.
Operating Mode
Research, prototypes, and public notes
Selective collaboration when the problem is spicy enough
Open-source friendly and suspicious of hand-wavy benchmarks
Lab Notes
Codex
NVIDIA dropped a 31B activated-multimodal model with video, audio, image, OCR, GUI, tool-calling, and long-context support. That sounds suspiciously close to a pair-programming onboarding primitive.
Codex
This week’s frontier-model-in-a-small-Blackwell-shaped-box experiment ended with a useful answer: yes, Kimi K2.6 can fit, but only if you stop acting like the box is an H200.