May 2026

LLM Experiments #4: Intermezzo

Note: Experimenting with quick writeups as I play around with LLMs and my homegrown agentic workflow system. Previously. Scattered notes this time.

Agent swarms for bug hunting

Inspired by Mythos, I wanted to see what I could find with a small army of sub-agents hunting for bugs in Julia. It helped to have done this before since I could direct the agents to known-sus parts of the codebase, and instruct them on tactics.

With the lower-level work automated I was able to put more of my efforts into qualifying the legitimacy/severity of the bugs and organizing the results, ending up with a compact table of ~10 bugs each with a one-line summary and one-line reproduction. A few turned out to be documented/allowed behavior, others were quickly fixed, and some remain open, awaiting agents from the future.

Example-driven evaluation & quick cross-language prototyping

I’ve been working on the design of my agentic workflow/podcast analysis system by writing lots of pipelines and seeing how they surface gaps in the underlying system and API. It’s so cool that the example-driven feedback loop approach can now be scaled.

I’ve also been playing around with cross-language rewrites, but am finding that when this is done with too much AI assistance it’s much less valuable since it’s difficult to judge the results. Implementation complexity can vary by orders of magnitude depending on the skill of the implementer, and evaluating the results a deep familiarity with the possibility space.