Status: Seedling
May 2026

LLM Experiments #4: Intermezzo

Note: Experimenting with quick writeups as I play around with LLMs and my homegrown agentic workflow system. Previously. Scattered notes this time.

Agent swarms for bug hunting

Inspired by Mythos I led an army of sub-agents to hunt for bugs in the Julia language & stdlib. It helped to have done this before since I could direct the agents to sus areas of the codebase, and instruct them on tactics I’d used to find bugs in the past.

This time I put most of my efforts into qualifying the bugs for legitimacy/severity and organizing the results for easy consumption, winnowing complicated examples down to one-liners. A few were quickly fixed, while others remain open, awaiting agents from the future, and two determined to not be bugs after all.

Having an automated the most tedious part of the work I was able to spend most of my time on the higher-level parts of the process, do I think it only really worked because I had some practical experience to ground my ideas. Even so, I was struck by how quickly I was able to do this from a cold start.

Example-driven evaluation & quick cross-language prototyping

I’m tweaking the API of my agentic workflow system/podcast analysis project by writing lots of pipelines and seeing how they surface gaps in the underlying system and API.

This style of development brings to mind Mike Bostok’s obsession with examples, and Stephen Wolfram’s design livestreams where he evaluates Mathematica APIs by focusing on how they express specific practical scenarios. It’s so cool that this process can now be scaled.

It’s also been fun to periodically rewrite the whole system into a new language, though I’m finding that when this is done with too much AI assistance it’s difficult to judge the results, since implementation complexity can vary by orders of magnitude depending on the skill of the implemented, and evaluation requires in-depth review and a deep awareness the possibility space. 

↑ Back to top