March 2025 Tech Notes on Podcast Vibes
I’m winding down my podcast project and wanted to write a note about some of the things it taught me. In short, all of the major tech choices in this project worked out really well, and I’ll probably use this toolkit to prototype future LLM-based processing pipelines.
Go
Go is a great choice for rapid prototyping. I’ve found it particularly useful for side projects since its simplicity makes it easy to context-switch from the other languages I use regularly. Go was a particularly good fit for this project since processing audio with APIs involved making lots of network requests and interfacing the database without doing much compute.
Why not Rust? I’ve done side projects in Rust before and while it is a very powerful tool, it has a lot of mental overhead compared Go, and the type system inhibits certain forms of exploratory programming that I find very valuable.
Postgres, PGX, and Postico
I used Postgres for the database with PGX as the driver. PGX Top to Bottom is a talk by the creator of PGX which explains the layered structure of the library, and taught me about a few useful API functions like CollectRows
, BeginFunc
, and CopyFrom
.
This was also my first time trying out Postico, a user interface to Postgres, which I found to be very well-designed. It made it easy to visually inspect the data in my tables and interactively edit schema definitions.
River
One lesson from my years at startups is to never build your own job execution engine if you can avoid it.
After looking at various options I decided to try River, an open-source job queueing library, and am very happy I did. It is made by experts with good taste and comes with nice documentation and a well-designed API that exposes convenient abstractions for defining and scheduling jobs. River also comes with a web-based user interface that let me easily monitor execution and inspect job-specific error logs, which made tracking down processing bugs a lot easier since the UI presents the error in the context of other job information, such as its input arguments, total runtime, and number of retries.
DeepGram
I used Deepgram to transcribe podcasts. They give you $200 of free credits and the transcriptions are surprisingly good. There are cheaper options, but Deepgram is very convenient.
Rather than uploading the audio to Deepgram myself I send them a link to the podcast audio to avoid the extra round-trip. The API endpint I used for this was /listen:
https://api.deepgram.com/v1/listen?punctuate=true&paragraphs=true&utterances=true&diarize=true&language=en&summarize=v2&topics=true&model=nova-2
By default, DeepGram will train on any audio you upload in return for a pricing discount. This was not a problem for me since the podcasts I was transcribing are already public, but might be a poor tradeoff for other use cases like personal voice notes. If you want, you can opt out by adding a specific query parameter to your request.
Claude
Claude did an excellent job at extracting emotionally-laden subjects from the podcast transcripts. The fact that this kind of AI processing is now possible and affordable opens up a lot of opportunities for fun analytics projects.
A few caveats:
While Claude was able to accurately identify subject-emotion pairs, one thing I don’t have as good a sense of is how many subjects were omitted from the extraction since I only did a limited number of spot checks and never created a manually-constructed dataset for comparison, relying instead on my intuitive sense of whether the extracted data was reasonable.
I noticed was that Claude would produce a similar number of results regardless of the length of the provided transcript snippet. I worked around this by feeding it the transcript in small overlapping chunks and de-duplicated the results using a vector similarity heuristic.
Claude’s citation feature launched while I was working on this project and turned out to be a great fit for connecting the emotional inferences back to the supporting text.
Previous posts in this series: