More thoughts about AI Governance
And the continued quest for the perfect AI transcription model
My last two guest podcast appearances around Responsible AI in financial services (here and here) seem to have generated enough interest, so once again I was asked back by Boston Quantara for the three-peat. We focused on AI safety in financial services in this edition, but most of what is discussed is applicable across a wide variety of industries and is also relevant for the consumer market.
Here is the marketing copy for this podcast, straight from Spotify:
When power grids flicker and trading algorithms hum, the stakes couldn’t be higher. Host Damian Speirs sits down with Lydatum CTO Yiannis Antoniou to untangle the toughest question in financial-services AI: how do you sprint toward innovation without tripping over governance?
From the recent Iberian-Peninsula blackout scare to embedding fiduciary duty directly into autonomous agents, they cover risk tiers, real-time monitoring, human-in-the-loop design, and why global standards can’t come soon enough.
Find out more about Lydatum at: www.lydatum.com
The podcast is embedded below. Just like the last two editions time, I find a breezy 30 minute-or-so chat is better for everyone, including hosts, speakers and audience. Short enough to keep things moving and to be heard on a lunch or similar break, long enough to have enough substance and generate follow-up questions, and not too long so that it drags and become a chore to listen to. Let me know if you don’t agree though!
For those who do prefer to listen and would rather read, I have always provided a transcription of the podcast for quick reading. I have been using Google Gemini 2.5 pro for this, and continue to be impressed by its multi-modal capabilities. This time, the newest version, released on June 5, was used and the results are, as always, exemplary. Apart from some very minor spelling edits - I have long given up hope that an AI system will ever spell my name properly - this is an almost verbatim transcription and is embedded below.
This time, and after learning how to tame Gemini to avoid hallucinations (see my previous article on my first failed attempts), I wanted to also try out the previous transcription king, OpenAI’s Whisper model. Results were, shall we say, suboptimal. I reused Pythoncode I had written in the past to make auto-transcriptions and diarization with Whisper with fair, but unspectacular results, compared to Gemini. I then asked Claude Code, my favorite new agent, to help create a better version by helping eliminate filler words (the ‘uh’, ‘uhmm’, etc. utterances that are part of speech but that should not really be part of the written record) and ensure better spelling, and despite this effort being marginally better, Gemini was still much more accurate and also orders of magnitude faster, especially when running OpenAI’s largest Whisper model locally.
The results are embedded below for comparison. Significant amount of editing was done to clean up misspellings and similar errors, but filler words etc. would take too long to manually clean up so they have been left intact to show the difference between the cleaner version above.
Overall, Gemini just does a better job overall and, perhaps even more importantly, does not need any code to be written to do the transcription. I am a big believer in democratizing access to this kind of capability, and the fact that is is currently free in Google Gemini and the AI Studio, is significantly faster, and an overall much easier and cleaner read, makes me very happy. It also clearly shows the progress that Google has made from a slow start in AI and points to OpenAI’s significant battles ahead to bring itself back to the top.
There will be more podcasts in the near future, by the way, so stay tuned.