For this use case, why not use Whisper to transcribe the audio, and then an LLM ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		rahimnathwani 28 days ago \| parent \| context \| favorite \| on: Trying out Gemini 3 Pro with audio transcription a... For this use case, why not use Whisper to transcribe the audio, and then an LLM to do a second step (summarization or answering questions or whatever)? If you need diarization, you can use something like https://github.com/m-bain/whisperX

pants2 28 days ago | [–]

Whisper simply isn't very good compared to LLM audio transcription like gpt-4o-transcribe. If Gemini 3 is even better it's a game-changer.

crazysim 28 days ago | [–]

Since Gemini seems to be sucking at timestamps, perhaps Whisper can be used to help ground that as an additional input alongside the audio.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact