Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
rahimnathwani
28 days ago
|
parent
|
context
|
favorite
| on:
Trying out Gemini 3 Pro with audio transcription a...
For this use case, why not use Whisper to transcribe the audio, and then an LLM to do a second step (summarization or answering questions or whatever)?
If you need diarization, you can use something like
https://github.com/m-bain/whisperX
pants2
28 days ago
|
next
[–]
Whisper simply isn't very good compared to LLM audio transcription like gpt-4o-transcribe. If Gemini 3 is even better it's a game-changer.
crazysim
28 days ago
|
prev
[–]
Since Gemini seems to be sucking at timestamps, perhaps Whisper can be used to help ground that as an additional input alongside the audio.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
If you need diarization, you can use something like https://github.com/m-bain/whisperX