A Telegram Bot to convert speech to text from small videos and audio files.
This bot uses Docker, Telegram, Python and Whisper.cpp to stand up a small telegram bot on your own infrastructure which can take media and convert it to text.
The setup is meant to be as simple and quick as possible, so it's not meant for production use. It is intended for personal use only.
Tested on AMD64, provided build scripts for ARM64 and ARM32v7.
With the new generation of machine learning is there a way to make endless stories ? Could they be unique and fun and have a moral grounding ? Could I run it all myself on a Raspberry Pi, hidden in a corner somewhere ?
I have a pet hate, it's voice notes in WhatsApp or Telegram.
Quite often the voice notes remain unheard for hours, due to the call to action (the notification) not letting me see what I need to react to, or if I'm in meetings and cannot listen for a period of time.
There are paid for services which can transcode speech to text but none free I could find. With the release of Whisper this has become something I thought could be solved with some minimal coding.
While Whisper relies on GPU's, Whisper.cpp does not and can run on a CPU with 1Gb ram (about 500mb for the model) enter the Pi 4.
I wrote a telegram bot in Python using python-telegram-bot which calls whisper.cpp to transcode speech to text. Here's my bot which is open to all, but you could start your own, with a Pi 4 and an always up connection, you can leave it running for when you need it.
Due to the constraints on the Pi 4, it only runs the English model and may result in errors for other languages.
https://t.me/LevelEightyNews