Buzz is a desktop application that utilizes OpenAI's Whisper technology to automatically transcribe audio into text. It offers various models and operates on Windows, macOS, and Linux. The underlying machine learning model is highly powerful, and it can even transcribe songs and videos to text by piping the audio system to the microphone. Buzz is an open-source tool that is completely free to use.
In addition to audio transcription, Buzz also features a translation mode. Currently, it only supports English as an input language for both transcription and translation. However, I anticipate that the tool will add more language support in future updates as OpenAI recently launched Whisper, a new open-source neural network that provides human-level accuracy for English speech recognition.
Buzz is available on GitHub and is written in Python. You can either run it directly from the source or use the self-contained binary releases provided by the developer. If you choose to use the source, you'll need to have Python and the Poetry library installed. Then, you can install all the required dependencies and virtual environments using the "poetry install" command.
poetry install
To make things easier, you can simply download the binary release and run it without any complicated setup. You can find releases for Mac, Windows, and Linux. I'm currently using the Windows version, so you can download it and run it directly. Keep in mind that this software is quite resource-intensive, so I suggest running it on a computer with high hardware specifications.
Once you've downloaded and installed the software, the first step is to select your microphone and specify the mode. By default, it runs in transcription mode, but you can switch to translation mode if you prefer.
You can currently use this software to convert speech to text, but its accuracy is only decent. However, it's not as smooth as the voice typing feature in Windows 10, 11 or the Speechnotes website. It's still functional and can be used to test the accuracy and usability of OpenAI's Whisper technology.
Conclusion
In conclusion, Whisper is an impressive neural network for speech to text conversion. Developers and programmers can utilize it to create software and apps that require speech to text functionality. While the accuracy is good, the GUI currently lacks speed and smoothness. However, this is only a limitation of the user interface and not of the model itself. Hopefully, future updates will improve the UI.