I want to build an app that does speech transcription - doesn't have to be real-time. I've seen Amazon's speech API and that looks better than the Google and Microsoft stuff - are there any other options I should be aware of from smaller companies? What's are the advantages and disadvantages of the various options? Thanks in advance
First, a disclaimer. Transcribing speech in Java is difficult, since Java's access to audio hardware is limited by its portability requirements. So I don't really know of any Java speech products, and web-based ones would be an especial problem, since webservers cannot run local audio hardware in any language. I except things like ActiveX controls, since they were just a nightmare anyway.
One of the most popular speech-to-text systems I know of has been IBM's Dragon Naturally Speaking app. I don't know its current status, though and its OS options were limited.
For the Linux and Unix world, one ot the most prominent text-to-speech systems has been Carnegie-Mellon University's Sphinx system, which is available in both full-server and mini-system (Raspberry Pi, for example) flavors.
These are true transcription apps. Of course, if you mainly want simple speech control (a few key words) or "wake-up" apps to front something like Alexa, you have options using things like TensorFlow, which is available on a wide variety of platforms, including some very inexpensive peripheral device boards.
Some people, when well-known sources tell them that fire will burn them, don't put their hands in the fire.
Some people, being skeptical, will put their hands in the fire, get burned, and learn not to put their hands in the fire.
And some people, believing that they know better than well-known sources, will claim it's a lie, put their hands in the fire, and continue to scream it's a lie even as their hands burn down to charred stumps.