David Champion 48 Posted December 1, 2022 Is there a VCL library available to convert human voice audio into text? It has to work without access to the internet, the English language preferably. Share this post Link to post
programmerdelphi2k 237 Posted December 1, 2022 (edited) https://blogs.embarcadero.com/this-google-api-easily-adds-powerful-speech-recognition-to-apps/ https://github.com/halilhanbadem/delphi-google-speech-to-text now, if you want just "read the text", you can use the "ISpeechVoice" from MS, just importing it in your IDE, nothing more: Component -> Import component... type library sapi Edited December 1, 2022 by programmerdelphi2k Share this post Link to post
David Champion 48 Posted December 1, 2022 @programmerdelphi2k This doesn't achieve what I'm trying to do which is to transliterate without any cloud services. The client's environment prohibits the use of the internet; there is only a local network. Share this post Link to post
programmerdelphi2k 237 Posted December 1, 2022 ok! if I know some I let you know! Share this post Link to post
David Champion 48 Posted December 1, 2022 I have found the Windows.Media.SpeechRecognition namespace in Microsoft WinRT. That may be a way forward without the Speech Recognition needing to connect to Azure. Share this post Link to post
PeteG 0 Posted May 2 Hi David, Long time later but did you ever get anywhere with this? I've found https://github.com/ggerganov/whisper.cpp which is a C++ library, could work at getting that doing something with a fair bit of work. Pete Share this post Link to post
David Champion 48 Posted May 2 (edited) Thanks for the recommendation. The feature that I was suggesting as part of an on going project was not thought to be worthwhile. So, no, it was canned. Edited May 2 by David Champion Share this post Link to post
Rollo62 536 Posted May 2 You can have a look here, from Grijjy, its quite old, but worked well for me under iOS and Android, so I assume Windows is OK too. Share this post Link to post
David Champion 48 Posted May 2 (edited) @Rollo62 It was more the other way round; limited Speech Recognition. Logging to text at various intervals what people are saying, so that positions in Audio log can be sparsely described. Also, the application cannot connect to the internet. Edited May 2 by David Champion Share this post Link to post
Rollo62 536 Posted May 2 Oh yes, of course. I had skimmed the title too quickly, normally terms like TTS, TextToSpeech, SAPI, SpeechToText trigger me in the right direction. Maybe this will be helpful "https://learn.microsoft.com/de-de/windows/apps/develop/speech" and an older article with Rx1.4.2 sources by Brian Long "http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm" Share this post Link to post