Al T 12 Posted October 28, 2022 (edited) Hi, I'm trying to think of the best structure for a offline rhyming dictionary database. I'm willing to make it a open source offline dictionary, but need some ideas on how to best create the dictionary without bogging it down or making it bloated. Thank you in advance! Edited October 28, 2022 by Al T Share this post Link to post
Rollo62 536 Posted October 29, 2022 (edited) I'm not an expert with that, but I guess you have to structure the database with a phonetic interpretation of words. When you look at this https://www.rhymedb.com/ Moreover, how to handle other languages, I assume also there a phonetic interpretation might cover a few languages at once. Edited October 29, 2022 by Rollo62 3 Share this post Link to post
Fr0sT.Brutal 900 Posted October 31, 2022 On 10/29/2022 at 11:17 AM, Rollo62 said: https://www.rhymedb.com/ They seem to have serious issue with their base. In what world "ensure" rhymes with "fire"? Or they just dumbly took all words ending with "re"? 1 1 Share this post Link to post
Steven Kamradt 20 Posted November 2, 2022 (edited) Maybe the soundex algorithm might be a way to start? I have used it before to check for misspellings of names as two words which sound the same will generate the same value. Unfortunately there are false positives, and you might want to not use the first character, just the numeric value, but it might get fairly close. (see https://en.wikipedia.org/wiki/Soundex). I wouldn't truncate at 3 digits, let it go all the way to maxint if necessary. As for database layout, you could just store the word with the soundex value (integer portion). When you need sounds like words, just return all records with the same soundex integer from your database. Edited November 2, 2022 by Steven Kamradt Share this post Link to post
Serge_G 87 Posted November 3, 2022 I don't think soundex to be a good algorithm, I certainly look for phonetic Share this post Link to post
Rollo62 536 Posted November 3, 2022 (edited) Maybe this database is helpful too, with real pronounciation examples and theoretical background. https://englishexplorations.check.uni-hamburg.de/basic-concepts-of-english-phonetics-and-pronunciation/ https://en.wikipedia.org/wiki/International_Phonetic_Alphabet https://www.internationalphoneticassociation.org/IPAcharts/IPA_chart_orig/pdfs/IPA_Kiel_2020_full.pdf If you use such IPA alphabet in the DB, then maybe its possible to find and run a specific soundex algorithm on that. I would doubt that the original soundex for the usual alphabet might help here, but its worth a try. Edited November 3, 2022 by Rollo62 Share this post Link to post
Bounceback 1 Posted November 7, 2022 (edited) Forget soundex. Soundex is actually LOSSY on vowels. It would be counter-productive (at least for English). A modified metaphone algorithm would probably serve better. First off, what language? In Finnish you could probably just turn the strings end-front and do a simple SQL Like. Hungarian too. Perhaps. English? Not so simple. You need a serious ORTHOGRAPY to FONETICS (Sampa or another representation) /database/. English never had a reform so orto-to-fonetic is illogical. Impossible to "calculate". Retrieve and receive? ie, ei... And loads of other stuff. Such database might be possible to "scrape" from sources. Oxford dictionary of modern English, perhaps? You would probably need also the orthographic "lemma" forms, all converted to phonetic representation. Your biggest road-block would definitely NOT be the RDBMS structure. I would start by buying my NLP-doctor-friend at the royal university of tech a couple of beers. Actually, when thinking about this, maybe this is one of the avenues where it is actually prudent to research the use of ML. Edited November 7, 2022 by Bounceback Share this post Link to post
Pat Foley 51 Posted November 7, 2022 It's in the source 🙂 https://docwiki.embarcadero.com/Libraries/Sydney/en/System.StrUtils.SoundexWord Brian Longs site with low level calls http://blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#Animation Some context may be needed is Robert rubert or rowbear? Gotham or Goothem? Share this post Link to post
Rollo62 536 Posted November 8, 2022 (edited) Since its about rhymes, probably the phonetic abstraction on the trailing parts of the words might be interesting, and maybe can more or less looked up in a DB. Playing around with this tool, Like: Quote a cable laying on this table into Quote ə ˈkeɪbl ˈleɪɪŋ ɒn ðɪs ˈteɪbl But you're right, its hard to predict its general outcome without deeper knowledge of semantics, orthography, etc. On the other hand, is the phonetic transcription not all about the phonetic representation of spoken words ? So it would be my first candidate for the job. Edited November 8, 2022 by Rollo62 Share this post Link to post