#ifdef 12 Posted July 23, 2023 There is a multi-line multi-lang (DE, EN, FR) document, and now I need to determine the language of each individual line. No any web. Local only. Preferably using VCL + ISpellChecker (WinAPI) or even Microsoft Word (OLE). What do you think, is it possible? Share this post Link to post
FPiette 383 Posted July 23, 2023 You need a dictionary for each language in the document. Then for each line you search the count of known words in each language. You can consider - with an error margin - that the language having the most recognized words is the line language. Dictionaries should contain all variations of each word (singular, plural, masculine, feminine, neuter, all conjugated forms, etc). Share this post Link to post
#ifdef 12 Posted July 23, 2023 (edited) Thanks. Here is a spell checker, is it possible use it for language detect (do it without dictionaries)? Edited July 23, 2023 by #ifdef Share this post Link to post
FPiette 383 Posted July 23, 2023 3 hours ago, #ifdef said: Here is a spell checker, is it possible use it for language detect (do it without dictionaries)? A spell checker IS basically a dictionary as I described it. It may add grammar check as well. Share this post Link to post
Brian Evans 105 Posted July 23, 2023 (edited) Scoring using the most common words for each language works surprisingly well. Even scoring each line using the 50-100 most common words works most of the time except for very short lines or odd fragments of longer sentences. If the scores are close for different languages a more complex approach can be used for that line. Would work better if you can group lines together in paragraphs or sentences so very short lines would be less of an issue. Even just the 10 most common words works for longer text and a lot of short text English : the be to of and a in that have I French: être avoir je de ne pas le la tu vous German: wie ich seine dass er war für auf sind mit Edited July 23, 2023 by Brian Evans 2 Share this post Link to post
#ifdef 12 Posted July 23, 2023 2 hours ago, FPiette said: A spell checker IS basically a dictionary as I described it. It may add grammar check as well. Yes, and that's why I would not like to invent another dictionary, but use a ready-made one 🙂 Share this post Link to post