Jump to content
Registration disabled at the moment Read more... ×
#ifdef

Determine the language

Recommended Posts

There is a multi-line multi-lang (DE, EN, FR) document, and now I need to determine the language of each individual line.

 

No any web. Local only. Preferably using VCL + ISpellChecker (WinAPI) or even Microsoft Word (OLE).

 

What do you think, is it possible?

Share this post


Link to post

You need a dictionary for each language in the document. Then for each line you search the count of known words in each language. You can consider - with an error margin - that the language having the most recognized words is the line language.

Dictionaries should contain all variations of each word (singular, plural, masculine, feminine, neuter, all conjugated forms, etc).

Share this post


Link to post

Thanks.

 

 Here is a spell checker, is it possible use it for language detect (do it without dictionaries)?

Edited by #ifdef

Share this post


Link to post
3 hours ago, #ifdef said:

Here is a spell checker, is it possible use it for language detect (do it without dictionaries)?

A spell checker IS basically a dictionary as I described it. It may add grammar check as well.

Share this post


Link to post

Scoring using the most common words for each language works surprisingly well. Even scoring each line using the 50-100 most common words works most of the time except for very short lines or odd fragments of longer sentences. If the scores are close for different languages a more complex approach can be used for that line. Would work better if you can group lines together in paragraphs or sentences so very short lines would be less of an issue. 

 

Even just the 10 most common words works for longer text and a lot of short text

English : the be to of and a in that have I

French: être avoir je de ne pas le la tu vous    

German: wie ich seine dass er war für auf sind mit

 

Edited by Brian Evans
  • Like 2

Share this post


Link to post
2 hours ago, FPiette said:

A spell checker IS basically a dictionary as I described it. It may add grammar check as well.

Yes, and that's why I would not like to invent another dictionary, but use a ready-made one 🙂

Share this post


Link to post

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×