Jump to content
#ifdef

Determine the language

Recommended Posts

There is a multi-line multi-lang (DE, EN, FR) document, and now I need to determine the language of each individual line.

 

No any web. Local only. Preferably using VCL + ISpellChecker (WinAPI) or even Microsoft Word (OLE).

 

What do you think, is it possible?

Share this post


Link to post

You need a dictionary for each language in the document. Then for each line you search the count of known words in each language. You can consider - with an error margin - that the language having the most recognized words is the line language.

Dictionaries should contain all variations of each word (singular, plural, masculine, feminine, neuter, all conjugated forms, etc).

Share this post


Link to post

Thanks.

 

 Here is a spell checker, is it possible use it for language detect (do it without dictionaries)?

Edited by #ifdef

Share this post


Link to post
3 hours ago, #ifdef said:

Here is a spell checker, is it possible use it for language detect (do it without dictionaries)?

A spell checker IS basically a dictionary as I described it. It may add grammar check as well.

Share this post


Link to post

Scoring using the most common words for each language works surprisingly well. Even scoring each line using the 50-100 most common words works most of the time except for very short lines or odd fragments of longer sentences. If the scores are close for different languages a more complex approach can be used for that line. Would work better if you can group lines together in paragraphs or sentences so very short lines would be less of an issue. 

 

Even just the 10 most common words works for longer text and a lot of short text

English : the be to of and a in that have I

French: être avoir je de ne pas le la tu vous    

German: wie ich seine dass er war für auf sind mit

 

Edited by Brian Evans
  • Like 2

Share this post


Link to post
2 hours ago, FPiette said:

A spell checker IS basically a dictionary as I described it. It may add grammar check as well.

Yes, and that's why I would not like to invent another dictionary, but use a ready-made one 🙂

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×