Announce: $150 discount on Rubicon Pro ( optimized full text search component suite for Delphi )

Ann Lynnworth · December 3, 2021

03 December 2021

You are hereby welcome to a $150 discount off the list price of $199 for Rubicon Pro full-text-search components for Delphi. ($199 - 150 ... your price will be $49 usd.)

We are looking for some new-user feedback, especially from anyone using FireDAC, ADO or IBExpress in a Vcl project that targets Windows desktop. I am hoping that everyone reading this forum is willing to give useful feedback.

Here's a 15 minute what-does-it-do video: Rubicon showcase from Desktop UX Summit 2021

To qualify for the discount, be one of the next nine people to answer 6 questions on this survey.

The license you receive will be for one developer and there are no runtime fees. Rubicon Pro includes full source. Rubicon currently supports XE8 through Delphi 11 Alexandria (D22 to D28).

Ann Lynnworth
HREF Tools Corp.

Rinzwind · December 3, 2021

Links to "The flexible management system | monday.com Work OS"?

Lars Fosdal · December 3, 2021

@Ann Lynnworth - that video link looks like a miss.

Including a link to your site would be nice: https://www.href.com/rubicon

~~Or is this just a clever spam bot?~~ Fortunately, she's legit 🙂

Uwe Raabe · December 3, 2021

29 minutes ago, Lars Fosdal said:

Or is this just a clever spam bot?

I am quite confident that isn't the case here. Otherwise they had found a way to send sophisticated personal messages to users of this forum that actually make sense.

Ann Lynnworth · December 4, 2021

Apologies from an all-too-human.

Let's try a youtube link again :

https://www.youtube.com/watch?v=vRTneMRC4yQ&t=1s

I got that "monday" advertisement too when I tested, and then the [Skip Ad] button showed up and the real content started.

THANK YOU to those who responded. Beth will be sending you info momentarily.

Ann

Edited December 4, 2021 by Ann Lynnworth
realized that people have already filled in the survey!

Edwin Yip · December 4, 2021

A few questions if you don't mind.

How does it compare with SQLite's FTS? ( would appreciate a genuine comparison table).
Does the db backend support SQLite? Has built-in support (asking this because I don't use FireDAC, ADO nor IBExpress )?
No support for earlier versions of Delphi prior to XE8?

Edited December 4, 2021 by Edwin Yip

Ann Lynnworth · December 4, 2021

Thanks for the questions Edwin. I'll answer what I can :

3. Earlier versions of Rubicon supported earlier versions of Delphi back to Delphi 2.0 from around year 1996. If necessary, I can pull installers and/or source from the archive. If you buy a license, you get a free option to have direct access to the svn repo. Sometimes that is the easiest way to use the older codebase.

2. There is no direct built-in support for SQLite at present. ( Which query object do you use? ) Rubicon sits on top of a range of database access components and we have been focusing on FireDAC as a way of being as open as possible to the widest range of developers. Speed with FireDAC and SQLite still seems very fast to us. If you buy the full-source license, nothing stops you from cloning one of the existing bridges, and we can talk that through with you if you wish.

1. This is a guess. I have not used SQLite's FTS myself. Rubicon lets you index across multiple fields, even multiple tables, so that if there are related records that should all bring the user to "one place" based on an integer primary key, you can do that. The knowledge of where a word is located is not tied to a single field, it's tied to a location identifier which is usually a PK of a record in an important table. Rubicon also supports a range of searching syntax. I tried to demonstrate it in the showcase. There are different modes. As just one example, you can use a NEAR keyword and find one word within __ words of another, e.g. Delphi near SQLite. There's support for AND, OR, ( ) grouping and phrases. There are events relating to building the index, so that you can filter out noise in a custom way, and related to the searching. Rubicon also supports ranking of the match results, such that records that match the keyword more often are considered more important and can be presented to the user at the top of their list.

You might know the site www.codenewsfast.com which we have sponsored for many years. It's not perfect, it needs to start importing from discourse databases not just nntp and yahoo groups. Anyway, the database there uses FirebirdSQL v2.5 and it contains about 15gb of text. You can try it to get a sense of the syntax yourself. The index on that system is done with NexusDB. Rubicon indexes can be done with a different DBMS than the text itself.

-Ann

Edwin Yip · December 4, 2021

Thanks for the answers Ann. I use mORMot for db connection.

Just one note - if you take full-text search seriously (I know you do), you cannot ignore SQLite's FTS3/4/5 (with code sponsor from the Google search team) because it's the currently the number one embedded/desktop FTS engine in the world. The features you described seems to be all supported by SQLite FTS as far as I know.

December 4, 2021

Is the search done in an SQL context? In other words, can i add other SQL where clauses to do an FTS on a subset of the text fields (by rows)?

Ann Lynnworth · December 4, 2021

I haven't seen how to reply to individual posts so here are 2 replies in 1.

Edwin- thank you. We will look closer at the SQLite FTS details. Do you personally use SQLite FTS exclusively with text stored in SQLite? Can one use the SQLite FTS against, say, data in Oracle or Postgresql ?

I have put mORMot on the shortlist to look at closely next year.

Dany - I'm not sure I am catching your question. With Rubicon, the indexing can be done on a SQL query with a WHERE clause or on a SQL View if the DBMS supports that. Using a view makes it easy to put in a complex WHERE clause, and importantly lets subsequent selections occur against the view. Let's say you indexed 10 fields with Rubicon but then wanted to only return results if the matches were in 2 of those fields, you could do some filtering before returning results but performance will suffer. What you get raw out of Rubicon is a list of the matching location identifiers. You can put that list into a WHERE clause.

Example: select * from customer where (CustomerNo in ( 3, 5, 12 ) )

Using the in .... clause gives the full set of matches, after which you'd have to filter. By the time you do a LIKE clause to filter, it's a waste of the index against the 10 fields. But it would work.

select * from customer where (CustomerNo in ( 3, 5, 12 ) ) and ( City LIKE "%bedf%" ) and ( State = "MA" )

This doen't sound to me like an ideal use of Rubicon - but I might not have understood you.

Ann Lynnworth · December 4, 2021

Edwin - would you point me at a sample project that compiles in Delphi and uses the SQLite FTS5 extension? It seems one has to compile the support in, somewhere, somehow. ( I have been reading on https://www.sqlite.org/fts5.html ) Is that support included in mORMot?

I have a sample open-source recipe database in SQLite with 173,000 records. I would be interested to do a speed and ease-of-use comparison. My recipe .db3 is included with the install of Rubicon Lite. If for any reason you want to run install locally and compare on your own hardware, you are more than welcome to use it. Link to download Rubicon Lite

Edwin Yip · December 4, 2021

2 hours ago, Ann Lynnworth said:

Edwin- thank you. We will look closer at the SQLite FTS details. Do you personally use SQLite FTS exclusively with text stored in SQLite?

Yes.

Quote

Can one use the SQLite FTS against, say, data in Oracle or Postgresql ?

No.

Re. FTS5 sample compile-able with Delphi, I know at least mORMot (which I use) and DISQLite have that.

Hope it helps.

Ann Lynnworth · December 5, 2021

Thanks Edwin. I worked with DISQLite a little some years ago. Lajos and I will do some comparisons over the next couple of weeks.

I do see one feature difference and that is: Rubicon does not currently support RegEx MATCH within the search expression.

I don't suppose anyone has accurate usage statistics on ORMs for Delphi ? Is there any way to quantify how popular mORMot is, or how it is trending?

corneliusdavid · December 6, 2021

On 12/2/2021 at 10:47 PM, Ann Lynnworth said:

To qualify for the discount, be one of the next nine people to answer 6 questions on this survey.

I have a question about one of the survey questions: "Would it suffice to re-index on a schedule such as quarterly, monthly or weekly?"

What is the purpose of re-indexing? Is that re-indexing my database with the data? Or something in the Rubicon library/database itself?

Ann Lynnworth · December 7, 2021

7 hours ago, corneliusdavid said:

I have a question about one of the survey questions: "Would it suffice to re-index on a schedule such as quarterly, monthly or weekly?"

What is the purpose of re-indexing? Is that re-indexing my database with the data? Or something in the Rubicon library/database itself?

Thank you for the question, David. It's probably not obvious at all... the reason Rubicon search results come back so quickly is because the answer is known in advance, and that's because the word-to-location index is created before searches are done. That's called indexing. Re-indexing happens when the text changes. I'll give you some scenarios to clarify.

1. Natural delay between text changes. Let's say the text data comes from a government publication which comes out monthly. No text data changes except at the time of publication. Therefore it is perfectly natural to add information about the new text data on a monthly basis.
In terms of workflow, at some point after the new text data is available, an EXE compiled with Delphi + perhaps FireDAC components + Rubicon would "Make" the index such that it includes the new text data. Something needs to run that EXE, either a human being or a windows task or something else in the ecosystem. The "index" which we call the WORDS table lives in some database. The WORDS table might live the same database as your text data or it might be separate. You can have a hybrid of an Oracle database with a SQLite or NexusDB words table.
The "Make" process generates a table which essentially tracks the location of every word within the text -- and "location" means LongInt identifier. In this scenario, you would probably naturally know the last cut-off of your indexing, and be able to use Append mode to merge in the information for the newly released text. ("Make" is a complete re-index, "Append" picks up where the last round of indexing left off.)

2. Daily updates on a smallish text database. Let's say the text data is something on a personal or small business LAN system, such that complete re-indexing would take less than 5 minutes. That could happen nightly, similar to a nightly backup workflow. You would just use Make to reindex the whole thing because it's fast and there's no need to build anything more complicated.

Ideally this is when a bartending system is integrated with the personal computer, so running the index triggers creation of the user's favorite beverage. 🙂

The 5 minute guideline is for the system architect to contemplate. Maybe the business actually closes at 6pm and you could re-index for 6 hours without bothering anyone. Then you could do this on a fairly large database.

3. Multi-user add/edit/delete maintenance to tables which include text relating to a single ID, perhaps a Job or an Invoice or a manufacturing part. This is challenging, and can be in fact impossible if the rate of text editing exceeds the ability to update the Words table. It boils down to whether there will be collisions on individual popular words. The problem is not multi-user collision on the record being edited but on the location-data associated with the keyword being indexed. If you look at the Rubicon Showcase project and stare at the Words table for a few minutes, you'll probably see what I mean. There is a BLOB storing location data for each word, so if two users modify text memos and both of them use the word 'surrogate' then there will be contention on updating the Blob relating to 'surrogate.' In the recipe sample system, there would be a contention when two users modified a recipe using an 'egg'.

Collision can be likely if there are enough users or the text entries are frequently similar.

Therefore this is not an ideal use-case for Rubicon unless there are multiple seconds between edits, perhaps because the majority of users are in readonly reporting mode.

You can still do it by cycling between two indices, one which is used for searches but is a little delayed in knowing everything, and one which is regenerated on a schedule and swapped into active use. Whether this is practical comes down to the database size, hardware spec, DBMS speed, and how necessary it is to have data available a microsecond after it has been typed.

Ok, that was a long answer! Cheers.

--

A huge THANK YOU to everyone who has filled in our survey and obtained their discounts already.

Ann Lynnworth
HREF Tools Corp.
https://www.href.com

corneliusdavid · December 7, 2021

@Ann Lynnworth, thank you very much for the detailed explanation. That's exactly the information I needed!

Edwin Yip · December 7, 2021

@Ann Lynnworth, Does it support CJK words? SQLITE FTS does not, and that's my only complain about it.

Ann Lynnworth · December 7, 2021

@Edwin YipCJK Chinese Japanese Korean?

Rubicon has full Unicode support. There is a place where you help it to know the alphabet for the text you care about, and you can customize the word delimiters. The existing showcase includes 100 recipes in each of Arabic, German, Spanish, Hungarian, Portuguese and Russian in addition to the much larger set in english. If you fast forward though the showcase video, you will see that there is a section where we do lingvo-specific searches. I imagine that you would also want to search in one lingvo at a time, yes?

I'd say you would be able to verify CJK support for yourself with the free Rubicon Lite edition and the FireDAC bridge to SQLite. You would need to make the index and do the searching with FireDAC until we offer a mORMot bridge.

Do you have a reason to think that CJK in particular would fail while arabic and russian would work? If you do, I can go through the steps to get those translations added into the recipe sample database.

Edwin Yip · December 7, 2021

1 hour ago, Ann Lynnworth said:

@Edwin YipCJK Chinese Japanese Korean?

Do you have a reason to think that CJK in particular would fail while arabic and russian would work? If you do, I can go through the steps to get those translations added into the recipe sample database.

Sure, because Chinese, Japanese and Korean have no such a thing as word delimiter :D

Word segmentation, is a fundamental and professional term in your area, I guess.

Edwin Yip · December 7, 2021

Quote

I'd say you would be able to verify CJK support for yourself with the free Rubicon Lite edition and the FireDAC bridge to SQLite.

Well, I must say, from your description I already know it most likely doesn't support it, because you didn't know CJK words have no word delimiters ;)

Ann Lynnworth · December 7, 2021

Live and learn, wow, okay!

What about https://chinese-words-separator.github.io I guess algorithms exist. I thought one of the higher unicode chars would be used these days as a separator in some documents.

Edwin Yip · December 7, 2021

Hah, yes, languages in the world are extremely diverse.

Algorithms exist, and open source projects too, some in c and some in golang on github.

AFAIK, there is only one in Pascal: https://github.com/PassByYou888/zChinese, but I don't know how good it's and the size of the files.

Ann Lynnworth · December 7, 2021

On 12/3/2021 at 6:53 AM, Rinzwind said:

Links to "The flexible management system | monday.com Work OS"?

FWIW the video is now available directly to avoid the youtube adverts

Rubicon_Showcase_2021_October

Ann Lynnworth · December 8, 2021

@Edwin Yip Re CJK, I asked a colleague in Singapore about word delims for Chinese. He reads and writes English and Chinese. He told me that Chinese does use whitespace and !,.:; characters like English. However, he said the modern hanyu pinyin "may" take away the whitespace. I guess then a dictionary lookup is used to assist in word matchingI think that's what the PassByYou888/zChinese code is doing -- thanks for that link. The example he gave was that at a bank, a person's name is stored without any spaces.

I just looked up Headline Daily and you are right, they are breaking at phrases or sentences with a fullwidth comma. I put the string into Unipad editor to verify. There were no invisible spacer chars.

Interesting (to me) was putting the text into google translate, which first changes it into a phonetic script, which does have spaces between words, and translated fairly successfully into english. I did have to laugh at the english word 'cola' in google english. One wonders at what point that mistake was introduced.

This was the random new item that I worked from:

西九龍衝鋒隊第4隊警員今日凌晨1時許，在旺角登打士街近廣東道交界設置路障截查可疑車輛，期間截停一輛私家車，警員發現24歲姓張的男司機身上藏22包共重13克俗稱「可樂」的可卡因毒品，另於車頭位置發現1包約重5克的

I'm thinking it should be possible (not fast, but possible) to use google translate API to parse into the intermediate translation, index that as words, and then FTS5 or Rubicon would succeed. I do not see a function in the REST API to get the intermediate parsing. Hmmm. Needs more research.

Edwin Yip · December 8, 2021

@Ann Lynnworth, I'm not encourage nor pushing you to work on CJK words support in your FTS library, because I honestly think SQLite FTS is excellent already, even without Chinese word support built in, it allows me to supply call back procedures to overcome that, if I want to, thanks to the excellent works by authors of both SQLite and mORMot.

But since you are interested, here are some points that might be of helpful:

No, spaces used in Chinese writings are not for delimiting words, it's impossible to just use spaces and punctuation to segment Chinese words.
Translation shown by services like Google Translate is a result of hard works by CS researchers all around the world in the past 4 or 5 decades. Yes, decades.
It's not always feasible for a desktop app to use Google API, even if it provides the word segmentation service (which I don't think so as far as I know). Further more, in case of China, we have the GFW, which makes it impossible to use Google API.
The 'cola' translation is not a mistake, but rather a local 'alias' (maybe only limited to that place) to cocaine :D

Sign In

Announce: $150 discount on Rubicon Pro ( optimized full text search component suite for Delphi )

Recommended Posts

Ann Lynnworth 11

Share this post

Link to post

Rinzwind 27

Share this post

Link to post

Lars Fosdal 1877

Share this post

Link to post

Uwe Raabe 2169

Share this post

Link to post

Ann Lynnworth 11

Share this post

Link to post

Edwin Yip 154

Share this post

Link to post

Ann Lynnworth 11

Share this post

Link to post

Edwin Yip 154

Share this post

Link to post

Guest

Share this post

Link to post

Ann Lynnworth 11

Share this post

Link to post

Ann Lynnworth 11

Share this post

Link to post

Edwin Yip 154

Share this post

Link to post

Ann Lynnworth 11

Share this post

Link to post

corneliusdavid 257

Share this post

Link to post

Ann Lynnworth 11

Share this post

Link to post

corneliusdavid 257

Share this post

Link to post

Edwin Yip 154

Share this post

Link to post

Ann Lynnworth 11

Share this post

Link to post

Edwin Yip 154

Share this post

Link to post

Edwin Yip 154

Share this post

Link to post

Ann Lynnworth 11

Share this post

Link to post

Edwin Yip 154

Share this post

Link to post

Ann Lynnworth 11

Share this post

Link to post

Ann Lynnworth 11

Share this post

Link to post

Edwin Yip 154

Share this post

Link to post

Create an account or sign in to comment

Create an account