Jump to content

Ann Lynnworth

  • Content Count

  • Joined

  • Last visited

Community Reputation

10 Good

Technical Information

  • Delphi-Version
    Delphi 11 Alexandria

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Ann Lynnworth

    docwiki.embarcadero.com is not working

    Maybe this is obvious but I'll just say: the google Cache feature is quite useful on those days when the docwiki goes offline. Google moved that cache feature (in google chrome on desktop, at least). Now it's hidden under the triple-dot menu at the right edge of each normal search result line, and then you have to click the Cache button bottom-right. Definitely better than waiting for docwiki to come back online.
  2. Ann Lynnworth

    Does anyone know if Nils Haeck is OK ? SimLib and NativeXml

    I use & maintain a fork of TNativeXml, with permission from nils, in the ZaphodsMap project which is open source on sourceforge, here: https://sourceforge.net/p/zaphodsmap I have not corresponded with Nils in more than 10 years. Good to know he is still posting on LinkedIn. FWIW, I still prefer to use TNativeXML when I need to iterate through nodes of XML.
  3. Rubicon v4.072 for Delphi is available now. This is a free upgrade for existing customers. There is a Free Lite version available for Delphi 10 and Delphi 11; download from https://www.href.com/rbcdnload. I am still looking for a couple more people to give new-user feedback. Just fill in the fairly short survey on this page, https://www.href.com/rubicon, and you will receive a significant discount to the pro version with full source. Since February, Rubicon has its own Discourse home located at https://forum.href.com/c/rubicon-full-text-search-for-delphi/5 If you would like to talk directly with other Delphi developers who have used Rubicon, please feel free to stop by. Ann Lynnworth Co-Founder and Bottle Washer HREF Tools Corp.
  4. From my colleague in Singapore, I learned about this dictionary for Chinese: https://www.pleco.com/about/ That would be one way to identify undelimited words. I guess Pleco would have to be licensed not only by us but by any developer who needed that functionality. I have done another round of improvements on the Rubicon installer. I am still very interested to hear from anyone wanting to do optimized full text search with FireDAC and non-CJK database content and willing to give us new-user feedback. The comments thus far have been extremely helpful. Release notes for Rubicon v4.070 are here: https://www.href.com/rbrelnotes To get the discount, just fill out the tiny survey and we will get back to you. You will get a real Rubicon Pro license with full source and free upgrades until the next major release. Best wishes to all for the "year 2022" ride around the sun.
  5. @Edwin Yip Re CJK, I asked a colleague in Singapore about word delims for Chinese. He reads and writes English and Chinese. He told me that Chinese does use whitespace and !,.:; characters like English. However, he said the modern hanyu pinyin "may" take away the whitespace. I guess then a dictionary lookup is used to assist in word matchingI think that's what the PassByYou888/zChinese code is doing -- thanks for that link. The example he gave was that at a bank, a person's name is stored without any spaces. I just looked up Headline Daily and you are right, they are breaking at phrases or sentences with a fullwidth comma. I put the string into Unipad editor to verify. There were no invisible spacer chars. Interesting (to me) was putting the text into google translate, which first changes it into a phonetic script, which does have spaces between words, and translated fairly successfully into english. I did have to laugh at the english word 'cola' in google english. One wonders at what point that mistake was introduced. This was the random new item that I worked from: 西九龍衝鋒隊第4隊警員今日凌晨1時許,在旺角登打士街近廣東道交界設置路障截查可疑車輛,期間截停一輛私家車,警員發現24歲姓張的男司機身上藏22包共重13克俗稱「可樂」的可卡因毒品,另於車頭位置發現1包約重5克的 I'm thinking it should be possible (not fast, but possible) to use google translate API to parse into the intermediate translation, index that as words, and then FTS5 or Rubicon would succeed. I do not see a function in the REST API to get the intermediate parsing. Hmmm. Needs more research.
  6. FWIW the video is now available directly to avoid the youtube adverts Rubicon_Showcase_2021_October
  7. Live and learn, wow, okay! What about https://chinese-words-separator.github.io I guess algorithms exist. I thought one of the higher unicode chars would be used these days as a separator in some documents.
  8. @Edwin YipCJK Chinese Japanese Korean? Rubicon has full Unicode support. There is a place where you help it to know the alphabet for the text you care about, and you can customize the word delimiters. The existing showcase includes 100 recipes in each of Arabic, German, Spanish, Hungarian, Portuguese and Russian in addition to the much larger set in english. If you fast forward though the showcase video, you will see that there is a section where we do lingvo-specific searches. I imagine that you would also want to search in one lingvo at a time, yes? I'd say you would be able to verify CJK support for yourself with the free Rubicon Lite edition and the FireDAC bridge to SQLite. You would need to make the index and do the searching with FireDAC until we offer a mORMot bridge. Do you have a reason to think that CJK in particular would fail while arabic and russian would work? If you do, I can go through the steps to get those translations added into the recipe sample database.
  9. Thank you for the question, David. It's probably not obvious at all... the reason Rubicon search results come back so quickly is because the answer is known in advance, and that's because the word-to-location index is created before searches are done. That's called indexing. Re-indexing happens when the text changes. I'll give you some scenarios to clarify. 1. Natural delay between text changes. Let's say the text data comes from a government publication which comes out monthly. No text data changes except at the time of publication. Therefore it is perfectly natural to add information about the new text data on a monthly basis. In terms of workflow, at some point after the new text data is available, an EXE compiled with Delphi + perhaps FireDAC components + Rubicon would "Make" the index such that it includes the new text data. Something needs to run that EXE, either a human being or a windows task or something else in the ecosystem. The "index" which we call the WORDS table lives in some database. The WORDS table might live the same database as your text data or it might be separate. You can have a hybrid of an Oracle database with a SQLite or NexusDB words table. The "Make" process generates a table which essentially tracks the location of every word within the text -- and "location" means LongInt identifier. In this scenario, you would probably naturally know the last cut-off of your indexing, and be able to use Append mode to merge in the information for the newly released text. ("Make" is a complete re-index, "Append" picks up where the last round of indexing left off.) 2. Daily updates on a smallish text database. Let's say the text data is something on a personal or small business LAN system, such that complete re-indexing would take less than 5 minutes. That could happen nightly, similar to a nightly backup workflow. You would just use Make to reindex the whole thing because it's fast and there's no need to build anything more complicated. Ideally this is when a bartending system is integrated with the personal computer, so running the index triggers creation of the user's favorite beverage. 🙂 The 5 minute guideline is for the system architect to contemplate. Maybe the business actually closes at 6pm and you could re-index for 6 hours without bothering anyone. Then you could do this on a fairly large database. 3. Multi-user add/edit/delete maintenance to tables which include text relating to a single ID, perhaps a Job or an Invoice or a manufacturing part. This is challenging, and can be in fact impossible if the rate of text editing exceeds the ability to update the Words table. It boils down to whether there will be collisions on individual popular words. The problem is not multi-user collision on the record being edited but on the location-data associated with the keyword being indexed. If you look at the Rubicon Showcase project and stare at the Words table for a few minutes, you'll probably see what I mean. There is a BLOB storing location data for each word, so if two users modify text memos and both of them use the word 'surrogate' then there will be contention on updating the Blob relating to 'surrogate.' In the recipe sample system, there would be a contention when two users modified a recipe using an 'egg'. Collision can be likely if there are enough users or the text entries are frequently similar. Therefore this is not an ideal use-case for Rubicon unless there are multiple seconds between edits, perhaps because the majority of users are in readonly reporting mode. You can still do it by cycling between two indices, one which is used for searches but is a little delayed in knowing everything, and one which is regenerated on a schedule and swapped into active use. Whether this is practical comes down to the database size, hardware spec, DBMS speed, and how necessary it is to have data available a microsecond after it has been typed. Ok, that was a long answer! Cheers. -- A huge THANK YOU to everyone who has filled in our survey and obtained their discounts already. Ann Lynnworth HREF Tools Corp. https://www.href.com
  10. Thanks Edwin. I worked with DISQLite a little some years ago. Lajos and I will do some comparisons over the next couple of weeks. I do see one feature difference and that is: Rubicon does not currently support RegEx MATCH within the search expression. I don't suppose anyone has accurate usage statistics on ORMs for Delphi ? Is there any way to quantify how popular mORMot is, or how it is trending?
  11. Ann Lynnworth

    My new project : WebView4Delphi

    You are right! It was just a matter of passing a more recent useragent string. GlobalCEFApp.UserAgent := // TCefApplication with CEF4 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0';
  12. Edwin - would you point me at a sample project that compiles in Delphi and uses the SQLite FTS5 extension? It seems one has to compile the support in, somewhere, somehow. ( I have been reading on https://www.sqlite.org/fts5.html ) Is that support included in mORMot? I have a sample open-source recipe database in SQLite with 173,000 records. I would be interested to do a speed and ease-of-use comparison. My recipe .db3 is included with the install of Rubicon Lite. If for any reason you want to run install locally and compare on your own hardware, you are more than welcome to use it. Link to download Rubicon Lite
  13. I haven't seen how to reply to individual posts so here are 2 replies in 1. Edwin- thank you. We will look closer at the SQLite FTS details. Do you personally use SQLite FTS exclusively with text stored in SQLite? Can one use the SQLite FTS against, say, data in Oracle or Postgresql ? I have put mORMot on the shortlist to look at closely next year. Dany - I'm not sure I am catching your question. With Rubicon, the indexing can be done on a SQL query with a WHERE clause or on a SQL View if the DBMS supports that. Using a view makes it easy to put in a complex WHERE clause, and importantly lets subsequent selections occur against the view. Let's say you indexed 10 fields with Rubicon but then wanted to only return results if the matches were in 2 of those fields, you could do some filtering before returning results but performance will suffer. What you get raw out of Rubicon is a list of the matching location identifiers. You can put that list into a WHERE clause. Example: select * from customer where (CustomerNo in ( 3, 5, 12 ) ) Using the in .... clause gives the full set of matches, after which you'd have to filter. By the time you do a LIKE clause to filter, it's a waste of the index against the 10 fields. But it would work. select * from customer where (CustomerNo in ( 3, 5, 12 ) ) and ( City LIKE "%bedf%" ) and ( State = "MA" ) This doen't sound to me like an ideal use of Rubicon - but I might not have understood you.
  14. Thanks for the questions Edwin. I'll answer what I can : 3. Earlier versions of Rubicon supported earlier versions of Delphi back to Delphi 2.0 from around year 1996. If necessary, I can pull installers and/or source from the archive. If you buy a license, you get a free option to have direct access to the svn repo. Sometimes that is the easiest way to use the older codebase. 2. There is no direct built-in support for SQLite at present. ( Which query object do you use? ) Rubicon sits on top of a range of database access components and we have been focusing on FireDAC as a way of being as open as possible to the widest range of developers. Speed with FireDAC and SQLite still seems very fast to us. If you buy the full-source license, nothing stops you from cloning one of the existing bridges, and we can talk that through with you if you wish. 1. This is a guess. I have not used SQLite's FTS myself. Rubicon lets you index across multiple fields, even multiple tables, so that if there are related records that should all bring the user to "one place" based on an integer primary key, you can do that. The knowledge of where a word is located is not tied to a single field, it's tied to a location identifier which is usually a PK of a record in an important table. Rubicon also supports a range of searching syntax. I tried to demonstrate it in the showcase. There are different modes. As just one example, you can use a NEAR keyword and find one word within __ words of another, e.g. Delphi near SQLite. There's support for AND, OR, ( ) grouping and phrases. There are events relating to building the index, so that you can filter out noise in a custom way, and related to the searching. Rubicon also supports ranking of the match results, such that records that match the keyword more often are considered more important and can be presented to the user at the top of their list. You might know the site www.codenewsfast.com which we have sponsored for many years. It's not perfect, it needs to start importing from discourse databases not just nntp and yahoo groups. Anyway, the database there uses FirebirdSQL v2.5 and it contains about 15gb of text. You can try it to get a sense of the syntax yourself. The index on that system is done with NexusDB. Rubicon indexes can be done with a different DBMS than the text itself. -Ann
  15. Ann Lynnworth

    My new project : WebView4Delphi

    Great. Thank you. Have you tested going to drive.google.com by any chance? Google is no longer allowing the custom browser that I last compiled with CEF4. I have been meaning to recompile to get around the problem. Google lets me into gmail and they let me download items from drive when the notices come into gmail, but direct use of google drive is blocked. They detect my CEF4 browser as an old version of "Firefox" now.