RaelB 4 Posted July 18, 2023 Hi, I am needing to store a large number of small images, e.g. 10,000+ images (so could be 50,000 or more..), where each image is around 30kb. Generally I will not be iterating through all of them. Main operations are check if exists, load, and save. I'm considering to use the filesystem for this purpose, as apposed to using a database e.g. sqlite. Is there any downside to using the filesystem in this scenario? Thanks Share this post Link to post
angusj 126 Posted July 18, 2023 2 hours ago, RaelB said: Is there any downside to using the filesystem in this scenario? If you're only searching by filename or file type (*.jpg;*.png; etc) then the filesystem would suffice. Share this post Link to post
Anders Melander 1815 Posted July 18, 2023 2 hours ago, angusj said: If you're only searching by filename or file type (*.jpg;*.png; etc) then the filesystem would suffice. Why would that make a difference? The whole directory/tree would need to be scanned anyway, even if he's only interested in the image files. [edit] Oh, I guess you mean select files based on the filename as opposed to the file content. I would probably go for an in-memory database (e.g. SQLite or even just a TFDMemTable) for the index but store the files on disk (but don't put all files in a single folder - that will suck). That said, if you have 50K files averaging 30Kb each, that's only 1,5 Gb total (+ overhead) so you could actually fit the whole thing in memory. Share this post Link to post
weirdo12 21 Posted July 19, 2023 Have a look at SQLite as Anders suggested: https://sqlite.org/appfileformat.html 1 Share this post Link to post
Der schöne Günther 316 Posted July 19, 2023 If you have no trouble identifying every item by filename alone, I'd personally go with the raw filesystem. Reason is that you can easily purge older files, add or remove content without going through your application which eventually have to be updated. Not sure about the type of app you're building, but sticking with raw files has often helped me so that a technician on site was easily able to "remove all garbage data from scanner 2 between 13:00 and 13:15" without me adding something like this to the application or editing a big database file. Also you can easily corrupt your sqlite database file with sudden power losses and stuff like this. If you set it up correctly then it doesn't happen, but I think the default settings of FireDAC have the journalling behaviour set up so that it is not robust against power losses. It's a mistake I have unfortunately made more than once. I think NTFS (or at least, Windows Explorer) starts having trouble after around 10.000 files in the same directory. You should definitely have some subfolders that also allow you to group/sort things easier. Like a folder for every new day or something like this. 1 Share this post Link to post
Markus Kinzler 174 Posted July 19, 2023 Do you need to access to the image files besides your application? When not I would suggest to store the files in an database. In aspect of security and movebility. Share this post Link to post
Fr0sT.Brutal 900 Posted July 19, 2023 Files. The simpler - the better. Probably DB for index but you'll have to keep it in sync with filesystem Share this post Link to post
Zoran Bonuš 12 Posted July 19, 2023 If you already use database of other data, I would go with the images in DB. Another form.of storage means also another thing to take care of backups, accessibility, security, etc. Share this post Link to post
Angus Robertson 577 Posted July 19, 2023 I would store them in files, but not thousands of small files, a few larger files, with a database indexing their location in those large files. At least assuming they are not randomly being deleted. Easier and faster to back up. Angus Share this post Link to post
Die Holländer 49 Posted July 19, 2023 MS-SQL database has a special implementation "FileTables" to handle the file-system in combination with the database. Share this post Link to post
Brian Evans 109 Posted July 19, 2023 Myself I would use the file system over SQLite. Store extra information with the images as either image metadata or in a file beside the image file. This lets you regenerate the database from the files as needed. Less involved to do incremental backups. A small SQLite database file + changed image files vs a large SQLite database file with images in it. Share this post Link to post
mytbo 5 Posted July 19, 2023 20 hours ago, RaelB said: I'm considering to use the filesystem for this purpose, as apposed to using a database e.g. sqlite. With your requirements I would prefer a SQLite database. There is an article with sample source code on this topic in the DP forum (German part). Here is the translation into English with Google Translator. The result is not perfect, also some formatting is destroyed, but it is readable. With best regards Thomas Share this post Link to post
RaelB 4 Posted July 19, 2023 Thanks for all the replies. It has helped me consider all the various factors to keep in mind. @Der schöne Günther: Which setting/setup are you referring to with regards to sqlite db corruption and power losses? Share this post Link to post
David Schwartz 430 Posted July 19, 2023 (edited) I'd run a test by creating a folder with 50,000 files in it with random names and a dozen different images just to take up space. have the names in a list and just make random requests and see what your performance is like. Personally, I'd hash the names and then set up subfolders based on the first letter and split them up that way. They'd be much small collections and I suspect accessing them would be faster than dumping them all into one folder. Keep in mind that the file system is itself a form of database. It typically organizes things in the order they're added. It has index files that can accelerate lookups, but if the names are all similar, like abcd##### where the ##### part is a 5-digit code, they may all be in alphabetical order anyway (assuming the number is incremented each time one is added.) If they names are hashed, then they're going to be fairly random. Linux is generally faster then Windows for large folders. But I'd hate to have to open a WIndows file explorer for the folder with all of these files in it! Edited July 19, 2023 by David Schwartz Share this post Link to post
Der schöne Günther 316 Posted July 20, 2023 10 hours ago, RaelB said: Which setting/setup are you referring to with regards to sqlite db corruption and power losses? I don't have at hand how you had to set up the TFDConnection (and possibly the Sqlite bridge) with the FireDAC components, but here's a starting point: How To Corrupt An SQLite Database File from Sqlite.org itself. Also, see https://docwiki.embarcadero.com/RADStudio/en/Using_SQLite_with_FireDAC, it's surprisingly helpful: Quote By default, all SQLite driver settings are set for the high-performance single connection access to a database in a stable environment (...) Set Synchronous to Full to protect the DB from the committed data losses. Share this post Link to post
Blavatsky 0 Posted August 6, 2023 (edited) Somebody from Embarcadero kindly wrote me an app to do that. I put about 900 images (jpeg ) Edited August 6, 2023 by Blavatsky Share this post Link to post