Jump to content
RaelB

Store a large number of images in the filesystem or in a DB?

Recommended Posts

Hi,

I am needing to store a large number of small images, e.g. 10,000+ images (so could be 50,000 or more..), where each image is around 30kb. Generally I will not be iterating through all of them. Main operations are check if exists, load, and save.

I'm considering to use the filesystem for this purpose, as apposed to using a database e.g. sqlite.

Is there any downside to using the filesystem in this scenario?

Thanks

Share this post


Link to post
2 hours ago, RaelB said:

Is there any downside to using the filesystem in this scenario?

If you're only searching by filename or file type (*.jpg;*.png; etc) then the filesystem would suffice.

Share this post


Link to post
2 hours ago, angusj said:

If you're only searching by filename or file type (*.jpg;*.png; etc) then the filesystem would suffice.

Why would that make a difference? The whole directory/tree would need to be scanned anyway, even if he's only interested in the image files.

[edit] Oh, I guess you mean select files based on the filename as opposed to the file content.

 

I would probably go for an in-memory database (e.g. SQLite or even just a TFDMemTable) for the index but store the files on disk (but don't put all files in a single folder - that will suck). That said, if you have 50K files averaging 30Kb each, that's only 1,5 Gb total (+ overhead) so you could actually fit the whole thing in memory.

Share this post


Link to post

If you have no trouble identifying every item by filename alone, I'd personally go with the raw filesystem. Reason is that you can easily purge older files, add or remove content without going through your application which eventually have to be updated. Not sure about the type of app you're building, but sticking with raw files has often helped me so that a technician on site was easily able to "remove all garbage data from scanner 2 between 13:00 and 13:15" without me adding something like this to the application or editing a big database file.

 

Also you can easily corrupt your sqlite database file with sudden power losses and stuff like this. If you set it up correctly then it doesn't happen, but I think the default settings of FireDAC have the journalling behaviour set up so that it is not robust against power losses. It's a mistake I have unfortunately made more than once.

 

I think NTFS (or at least, Windows Explorer) starts having trouble after around 10.000 files in the same directory. You should definitely have some subfolders that also allow you to group/sort things easier. Like a folder for every new day or something like this.

  • Thanks 1

Share this post


Link to post

Do you need to access to the image files besides your application? When not I would suggest to store the files in an database. In aspect of security and movebility.

Share this post


Link to post

Files. The simpler - the better. Probably DB for index but you'll have to keep it in sync with filesystem

Share this post


Link to post

If you already use database of other data, I would go with the images in DB. Another form.of storage means also another thing to take care of backups, accessibility, security, etc.

Share this post


Link to post

I would store them in files, but not thousands of small files, a few larger files, with a database indexing their location in those large files.  At least assuming they are not randomly being deleted.  Easier and faster to back up. 

 

Angus

 

 

 

Share this post


Link to post

Myself I would use the file system over SQLite. Store extra information with the images as either image metadata or in a file beside the image file. This lets you regenerate the database from the files as needed. 

 

Less involved to do incremental backups. A small SQLite database file + changed image files vs a large SQLite database file with images in it. 

Share this post


Link to post
20 hours ago, RaelB said:

I'm considering to use the filesystem for this purpose, as apposed to using a database e.g. sqlite.

With your requirements I would prefer a SQLite database. There is an article with sample source code on this topic in the DP forum (German part). Here is the translation into English with Google Translator. The result is not perfect, also some formatting is destroyed, but it is readable.


With best regards
Thomas

Share this post


Link to post

Thanks for all the replies. It has helped me consider all the various factors to keep in mind.

 

@Der schöne Günther: Which setting/setup are you referring to with regards to sqlite db corruption and power losses?

Share this post


Link to post

I'd run a test by creating a folder with 50,000 files in it with random names and a dozen different images just to take up space. have the names in a list and just make random requests and see what your performance is like.

 

Personally, I'd hash the names and then set up subfolders based on the first letter and split them up that way. They'd be much small collections and I suspect accessing them would be faster than dumping them all into one folder. 

 

Keep in mind that the file system is itself a form of database. It typically organizes things in the order they're added. It has index files that can accelerate lookups, but if the names are all similar, like abcd##### where the ##### part is a 5-digit code, they may all be in alphabetical order anyway (assuming the number is incremented each time one is added.) If they names are hashed, then they're going to be fairly random.

 

Linux is generally faster then Windows for large folders. But I'd hate to have to open a WIndows file explorer for the folder with all of these files in it!

 

 

Edited by David Schwartz

Share this post


Link to post
10 hours ago, RaelB said:

Which setting/setup are you referring to with regards to sqlite db corruption and power losses?

I don't have at hand how you had to set up the TFDConnection (and possibly the Sqlite bridge) with the FireDAC components, but here's a starting point:

How To Corrupt An SQLite Database File from Sqlite.org itself.

 

Also, see https://docwiki.embarcadero.com/RADStudio/en/Using_SQLite_with_FireDAC, it's surprisingly helpful:

Quote

By default, all SQLite driver settings are set for the high-performance single connection access to a database in a stable environment (...) Set Synchronous to Full to protect the DB from the committed data losses.

 

Share this post


Link to post

Somebody from Embarcadero kindly wrote me an app to do that.
I put about 900 images (jpeg )

printscreen.jpg

Edited by Blavatsky

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×