Alberto Fornés 22 Posted September 6, 2019 Hello, I am in the process of organizing how I save information from a estimate - orders system . Normally one thought of a master table (customer, total quantities, discount, taxes, etc.), and one or two detail tables with the items that make up the order (code, quantity, price, discounts, etc). In this case the information that makes up the detail is quite complex, and depends on many variables and it is possible that it is frequently reviewed. That's why I was thinking instead of defining a table with a multitude of fields, saving all the information in a json document. My question would be what experiences or inconveniences can you share and recommend about it. In my case, the database is Firebird and I would save the information in a blob field, I know there are other databases better prepared to handle JSON fields, but it is a change that if I can, I prefer not to do it. I imagine that the topic of the querys will be a weak point of working with json, but maybe I can solve it with support tables with certain indexed fields. Share this post Link to post
Rollo62 536 Posted September 6, 2019 (edited) I use JSON field for a similar approach, but I need to have saved the current state of a table document as safety backup. Since table entries might change over time, and building a history-safe table structure is somewhat brittle, I decided to store a copy of the current document (with main table entries) as JSON backup. No matter how the database structure and the data might change over time, I have valid backups when I release a document. So that I can surely find the state that is related to a special ID and date. Works fine, but of coarse there is the lack of search, but since the current tables refer to the momentary situation, I do not need to search in JSON really, Only when I need to restore old settings, I simply readout the JSON, and there is may be different versions of parameters stored in the JSON over time, which I can manually adjust if needed. Not sure if it is a good idea to have such field as main operating field, I guess this has too many drawbacks (speed, no seach, not easy to isolate between multiple user edits, etc.). Especially as you wrote that these fields need to be changed frequently. But if the field is just an addon to the normal table structure for a special use, as it is in my case, this might work fine. Maybe a clever mix of DB table fields and JSON fields can keep best of both worlds, but this highly depends on your project. Edited September 6, 2019 by Rollo62 1 Share this post Link to post
Arnaud Bouchez 407 Posted September 6, 2019 For a regular SQL DB, I would use a BLOB SUB_TYPE TEXT field, to store the JSON content. Then duplicate some JSON fields in a dedicated stand-alone SQL field (e.g. a "id" or "name" JSON field into regular table.id or table.name fields in addition to the table.json main field), for easy query - with an index if speed is mandatory. So, start with JSON only (and a key field), then maybe add some dedicated SQL fields for easier query, if necessary. side note: I would rather use SQLite3 or PostgreSQL for efficiently storing JSON content: both have efficient built-in support. Or a "JSON native" database like MongoDB. 3 1 Share this post Link to post
aehimself 396 Posted September 8, 2019 Based on experience I am against this type of data storage for several reasons: - Some database engines can not search in BLOB fields. Now it might not be an issue in the beginning but it makes investigating data corruption / debugging processing issues a living nightmare - Downloading blob fields are slow, no matter what. Especially over the network. In our production databases I can select hundreds of thousands of rows with 150+ columns but only a fraction of that 2 fields: an ID and a BLOB storing an XML - Database engines are robust, they were built for having columns. While it can be hard / intimidating to see the relations at first time, they have no issues handling a larger number of columns - Unless the DB engine has built-in JSON support, you'll not be able to query and compare a specific value on database side, you'll always have to download and compare from code, increasing memory, network and CPU usage of your program. Considering a well-fed metal (our slang for a server with enough resource + overhead) running the database, this will always be significantly slower than letting the DB engine to take care of these The only reasonable case when storing a JSON / XML in a database I can see is if you receive that from an external source (like a REST server or a 3rd party sending your software some information) and you want to keep those chunks for auditing purposes. Sooner or later you'll face one of the issues I mentioned, though 🙂 We did the same about 8 years ago (only with XMLs) and now we are looking into opportunities to undo it... 2 Share this post Link to post
Guest Posted September 8, 2019 This is a question of needs and content. I would think like this; i have chunks of BLOBs. There are a lot of things that can be done with a VARCHAR(32768) columns that cannot be done with BLOB columns. And vice versa and so on. The trade-offs/benefits of BLOBs vs VARCHARS or BLOBs vs "stored links" are nothing new. Old knowledge, google "RDBMS normalisation". Your "needs" should dictate whether you should; normalise the JSON chunk and store it using the RDBMS idiom/pattern. Or simply just put it away for the future in a BLOB. IMHO this has nothing to do about the content of the BLOB. Nothing to do with JSON or XML or DOCX or any other B(inary) format. You could write a "windows service/linux cron job" that massages all new JSON BLOBs according to you specific application needs and "explode" the content into "support table structures". That way neither the RDBMS, nor any middle layer you write will get bogged down by the processing. Think "Full Text Search" and best practice transaction handling. Using this approach, if something really basic changes in the general structure of all your stored JSON BLOBs you could clear all the "indexed" flags and restart the process. Something like a reset or a clean-up. The requisites regarding WHERE to store the actual B(inary)LOBs (on disk, in table BLOB) has more to do with you strategic needs (how will your system be hosted, and such) than the actual format. JSON is text, but using FBs parsing functions to have things done server-side may prove a bit tricky in traditional PSQL. FB 3, if i am not mistaken, has the ability to execute external functions in stead of vanilla-PSQL and if that is manageable in you environmental context perhaps you could do some magic (i.e. link in Delphis JSON libs in externally compiled functions similar to UDFs into your schema. But personally i have not had the need myself, and i am i bit reluctant to have my RDBMSs call the OS). I know this is common in M$SQL and PostGreSQL that had the capability for longer than FB. HTH, /D Share this post Link to post
Thijs van Dien 9 Posted September 25, 2019 (edited) I would suggest to at least not go the JSON route. If you don't want to go fully relational, and don't have too much nesting, you could use something like the Entity-Attribute-Value (EAV) pattern. When done correctly, that's still quite workable with a RDBMS like Firebird while already offering greater flexibility (with which also comes greater responsibility). It could strike a nice balance. Edited September 25, 2019 by Thijs van Dien Share this post Link to post
Arnaud Bouchez 407 Posted September 30, 2019 (edited) On 9/8/2019 at 9:11 AM, aehimself said: - Some database engines can not search in BLOB fields. Now it might not be an issue in the beginning but it makes investigating data corruption / debugging processing issues a living nightmare - Downloading blob fields are slow, no matter what. Especially over the network. In our production databases I can select hundreds of thousands of rows with 150+ columns but only a fraction of that 2 fields: an ID and a BLOB storing an XML - Database engines are robust, they were built for having columns. While it can be hard / intimidating to see the relations at first time, they have no issues handling a larger number of columns - Unless the DB engine has built-in JSON support, you'll not be able to query and compare a specific value on database side, you'll always have to download and compare from code, increasing memory, network and CPU usage of your program. Considering a well-fed metal (our slang for a server with enough resource + overhead) running the database, this will always be significantly slower than letting the DB engine to take care of these The only reasonable case when storing a JSON / XML in a database I can see is if you receive that from an external source (like a REST server or a 3rd party sending your software some information) and you want to keep those chunks for auditing purposes. Sooner or later you'll face one of the issues I mentioned, though 🙂 We did the same about 8 years ago (only with XMLs) and now we are looking into opportunities to undo it... @aehimself I do not have the same negative experience in practice. In fact, your last point gives light to the previous one: "DB engine has built-in JSON support" is needed. I stored JSON in DB with great success for several projects. Especially in a DDD project, we use our ORM to store the "Aggregate" object, with all its associated data objects and arrays serialized as JSON, and some indexed fields for quick query. If the architecture starts from the DB, which is so 80s, using JSON doesn't make sense. But if the architecture starts from the code, which is what is recommended in this century with new features like NoSQL and ORM running on high-end hardware, then you may have to enhance your design choices. Normalizing data and starting from the DB is perfectly fine, for instance if you make RAD or have a well-known data layout (e.g. accounting), but for a more exploring project, it will become more difficult to maintain and evolve. "- Some database engines can not search in BLOB fields. Now it might not be an issue in the beginning but it makes investigating data corruption / debugging processing issues a living nightmare" -> this is why I propose to duplicate the searched values in its own indexed fields, and only search the JSON once it is refined; and do not use BLOB binary fields, but CTEXT fields. "Downloading blob fields are slow, no matter what" -> it is true if you download it, but if the JSON process is done on the server itself, using JSON functions as offered e.g. by PostgreSQL or SQLite3, it is efficient (once you pre-query your rows using some dedicated indexed fields) "they have no issues handling a larger number of columns" -> main idea of using JSON is not to have a fixed set of columns, but being able to store anything, with no pre-known schema, and with several level of nested objects or arrays. Edited September 30, 2019 by Arnaud Bouchez Share this post Link to post
aehimself 396 Posted September 30, 2019 @Arnaud Bouchez I guess it all comes down to personal preference and possibility. There are cases when you are not allowed to choose the DB engine for your application, and there are cases when your application has to support some / most of them. My situation was exactly like this - someone decided to go with ONE DB engine only, and the whole software was built to use the engine-specific features. You don't necessarily have to agree with the design choices but you can only cook from what you have. As for preference I like the strict typing more but I can see the pros and opportunities of weak types (JSON is still the best for REST-like channels and local application settings store). I'll rather add a column to a database than to verify before each data access if the value exists, if it is a known (and accepted) type and if it has the information which I need. Especially if the user has the freedom or the possibility to access / change these objects (in my consideration if the software is running on their systems, they do). If you have the freedom and do not plan your database to grow too large - go for it. All I'm saying is that I already burned myself with this and therefore quickly became a no-go in future designs. Share this post Link to post