Jump to content
aehimself

Dynamic creation of SQL queries

Recommended Posts

Hi all,

 

I have an application where I would like the user to be able to "build" what data he wants to see. As I can not expect the user to be an expert in SQL, I need to have an interface where he can just check (or enter) the fields, my program will parse it, generate the necessary SQL query (with proper joins and such), executes it and returns the data.

I wrote a small program to "design" the database of the application - it is generating SQL scripts and the code for my custom table-as-an-object implementation - so I have access to all the field and relational information and can output it in any format needed.

 

I already have a prototype which can judge which field in which table to select and a crawler (path finder, basically) which can determine the shortest route between two tables. They seem to be working fine but there's still a lot of work with it (optimizations, beautifying - the usual) which I am soooooo unwilling to do at the moment.

 

So my question is - do I really have to reinvent the wheel? Is there a framework available which can do this for me?

 

And yes, I really hate to redesign working prototypes to production-ready implementations 🙂

Share this post


Link to post

Hi...:classic_cool:

Quote

I have an application where I would like the user to be able to "build" what data he wants to see. As I can not expect the user to be an expert in SQL

Imho is not a good idea to allow the end user to generate statements. Especially with an input in a TDEDIT. :classic_ninja:

SQL injection: https://en.wikipedia.org/wiki/SQL_injection

 

Share this post


Link to post
On 1/15/2020 at 8:23 AM, haentschman said:

Hi...:classic_cool:

Imho is not a good idea to allow the end user to generate statements. Especially with an input in a TDEDIT. :classic_ninja:

SQL injection: https://en.wikipedia.org/wiki/SQL_injection

 

Good point, but the user is not allowed to write SQL queries at all 🙂 The method I'm working on / looking for is allowing the user to select (enter) field names, maps those to real fields in the database (attempted sql injection will fail at this level as '; DROP TABLE Users; will not map to any field in the database) and based on the mapped fields constructs a SELECT query with proper joins.

 

Maybe I'm not viewing this scenario from all aspects, but I don't see the possibility of a vulnerability here.

Share this post


Link to post

@aehimself Just a thought. You may need to support "as" naming to handle duplicate field names in table joins or IsNull code. 

select aStrField, IsNull(aIntField, 0) as aIntField from aTable

Without the as - the aIntField will not be named in the result set - at least not for MS SQL.

 

Then there is the question of count, max, min, and so forth...

Once you open for custom queries - you basically open a can of feature requests 😉

Share this post


Link to post
1 hour ago, Lars Fosdal said:

@aehimself Just a thought. You may need to support "as" naming to handle duplicate field names in table joins or IsNull code. 


select aStrField, IsNull(aIntField, 0) as aIntField from aTable

Without the as - the aIntField will not be named in the result set - at least not for MS SQL.

 

Then there is the question of count, max, min, and so forth...

Once you open for custom queries - you basically open a can of feature requests 😉

I'm not too worried about duplicate file names, as I'm planning to generate the query with TABLENAME.FIELDNAME all the times (thus the mapping) and if the output will show MyField and MyField_1 in the column header... then so be it.

As for the IsNull you might be right but not during data display (my database component handles null values as 0, which is perfect in my case) but at WHERE clauses... I recall having an issue on MSSQL not returning correct rows as it handled NULL and 0 (or '', I don't remember exactly) differently. We will see how exactly it will turn out, especially with the "can" you are talking about 🙂

 

As for the external components I think I will have to gather my strength and finalize my solution. After all it's less than 400 lines (only field mapping and the crawler until now, though) and does not increase the budget. Good news is that I'm either doing this or implementing the requested XML documentation commenting. And I rather code than to do documentation 😄

Share this post


Link to post

I have designed and written several systems like that. Some have been for use by the end-user (for those that don't understand SQL or data relationship) and some have been for the internal use of an application (to avoid hard coding queries). My experience is that it never worked like it was supposed to. It might be that my designs was flawed but it's not like I haven't tried everything I could think of.

 

It is a much more complex problem than what your description suggests. I actually know of a company that was run into the ground trying to solve it (well, there were other factors but this was the one that broke their back).

 

The last system I implemented consisted of four major components:

  • A data dictionary. Contains the meta data: Tables, columns, relationships, etc.
  • A graph. Basically a Weighted Undirected Graph using Adjacency List that represents all tables and their relationships.
  • A resolver. Given a start and ending node, finds the cheapest path through the graph using Dijkstra's algorithm.
  • A query builder. Uses the input criteria (where), the output list (select) and the path through the graph (joins) to generates SQL.

This works great for simple queries, small datasets or simple databases but once you throw a real world, large and complex database at it falls apart for several reasons. Here's just a few of them:

  • It is not enough to find the shortest/cheapest path through the graph.
    Given multiple possible paths, the solution should optimally pass through as many of the "where"-tables as possible.
    It's not obvious what nodes to chose for the start and end node and the choice will affect the result.
  • The criteria often needs to be qualified/scoped or the result will not be what the user expected.
    For example if the user asks the car database for "car.name" where "country.name=China" expecting a list of Chinese cars they will not understand that they receive a list of European cars assembled in China because the resolver took an unexpected path through the graph.
  • There are many other cases where a choice, that affect the result, has to be made but the user hasn't supplied the information required to do so.
    In many situation the desired type of join to use cannot be inferred form the cardinality of a relation.
    The users often don't understand cardinality and most often don't even know where they will need to resolve a ambiguity with additional criteria.
    The query builder don't know how it should resolve a many-to-many relation or if it should resolve it at all.

 

My recommendation is that you use a traditional query builder instead and concentrate your efforts on the documentation you mentioned.

Edited by Anders Melander
  • Thanks 1

Share this post


Link to post

@Anders Melander The good thing is that my prototype consists of exactly the same 4 parts; which makes me believe I'm not drifting too far away from the original goal 🙂

My database structure is not large, consists of 10-ish, 11-ish tables, containing only one circular reference until now. It wouldn't even be impossible to hard code (or generate to source, that is) the shortest possible paths an all possible queries, but I am really against these kind of solutions. I was planning to test this idea on a large database (~1500 tables, multiple circular references, standalone tables, etc.) but I need to extract the fields and connections first.

As for the concerns yes - I have them in mind. Especially the crawler. While it works well in my scenario but it's unoptimized and I worry about it's performance on more complex structure. That's something which will turn out on the long run.

Scoping is already fully supported. I don't have many fields with the same name but yes, I already saw it as a possible problem source. I also have a theory of a mechanism to analyze and learn the more common requests so future ones can be more close to what the user wanted, but this is strictly a theory until now, and won't even start to work on it until I decide to go with my solution or to purchase a component doing this already.

 

And thank you very much for your post, it was tremendously informative!

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×