I have a dream that one day all applications being developed will be database-agnostic. I have a dream that one day I will build a system that will be able to use every database engine I will want. And other developer will do that too. I have a dream that one day if a company decides to reduce database licensing cost, it will just switch their ERP, CRM and other software to a cheaper or even free database engine.
I have a dream today!1
Is it just a dream or is it actually possible? Maybe there are such database-agnostic applications already created?
What does database-agnostic software mean?
Agnosticism (from Ancient Greek – without knowledge) is the view that some claims about the supernatural existence are unknown or even are impossible to verify whether they are true or not.
Using the agnosticism term in the software engineering is extending the meaning of the term but it makes some sense to me. A term database-agnostic application is used when it is talked about an application which uses a database but it does not know and has no way (or is not interested in knowing) what database engine it is. For example, such system could be an online shop written in Java and JavaScript. Obviously, it would need a database to store products, orders, transactions etc., but it could use any database engine for that purpose. By any, I mean ANY, whatever database engine you could think of: SQL Server, Oracle, MySQL, PostgreSQL, NoSQL database engines. Ok, let’s make this topic a little bit easier so I write off “NoSQL”. Let’s focus on relational database engines.
The below diagram presents one from the most common architectures of a web application (like an online shop) from the high level view.
The front-end is written in JavaScript which is executed by a user’s browser on the user’s laptop. It sends requests to the backend written in Java executed on the server. Java backend does not persist data by itself so it queries a MySQL database for it. The MySQL database responds with data, the Java backend processes it and sends it to the JavaScript frontend. The client’s browser presents a web page with the data to the user.
Now, imagine that the same online shop software is distributed to multiple companies and installed around the world. One company uses MySQL as a database engine, another one because of high traffic wants to replace MySQL with Oracle. Another one has a great deal with Microsoft and wants to use SQL Server instead. If the vendor of this software says that is fine, you can use whatever database engine you want, it is a database-agnostic application.
Why is it so catchy?
Last week I published my thoughts about the modular architecture concept in the software engineering. The database-agnostic concept is an extension to the modular architecture with replaceable components. I must say, it is even more encouraging. If you have ever developed software that was installed by your clients on their hardware in their datacenters, I am sure you noticed that they have different preferences about database engines. Some of them are not enthusiastic about paying additional licensing fees for SQL Server. Some do not have MySQL administrators but have a great licensing deal with Oracle. Some of them have only servers with Windows and prefer SQL Server. Choosing the right vendor at the beginning of the project is a huge decision. Of course, the system could be built and tested for a few different database engines but that would increase development costs, right? What if you could build software once as a database-agnostic and let your customers install it with any database engine they want?
Can you now feel the power of the concept?
How software can be database-agnostic?
I will focus on technology that I have at least some knowledge about – Java.
Let’s think for a second how an application can be written to not be tied to any specific database engine. Some ideas are:
Move as much logic as possible to the application code
Implementing business logic in the database usually requires using a procedural SQL language which is very specific to a database engine. SQL Server has T-SQL, Oracle has PL/SQL. T-SQL allows nested transactions, returning result sets directly to the client, using table-valued functions. PL/SQL has autonomous transactions, parametrized views, packages. They are so different that there is no chance to write procedures, functions, triggers code that will compile on those two database engines not even mentioning all other. Giving up and implementing all business logic in the application code is a must.
Avoid vendor-specific features
Even if the logic is implemented in the application, the database will still have to process SQL queries because it is the only widely used method of managing data in relational world. The queries will be send by the application so they should not contain any SQL Server-specific, Oracle-specific or any other vendor-specific features. SQL standard should be used instead. The latest ISO standard of SQL is SQL:2011. Unfortunately, there is no single database engine that fully supports even SQL:2011 Core. For example the latest Oracle 12c has multiple gaps. For example it does not distinguish a zero-length VARCHAR string from NULL, it provides MINUS table operator instead of EXCEPT DISTINCT. I have not found a documentation SQL Server being compliant to ISO SQL standard but a couple quick examples: || as string concatenation is not supported,ORDER BY NULLS LAST
does not work.
Any standard makes sense when all or almost all vendors are compliant. In relational databases world, there is no such standard for me. I cannot write a SQL query and be sure that it will not crash on any RDBMS.
Use ORM instead of writing SQL queries
As I cannot write queries that will fit all database engines, I need to come to terms with it. I can use additional layer that will make my code compliant. Fortunately, in Java world, there is already ORM technique. It allows me to operate on objects in Java and let some other library (like Hibernate) deal with the database engines differences. In practice, I define how objects in Java are mapped to tables in a database and I can simply use the objects. The library takes care of building SQL queries for a specific database engine.
The idea is brilliant, but … Is there a but? Oh, there is. Imagine you created a table called HISTORICAL_REJECTED_TRANSACTIONS on SQL Server, you wrote some code around it, used Hibernate for SQL queries generation and tested it successfully. Later the SQL Server is replaced with Oracle. Then, you realize Oracle does not allow you to create that table because its name exceeds 30 characters length. I know, it is not that difficult to not use long names, but I need to know it and remember about it. Is it the only thing I need to remember when writing portable code? No, for sure not. But I do not know what I do not know so I cannot create a complete list of such always-keep-in-mind items.
So can I build database-agnostic software or not?
I deeply believe that currently it is impossible to write a big application that can be easily switched between all relational database engines. Of course, it is possible to choose just a few engines. Develop software using constructions common to the chosen engines, avoiding vendor-specific features, using ORM technique and testing it with all those engines. Whenever an incompatibility is found, a workaround is developed for the specific case. Obviously, it is not free. There is a cost difference between testing software with one database or with two, but it still may sum up from the selling point of view. That would work, but the list of supported database engines will be closed, unlike saying “this is database-agnostic software”.
1 This is a paraphrase of the famous Martin Luther King Jr. speech.