XML has become a very popular document format. It has many advantages: it is flexible, self describing and easily portable. It seems suitable for exchanging data between systems written in different programming languages. In this article, I will present how to generate XML documents from relational data retrieved from SQL queries. SQL Server provides a great feature to achieve that - FOR XML. Of course, as the same data can be presented in many ways, there are various options of FOR XML. This article is the first part of a guide through those possibilities offered by SQL Server 2014.
Environment
For purpose of this demonstration, I use a database with tables presented on the diagram:
There are four accounts grouped into three categories (gold, silver and regular members). The accounts contain some transactions which are different types (payment, credit, void).
Here is a query that joins this data together:
SELECT *
FROM transactions
JOIN transaction_types ON trans_ttype_id = ttype_id
JOIN accounts ON acc_id = trans_acc_id
JOIN account_groups ON agr_id = acc_agr_id
Now, lets turn this tabular result set into an XML document/fragment.
FOR XML RAW
Attribute-centric XML
An easy way to build an XML fragment is to use FOR XML RAW option. Below is a sample query that has only two changes from the above SQL query: reduced number of columns to only those that I need and the FOR XML RAW clause at the end.
SELECT agr_name, acc_id, acc_name, trans_id, ttype_name, trans_value
FROM transactions
JOIN transaction_types ON trans_ttype_id = ttype_id
JOIN accounts ON acc_id = trans_acc_id
JOIN account_groups ON agr_id = acc_agr_id
FOR XML RAW
<row agr_name="gold members" acc_id="1" acc_name="Account 1" trans_id="1" ttype_name="credit" trans_value="10000.0000" />
<row agr_name="regular members" acc_id="3" acc_name="Account 3" trans_id="2" ttype_name="credit" trans_value="3.9500" />
<row agr_name="regular members" acc_id="3" acc_name="Account 3" trans_id="3" ttype_name="payment" trans_value="-1.1200" />
<row agr_name="regular members" acc_id="3" acc_name="Account 3" trans_id="4" ttype_name="credit" trans_value="21.0000" />
<row agr_name="regular members" acc_id="2" acc_name="Account 2" trans_id="5" ttype_name="payment" trans_value="-103.0000" />
As you can see above, FOR XML RAW in the basic version generates an XML fragment with a single element called row for each row from SQL result set.
Naming an element row does not make much sense. Fortunately, you can give it a more meaningful name by adding a parameter to the RAW option - RAW('transaction')
:
SELECT agr_name, acc_id, acc_name, trans_id, ttype_name, trans_value
FROM transactions
JOIN transaction_types ON trans_ttype_id = ttype_id
JOIN accounts ON acc_id = trans_acc_id
JOIN account_groups ON agr_id = acc_agr_id
FOR XML RAW('transaction')
<transaction agr_name="gold members" acc_id="1" acc_name="Account 1" trans_id="1" ttype_name="credit" trans_value="10000.0000" />
<transaction agr_name="regular members" acc_id="3" acc_name="Account 3" trans_id="2" ttype_name="credit" trans_value="3.9500" />
<transaction agr_name="regular members" acc_id="3" acc_name="Account 3" trans_id="3" ttype_name="payment" trans_value="-1.1200" />
<transaction agr_name="regular members" acc_id="3" acc_name="Account 3" trans_id="4" ttype_name="credit" trans_value="21.0000" />
<transaction agr_name="regular members" acc_id="2" acc_name="Account 2" trans_id="5" ttype_name="payment" trans_value="-103.0000" />
Both above examples are XML fragments not XML documents. Why? The difference between those two is simple - an XML document has to have a single root element, an XML fragment does not. The previous examples contains 5 root elements each so they are XML fragments. If you want to produce an XML document, the ROOT directive can be used as in the following query:
SELECT agr_name, acc_id, acc_name, trans_id, ttype_name, trans_value
FROM transactions
JOIN transaction_types ON trans_ttype_id = ttype_id
JOIN accounts ON acc_id = trans_acc_id
JOIN account_groups ON agr_id = acc_agr_id
FOR XML RAW('transaction'), ROOT('transactions')
<transactions>
<transaction agr_name="gold members" acc_id="1" acc_name="Account 1" trans_id="1" ttype_name="credit" trans_value="10000.0000" />
<transaction agr_name="regular members" acc_id="3" acc_name="Account 3" trans_id="2" ttype_name="credit" trans_value="3.9500" />
<transaction agr_name="regular members" acc_id="3" acc_name="Account 3" trans_id="3" ttype_name="payment" trans_value="-1.1200" />
<transaction agr_name="regular members" acc_id="3" acc_name="Account 3" trans_id="4" ttype_name="credit" trans_value="21.0000" />
<transaction agr_name="regular members" acc_id="2" acc_name="Account 2" trans_id="5" ttype_name="payment" trans_value="-103.0000" />
</transactions>
The five elements are now wrapped with one root element named transactions.
All examples presented so far are attribute-centric XML fragments/documents. It means that data is stored in XML attributes. Because of that, it was possible to have each row that consisted of six columns be presented as a single, flat element in an XML fragment/document.
Element-centric XML
Generating attribute-centric XMLs is not the only option. FOR XML RAW can also generate an element-centric XML document when the ELEMENTS directive is added:
SELECT agr_name, acc_id, acc_name, trans_id, ttype_name, trans_value
FROM transactions
JOIN transaction_types ON trans_ttype_id = ttype_id
JOIN accounts ON acc_id = trans_acc_id
JOIN account_groups ON agr_id = acc_agr_id
FOR XML RAW('transaction'), ROOT('transactions'), ELEMENTS
<transactions>
<transaction>
<agr_name>gold members</agr_name>
<acc_id>1</acc_id>
<acc_name>Account 1</acc_name>
<trans_id>1</trans_id>
<ttype_name>credit</ttype_name>
<trans_value>10000.0000</trans_value>
</transaction>
<transaction>
<agr_name>regular members</agr_name>
<acc_id>3</acc_id>
<acc_name>Account 3</acc_name>
<trans_id>2</trans_id>
<ttype_name>credit</ttype_name>
<trans_value>3.9500</trans_value>
</transaction>
<transaction>
<agr_name>regular members</agr_name>
<acc_id>3</acc_id>
<acc_name>Account 3</acc_name>
<trans_id>3</trans_id>
<ttype_name>payment</ttype_name>
<trans_value>-1.1200</trans_value>
</transaction>
<transaction>
<agr_name>regular members</agr_name>
<acc_id>3</acc_id>
<acc_name>Account 3</acc_name>
<trans_id>4</trans_id>
<ttype_name>credit</ttype_name>
<trans_value>21.0000</trans_value>
</transaction>
<transaction>
<agr_name>regular members</agr_name>
<acc_id>2</acc_id>
<acc_name>Account 2</acc_name>
<trans_id>5</trans_id>
<ttype_name>payment</ttype_name>
<trans_value>-103.0000</trans_value>
</transaction>
</transactions>
Now, data is presented not as attributes but as XML elements. It is still an XML document because it contains a root element - transactions. The first level of nesting has transaction elements. There are five of them as there are five rows returned by the SQL query. Each column data has its own XML element in the second level of nesting.
In some sense the result still looks like a table - a result set of the SQL query. There are still: a table (transactions root element), rows (transaction elements) and columns (leave elements). It might not be easy to tell how many accounts a particular account group has just by a quick looking at the document. It is not easy as in case of the SQL query result. The document can build much better than that.
FOR XML AUTO
What is different from FOR XML RAW?
The FOR XML AUTO option moves generating XML documents to a new level. See the basic example below. Notice that I have added the ORDER BY clause. It is important in case of the FOR XML AUTO option as the order of rows can significantly change nesting of the elements in the XML document - rows that should be grouped together on some level, should be close each other in the SQL query result. In my case, the highest level is an account group so I sort by agr_name in the first place. Then there are accounts in those groups so I use acc_id as the second sorting column. Transactions are deepest elements of the nesting hierarchy so I do not need to sort them, but I want to - to keep them chronological, so I also use trans_id for sorting.
SELECT agr_name, acc_id, acc_name, trans_id, ttype_name, trans_value
FROM transactions
JOIN transaction_types ON trans_ttype_id = ttype_id
JOIN accounts ON acc_id = trans_acc_id
JOIN account_groups ON agr_id = acc_agr_id
ORDER BY agr_name, acc_id, trans_id
FOR XML AUTO
<account_groups agr_name="gold members">
<accounts acc_id="1" acc_name="Account 1">
<transactions trans_id="1" trans_value="10000.0000">
<transaction_types ttype_name="credit" />
</transactions>
</accounts>
</account_groups>
<account_groups agr_name="regular members">
<accounts acc_id="2" acc_name="Account 2">
<transactions trans_id="5" trans_value="-103.0000">
<transaction_types ttype_name="payment" />
</transactions>
</accounts>
<accounts acc_id="3" acc_name="Account 3">
<transactions trans_id="2" trans_value="3.9500">
<transaction_types ttype_name="credit" />
</transactions>
<transactions trans_id="3" trans_value="-1.1200">
<transaction_types ttype_name="payment" />
</transactions>
<transactions trans_id="4" trans_value="21.0000">
<transaction_types ttype_name="credit" />
</transactions>
</accounts>
</account_groups>
There are two account_groups elements. Each group has child elements which are accounts. The accounts have transactions. The transactions have transaction_types. The biggest difference that distinguish FOR XML AUTO from FOR XML RAW is that the first one presents data in a more smart way while the second one just converts rows into XML elements. The result of the FOR XML AUTO option looks more human-friendly. There are more levels, I have all transactions on one account grouped together under the same node.
Even the new result looks better, not all I like about it. It is an XML fragment again because it has two account_groups instead of a single root element. But I want to have an XML document. It can be easily changed in the same way as in the case of FOR XML RAW - by adding ROOT('transactions')
directive. I may not like using attributes to store data so can switch to the element-centric XML in the already-known way - the ELEMENTS directive.
SELECT agr_name, acc_id, acc_name, trans_id, ttype_name, trans_value
FROM transactions
JOIN transaction_types ON trans_ttype_id = ttype_id
JOIN accounts ON acc_id = trans_acc_id
JOIN account_groups ON agr_id = acc_agr_id
ORDER BY agr_name, acc_id, trans_id
FOR XML AUTO, ROOT('transactions'), ELEMENTS
<transactions>
<account_groups>
<agr_name>gold members</agr_name>
<accounts>
<acc_id>1</acc_id>
<acc_name>Account 1</acc_name>
<transactions>
<trans_id>1</trans_id>
<trans_value>10000.0000</trans_value>
<transaction_types>
<ttype_name>credit</ttype_name>
</transaction_types>
</transactions>
</accounts>
</account_groups>
<account_groups>
<agr_name>regular members</agr_name>
<accounts>
<acc_id>2</acc_id>
<acc_name>Account 2</acc_name>
<transactions>
<trans_id>5</trans_id>
<trans_value>-103.0000</trans_value>
<transaction_types>
<ttype_name>payment</ttype_name>
</transaction_types>
</transactions>
</accounts>
<accounts>
<acc_id>3</acc_id>
<acc_name>Account 3</acc_name>
<transactions>
<trans_id>2</trans_id>
<trans_value>3.9500</trans_value>
<transaction_types>
<ttype_name>credit</ttype_name>
</transaction_types>
</transactions>
<transactions>
<trans_id>3</trans_id>
<trans_value>-1.1200</trans_value>
<transaction_types>
<ttype_name>payment</ttype_name>
</transaction_types>
</transactions>
<transactions>
<trans_id>4</trans_id>
<trans_value>21.0000</trans_value>
<transaction_types>
<ttype_name>credit</ttype_name>
</transaction_types>
</transactions>
</accounts>
</account_groups>
</transactions>
As expected, all attributes are converted into elements. It is already an XML document and it starts looking pretty cool already but I am not going to stop enhancing it.
Renaming element names
A thing that still disturbs me is the naming of the elements. A node that describes a transaction is named transactions which is plural in spite of a singular form. The same about accounts and account_groups. Moreover, the names that looked good in the SQL query result, now seem too technical. I mean the elements that come directly from table columns - trans_id, trans_value, acc_id etc. Fortunately, fixing this is, again, simple. The names directly come from the names of tables and columns from the SQL query. To change them, I can just give them aliases in the query. See the result below:
SELECT agr_name AS name, acc_id AS id, acc_name AS name, trans_id AS id, ttype_name AS type, trans_value AS value
FROM transactions [transaction]
JOIN transaction_types [type] ON trans_ttype_id = ttype_id
JOIN accounts [account] ON acc_id = trans_acc_id
JOIN account_groups [group] ON agr_id = acc_agr_id
ORDER BY agr_name, acc_id, trans_id
FOR XML AUTO, ROOT('transactions'), ELEMENTS
<transactions>
<group>
<name>gold members</name>
<account>
<id>1</id>
<name>Account 1</name>
<transaction>
<id>1</id>
<value>10000.0000</value>
<type>
<type>credit</type>
</type>
</transaction>
</account>
</group>
<group>
<name>regular members</name>
<account>
<id>2</id>
<name>Account 2</name>
<transaction>
<id>5</id>
<value>-103.0000</value>
<type>
<type>payment</type>
</type>
</transaction>
</account>
<account>
<id>3</id>
<name>Account 3</name>
<transaction>
<id>2</id>
<value>3.9500</value>
<type>
<type>credit</type>
</type>
</transaction>
<transaction>
<id>3</id>
<value>-1.1200</value>
<type>
<type>payment</type>
</type>
</transaction>
<transaction>
<id>4</id>
<value>21.0000</value>
<type>
<type>credit</type>
</type>
</transaction>
</account>
</group>
</transactions>
The element names are not changed and make more sense to me.
I hope you now believe me that SQL Server is powerful when it comes to generating XML documents and ... this is not all ... Our journey through XML world of SQL Server is not finished yet.
Although, the final XML document looks pretty good, I am still not absolutely happy with the result. There are some elements that I would like to have in attributes while keeping the rest of them as elements. There are quite easy ways to achieve exactly what I need and even more.
You can read about further customization of XML fragments/documents and generating an XSD schema in my next article.