Generating XML from SQL - part 2

xmlThis time, I present the most advanced way of generating XML documents from relational data in SQL Server - the FOR XML PATH option. It is substantially more flexible than the options described last week (FOR XML RAW, FOR XML AUTO). It allows explicitly indicating which columns should be converted to XML attributes and which should be XML elements. Additionally, I will show you how to create an XML schema that defines a structure for documents generated using any of the FOR XML options.

The previous article about generating XML documents in SQL Server described two basic options: FOR XML RAW and FOR XML AUTO. Their advantage over the method that I am going to use today is easiness of using them. Unfortunately, not all combinations of attributes and elements are possible to achieve - you can choose that all data from columns goes to XML elements or all data goes to XML attributes. FOR XML PATH lets you combine them.

In this article I continue extending the example from the previous article so if you have not read it yet, I encourage you to do so.

 

Full flexibility with FOR XML PATH

This is how a basic query looks like:

SELECT agr_name AS [name], acc_id AS [id], acc_name AS [name], trans_id AS [id], ttype_name AS [type], trans_value AS [value]
FROM transactions [transaction]
JOIN transaction_types [type] ON trans_ttype_id = ttype_id
JOIN accounts [account] ON acc_id = trans_acc_id
JOIN account_groups [group] ON agr_id = acc_agr_id
ORDER BY agr_name, acc_id, trans_id
FOR XML PATH('group'), ROOT('transactions')

<transactions>
    <group>
        <name>gold members</name>
        <id>1</id>
        <name>Account 1</name>
        <id>1</id>
        <type>credit</type>
        <value>10000.0000</value>
    </group>
    <group>
        <name>regular members</name>
        <id>2</id>
        <name>Account 2</name>
        <id>5</id>
        <type>payment</type>
        <value>-103.0000</value>
    </group>
    <group>
        <name>regular members</name>
        <id>3</id>
        <name>Account 3</name>
        <id>2</id>
        <type>credit</type>
        <value>3.9500</value>
    </group>
    ...
</transactions>

As you can see, there is no nesting representing transactions and accounts. There is only the top level transactions element and transactions nodes representing rows. It is far from what I need.

FOR XML PATH allows driving a process of building XML documents using a naming convention. To indicate nesting, the slash character (/) is used. In this case, all data that belongs to accounts should go under account element. To achieve that, it is enough to prefix each account column in the SQL query with account/ text. It means, that two columns: acc_id and acc_name get new aliases: account/id and account/name. Going forward, I need to do something with the other columns. I want the transaction columns to be wrapped with a transaction node and to go inside the account elements. Prefixing trans_id and trans_value columns with account/transaction/ will do the job. So far, account and transaction columns have been taken care of but there are two more - agr_name and ttype_name. What about them? The first one belongs to the account group and I want it to stay on top so it is left without a prefix. Although the second one comes from the transaction_types table so it could be automatic to create the third level of nesting with an account/transaction/type prefix, for me it is only a dictionary to resolve enigmatic trans_ttype_id and I think that more natural is to put it on the account/transaction level.

The complete query with all the above changes is presented below:

SELECT agr_name AS [name], acc_id AS [account/id], acc_name AS [account/name], trans_id AS [account/transaction/id], ttype_name AS [account/transaction/type], trans_value AS [account/transaction/value]
FROM transactions [transaction]
JOIN transaction_types [type] ON trans_ttype_id = ttype_id
JOIN accounts [account] ON acc_id = trans_acc_id
JOIN account_groups [group] ON agr_id = acc_agr_id
ORDER BY agr_name, acc_id, trans_id
FOR XML PATH('group'), ROOT('transactions')

<transactions>
    <group>
        <name>gold members</name>
        <account>
            <id>1</id>
            <name>Account 1</name>
            <transaction>
                <id>1</id>
                <type>credit</type>
                <value>10000.0000</value>
            </transaction>
        </account>
    </group>
    <group>
        <name>regular members</name>
        <account>
            <id>2</id>
            <name>Account 2</name>
            <transaction>
                <id>5</id>
                <type>payment</type>
                <value>-103.0000</value>
            </transaction>
        </account>
    </group>
    <group>
        <name>regular members</name>
        <account>
            <id>3</id>
            <name>Account 3</name>
            <transaction>
                <id>2</id>
                <type>credit</type>
                <value>3.9500</value>
            </transaction>
        </account>
    </group>
  ...
</transactions>

Yippee! The nesting is proper now.

As I asserted at the end of the previous article, I was not happy about having all data as XML elements. For example, I prefer IDs of accounts and transactions to become attributes while keeping all other as elements.

FOR XML PATH provides an easy way to do this. FOR XML uses columns naming convention in a SQL query to allow you to indicate your choice for each column. If a column should be an attribute, its alias should start with the @ character. If a column should be an element, its alias should not start with the @ character. Isn't it simple? I change two aliases: account/id to account/@id and account/transaction/id to account/transaction/@id. Please take a look at the below example:

SELECT agr_name AS [name], acc_id AS [account/@id], acc_name AS [account/name], trans_id AS [account/transaction/@id], ttype_name AS [account/transaction/type], trans_value AS [account/transaction/value]
FROM transactions [transaction]
JOIN transaction_types [type] ON trans_ttype_id = ttype_id
JOIN accounts [account] ON acc_id = trans_acc_id
JOIN account_groups [group] ON agr_id = acc_agr_id
ORDER BY agr_name, acc_id, trans_id
FOR XML PATH('group'), ROOT('transactions')

<transactions>
    <group>
        <name>gold members</name>
        <account id="1">
            <name>Account 1</name>
            <transaction id="1">
                <type>credit</type>
                <value>10000.0000</value>
            </transaction>
        </account>
    </group>
    <group>
        <name>regular members</name>
        <account id="2">
            <name>Account 2</name>
            <transaction id="5">
                <type>payment</type>
                <value>-103.0000</value>
            </transaction>
        </account>
    </group>
    <group>
        <name>regular members</name>
        <account id="3">
            <name>Account 3</name>
            <transaction id="2">
                <type>credit</type>
                <value>3.9500</value>
            </transaction>
        </account>
    </group>
    ..
</transactions>

The nesting is correct, IDs are XML attributes, other data is presented in XML elements. I am pretty much done with polishing the XML document.

 

Producing XML schema

An XML document can stay flexible and semi-structured, giving a freedom to whoever creates it to modify its structure or you can enforce a specific structure (nesting, naming, data constraints) that you and your system can understand. Such rigor can be formalized by an XML schema. An XML schema can be created manually, but as SQL Server can produce XML documents by itself, what could stop it from producing XML schema for those documents? The answer is - nothing.

An XML schema is produced by using the XMLSCHEMA directive. If it is added to the SQL query, both: an XML schema and an XML document are returned. To avoid this, I add WHERE 1 = 2 to the query. It prevents returning data so only a schema is generated.

SELECT agr_name, acc_id, acc_name, trans_id, ttype_name, trans_value
FROM transactions
JOIN transaction_types ON trans_ttype_id = ttype_id
JOIN accounts ON acc_id = trans_acc_id
JOIN account_groups ON agr_id = acc_agr_id
WHERE 1 = 2
ORDER BY agr_name, acc_id, trans_id
FOR XML AUTO, ELEMENTS, XMLSCHEMA('transactions')

<xsd:schema targetNamespace="transactionsxmlns:schema="transactionsxmlns:xsd="http://www.w3.org/2001/XMLSchemaxmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypeselementFormDefault="qualified">
  <xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypesschemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
  <xsd:element name="account_groups">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="agr_nameminOccurs="0">
          <xsd:simpleType>
            <xsd:restriction base="sqltypes:varcharsqltypes:localeId="1033sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth">
              <xsd:maxLength value="255" />
            </xsd:restriction>
          </xsd:simpleType>
        </xsd:element>
        <xsd:element ref="schema:accountsminOccurs="0maxOccurs="unbounded" />
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:element name="accounts">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="acc_idtype="sqltypes:int" />
        <xsd:element name="acc_name"minOccurs="0">
          <xsd:simpleType>
            <xsd:restriction base="sqltypes:varcharsqltypes:localeId="1033sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth">
              <xsd:maxLength value="255" />
            </xsd:restriction>
          </xsd:simpleType>
        </xsd:element>
        <xsd:element ref="schema:transactionsminOccurs="0maxOccurs="unbounded" />
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:element name="transactions">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="trans_idtype="sqltypes:int" />
        <xsd:element name="trans_valuetype="sqltypes:moneyminOccurs="0" />
        <xsd:element ref="schema:transaction_typesminOccurs="0maxOccurs="unbounded" />
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:element name="transaction_types">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="ttype_nameminOccurs="0">
          <xsd:simpleType>
            <xsd:restriction base="sqltypes:varcharsqltypes:localeId="1033sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth">
              <xsd:maxLength value="255" />
            </xsd:restriction>
          </xsd:simpleType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

The above XML schema describes a document produced by the following query:

SELECT agr_name, acc_id, acc_name, trans_id, ttype_name, trans_value
FROM transactions
JOIN transaction_types ON trans_ttype_id = ttype_id
JOIN accounts ON acc_id = trans_acc_id
JOIN account_groups ON agr_id = acc_agr_id
ORDER BY agr_name, acc_id, trans_id
FOR XML AUTO, ELEMENTS

It is great to create an XML schema just by using a proper directive. Then, having one SQL query, both: XML document and its schema can be created without additional effort.

Unfortunately, the XMLSCHEMA directive does not work with FOR XML PATH so you can use it only with FOR XML RAW and FOR XML AUTO.

 

Summary

I hope you took advantage from my two articles about generating XML documents in SQL Server. In my opinion, this feature works great and allows customizing the produced documents so it can fulfill requirements of various systems. FOR XML PATH gives the greatest flexibility but as XMLSCHEMA does not cooperate with it, the first choice should be FOR XML RAW or FOR XML AUTO. FOR XML PATH is the last resort choice but a very powerful option.