Deconstructing XML Context from SQL Server


Recently I have had a requirement to deconstruct XML in SQL Server and present fragments of that XML to users as field values.  This can be achieved relatively easy using a combination of XQuery/XPath against XML data that stored in fields within an SQL Server table.  We’ll start by looking at some simple cases and then look at more complex XML considerations.

Simple Examples

In my simplest example, I’ll be using a snapshot of XML from books (you can find the full version here) and I have included what I am using below for clarity.

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
 </catalog>

In our first example, we assume that the XML is included as a chunk in a field and therefore, we can define an XML column with the following SQL.

 
;with mydataset as 
(
	select 1 as row_id, @my_xml  as xml_content
	union all select 2 as row_id, @my_xml  as xml_content
)

Select *
from mydataset 

You’ll notice that the query results look like the following.  Clicking on (a cell) from the xml_content field will open up the XML in a new window.

SNAGHTML485b942

There is no big deal here but, suppose I now want to add to my projection (ie the select) based on the data in the XML fragment, for example, the first books author.  I can use an XPath pattern to navigate to the node (tag) that I am interested in and display the nodes text.  My SQL (selecting from mydataset) would now become;

Select *
, mydataset.xml_content.query('/catalog/book[1]/author/text()') as first_book_author 
from mydataset 

There are really two parts to the field I added.  Firstly we use the .query method to parse an XQuery against the XML column.  You can read more about the query method here and XPath here (actually the XPath link includes some good examples for XPath).

If your not too familiar with XQuery or XPath, there are several things to note in the query that I’ve used (ie the ‘/catalog/book[1]/author/text()’.

First, it follows the document structure and you can select nodes by using the delimited structure starting at the root node.  The first / refers to the root and then we define the nodes of the document that we want to see which are delimited by name (for example a collection of book nodes is defined by /catalog/book/).

Secondly, I can refer to a particular node by specifying its index. This is done with the square brackets after the book node (ie book[1] refers to the first book node – the book with the ID=”bk101”).

Finally, I can extract the nodes value (text) by using the text() function. That’s the text between the the nodes opening and closing tag.

Of course, the return type is XML so if I wanted to convert it to a text type I would have to cast.

cast(mydataset.xml_content.query('/catalog/book[1]/author/text()') as nvarchar(255)) as first_book_author 

Adding Namespaces

For something a little more interesting (and perhaps more practical), I recently needed to extract the some dataset information from an SSRS Server (actually, it was the query for datasets).  Datasets and other information are included in the report server database Catalog table.  Its an interesting table (perhaps one may argue that its not entirely relational) because it includes different types of objects however content data (the XML relating to an object) is stored in an Image format so the first step is to convert it to XML.

Since the table contains different types of data (for different SSRS objects) we can restrict the table by the Type field but for now lets not worry about it. I’ve converted the Content field to XML the the query below. You can see the XML (based on the root nodes value) in the image below

image

<SharedDataSet xmlns:rd="http://schemas.microsoft.com/SQLServer/reporting/reportdesigner" xmlns="http://schemas.microsoft.com/sqlserver/reporting/2010/01/shareddatasetdefinition">
  <DataSet Name="">
    <Query>
      <DataSourceReference>my_connection_name</DataSourceReference>
      <CommandText>my_sqlcommand</CommandText>
    </Query>
    <Fields>
      <Field Name="field_1">
        <DataField>fiedl_1</DataField>
        <rd:TypeName>System.Int32</rd:TypeName>
      </Field>
      <Field Name="field_2">
        <DataField>field_2</DataField>
        <rd:TypeName>System.String</rd:TypeName>
      </Field>
    </Fields>
  </DataSet>
</SharedDataSet>

You may think that the use of an XPATH expression ‘/SharedDataSet/DataSet/Query/CommandText/text() would define the nodes text (for the dataset).  Unfortunately, the query (the field dataset_query in the output) is blank – Try it for yourself.

select 
*
, ssrs_data.my_xml.query('/SharedDataSet/DataSet/Query/CommandText/text()') as dataset_query  

from 
(
	select
		Content  
		, Convert(xml, (CONVERT(varbinary(max), Content))) as my_xml 
	from [dbo].[Catalog]

) as ssrs_data
[/Code]&lt;/p&gt; &lt;p&gt;If we revisit the sample for the XML above, we can wee that it includes namespaces (note the use of xmlns in the root node). Namespaces avoid conflicts in node names so that the same tag (node) could have several meanings within the XML snippet. There are 2 namespaces in our example above, one for rd (the first) and a default value (the second one).&lt;/p&gt; &lt;p&gt;You can also see how the rd namespace is applied in the XML through the tag &lt;rd:TypeName&gt;. If there is no namespace used in XML, we can assume that its the default.&lt;/p&gt; &lt;p&gt;The problem for us is that we have to define the namespaces in our query so our XPath expression can recognize the namespace in the document and we use &lt;a href="https://msdn.microsoft.com/en-AU/library/ms177607.aspx"&gt;NAMESPACES&lt;/a&gt; syntax for this. This CTE type table defines namespaces and there preference in the document. I could be cheeky and define the default namespace (since I am only working with tags in that space) and that’s what i have done below.&lt;/p&gt; &lt;p&gt;
 
with 
XMLNAMESPACES (	
	DEFAULT N'http://schemas.microsoft.com/sqlserver/reporting/2010/01/shareddatasetdefinition'
			)

select 
*
, ssrs_data.my_xml.query('/SharedDataSet/DataSet/Query/CommandText/text()') as dataset_query  

from 
(
	select
		Content  
		, Convert(xml, (CONVERT(varbinary(max), Content))) as my_xml 
	from [dbo].[Catalog]

) as ssrs_data

However, suppose I want to include the namespace in my query (say for example the first field type), my query now becomes.

with 
XMLNAMESPACES (	
	DEFAULT N'http://schemas.microsoft.com/sqlserver/reporting/2010/01/shareddatasetdefinition', 
	N'http://schemas.microsoft.com/SQLServer/reporting/reportdesigner' as rd
			)

select 
*
, ssrs_data.my_xml.query('/SharedDataSet/DataSet/Query/CommandText/text()') as dataset_query  
, ssrs_data.my_xml.query('/SharedDataSet/DataSet/Fields/Field[1]/rd:TypeName/text()') as first_field_type 
, ssrs_data.my_xml.query('count(/SharedDataSet/DataSet/Fields/Field)') as field_count 
from 
(
	select
		Content  
		, Convert(xml, (CONVERT(varbinary(max), Content))) as my_xml 
	from [dbo].[Catalog]

) as ssrs_data


Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s