Sunday, February 26, 2012

Using Page and CAML Query to Extract SharePoint List in SSIS

 

There are several ways to read SharePoint List as data source in SSIS.  The easiest way is using the SharePoint List Adapter.  You could download it from the Codeplex site.  However, you may want to have more control during extracting the list to avoid failure, to manipulate data before sent to the output rows, or you are experiencing timeout issue with lots of item in the list. 

Here is the solution I used:

Add For Loop Container in Task Component with a Data Flow

  • Since the SharePoint list could be large, use the for loop container to loop through the paging.

image

  • Set InitExpression to the Page number @PageCount = 1
  • Set the EvalExpression to a Variable: @LoopNext  which will be determine in the data flow.
  • Set the AssignExpression to be @PageCount = @PageCount + 1

image 

Create a Script Component  with Source Type to Read SharePoint List

  • In the data flow, add a Script Component with Source type.

image

  • Pass in @LoopNext  and @SharePointPageInfo variables as Read Write Variables.
  • Pass in @SharePointSite, @SharepointListname, @SharePoinbtView, @SharePointRowLimit variables as Read Variables.
  • Create two Outputs. One for the default SharePoint list data set, the other will hold the error data.
  • Click Edit Script… button to add .net code
  • Right click the project to add Server Reference that point to the SharePoint Site

image image

  • In CreateNewOutputRows(), instantiate the SharePoint Web Service List object
  • Create several XMLNode objects as the parameters to call System.Xml.XmlNode GetListItems()
//Retrieve the list of available fields from the GetList call to Sharepoint.
Lists SPS = new Lists();
SPS.UseDefaultCredentials = true;
SPS.Url = strSharepointURL;
 
//Create XML Node
XmlDocument xmlDoc = new System.Xml.XmlDocument();
XmlNode ndViewFields = xmlDoc.CreateNode(XmlNodeType.Element, "ViewFields", "");
XmlNode ndQueryOptions = xmlDoc.CreateNode(XmlNodeType.Element, "QueryOptions", "");
XmlNode ndQuery = xmlDoc.CreateNode(XmlNodeType.Element, "Query","");
 
 
XmlNode ndListItems = null;
ndListItems = SPS.GetListItems(strListName, strViewName, ndQuery, ndViewFields, strRowLimit, ndQueryOptions, null);



  • Once the result gets back, iterate through the XML nodes to populate the default output data row.
  • If there is an exception thrown

    • Set the @LoopNext to false which will stop the For Loop container
    • Add a row that contains the exception message to error output.


catch (System.Web.Services.Protocols.SoapException ex)
     {
         string strError = "Message:" + ex.Message + "\nDetail:" + ex.Detail.InnerText + "\nStackTrace:" + ex.StackTrace;
         
         ErrorBuffer.AddRow();
         ErrorBuffer.ErrorData = strError.ToString().Substring(0, strError.ToString().Length > 4000 ? 4000 : strError.ToString().Length);
         bLoopNext = false;
     }



  • In the data flow, sent the SharePoint list items to the destination table and the error data to error table. This way the SSIS package will not fail to execute with any SharePoint Site issue. Other process could utilize the data in the error table to recover it softly.
  • In the Add Metadata component, add the @PageCount as an output column. It could be used for recovery or debug purpose.

image 


Narrow Down the Needed Fields



Don’t select all the fields from the list. Sometime the list may contain binary data, meta data or image that may take long time to load. To narrow down the fields, just specify the fields in the ViewFields XML Node as:



XmlNode ndViewFields = xmlDoc.CreateNode(XmlNodeType.Element, "ViewFields", "");
//Get List of Field needed
ndViewFields.InnerXml = "<FieldRef Name='ID' />";
ndViewFields.InnerXml += "<FieldRef Name='Created' />";
ndViewFields.InnerXml += "<FieldRef Name='Author' />";
ndViewFields.InnerXml += "<FieldRef Name='Editor' />";
ndViewFields.InnerXml += "<FieldRef Name='Modified' />";

Add CAML Query to Extract Only the Delta



To pull only the delta records in, just add filters in the query. The SharePoint list has a system field called “Modified” which will be time stamped when the record get changed. You could utilize the CAML query to get the items that have Modified Date greater than a Date value or within an arrange of date. If you don’t know the syntax of the CAML Query, you could use some SharePoint Utility such as U2U CAML Builder.


Here is a sample CAML Query that will filter the result by range of date:



<Where><And>
<Geq><FieldRef Name="Modified" IncludeTimeValue="TRUE" /><Value Type="DateTime">2012-01-01 00:00:00</Value></Geq>
<Leq><FieldRef Name="Modified" IncludeTimeValue="TRUE" /><Value Type="DateTime">2012-01-31 23:59:00</Value></Leq>
</And>
</Where>


Page though Large Number of List Items in the Script



Depend on the size of the item, you may set the row limit to 1000 while calling the GetListItems().  The function supports service-side paging. When the XML result returned, it includes a ListItemCollectionPositionNext attribute that contains the information to support the paging.  You need to save it for the next call.  Make sure you do not modify the string. 


In the example below, I set the ListItemCollectionPositionNext attribute to a local variable first and the bLoopNext to true. I then save the string to a SSIS Read/Write variable to be use for next loop.



//Get page related meta data
XDocument meta = XDocument.Parse(ndListItems.InnerXml);
if (meta.Root.Attribute("ListItemCollectionPositionNext") != null)
    {
       strPage = meta.Root.Attribute("ListItemCollectionPositionNext").Value;
       bLoopNext = true;
    }
else
    {
       strPage = string.Empty;
       bLoopNext = false;
    }

When next loop starts, I set the saved attribute string to the QueryOptions parameter to be used to fetch the next set data back.



//Set page Query option
if (strPage.Length > 0)
    {
       ndQueryOptions.InnerXml = @"<Paging ListItemCollectionPositionNext='' />";
       ndQueryOptions.ChildNodes[0].Attributes["ListItemCollectionPositionNext"].InnerText = strPage;
    }
else
    {
        ndQueryOptions.InnerXml = "";
    }
    

 Result



After executing the SSIS package, you can see the SharePoint List been extracted page by page as below:


image 


The PageCount column stores the SharePoint List page number.


Reference:



http://msmvps.com/blogs/ivansanders/archive/2011/07/24/ssis-sharepoint-list-adapters.aspx


http://msdn.microsoft.com/en-us/library/lists.lists.getlistitems(v=office.12).aspx


http://sqlsrvintegrationsrv.codeplex.com/releases/view/17652

3 comments:

  1. Hello Dude,

    You can use either sharePoint designer or the browser to create and customize SharePoint Web sites. These authored artifacts are stored in the content database. Thanks for sharing it.....

    Data Extract

    ReplyDelete
  2. As they are giving them several proposes to download this from the internet.They wanted to make that sure to be prestigious all along before them.




    Workstation

    ReplyDelete
  3. I am really happy for published this post appear in this site.This site is more valuable to me. I visit this site and read this post.
    Graphics cards

    ReplyDelete