Sunday, February 26, 2012

Using Page and CAML Query to Extract SharePoint List in SSIS

 

There are several ways to read SharePoint List as data source in SSIS.  The easiest way is using the SharePoint List Adapter.  You could download it from the Codeplex site.  However, you may want to have more control during extracting the list to avoid failure, to manipulate data before sent to the output rows, or you are experiencing timeout issue with lots of item in the list. 

Here is the solution I used:

Add For Loop Container in Task Component with a Data Flow

  • Since the SharePoint list could be large, use the for loop container to loop through the paging.

image

  • Set InitExpression to the Page number @PageCount = 1
  • Set the EvalExpression to a Variable: @LoopNext  which will be determine in the data flow.
  • Set the AssignExpression to be @PageCount = @PageCount + 1

image 

Create a Script Component  with Source Type to Read SharePoint List

  • In the data flow, add a Script Component with Source type.

image

  • Pass in @LoopNext  and @SharePointPageInfo variables as Read Write Variables.
  • Pass in @SharePointSite, @SharepointListname, @SharePoinbtView, @SharePointRowLimit variables as Read Variables.
  • Create two Outputs. One for the default SharePoint list data set, the other will hold the error data.
  • Click Edit Script… button to add .net code
  • Right click the project to add Server Reference that point to the SharePoint Site

image image

  • In CreateNewOutputRows(), instantiate the SharePoint Web Service List object
  • Create several XMLNode objects as the parameters to call System.Xml.XmlNode GetListItems()
//Retrieve the list of available fields from the GetList call to Sharepoint.
Lists SPS = new Lists();
SPS.UseDefaultCredentials = true;
SPS.Url = strSharepointURL;
 
//Create XML Node
XmlDocument xmlDoc = new System.Xml.XmlDocument();
XmlNode ndViewFields = xmlDoc.CreateNode(XmlNodeType.Element, "ViewFields", "");
XmlNode ndQueryOptions = xmlDoc.CreateNode(XmlNodeType.Element, "QueryOptions", "");
XmlNode ndQuery = xmlDoc.CreateNode(XmlNodeType.Element, "Query","");
 
 
XmlNode ndListItems = null;
ndListItems = SPS.GetListItems(strListName, strViewName, ndQuery, ndViewFields, strRowLimit, ndQueryOptions, null);



  • Once the result gets back, iterate through the XML nodes to populate the default output data row.
  • If there is an exception thrown

    • Set the @LoopNext to false which will stop the For Loop container
    • Add a row that contains the exception message to error output.


catch (System.Web.Services.Protocols.SoapException ex)
     {
         string strError = "Message:" + ex.Message + "\nDetail:" + ex.Detail.InnerText + "\nStackTrace:" + ex.StackTrace;
         
         ErrorBuffer.AddRow();
         ErrorBuffer.ErrorData = strError.ToString().Substring(0, strError.ToString().Length > 4000 ? 4000 : strError.ToString().Length);
         bLoopNext = false;
     }



  • In the data flow, sent the SharePoint list items to the destination table and the error data to error table. This way the SSIS package will not fail to execute with any SharePoint Site issue. Other process could utilize the data in the error table to recover it softly.
  • In the Add Metadata component, add the @PageCount as an output column. It could be used for recovery or debug purpose.

image 


Narrow Down the Needed Fields



Don’t select all the fields from the list. Sometime the list may contain binary data, meta data or image that may take long time to load. To narrow down the fields, just specify the fields in the ViewFields XML Node as:



XmlNode ndViewFields = xmlDoc.CreateNode(XmlNodeType.Element, "ViewFields", "");
//Get List of Field needed
ndViewFields.InnerXml = "<FieldRef Name='ID' />";
ndViewFields.InnerXml += "<FieldRef Name='Created' />";
ndViewFields.InnerXml += "<FieldRef Name='Author' />";
ndViewFields.InnerXml += "<FieldRef Name='Editor' />";
ndViewFields.InnerXml += "<FieldRef Name='Modified' />";

Add CAML Query to Extract Only the Delta



To pull only the delta records in, just add filters in the query. The SharePoint list has a system field called “Modified” which will be time stamped when the record get changed. You could utilize the CAML query to get the items that have Modified Date greater than a Date value or within an arrange of date. If you don’t know the syntax of the CAML Query, you could use some SharePoint Utility such as U2U CAML Builder.


Here is a sample CAML Query that will filter the result by range of date:



<Where><And>
<Geq><FieldRef Name="Modified" IncludeTimeValue="TRUE" /><Value Type="DateTime">2012-01-01 00:00:00</Value></Geq>
<Leq><FieldRef Name="Modified" IncludeTimeValue="TRUE" /><Value Type="DateTime">2012-01-31 23:59:00</Value></Leq>
</And>
</Where>


Page though Large Number of List Items in the Script



Depend on the size of the item, you may set the row limit to 1000 while calling the GetListItems().  The function supports service-side paging. When the XML result returned, it includes a ListItemCollectionPositionNext attribute that contains the information to support the paging.  You need to save it for the next call.  Make sure you do not modify the string. 


In the example below, I set the ListItemCollectionPositionNext attribute to a local variable first and the bLoopNext to true. I then save the string to a SSIS Read/Write variable to be use for next loop.



//Get page related meta data
XDocument meta = XDocument.Parse(ndListItems.InnerXml);
if (meta.Root.Attribute("ListItemCollectionPositionNext") != null)
    {
       strPage = meta.Root.Attribute("ListItemCollectionPositionNext").Value;
       bLoopNext = true;
    }
else
    {
       strPage = string.Empty;
       bLoopNext = false;
    }

When next loop starts, I set the saved attribute string to the QueryOptions parameter to be used to fetch the next set data back.



//Set page Query option
if (strPage.Length > 0)
    {
       ndQueryOptions.InnerXml = @"<Paging ListItemCollectionPositionNext='' />";
       ndQueryOptions.ChildNodes[0].Attributes["ListItemCollectionPositionNext"].InnerText = strPage;
    }
else
    {
        ndQueryOptions.InnerXml = "";
    }
    

 Result



After executing the SSIS package, you can see the SharePoint List been extracted page by page as below:


image 


The PageCount column stores the SharePoint List page number.


Reference:



http://msmvps.com/blogs/ivansanders/archive/2011/07/24/ssis-sharepoint-list-adapters.aspx


http://msdn.microsoft.com/en-us/library/lists.lists.getlistitems(v=office.12).aspx


http://sqlsrvintegrationsrv.codeplex.com/releases/view/17652

Thursday, February 16, 2012

Tip to Change SSIS Variable Scope

Recently I work on a SSIS project that has a Sequence Container with lot of variables that I need for a data flow. Later on I want to change the sequence container to be a For Loop Container.   It is easy to copy and paste the the data flow in the container to the other. But, how to move the SSIS variables in the visual studio?

If you are familiar with the SSIS XML code, you may be able to copy or move the variables from one XML Node section to the other without corrupt it.  For others, recreating the variable with correct value or expression could be cumbersome.

Here is a easy way to handle it:

  • Install BIDS Helper from CodePlex site.
  • Select the variable you want to copy or move

image

  • Click on the Move/Copy Variables from SSIS Variables tool bar to open a new window

image

  • Click on the DTS executable that you want the variable scope to change to , then click OK

image

  • The variable shows up in the SSIS Variables Window with new Scope

Wednesday, February 15, 2012

SQL Dump After Either Upgrade or Downgrade the PowerPivot

Recently I upgraded the PowerPivot Excel add on to the version 2 to work with new DAX. I experienced excel crush when I opened the workbook.  The error message showed as:

SQLDUMPER.EXE
Unable to open file \\?\Program Files\Microsoft Analysis Services\AS OLEDB\10\FlightRecorderCurrent.trac error 2

I tried to uninstall and re-install the program, but it didn’t work.  I have to manually delete the directory C:\Program Files\Microsoft Analysis Services\AS OLEDB\ and repair the PowerPivot, then it works.

Same situation happened again when I downgrade my PowerPivot back to version 1 to be compatible with my client’s environment.  Once I manually remove the C:\Program Files\Microsoft Analysis Services\AS OLEDB\ directory and then repair the program, it works again.