Project Online built on top of SharePoint, so we can assume that all data entities will be accessible through Odata services, like a normal SharePoint lists. However, there are also some specific requirements around project data which needs to be treated differently in order to achieve the best performance.
There are number of tech articles from Microsoft around how to bring your data on premises: link , however I will try to improve it and provide a real scenario. I am going to create a new Production ready solution to bring all Project Online data on premises. By the end of this article, you will get complete SSIS solution, which you can deploy and use. I will be explaining almost every step, but in case if you miss the line, just grab the source code from the second part and get better understanding. I am going to use a brand new PWA environment to minimize risks of schema differences, unless you have deleted some of the out of the box fields)))
Before we start, get the source code from the article I mentioned above and try to read it to understand the relations between different tasks in SSIS package.
What is Odata Feed?
From the specification, OData is a Web protocol for querying and updating data that builds upon existing Web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON. There are couple best practices around using Odata feeds, such as:
- Use json as format (minimize the payload)
- Limit the batch size to 100
- Try to avoid bringing large data sets. Use filtering and select. Bring only fields you need now and only for records you need.
Shall we begin?
Before we start building a new SSIS solution, we need to understand what we are going to query. What is the size of the data, which will be pulled and how often. Project Online provides out of the box around 35 feeds. We can categorize those feeds in 3 categories:
- Project Related Feeds (Feeds for entities which will be updated together with project, once you save and publish project)
- Feeds with own “Modified On” field (Feeds for entities which will have its own field to track changes- once new item is created or existing is updated, we can observe it)
- Feeds without any “Modified On” date (Feeds for entities without any field, which can be used to track item’s change. Example could be Association feeds).
- Feeds for Lists under Project Sites (We will handle them separately)
Now, let’t talk about categories in detail. We need to build an effective solution, right? In order to do so, we have to bring only deltas. For the first category of feeds, we need to find which projects have been changed and bring all data from project-related feeds. For the second category of feeds, we need to build querys in such way that they only brings changed or new records. The last case is most straight forward. As we are not able to identify which records have been changed, we just need to delete all records and bring data again. Not really efficient, however usually number of records in such feeds are not large, so we are fine!
First, get your environment ready, by following steps in this article: link
Now, let’s create a new SSIS project
Once you create a new project it’s time to create a new Odata Connection Manager:
Don’t forget to select “Online Service Authentication” and test connection.
Now its time to create a local Data Base and all tables, which will keep project Data. I have prepared a SQL script to make it easy. Just create a new DB called PWA and run the script:
Go back to SSIS designer and create a new variable to keep the Last Sync Date.
Now we need to crate a new Connection Manager to access our SQL DB where the data will be stored.
Follow similar steps as we did before for Odata Connection Manager, except select ODE DB Connection.
Once new connection is added, drag and drop to the SSIS designer new item “Execute SQL Task”.
In the “Result Set” select where result will be stored.
Now we have Last Sync date in our variable. We are going to use this variable for almost each query, to filter only records which have been changed. Also don’t forget to add manually new records to [Synced] table, otherwise it will return error.
In the next task we will bring all projects and will store them in Record Set for future use.
Drag and Drop on the screen new “Data Flow Task”, name it “Get All Projects” and connect previously created SQL task with it.
Go inside and drag and drop new Odata Source task. Let’s configure it:
Pay attention to Query options field. Remember, bring only fields you need and use “format=json” to minimize size of response from the server.
Create a new variable of type “object” to keep Project collection – AllProjects
Add new Recordset Destination and configure it as shown below:
Also, we need to find if there are any deleted projects. To find it, we have to query local projects table and discover if there are any records which exists only locally.
We will store deleted project ids in another recordset, so after we can delete all entities related to those deleted projects.
Now, as we get latest changes from Projects, we have to update the last Synced date. We need to do now as executing SSIS package might take time, and we can just simply miss some records.
Go back to the main designer screen and drop new SQL task to update Sync table.
The next, we will start with simple task of deleting related entities for project which no longer exists on the cloud.
Create a new Foreach Loop task and drag and drop a new Data Flow Task inside, name it “Delete All Entities”. In this task we will delete everything, even items from lists in project sites, unless you want to keep them for future use. However, I recommend to do a snapshot of data, so you PWA DB will be a mirror of your cloud environment.
To delete all entities related to removed project, you have couple options:
- Create a new stored procedure and pass project ID as parameter. (This is the most efficient way, as SSIS package will executed only one query for one project)
- Create a OLE DB command for each entity.
We will go with the first option, as it also will help us to minimize the maintenance. In case if we will add a new List to Project Site, all we need to do is just update a stored procedure, without the need to redeploy the package. So, let’s create Expression Builder task and after Execute it:
Build Schema creates a new query to first check, if stored procedure exists, and if its not, it will create it. We need to delete records from all tables which have ProjectId.
Now let’s create a new string variable to keep Project ID when we walk through deleted projects in for-each loop:
Drag and Drop new SQL task inside for-each loop and name it “Delete All Project Related Records”. The configuration is following:
That’s it. Now you can run the project and see the result. I know, until now you don’t have any records in any of project tables, but at least if your package runs successfully, it means that you have everything setup correctly and we can move to the second part of this article.