Office 365 Migration API: how to migrate the taxonomy metadata

If you wander whether the Office 365 Migration API supports migration of taxonomy metadata, the answer is YES. The API supports it but there isn't good documentation about how to modify your xml files after the command ConvertTo-SPOMigrationTargetedPackage.

In this post, I'm going to show you the approach. I assume that you have base knowledge of how to use the Migration API.

The scenario I'm going to use is simple - migrate single Document library with single document.
The document has one taxonomy column (MyTaxonomy) and three versions:

Here are the steps:

Step 1: Export the document library

 Export-SPWeb "https://portal.cosingens.com/" `
  -ItemUrl "/Shared Documents" `
  -Path "\\cos-dev-03\export1\Export" `
  -NoFileCompr -IncludeVersions 4

This command will export the document library and will create the initial package.
The command is not part of the Migration API. It is well know since SharePoint 2010 and is part of the server side object model.

Step 2: Convert the package using the Migration API

The initial package needs to be converted to new type of package so the Migration API can work with it.

Step 2.1 Create new target site in the SharePoint Online tenant or use exiting site.
For this demo, I'll create new site collection using the default Team site template. The site url is https://mod755812.sharepoint.com/sites/demo7.

Step 2.2 Execute the ConvertTo-SPOMigrationTargetedPackage command:

 $userName = "admin@mod755812.onmicrosoft.com"
 $creds = Get-Credential $userName

 ConvertTo-SPOMigrationTargetedPackage `
    -SourceFilesPath "\\cos-dev-03\export1\Export" `
    -SourcePackagePath "\\cos-dev-03\export1\Export" `
    -OutputPackagePath "C:\Export1\Package_target" `
    -TargetWebUrl "https://mod755812.sharepoint.com/sites/demo7" `
    -TargetDocumentLibraryPath "/Shared Documents" `
    -Credentials $creds

This command (part of the Migration API) will create a package in "C:\Export1\Package_target" that is now ready to be upload in the cloud.

The goal of this post is to show how we can further modify this package so the taxonomy values are preserved while deploying this to SharePoint Online. If you don't have taxonomy columns, you can directly use the package in "C:\Export1\Package_target".

Let's now prepare our SharePoint online site and collect some information that we will need in step 7.

Step 3: Create new or use existing taxonomy in the target site
For this demo, I'll copy (migrate) my on-premise taxonomy following the below described Approach 1.
Keep in mind, there are several possible approaches for creating taxonomy in the cloud:

Approach 1: Copy the on-premise term set (mapped to the column MyTaxonomy) into the cloud site.
I'll use the SharePoint Managed Metadata and Taxonomy Tools Online and perform export\import of the terms. The good thing is that this tool can preserve the terms ids!
Here are screenshots of the process:

Load the on-premise term set

Export the term set. Select the "Export Term IDs" option.

Create new term set in the cloud taxonomy.

Open the SharePoint online site and select the newly created term set.

Import the terms.

As a result - the term set is migrated in the cloud and the term IDs are preserved.

Approach 2: Configure hybrid SharePoint taxonomy
The feature is currently in preview but it works smoothly. I have configured AD Sync with my demo tenant and now the only thing I need to do is running the following command:

 $userName = "admin@mod755812.onmicrosoft.com"
 $credential = Get-Credential $userName

 Copy-SPTaxonomyGroups `
   -LocalTermStoreName "Managed Metadata Service" `
   -LocalSiteUrl "https://portal.cosingens.com/" `
   -RemoteSiteUrl "https://mod755812.sharepoint.com/sites/demo7/" `
   -GroupNames "Demo 8502" `
   -Credential $credential

This approach preservers also the on-premise IDs of the terms which will facilitate the next steps.

Approach 3: Create new taxonomy into the cloud site.
You can also create brand new taxonomy. However the mapping in the xml files will be a little bit difficult.

Step 4: Generate the WssIds for the newly create terms

Before proceeding with this action, let's review the "big picture" of the taxonomy implementation:

Here are descriptions of the above screenshot:
1. Taxonomy term has label and Id.
2. There is a hidden list (~sitecollection//Lists/TaxonomyHiddenList) that keeps the term data (Label, Id, Parent Term Set Id, ...) as a list item. There is only one list item per term. This item is create while the term is added for a first time as a value in some taxonomy column. The ID of the list item inside in this hidden list is actually the WssId.
3. The file "text file.txt" has a taxonomy value "Value 1". Internally, the list item that represent "text file.txt" keeps the taxonomy value in three columns
4. Sections 4 and 5 shows these three columns and the format of the data.

Simply speaking - the taxonomy value in list item is represented as lookup value to the hidden taxonomy list.

Now having this "big picture" in mind, let's open the hidden taxonomy list inside the newly created site collection. In my scenario, this is the list https://mod755812.sharepoint.com/sites/demo7/Lists/TaxonomyHiddenList/

You will find that the list is empty.
Remember - a list item is created in the hidden taxonomy list when we try to add a taxonomy value somewhere in the site collection.
In order to force SharePoint to generate items for my newly imported (or migrated or created) taxonomy, I'll create one dummy list, add new list column "Dummy Taxonomy Column", mapped it to "Imported Term set" and add items for each term value:

Now we have the WssId of the Value 1 term - it is 3 (the ID of the list item inside the hidden taxonomy list).

Step 5: Crete the taxonomy column in the Target site and add it to the Target List

For this demo, I will create a new list column.
But if you use the "SharePoint / Office 365 Dev Patterns & Practices (PnP)" samples - you can move your site columns and site content types in the Target Site preserving the IDs.

My new column is named "NewTaxonomy" and it is mapped to the "Imported term Set" created in step 3.

Step 6: Get the taxonomy column settings

I usually use a JavaScript for such tasks.
Here is a demo code that retrieves all columns and column's IDs using the JavaScript object model:

function ReadAllFields(listUrl) {
    var context = SP.ClientContext.get_current();
    var web = context.get_web();
    var list = context.get_web().getList(listUrl);
    var fieldNames = list.get_fields();
    context.load(fieldNames);
    context.executeQueryAsync(
        function () {
            var listItemEnumerator = fieldNames.getEnumerator();
            while (listItemEnumerator.moveNext()) {
                var field = listItemEnumerator.get_current();
                console.log(field.get_internalName() + " | " + field.get_id());
            }
        },
        function (sender, args) {
            console.log("Error:")
            console.log(args)
        });
}
ReadAllFields("/sites/demo7/Shared%20Documents");

And here is the result (the right site is the console of the Debuger Tool )

Each taxonomy column use hidden Note column which name is an auto generate string. In this demo I have only one taxonomy column so it is easy to identify the hidden column.
If you have more taxonomy columns (in real scenarios you will have), you will need to retrieve the SchemaXML of each taxonomy columns. Here is a sample code:

var globalVariable;
var context = SP.ClientContext.get_current();
var list = context.get_web().getList("/sites/demo7/Shared Documents");
var fileds = list.get_fields();
var myTaxColumn = fileds.getByInternalNameOrTitle("NewTaxonomy");
context.load(list);
context.load(fileds);
context.load(myTaxColumn);
context.executeQueryAsync(
    function (sender, args) {
        if (myTaxColumn) {
            globalVariable = myTaxColumn.get_schemaXml();
            console.log(globalVariable);
        }
        else {
            console.log("column is null:")
        }
    },
     function (sender, args) {
         console.log("Error:")
         console.log(args)
     });

Exploring the Schema XML of the column you will find the column ID of the hidden taxopnmy column. Then you can get the internal name of the column based on this ID.

The string from the browser console is opened in Visuals Studio.

Now we are ready to modify the xml files that were generated in step 2.

Step 7: Manually replace the on-premise settings with SharePoint online settings

Let's first explore the package created on Step 2.2:

The xml files are located in "C:\Export1\Package_target". The following three files are very important - Manifest.xml, LookupListMap.xml, UserGroup.xml.

Manifest.xml
This file contains the definitions of the content types, list items and list items versions.

The important part is that the columns values are represented in Filed tags. Here is how the taxonomy look like:

Versions are represented in the same way. Here is how it looks version 0.2:

LookupListMap.xml
This file provides descriptions of the lists that are used as Lookup in the exported data.
Because we have only one lookup (Taxonomy) column, that is why here we see only one list information.

The Id attribute in the LookupItem tag is the WssId - the id of the item inside the Hidden taxonomy list.

UserGroup.xml
This file provides description of the users that are in the exported data.
Here is the place where we can "replace" users. Real world case is the scenario where a user has left the company before some years. His account is not any more in the Active Directory and also not sync as cloud user.But his account is inside the metadata of the exported documents.
So modifying UserGroup.xml we can replace this user with some default or system user.

In our demo case, the file is very simple:

After we explored these xml files, we can create the following mapping table:

	On-premise	Cloud
Taxonomy Hidden List Id:	c7a3daca-fb1a-43fc-a047-7309b02b44c1	1d645dc0-5112-4680-bbad-c972d30dd202
Taxonomy Column Internal name	MyTaxonomy	NewTaxonomy
Taxonomy Column Id	33c59020-aab9-4979-ae17-985ce6843e9e	159bec7b-2579-4703-8bb4-8052df40aee4
Hidden Taxonomy Column Internal Name	j3c59020aab94979ae17985ce6843e9e	h59bec7b257947038bb48052df40aee4
Hidden Taxonomy Column Id	02582a64-a7f1-4b59-a5ca-9bd22a64f91c	eb0495ca-1e26-4655-85f3-e377149b6b16
Term 1: Label	Value 1	Value 1
Term 1: Guid	2c244e8e-7c4a-4cc1-9eff-437b823fcd91	2c244e8e-7c4a-4cc1-9eff-437b823fcd91
Term 1: WssId	2	3
Term 2: Label	Value 2	Value 2
Term 2: Guid	2c244e8e-7c4a-4cc1-9eff-437b823fcd91	2c244e8e-7c4a-4cc1-9eff-437b823fcd91
Term 2: WssId	3	1

Remember, the mapping between terms depends on the values in the Hidden Taxonomy Lists:

The modifications we need to do now are:
- for "all xml files" replace all matches of values in column 'On-premise' with corresponding values in column 'Cloud'.

Step 7: Upload the packages to Azure Storage

Step 7.1 Create storage account in Azure.
I create a new storage account named "UploadToSPO":

Step 7.2 Open the storage account with the Azure Storage Explorer
Download this tool - http://storageexplorer.com/. After you provide your credentials you will be able to open the storage account and verify that it is empty:

Step 7.3 Upload package to the Azure Storage
Run the script:

$storageAccount = "uploadtospo"
$key1 = "value of key 1"
$guid = [guid]::NewGuid().ToString()
$PackageContainerName = "Package-" + $guid
$FilesContainerName = "Files-" + $guid


$azurelocations = Set-SPOMigrationPackageAzureSource `
                    -SourceFilesPath "\\cos-dev-03\export1\Export" `
                    -SourcePackagePath "C:\Export1\Package_target" `
                    -AccountName $storageAccount `
                    -AccountKey $key1 `
                    -PackageContainerName $PackageContainerName `
                    -FileContainerName $FilesContainerName

Refresh and explore the Azure Storage. You will see the that xml files are uploaded to the package blob storage while the actual data is inside files blob storage:

The created queue is now empty:

Step 8: Create execution job

$job = Submit-SPOMigrationJob `
            -TargetWebUrl "https://mod755812.sharepoint.com/sites/demo7" `
            -MigrationPackageAzureLocations $azurelocations `
            -Credentials $creds
            -Credentials $creds

Now refresh the Queue in your Azure Storage. You will see the messages that were generated during the process.

And the result is:

The taxonomy values and versions are preserved.
What's missing is the values of the column "Custom Column" but that is because I didn't create such column in my new document library. And the migration API can't do magic - it creates data based on the mapping in the xml files.

SPObject.Resolve()

Search This Blog

Office 365 Migration API: how to migrate the taxonomy metadata

Comments

Post a Comment

Popular posts from this blog

ClientPeoplePicker in SharePoint 2013

Using SharePoint Attachments' controls with elevated security

Error: A duplicate field name "xxxx" was found