Creating a trainable classifier- Microsoft SC-400 Certification

First things first, in order to create our custom trainable classifiers, we need to opt in to this feature in the Microsoft 365 compliance center, as shown in the following screenshot:

Figure 4.5 – Opting in to trainable classifiers in the Microsoft 365 compliance center

Bear in mind that this initial scan of data in your tenant might take as long as 14 days to complete. Prior to the completion of this, we will not be able to create any new classifiers in our environment.

Before we start creating our first custom classifier, it is important to note that there is a timeline to adhere to since the training takes some time for the classifier to get right, as described in the screenshot here:

Figure 4.6 – A timeline of deployment for a classifier

Figure 4.6 is perhaps not accurate for each classifier created, but as you can see, we do have at least 26 days before we can publish a classifier to our environment. The initial step in the timeline will, of course, not be there for the next classifier we create as it is a one-and-done thing to opt-in.

So, let’s start with creating a trainable classifier. There is a requirement here that the seed content (used for training our classifier) is stored in SharePoint Online. You can train your classifier using several different file types. The standard Office files, Word (.docx), Excel (.xlsx), PowerPoint (.pptx), Visio (.vsdx), and Text (.txt) files, are fully supported and most commonly used. The full list of supported file extensions can be found at the following link: https://docs.microsoft.com/en-us/sharepoint/technical-reference/default-crawled-file-name-extensions-and-parsed-file-types.

There need to be at least 50 files present in the SharePoint storage space, and the latest 500 files are the ones that will be scanned. The content must not be encrypted, and the language must be English.

In my example, I have uploaded some Right Management Services (RMS) logs to SharePoint Online to a .txt file, to be able to crawl them:

  1. Sign in to the Microsoft 365 compliance center.
  2. Click on Create trainable classifier.

Figure 4.7 – Create trainable classifier

3. Specify a name for the classifier and add a description to help you identify what it is used for.

Figure 4.8 – Adding a name and description for the classifier

4. Specify the SharePoint site where you have added the seed content and the folder the content resides in.

Figure 4.9 – Specifying the SharePoint site and folder where the seed content is located

5. Now we have created our first classifier, but as shown in the following screenshot, it can take up to 24 hours to analyze the content you have provided, so patience is a virtue:

Figure 4.10 – The processing of data can range from 1 to 24 hours to complete

6. You can follow the status of your classifier in the Trainable classifiers section of the Microsoft 365 compliance center.

Figure 4.11 – Showing the location of the newly created classifier

7. Once the classifier is presented under In progress, we can start testing it. In the following example, I used two Word documents. One document contained RMS log data, while the other one contained log data from another source and thus should not be identified by the classifier:

Figure 4.12 – Add items to test the classifier

8. Here, we will have to give our take on what the classifier has identified. Does it behave as we configured it to?

Figure 4.13 – Does the classifier identify relevant data?

9. Since it is relevant, we will click on the Yes button to let the classifier know that it has found relevant data.

Figure 4.14 – Clicking on Yes if the data is relevant, No if not

10. This brings us to the following screen, stating Auto-retrain performed. This means that we have successfully told the classifier it identified relevant data and it will hone its performance further:

    Figure 4.15 – Auto-retrain performed

    11. We then see the following screen before us, stating what we have just performed and what the recommended steps are moving forward. The portal states Classifier accuracy is not available yet since we have only tested this classifier on two data points; the recommended number of items to train your classifier on is at least 200. We can also see that the portal does not recommend this classifier to be published:

      Figure 4.16 – Information about your classifier after performing the first test

      12. Now we need to upload even more data to test the classifier on. In my example, continuing with logs, I uploaded more than 200 files with a mix of RMS logs and from other sources. I performed steps 8-11 once again to see whether my classifier is good enough to publish in my Microsoft 365 tenant.

        Figure 4.17 – Information about the classifier after retraining it with over 200 items

        13. We can now, finally, publish our classifier.

          Figure 4.18 – Click Yes to publish the classifier

          14. And we are done with our first classifier, well done!

            Figure 4.19 – The published classifier in the Microsoft 365 compliance center

            In this section, we have covered the topic of how to create a custom classifier in Microsoft 365, touching on some of the remaining topics. Up next, we will look at how we can make sure that a trainable classifier is performing properly.

            Leave a Reply

            Your email address will not be published. Required fields are marked *