Implementing document fingerprinting- Microsoft SC-400 Certification

Employees in an organization that have the responsibility of dealing with information manage many kinds of sensitive information when completing their regular daily tasks. Document fingerprinting within the Microsoft 365 ecosystem makes it simpler for you, as the admin, to protect that information by identifying standard forms that are used by all users within the business.

To prevent unintentionally sharing information that’s been created from official company templates, you can configure and implement document fingerprinting as a custom sensitive information type. A good example of this is documents managed by HR workers that can potentially contain personal information, which can be identified by document fingerprinting. This is regardless of whether the information from within the document does not meet other sensitive information type conditions.

Although documents do not have actual fingerprints, much like a person’s fingerprint, documents have unique identifiers and word patterns. When a file is uploaded, DLP detects that unique pattern in the document, creates a fingerprint built on that pattern, and utilizes the document fingerprint to detect outbound documents that may encompass the same pattern.

The following diagram and flow demonstrate how the basic functionality of document fingerprinting works:

Figure 3.13 – Example of document fingerprinting

Here, the Personal Info Template document contains blank fields under Title, Personal Info, and Description, as well as descriptions for each of those fields, which is the word pattern.

The word pattern is then converted into a document fingerprint and a small Unicode XML file with a unique hash value is created. This represents the original text. Active Directory also stores the fingerprint as a security measure because the original document is not stored on the service.

The personal information fingerprint will become a sensitive information type that can be associated with a policy, or even any outbound email, that contains the document that was created from the same template.

There are some cases, however, where document fingerprinting cannot be detected in sensitive information, as follows:

  • A password-protected file
  • A document that only contains images
  • A document that does not contain all the text that was in the original template that was used to create the document fingerprint

In this section, we learned about the theory behind document fingerprinting and how this relates to sensitive information types. You should now understand how the functionality works and the uses cases where it is not detected. In the final section of this chapter, we will discuss keyword dictionaries and how to implement them from the Microsoft 365 compliance center and PowerShell.

Creating a keyword dictionary

A keyword dictionary is an effective method of managing a big list of words that regularly change. You can create keyword lists in a sensitive information type; however, lists have size limitations, and you will be required to edit an XML file to make any changes to them.

You can configure keyword dictionaries from the Microsoft 365 compliance center or via the Security & Compliance PowerShell module. There are some Microsoft best practice recommendations you should be aware of when implementing keyword dictionaries:

  • Create an employee audit and create the list from the outcome.
  • Collect typical words from some departments using Microsoft Forms.
  • Collaborate with some employees, such as those from HR or legal, to create a list of typical words.
  • Remember that you can edit the list, so you can improve your results by revising them regularly.

We will now go through the actions you need to complete to build a keyword dictionary from both the Microsoft 365 Security & Compliance Center and a file using PowerShell.

Leave a Reply

Your email address will not be published. Required fields are marked *