Auto-tagging in Exchange 2010


Exchange 2010 includes the ability for administrators to define retention policies to control how long items are allowed to remain inside user mailboxes before Exchange will automatically take action to move or remove the items. A retention policy is composed of a set of retention tags that can apply to every item in a mailbox or just items in a specific folder. Each retention tag specifies a retention period in days and an action that Exchange will take when the retention period expires. For example, you could create a retention tag for Inbox items that cause Exchange to move items into the Deleted Items folder after 30 days. Once an administrator assigns a retention policy to a mailbox, the retention tags in the policy are placed on items by the Managed Folder Assistant, which runs nightly on Exchange mailbox servers. The Managed Folder Assistant then subsequently examines items on a nightly basis to discover items whose retention period has expired and then takes the defined action. All of this is a big change from the Managed Records Management (MRM) approach taken by Microsoft in Exchange 2007, which requires users to move items into managed folders before Exchange will manage them and that’s why Microsoft refers to the MRM built around retention policies and delivered in Exchange 2010 as MRM 2.0.

The ideas behind retention policies and tags are sound; the implementation in Exchange 2010 lacks completeness because of the need to define retention policies and tags through the Exchange Management Shell (EMS). This will be a concern for some administrators who prefer to work with objects through a GUI, especially when they’re dealing with concepts that they haven’t become accustomed to yet. Microsoft addresses this issue in Exchange 2010 SP1 where the Exchange Management Console (EMC) comes fully equipped with the necessary GUI to define, manage, and apply retention policies and tags.

This article focuses on auto-tagging, one of the more interesting and advanced techniques that Microsoft has incorporated into Exchange 2010. Auto-tagging is a concept introduced to assist administrators to implement retention tags by adding machine learning to automate the application of tags. Unfortunately auto-tagging is a feature that Microsoft has dropped from Exchange 2010 SP1. The reasons why are varied but include poor user and administrator awareness, lack of a GUI to guide people through the intricacies of setting auto-tagging up for a mailbox, and maybe a lack of completeness in terms of the feature itself – or possibly just some bugs. I think auto-tagging is a really good idea and hope that Microsoft will reintroduce it in a future service pack.

Learning how users tag items

The idea behind auto-tagging is simple. The user begins the process by manually applying retention tags to at least 500 items to form a collection of items called the training set. A smaller number is not sufficient to allow Exchange to build a good automatic tagging scheme that might match normal user behaviour. To tag an item, the user can either move the item into a folder that has a default tag or they can apply a specific tag to a specific item. For example, if the user receives regular progress reports for “Project Euro” and they apply the “Retain for five years” tag to these reports, Exchange will learn that any time a new item comes into the user’s Inbox for “Project Euro”, it’s a safe guess that “Retain for five years” is a good tag to apply.

Once they have tagged enough items to provide Exchange with a sufficiently diverse set to learn about their tagging behaviour, the user informs the administrator, who then then trains Exchange by pointing it to the set of tagged items. It would be nice if Outlook or Outlook Web App offered the user an option to train Exchange without administrator intervention but it’s the nature of new technology that some rough edges are inevitable and this is a rough edge for auto-tagging.

When the administrator launches the learning process, Exchange reviews the training set of items and learns from the content of the items and the tags that the user has applied to the items to construct a tagging scheme that it can apply automatically as new items arrive in the user’s inbox. The more tagged items Exchange can learn from, the better the automatic tagging scheme will be. The user can replace an auto-applied tag with another if a mistake is made or an administrator can clear all of the auto-applied tags in a mailbox and restart the learning process if the tags prove to be inappropriate, possibly because the set supplied to Exchange for it to learn about the user’s retention tagging pattern was incomplete or unrepresentative of the usual material that flows into their mailbox.

Steps to implement auto-tagging for a mailbox

To start the process of training Exchange, the administrator first enables a retention policy for the mailbox and then turns on auto-tagging for the mailbox. These commands first assign the “IT Department Retention Policy” retention policy to a mailbox and then enable auto-tagging:

Set-Mailbox –Identity JSmith –RetentionPolicy “IT Department Retention Policy”

Set-MailboxComplianceConfiguration –Identity JSmith –RetentionAutoTaggingEnabled $True

Once auto-tagging is enabled, Exchange accumulates details of items that the user applies retention tags manually until it has enough (at least 500) to begin the training process. The details of the items in the training set that Exchange uses to learn about user tagging behaviour includes information such as the message subject, the sender, and some key words from the text.

An administrator can check whether sufficient items have been tagged with the Get-MailboxComplianceConfiguration cmdlet. If enough items haven’t been added to the training set, Exchange will tell you how many more are needed before auto-tagging is possible. In this example, the required training count figure is 493 (the number reduces from 500), so the user still has a considerable way to go before auto-tagging is possible. Applying an archive tag does not reduce the required training count.

Get-MailboxComplianceConfiguration –Identity JSmith

Identity                    : ajr.com/Exchange Users/JSmith

RetentionAutoTaggingEnabled : True

RequiredTrainingCount       : 493

CorrectedMailCount          : 0

PredictedMailCount          : 0

ErrorRate                   : 0

After the training set is built, you can begin the training process. Exchange flags an error if insufficient tagged items are present in the mailbox. Be aware that Exchange also resets the required training count to 500 if you execute this command before sufficient items have been tagged, so you’ll have to begin to accumulate a new set of manually tagged items. However, let’s assume that we have done the right thing and tagged 500 items before we start the training process. Checking the mail compliance status now reveals:

Identity                    : ajr.com/Exchange Users/JSmith

RetentionAutoTaggingEnabled : True

RequiredTrainingCount       : 0

CorrectedMailCount          : 0

PredictedMailCount          : 0

ErrorRate                   : 0

As the required training count has now reached zero, we can start the training process:

Start-RetentionAutoTagLearning –Identity JSmith –Train

Once Exchange has been trained to auto-tag new items in a mailbox, the user can keep track of the effect of auto-tagging on by checking the tags placed on new items in the Inbox. It’s a good indication that things are progressing well if you select an item in the Inbox, find that a retention policy tag is present, and that it’s a tag that you would reasonably expect to place on the item. You can always adjust matters for the few items that don’t get a good automatic tag by applying a manual tag. On the other hand, if checking reveals that the training set of items used by Exchange to create its tagging scheme results in it applying inappropriate tags to new items that you simply can’t use, you will have to clear the set of automatic tags and begin the process again. Use this command to clear the set of automatic tags:

Set-RetentionAutoTagLearning –Identity JSmith –Clear

You then have to disable and re-enable auto-tagging for the mailbox to restart the process of accumulating a new training set of manually tagged items before you can retrain Exchange.

If you don’t want to check the tags on individual items, you can the Set-RetentionAutoTagLearning cmdlet to validate that Exchange is successfully applying auto-tagging for a mailbox. In this example we see that the “SuccessRate” is reported as 1, indicating that auto-tagging is proceeding normally. The other outputs indicate whether items that should be tagged are blank or have otherwise failed to be tagged for some reason.

Set-RetentionAutoTagLearning -Identity JSmith –CrossValidate –Segments 16

SuccessRate                             BlankRate                               FailureRate

-----------                             ---------                               -----------

1                                       0                                       0

The Segments parameter divides up a mailbox into different sections to learn from the tagged contents of each section. By default, Exchange divides a mailbox into ten segments. The minimum value is two while the maximum is 64. If the contents of a mailbox vary extensively, you may gain more accurate learning results by dividing a mailbox into additional segments. On the other hand, you can use a smaller number of segments if a mailbox contains items that are mostly similar.

No UI anywhere!

Unfortunately, there is no user interface for either administrators or users available in EMC or the Exchange Control Panel to enable auto-tagging, view details of the items used to determine how auto-tagging will happen, start or validate the collection process, conduct training, or review and tweak auto-tagging scheme afterwards. Everything has to be done through EMS. This is a pity as it will inevitably limit the impact of the functionality within the user community. After all, if they don’t know about a feature, they are hardly likely to use it unless they plumb the depths of TechNet or some other literature to discover that auto-tagging is possible. Hopefully, Microsoft will provide the necessary UI in the future.

In conclusion

Few users enjoy filing items into appropriate folders in their mailbox. Approximately the same number will enjoy applying retention tags. Auto-tagging therefore contributes to the effective use of retention tags by relieving users from a mundane task. Over time, and probably following several training exercises to develop an effective view of how a user would manually tag new items, auto-tagging has the potential to add some real value at low cost. Auto-tagging is interesting technology that may ease the introduction of retention policies in some quarters. Others will find its lack of GUI and documentation to be too high a barrier to warrant the investment of valuable administrator time. These cases are likely to be in the majority which means that auto-tagging will remain an exercise in computer science that needs to be revamped and made more user- and administrator-friendly. As we all know, Microsoft’s product development history demonstrates that they are very persistent when it comes to making features work right (eventually) so it will be interesting to see whether auto-tagging version 2.0 or even 3.0 is the one that succeeds or this will turn out to be an example of a bright idea that was ahead of its time and ends up on the great byte scrapheap.

About Tony Redmond

Lead author for the Office 365 for IT Pros eBook and writer about all aspects of the Office 365 ecosystem.
This entry was posted in Exchange, Exchange 2010 and tagged , , , , . Bookmark the permalink.

4 Responses to Auto-tagging in Exchange 2010

  1. mdrooij says:

    So, you had to scrap it from the SP1 book and thought shame, let’s post it on my blog 🙂

    Good stuff!

  2. Mark Brady says:

    Wow. This is such a great idea. There needs to be some way to apply retention by sender, subject, content. We get a daily email with downtown traffic conditions. That email has the shelf-life of about 10 minutes… why do we apply a default retention policy of 2 years?

    Is there any hopes this comes back?

    Is there a third party product that will do this?

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.