Exchange 2013 simplifies DAG management


As I work through the process of understanding Exchange 2013 so that I can write about it for “Microsoft Exchange 2013 Inside Out”, various odd thoughts come into my mind. One of those that recently arrived was that Microsoft has dumbed down the new Exchange Administration Center (EAC) when it comes to Database Availability Group (DAG) management. On the surface, it seemed like the Exchange Management Console (EMC) in Exchange 2010 gives administrators more control over the DAG, member servers, and databases, but when you work things through the situation is not quite as clear-cut.

How EAC displays DAG properties

How EAC displays DAG properties

The DAG was brand-new in Exchange 2010. Accordingly, although the developers did their very best to make the DAG easy to work with, some flaws exist. For example, it must have seemed like a very good idea to display the copy queue length and replay queue length for a database copy to flag potential replication problems to administrators. It’s absolutely true that knowing that logs are accumulating on these queues is an indication that all might not be right in the DAG, but the problem is that EMC only ever shows a snapshot of replication activity that’s accurate when EMC checks queue lengths. To be totally accurate, you’d need to have EMC refresh its data at a frequent interval, something that would impose a load on Exchange.

The processing overhead required to query servers about replication activity might be acceptable for a small DAG where Exchange only needs to check ten or so database copies spread over two or three servers. I can imagine big problems if you’d ask EMC to check the status for a hundred databases spread over ten servers – apart from the processing load, it would probably take EMC a few minutes to collect all the data from the servers and display the information and by that point the data is stale and needs to be refreshed again, so we get into a continuous loop of fetch and display. Not good…

Speaking of stale data, you might even get into a situation where EMC displays the famous copy queue length of 9,223,372,036,854,775,766 (see below), which seems like quite a lot of replication to get through! The reason, as explained in Tim McMichael’s excellent blog, is that despite the database copy in question being reported as “Healthy”, for some reason (potentially because the Replication Service on the server hosting the copy is stopped) a divergence has opened up between the timestamp (made available to DAG members though the cluster registry) for the last available log generated by the active copy and the system time on the server hosting the problematic copy. If the divergence is more than 12 minutes it could cause a problem if Active Manager attempted to activate this database copy because the potential exists that some logs are available for the previously active copy that will be ignored if this copy is brought online. Cue hole in database syndrome…

That's a large copy queue length!

That’s a large copy queue length!

Exchange detects these conditions and considers that replication is “stale”. To stop automatic activation, Exchange sets the copy queue length to 9223372036854775766 on the very sensible basis that such a number is going to exceed the AutoDatabaseMountDial setting for the server and so prevent Active Manager activating the copy automatically.

Getting back to EAC, the only way that you now see details of the copy queue length and replay queue length for a database copy is to select the relevant copy and then click the View Details link. This exposes all the relevant information, meaning that this isn’t another case where EAC is less functional than EMC – it’s just different and arguably a better implementation. If you prefer not to go through the somewhat tiresome select and click routine to check multiple database copies, you can simply run the Get-MailboxDatabaseCopyStatus command to review the replication status for all databases, or those belonging to a specific server or DAG.

I don’t mind that Microsoft has simplified matters by not displaying replication queue information for the DAG. It is in line with other efforts to simplify DAG management, such as removing the need to collapse DAG networks when DAGs extend across multiple subnets. In fact, Exchange 2013 prefers that you leave DAG network management to it.

Simplification and automation are good so I approve of what’s been done to make DAG management easier in Exchange 2013. Once they fix the fit-and-finish problems exhibited by the current version of EAC, it seems like some real progress will have been made over EMC.

Follow Tony @12Knocksinna

About Tony Redmond

Lead author for the Office 365 for IT Pros eBook and writer about all aspects of the Office 365 ecosystem.
This entry was posted in Email, Exchange, Exchange 2013 and tagged , , , , , , , , . Bookmark the permalink.

7 Responses to Exchange 2013 simplifies DAG management

  1. Will Martin says:

    Tony, I have to disagree on your discussion of DB copy status. If I’m looking at a database, I want to see how ALL copies of it are doing, and I am perfectly happy refreshing my view of the individual database’s copies – or of having Exchange do it for me automatically, since it is ONLY THE SINGLE DATABASE, or SERVER COPIES that I am worried about at that moment. We have four copies of each database, so if I have to open four individual property pages to see the status of a single databasde, I’m going back to the Shell and running Get-MailboxDatabase UserDB1 | Get-MailboxDatabaseCopyStatus – and the EAC IS less functional for my requirements than the EMC is.

  2. Hi Tony,

    You make the statement that “EMC only ever shows a snapshot of replication activity that’s accurate when EMC checks queue lengths. To be totally accurate, you’d need to have EMC refresh its data at a frequent interval, something that would impose a load on Exchange.”

    I’m not sure I understand how this is different from EAC or EMS behavior. When you look at the information in EAC, you are seeing the status as of the last fetch. When you run Get-MDCS in the shell, you are seeing the current status, which also does not refresh. The fact that you need to query status from our tools to see the latest and greatest values has remained largely unchanged over several versions. It would not be efficient or prudent to build in automatic refresh behavior.

    You also state that “the only way that you now see details of the copy queue length and replay queue length for a database copy is to select the relevant copy and then click the View Details link.” Yet, your own screen shot clearly shows the CQL visible for each database copy. So you don’t need to click View Details to see CQL. Yes, you do need to click View Details to see RQL, but is that really that big of a deal? And to be honest, not displaying RQL at a glance may have actually been an oversight on the part of the EAC team. I’ll chat with them and see if we can take a change that shows RQL at a glance with CQL.

    My $.02.

    • Hi Scott,

      Knowing that you monitor any statement made about DAGs, it’s not surprising that you’d come to debate these points! So much the better in the pursuit of truth…

      When I say that EMC only ever shows a snapshot of replication activity, I’m referring to the fact that many administrators believe that the data showed by EMC is refreshed to reflect almost real-time information. Of course it is not and that’s because of the reasons that I outline – it would be too much of an overhead to cycle and fetch replication data on a frequent interval for anything but a very small DAG.

      My belief is that EAC does a better job because it doesn’t create the impression of real-time data. Instead, when you look at EAC, I don’t think you could assume that you’re looking at anything but a static representation, including the information in the details pane. This is an improvement in my view because it removes doubt. If people want to get more frequent updates, they can do so through EMS. I agree that it would not be efficient or prudent to build in an automatic refresh interval. In fact, I’d go further and say that it would be silly.

      I think my statement about the CQL and RQL is correct. EAC shows static information, so the data that you view could be very stale. It is therefore logical that you’d have to force EAC to fetch up to date information by viewing the properties of a database copy if you wanted to be sure that you’re looking at the latest replication information. Even so, that data is becoming staler as you look at it.

      In a nutshell, I still think EAC is better than EMC is when displaying DAG information.

      TR

  3. I hadn’t heard of any customers with the misperception that EMC doesn’t need to be refreshed. My point is this…the underlying behavior has not changed. Whether you use EMC or EAC, you are still executing Get-MDCS under the covers when you open the view or refresh the view. There is no difference in behavior here, nor is there a difference in the information shown (except that in EMC, RQL could be seen at a glance, whereas EAC has it under database copy properties).

    As there is no business logic in any of our management GUIs, the information is always coming from PowerShell at the time the task is invoked. And unless and until you re-run the task, that information, as you point out, will be stale the moment it is printed.

    What I recommend for admins is that it is better to keep tabs on things like CQL and RQL by using perfmon and the appropriate perf counters. These do update automatically, and thresholds and alerts can be configured, as well.

    All that having been said, I do much prefer EAC over EMC (and I like EMC a lot, too).

    • Hi Scott,

      I totally accept that you’re running Get-MailboxDatabaseCopyStatus under the covers when EMC or EAC retrieves information about a database copy. The point is that EMC displays information in such a way that it’s easy for admins to make assumptions (and many have, as has been reported at conferences or in social fora). I think EAC makes it much harder to come to conclusions.

      TR

  4. Pingback: EAC Simplifies DAG Management « SME IT guy

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.