Monitoring last backup on SQL availability group database

Problem

I wanted to monitor database backup age on SQL Availability Group (AG) databases, but only on the primary replica, not secondary replicas as they weren't being backed up.

Here's a summary of my experience for future reference. I think I got there in the end - still testing - but this was harder than I thought it would be.

First, here's the monitor that does the job, we don't need to build one!

  • Name: Microsoft.SQLServer.Windows.Monitor.AvailabilityDatabaseHealth.BackupStatus
  • MP: Microsoft.SQLServer.Windows.Monitoring
It's disabled by default and doesn't generate an alert (which will cause us headaches later). There's also an interesting override called Track 'Availability Group Backup Preferences' that looks like it might do what you want. I tested it and it still generated duplicate alerts.

Solution

Write a PowerShell discovery that runs a SQL query to return all dbs hosted by the primary replica and add them to a group then enable the monitor for the group - easy!

Here's the SQL query that gets the data we need:

SELECT distinct ad.MachineName_EA844CA4_B2CD_69E3_D8A9_C2E4489C53C7 as Server
, ad.InstanceName_2CC762EA_9008_8DED_CEEA_C0A731372EA8 as Instance
, ad.DisplayName as DisplayName
, ad.AvailabilityReplicaId_6271191B_CCD9_7A8B_ECA3_17F409F1C332
, ad.AvailabilityDatabaseName_4400632F_8F79_864A_A1FD_5ACF68FB111A as DbName
FROM [OperationsManager].[dbo].[MTV_Microsoft$SQLServer$Windows$AvailabilityReplica] ar
inner join [OperationsManager].[dbo].[MTV_MicrosoftSQLServerWindowsDatabaseReplica] ad
on ar.MachineName_EA844CA4_B2CD_69E3_D8A9_C2E4489C53C7 = ad.MachineName_EA844CA4_B2CD_69E3_D8A9_C2E4489C53C7
AND ar.AvailabilityGroupName_2B65F5C7_DE13_BC25_C1EB_AA53047BB141=ad.AvailabilityGroupName_BE1A08AB_595A_3684_51F5_BDA703BC96C9
where ar.Role_52418D34_8CEA_2508_880C_DE9B8EE7DAB2 = 'Primary'

We need all those columns returned in our discovery as it uniquely identifies the object - that's my rationale anyway. If you don't have them, SCOM whinges at mp import.

Notice the bit in red, it's a work around for this problem.

Now I build my discovery script being sure to escape the $ sign in the query with `$ to keep PowerShell happy.

I import my mp then enable these overrides. Ignore the group name, it was for testing.

After a few minutes we get some alerts which is great but they're ugly. The alert name is the dot name of the monitor and there's no alert description:

At this point things are working reasonably well. Our group is getting populated and we're getting alerts but we need to fix them. To do that follow this article. In brief, you need to add the code below to an unsealed mp:

<Overrides>
<MonitorPropertyOverride ID="Aliase76ce3522d9f462da4ebvae908294433OverrideForMonitorMicrosoftSQLServerWindowsMonitorAvailabilityDatabaseHealthBackupStatusForContextSQLGroupAgPrimaryDb.AlertParameter1" Context="MicrosoftSQLServerWindowsDiscovery!Microsoft.SQLServer.Windows.AvailabilityDatabaseHealth" Enforced="false" Monitor="MicrosoftSQLServerWindowsMonitoring!Microsoft.SQLServer.Windows.Monitor.AvailabilityDatabaseHealth.BackupStatus" Property="AlertParameter1">
<Value>$Data/Context/Property[@Name='DatabaseName']$</Value>
</MonitorPropertyOverride>
<MonitorPropertyOverride ID="Aliase76ce3522d9f462da4ebvae918294433OverrideForMonitorMicrosoftSQLServerWindowsMonitorAvailabilityDatabaseHealthBackupStatusForContextSQLGroupAgPrimaryDb.AlertParameter2" Context="MicrosoftSQLServerWindowsDiscovery!Microsoft.SQLServer.Windows.AvailabilityDatabaseHealth" Enforced="false" Monitor="MicrosoftSQLServerWindowsMonitoring!Microsoft.SQLServer.Windows.Monitor.AvailabilityDatabaseHealth.BackupStatus" Property="AlertParameter2">
<Value>$Data/Context/Property[@Name='DaysSinceBackup']$</Value>
</MonitorPropertyOverride>
<MonitorPropertyOverride ID="Aliase76ce3522d9f462da4ebbae908294433OverrideForMonitorMicrosoftSQLServerWindowsMonitorAvailabilityDatabaseHealthBackupStatusForContextSQLGroupAgPrimaryDb" Context="MicrosoftSQLServerWindowsDiscovery!Microsoft.SQLServer.Windows.AvailabilityDatabaseHealth" Enforced="false" Monitor="MicrosoftSQLServerWindowsMonitoring!Microsoft.SQLServer.Windows.Monitor.AvailabilityDatabaseHealth.BackupStatus" Property="AlertMessage">
<Value>ED499E7E-D275-9E53-FDA6-B5A0EF526CB6</Value>
</MonitorPropertyOverride>
</Overrides>

<Presentation>
<StringResources>
<StringResource ID="Microsoft.SQLServer.Windows.Monitor.AvailabilityDatabaseHealth.BackupStatus.AlertMessage"/>
</StringResources>
</Presentation>

<DisplayString ElementID="Microsoft.SQLServer.Windows.Monitor.AvailabilityDatabaseHealth.BackupStatus.AlertMessage">
<Name>MSSQL on Windows: Database backup should be performed</Name>
<Description>Database '{0}' has not been backed up for {1} day(s).</Description>
</DisplayString>

I added them to my SQL overrides mp Microsoft.SQLServer.Windows.Monitoring.Overrides.

Now alerts look like this:


I messed around with adding more info because I wanted server names and SQL instance names in the description but it didn't work. I tried adding this (below) but it broke the monitoring (it was no longer monitored) which is bizarre:

<MonitorPropertyOverride ID="Aliase76ce3522d9f462da4ebvae918294433OverrideForMonitorMicrosoftSQLServerWindowsMonitorAvailabilityDatabaseHealthBackupStatusForContextSQLGroupAgPrimaryDb.AlertParameter3" Context="MicrosoftSQLServerWindowsDiscovery!Microsoft.SQLServer.Windows.AvailabilityDatabaseHealth" Enforced="false" Monitor="MicrosoftSQLServerWindowsMonitoring!Microsoft.SQLServer.Windows.Monitor.AvailabilityDatabaseHealth.BackupStatus" Property="AlertParameter3">
<Value>$Target/Host/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/MachineName$</Value>
</MonitorPropertyOverride>

I think the problem is because that information isn't captured in <AlertSettings> in the monitor itself.

Final notes...

In the script discovery I had to return these properties or the mp wouldn't import:

  • $ObjectInstance.AddProperty("$MPElement[Name='System!System.Entity']/DisplayName$", $DisplayName)
  • $ObjectInstance.AddProperty("$MPElement[Name='SqlLib!Microsoft.SQLServer.Core.DBEngine']/MachineName$", $Server)
  • $ObjectInstance.AddProperty("$MPElement[Name='SqlLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$", $Instance)
  • $ObjectInstance.AddProperty("$MPElement[Name='SqlLib!Microsoft.SQLServer.Core.AvailabilityGroupHealth']/UniqueId$", $Ida)
  • $ObjectInstance.AddProperty("$MPElement[Name='SqlLib!Microsoft.SQLServer.Core.AvailabilityDatabaseHealth']/DatabaseName$", $DbName)

That took me a while to figure out but this article helped the penny drop.

I also added a custom state view that sits under the SQL views from Microsoft's sealed mps:


Now we watch and wait and see if this thing works 👀

Links

  • Group discovery mp source
  • Override mp to enable alert source (warning contains all my other overrides)
  • SQL view mp source

Comments