Grey agents

Anything that runs a health service in SCOM can go grey (or gray 🙄). That includes agents, gateways and management servers. This article seeks to demystify identifying the agents that have gone grey, because between using PowerShell and SQL, it's not as easy as it should be.

Here's the current SQL query I use. The logic is:

  1. It targets the health service to get unique values, not Windows computer.
  2. Just get unhealthy ones - IsAvailable=0
  3. Not in maintenance mode - IsInMaintenanceMode=0
Here's the current PowerShell script I'm using. Theoretically it should return the same info as the SQL query.

Open up both articles so you can reference them. The interesting line in PowerShell is this:

Note: I use Microsoft.SystemCenter.HealthService to get anything that runs a health service, I don't just want agents.

get-scomclass -name "Microsoft.SystemCenter.HealthService" | get-scomclassinstance | where {$_.IsAvailable -eq $false -and $_.InMaintenanceMode -eq $false} | sort DisplayName | ft DisplayName, HealthState, InMaintenanceMode, IsAvailable

If I then run this command, I get 9 agents returned:

$Instance | ft DisplayName, HealthState, InMaintenanceMode, IsAvailable

The SQL query only returns 3 agents! Why the difference? When you compare the SQL query to PowerShell, we are filtering on the same things, i.e., IsAvailable=0 and IsInMaintenanceMode=0 so they kind of match up.

What I noticed in PowerShell was if you look at the HealthState column, it shows states of Uninitialized and Success. If you search for the computer in Administration > Agent Managed, the Uninitialized ones have a white circle whereas the Success ones are actually grey.

The Uninitialized ones often don't have any (or many) properties in Monitoring > Windows Computers. What's even more interesting is the Success ones match up with the SQL query.

This isn't always accurate however. I've seen agents showing as Uninitialized often had properties and it was some other bizarre reason like SCCM suspending the agent that made it go weird, hopefully these are the exception.

So where does this leave us? For now, I'm sticking with the current query and PowerShell as being the most accurate way to identify grey health services. It's great they match up, but I'll keep testing and update with new findings as needed.

Comments