Logical Chain of Components (Alert Storm Prevention)
We need an option to associate components to each other to prevent an alert storm created by a failure of a central component.
For example if a central router on a remote location fails we get alerts from each component monitored at the location even though its still available locally.
Saiyad Rahim commented
SCOM's lack of this feature is becoming a pain as we got more of our regional offices on the WAN connected via switches and routers.
Flood of alerts spamming the Service Desk has angered the SD Manager as well which is totally under stood.
There should be a feature to stop further alerts from connected items if the "Core" item/device such as Switch/Router/opsMgr Server /SCOM GW etc goes down.
There should be an ability of SCOM to ask during discovery if the item is a Core device or not which would be a starting point from there the user/SCOM admin can control if this devices failure should send out further alerts for its corresponding connected/underlying devices/objects.
Jon Sykes commented
I agree, alert correlation is the holy grail. You won't find this feature in any product and probably never will. Christian hits the nail on the head though! No reason to generate health service heartbeats etc when a SCOM GW goes offline (network/down etc).
What's being asked for is a correlation engine to determine the root cause issue. Something that will likely never be done due to the fact SCOM can monitor MSFT + 3rd party technology. What they could do is use a technique to suppress a flood of alerts that come in within a matter of seconds or a minute and possibly surface that as a single "event". or "outage"
Christian Stückrath commented
same issue when a scom gw fails (one alert should be enough, not 200+ SMS)
Needs alerts correlation, we get multiple alerts for same/similar issue.
Eiba Haddad commented
Agreed, adding intelligence and alerts correlation would be very useful