I suggest you ...

Logical Chain of Components (Alert Storm Prevention)

We need an option to associate components to each other to prevent an alert storm created by a failure of a central component.
For example if a central router on a remote location fails we get alerts from each component monitored at the location even though its still available locally.

203 votes
Vote
Sign in
(thinking…)
Sign in with: Facebook Google
Signed in as (Sign out)
You have left! (?) (thinking…)
Heiko shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

5 comments

Sign in
(thinking…)
Sign in with: Facebook Google
Signed in as (Sign out)
Submitting...
  • Saiyad Rahim commented  ·   ·  Flag as inappropriate

    Agree.
    SCOM's lack of this feature is becoming a pain as we got more of our regional offices on the WAN connected via switches and routers.

    Flood of alerts spamming the Service Desk has angered the SD Manager as well which is totally under stood.
    There should be a feature to stop further alerts from connected items if the "Core" item/device such as Switch/Router/opsMgr Server /SCOM GW etc goes down.

    There should be an ability of SCOM to ask during discovery if the item is a Core device or not which would be a starting point from there the user/SCOM admin can control if this devices failure should send out further alerts for its corresponding connected/underlying devices/objects.

  • Jon Sykes commented  ·   ·  Flag as inappropriate

    I agree, alert correlation is the holy grail. You won't find this feature in any product and probably never will. Christian hits the nail on the head though! No reason to generate health service heartbeats etc when a SCOM GW goes offline (network/down etc).

    What's being asked for is a correlation engine to determine the root cause issue. Something that will likely never be done due to the fact SCOM can monitor MSFT + 3rd party technology. What they could do is use a technique to suppress a flood of alerts that come in within a matter of seconds or a minute and possibly surface that as a single "event". or "outage"

Feedback and Knowledge Base