A way to reset monitors automatically when alert is closed
A lot of customers close alerts generated by monitors. Although they got very good instructions, they don't reset the monitors. PowerShell and Greenmaschine, don't do a very good job in resetting monitors. there should be a way to automatically reset monitors when an alert is closed and/or an option to reset all monitors to a green state.
This problem has been solved and will be shipped in SCOM 2019 (which will release in Q1 2019).
Details on the solution for this problem:
- An alert generated by monitor will not be allowed to close, if the corresponding monitor is still unhealthy. This will ensure that the health of the monitor and the corresponding alert are in sync.
- A new dashboard (“Alert Closure Failure”) has been created in Web console, which displays the list of active alerts which have been tried to close but could not close as the corresponding monitors where still unhealthy. This dashboard also allows you to forcefully close any such alerts by resetting the health of the corresponding monitor.
A monitor generated alert closure triggered from ticketing/incident management systems will receive an exception with the alert details, if the alert closure has failed because the corresponding monitor is still unhealthy.
Ronnie Viklund commented
I left a comment on the blog as well but it's been awaiting moderation for over a week.
I'm so glad I'm not a SCOM consultant anymore and I don't have to defend a crappy product anymore.
As the 2 ppl below me, MS missed the point totally and again the normal user gets a new proceduce to workaround the real issue. Alerts and monitors should be consider as 1 not 2 entities!!!
Ronnie Viklund commented
Oh no... the whole point was that we wanted monitors to reset to a healthy state when the corresponding alert was closed. We do not want a restriction that does not allow us to close the alert, this will cause even more headaches. Please consider revising this.
what a great example that you did not understand the problem we and our (your!) customers have.
Please read Stefan's first post describing the problem and compare it to your "solution".
Even if the new feature can (!) be useful (as long as the customer can choose how to handle it) it does not solve the problem…
J-P. Grohn commented
We need more Alert Rule automatic alert reset, when tick have gone back to green.
To the recent poster: I don't think this will be ever solved (like most of the feedback here).
However, that might help: https://gallery.technet.microsoft.com/PoSh-Reset-Monitor-On-c288374a?redir=0
Will this flaw ever get resolved?
I notice the last post on the Operations Manager Team Blog page talks about a 'powershell script to close old alerts' .... well, what if any of those old alerts are raised by monitors!!??
Other management packs, notorious ones like Active Directory and DNS base a lot of Alert Monitors on items in the Eventlogs, which never seem to find the counter-event which according to the monitor(s) would represent the Healthy state. That means their design is flawed.
A lot of alert monitors are either Manual reset. Which is a very poor design choice imo. The feature should be removed completely.
I'm assuming someone somewhere was thinking 'I want alerts in my (web)console that: 1) Give an alert and would also add to a certain Healthstate model which Alert Rules don't.
To compensate for this, I'd think that Alert Rules should be represented in the Healthmodel too.
To add flexibility, it would be work best if alert rules could be in/excluded to the health model with an override.
This would also allow for 'when closing an Alert rule also reset the health state of it automatically'.
Lars Villaume commented
The suggested global settings is good. Maybe also a setting to prevent the user to close the alert, but otherwise a built-in reset monitor setting.
Martin Ehrnst commented
Over a year later and SCOM 2016 released. This is still an issue. Any News on if or when we will have a solution for this?
Automatic alert resolution should at least reset the corresponding monitor, but having some kind of options would be Nice as well
Stelian, thanks for the link. I do have my own solution on that ( https://gallery.technet.microsoft.com/PoSh-Reset-Monitor-On-c288374a ). However, I think this should come right out of the product.
Please check: https://gallery.technet.microsoft.com/Alerts-Watchdog-Management-d5b3ea77 for a solution
Agree with Darren. Normal operators have no clue about the difference and close alerts that they think they have solved. IF that's a monitor alert and they didn't actually solve it, the monitor stays red and never generate an alert about this again. This has led to outages that could have been prevented by resetting the monitor on alert close (simply bc they will get a new alert).
Example of manual reset monitors can be found in many mp's. Most MS mp's don't have them anymore, but the DHCP mp still has loads of them i think (i've actually created my own and changed them to timed reset monitors).
Darren Joyce commented
With us, it's more a usability issue. It's hard to train large number of engineers on the difference between monitors/rules, and how to action each one. It's inevitable monitors will get closed accidentally. That's why I utilised the Resolved state - one method for everything. Then my scripts will determine what to do with the monitor/rule.
Saying that, today I reworked it so if a Monitor gets Closed, a command channel will do a health reset, so I can do away with the Resolved state.
Having two distinct actions depending on monitor or rule is a pain really.
Stefan Roth commented
When I tried to find a solution for a customer having ~1000 Server I had Active Directory, SQL Server, Windows OS, DNS, SharePoint 2013 and maybe 1-2 others implemented. Just the official MS MP's.Sorry, I cannot exactly recall but no Special third-party MP's..
I think it would make to have two settings.
1) A General reset button, which resets all monitors to green. This helps when tuning SCOM in the beginning of a project.
2) Resetting the monitors, when an alert gets closed.
I hope this helps...
since operators have no clue if the alert (raised by a monitor) will be closed automatically or not, they very likely close every alert manually as soon as they know they have solved the problem.
Please see my other feedback on that as well (would help in the operators alert handling processes):
For the others: I uploaded my ps1 to the gallery, maybe it helps the one or the other:
Darren, Patrick, Stefan, Thank you for your response. It is very helpful.
Can you share some examples of, in which MPs you have encountered such monitors with Manual Reset events?
Darren Joyce commented
Currently we tell our engineers to mark alerts as "resolved" (not "closed"), and I have a ps1 script that runs hourly that goes through this state and closes rules and resets monitors. It's a pain, but the best workaround we've found, apart from trying to train everyone to the difference between rules and monitors.
Same here... I use a ps1 which in my customers env which I trigger from notification channels as soon as an alert gets closed by an use (not system). Works more or less depending on the AsyncProcessLimit setting.