Upgrade your SCOM Notifications with PowerShell
At a client recently for a proof of concept job, we implemented OpsManager to replace an existing monitoring product they were using in their environment.
Out of the gates, they loved it! SCOM had out of the box management functionality for most the equipment in their environment, and with installing just a few quick management packs, they were able to monitor everything they wanted. It was great, it was easy and everyone had that warm, fuzzy feeling of IT Project Satisfaction.
One of the major concerns we began to hear was that the out of the box alerts from SCOM weren’t very informative. For instance, an e-mail would tell you that an alert was triggered, and when and on which computer, but other than that, you were kind of on your own.
I was quickly volunteered eager to jump into the fray, employing two of my favorite tools to fix the issue, Orchestrator and PowerShell!
To start, here is the default notification:
\-->Alert: ConfigMgr 2007 Component Health:
SMS_PXE_SERVICE_POINT state
Source: sccmpr01
Path: sccmpr01.woodlawn.net
Last modified by: USA\\OPsmgr
Last modified time: 2/11/2014 10:41:32 PM Alert description: sccmpr01
- ConfigMgr 2007 Component Health: SMS\_PXE\_SERVICE\_POINT state.
The availability state for SMS component 'SMS\_PXE\_SERVICE\_POINT' in site WD1 changed from 'Online' to 'Failed'. Its installation state is 'Installed'. Its execution state is 'Hung'. This component last provided a heartbeat at '02/11/2014 22:39:23'. The next heartbeat is expected in '30' seconds from that time.
Alert view link: "[http://scom.woodlawn.net/OperationsManager?DisplayMode=Pivot&AlertID=%7b1\[...\]-aa489%7d]()"
Notification subscription ID generating this message: {6E14B614-838C-77E1-0176-3A369BC231C2}
Yeah, pretty uninspiring. There is a web link, which is nice, but we can’t get to the meat of the issue. They asked for something which I thought was quite reasonable: “For a disk space alert, why can’t I see which disk and what threshold triggered the alert”, or “For CPU Usage monitors, how come I can’t see a listing of which application are pegging the CPU?”. Seemed pretty reasonable to me.
So, here is what I did. Using Orchestrator, I created a runbook that listens for a new Alert or Monitor being created. For the next step of the runbook, a PowerShell script is run that reaches out using the Operations Manager module and gathers information about the event using various methods and properties. This information is used to build an HTML e-mail, making liberal use of the Convert-ToHTML -Fragment and -As Table and -As List parameters.
We then run a snippet of code, based on the alert title to gather additional information. For instance, if the alert is a ‘disk space too low’ monitor that is exceeded, we may run a WMI query and gather information about the hard drive space free based on the drive letter mentioned in the alert.
The key thing to realize here is that this example just uses a bit of PowerShell to pull out some interesting information already there in Operations Manager, and stores it in a variable which is then string-expanded into an HTML message body. There are some typos in the text below, all of which stems from the Knowledge base and article info present in OpsMgr.
And here is our final result:
Alert - NA-SCOM-01 - Logical Disk Free Space is low
Information
This alert was triggered because the following monitor was exceeded:
Logical Disk Free Space - Monitor the percentage free space and number of free MBytes remaining on a logical disk. Only when both the low percentage free space threshold and low number of free MBytes threshold is the disk flagged as having low disk free space.
System Name | Drive Type | Volume Name | Name | Size (GB) | Free Space (GB) | Percent Free |
NA-SCOM-01 | 3a | C: | 99.90 | 1.62 | 1.67 |
Thresholds
The following threshold criteria were evaluated during this alert:
System Drive Warning MBytes Threshold: | 500 |
System Drive Warning Percent Threshold: | 10 |
System Drive Error Mbytes Threshold: | 300 |
System Drive Error Percent Threshold: | 5 |
Non System Drive Warning Mbytes Threshold: | 2000 |
Non System Drive Warning Percent Threshold: | 10 |
Non System Drive Error Mbytes Threshold: | 1000 |
Non System Drive Error Percent Threshold: | 5 |
Click here to view the Alert: “http://scom.ops.customer.net/OperationsManager?[..]”
Notification subscription ID generating this message: Tier II Support - 8 hour Response SLA
Knowledgebase
The following information has been provided to assist in addressing this matter:
Summary
The amount of free disk space on the logical disk volume has exceeded the threshold. System performance may be adversely affected and the ability to add or modify existing files on the logical disk volume may not be possible until additional free space is made available.
Configuration
The Logical Disk Free Space monitoring routine is a high configurable solution that enables Operators to set varying threshold values for system and non-system logical disk volumes. In addition separate threshold values can be set for Warning and Error states.
Since logical disk volumes may vary in size from a few gigabytes to many terabytes or more the Logical Disk Free Space monitoring routine requires that an Operator indicate both the Megabyte and Percentage based threshold values that must be passed before the Warning and Error thresholds reached. This means that in order for the threshold to be reached both the Megabyte and Percentage based threshold values for the System or Non-System Drive must be breached.
The default threshold values for the Logical Disk Free Space monitoring routine include:
System Drive Free Space Thresholds (Defaults)
Parameter
Default Value
System Drive Error Mbytes Threshold | 100 |
System Drive Error Percent Threshold | 5 |
System Drive Warning Mbytes Threshold | 200 |
System Drive Warning Percent Threshold | 10 |
Non-System Drive Free Space Thresholds (Defaults)
Parameter
Default Value
Non-System Drive Error Mbytes Threshold | 1000 |
Non-System Drive Error Percent Threshold | 5 |
Non-System Drive Warning Mbytes Threshold | 2000 |
Non-System Drive Warning Percent Threshold | 10 |
Please note that Overrides can be used to change any of the threshold values that are defined above. In addition these thresholds can be applied to all logical disk volume instances in the management group or if needed separate threshold values can be defined for specific logical disk volume instances.
Causes
When existing files grow in size and the new files are added, the free space is taken up on a logical disk. When the amount of free space on the logical disk falls below the threshold, the state for the logical disk will change.
Resolutions
To increase the amount of available disk space, do one or more of the following:
· Run Disk Cleanup to gain more free space on the disk.
· Back up and remove files, or delete unnecessary files from the disk.
· Move files to another disk or to offline storage.
· Purchase additional storage or switch to a larger disk.
To view recent disk space history you can use the following view:
Start Disk Capacity View
This approach uses a runbook to gather the information needed to create this report, however the same could be done using a notification channel in SCOM for the clever.
Big thanks to Sean Duffey for his great blog post Building a Daily Systems report email with Powershell for getting me started down this path.