Upgrade your SCOM Notifications with PowerShell

Published February 17, 2014 by FoxDeploy

At a client recently for a proof of concept job, we implemented OpsManager to replace an existing monitoring product they were using in their environment.

Out of the gates, they loved it! SCOM had out of the box management functionality for most the equipment in their environment, and with installing just a few quick management packs, they were able to monitor everything they wanted. It was great, it was easy and everyone had that warm, fuzzy feeling of IT Project Satisfaction.

One of the major concerns we began to hear was that the out of the box alerts from SCOM weren’t very informative. For instance, an e-mail would tell you that an alert was triggered, and when and on which computer, but other than that, you were kind of on your own.

I was quickly volunteered eager to jump into the fray, employing two of my favorite tools to fix the issue, Orchestrator and PowerShell!

To start, here is the default notification:

\-->Alert: ConfigMgr 2007 Component Health:

SMS_PXE_SERVICE_POINT state

Source: sccmpr01

Path: sccmpr01.woodlawn.net

Last modified by: USA\\OPsmgr

Last modified time: 2/11/2014 10:41:32 PM Alert description: sccmpr01

           - ConfigMgr 2007 Component Health: SMS\_PXE\_SERVICE\_POINT state.

            The availability state for SMS component 'SMS\_PXE\_SERVICE\_POINT' in site WD1 changed from 'Online' to 'Failed'.  Its installation state is 'Installed'.  Its execution state is 'Hung'.  This component last provided a heartbeat at '02/11/2014 22:39:23'.  The next heartbeat is expected in '30' seconds from that time.

Alert view link: "[http://scom.woodlawn.net/OperationsManager?DisplayMode=Pivot&AlertID=%7b1\[...\]-aa489%7d]()"

Notification subscription ID generating this message: {6E14B614-838C-77E1-0176-3A369BC231C2}

Yeah, pretty uninspiring. There is a web link, which is nice, but we can’t get to the meat of the issue. They asked for something which I thought was quite reasonable: “For a disk space alert, why can’t I see which disk and what threshold triggered the alert”, or “For CPU Usage monitors, how come I can’t see a listing of which application are pegging the CPU?”. Seemed pretty reasonable to me.

So, here is what I did. Using Orchestrator, I created a runbook that listens for a new Alert or Monitor being created. For the next step of the runbook, a PowerShell script is run that reaches out using the Operations Manager module and gathers information about the event using various methods and properties. This information is used to build an HTML e-mail, making liberal use of the Convert-ToHTML -Fragment and -As Table and -As List parameters.

We then run a snippet of code, based on the alert title to gather additional information. For instance, if the alert is a ‘disk space too low’ monitor that is exceeded, we may run a WMI query and gather information about the hard drive space free based on the drive letter mentioned in the alert.

The key thing to realize here is that this example just uses a bit of PowerShell to pull out some interesting information already there in Operations Manager, and stores it in a variable which is then string-expanded into an HTML message body. There are some typos in the text below, all of which stems from the Knowledge base and article info present in OpsMgr.

And here is our final result:

Alert - NA-SCOM-01 - Logical Disk Free Space is low

Information

This alert was triggered because the following monitor was exceeded:

Logical Disk Free Space - Monitor the percentage free space and number of free MBytes remaining on a logical disk. Only when both the low percentage free space threshold and low number of free MBytes threshold is the disk flagged as having low disk free space.

System Name	Drive Type	Volume Name	Name	Size (GB)	Free Space (GB)	Percent Free
NA-SCOM-01	3a		C:	99.90	1.62	1.67

Thresholds

The following threshold criteria were evaluated during this alert:

System Drive Warning MBytes Threshold:	500
System Drive Warning Percent Threshold:	10
System Drive Error Mbytes Threshold:	300
System Drive Error Percent Threshold:	5
Non System Drive Warning Mbytes Threshold:	2000
Non System Drive Warning Percent Threshold:	10
Non System Drive Error Mbytes Threshold:	1000
Non System Drive Error Percent Threshold:	5

Click here to view the Alert: “http://scom.ops.customer.net/OperationsManager?[..]”

Notification subscription ID generating this message: Tier II Support - 8 hour Response SLA

Knowledgebase

The following information has been provided to assist in addressing this matter:

Summary

The amount of free disk space on the logical disk volume has exceeded the threshold. System performance may be adversely affected and the ability to add or modify existing files on the logical disk volume may not be possible until additional free space is made available.

Configuration

The Logical Disk Free Space monitoring routine is a high configurable solution that enables Operators to set varying threshold values for system and non-system logical disk volumes. In addition separate threshold values can be set for Warning and Error states.

Since logical disk volumes may vary in size from a few gigabytes to many terabytes or more the Logical Disk Free Space monitoring routine requires that an Operator indicate both the Megabyte and Percentage based threshold values that must be passed before the Warning and Error thresholds reached. This means that in order for the threshold to be reached both the Megabyte and Percentage based threshold values for the System or Non-System Drive must be breached.

The default threshold values for the Logical Disk Free Space monitoring routine include:

System Drive Free Space Thresholds (Defaults)

Parameter

Default Value

System Drive Error Mbytes Threshold	100
System Drive Error Percent Threshold	5
System Drive Warning Mbytes Threshold	200
System Drive Warning Percent Threshold	10

Non-System Drive Free Space Thresholds (Defaults)

Parameter

Default Value

Non-System Drive Error Mbytes Threshold	1000
Non-System Drive Error Percent Threshold	5
Non-System Drive Warning Mbytes Threshold	2000
Non-System Drive Warning Percent Threshold	10

Please note that Overrides can be used to change any of the threshold values that are defined above. In addition these thresholds can be applied to all logical disk volume instances in the management group or if needed separate threshold values can be defined for specific logical disk volume instances.

Causes

When existing files grow in size and the new files are added, the free space is taken up on a logical disk. When the amount of free space on the logical disk falls below the threshold, the state for the logical disk will change.

Resolutions

To increase the amount of available disk space, do one or more of the following:

· Run Disk Cleanup to gain more free space on the disk.

· Back up and remove files, or delete unnecessary files from the disk.

· Move files to another disk or to offline storage.

· Purchase additional storage or switch to a larger disk.

To view recent disk space history you can use the following view:

Start Disk Capacity View

This approach uses a runbook to gather the information needed to create this report, however the same could be done using a notification channel in SCOM for the clever.

Big thanks to Sean Duffey for his great blog post Building a Daily Systems report email with Powershell for getting me started down this path.

FoxDeploy.com

Upgrade your SCOM Notifications with PowerShell