Skip to main content

restarting services and terminating processes with mom 2005

this particular example is for softgrid.  i thought it might be useful to generalize it for any purpose, though.  you probably already have services that may require a restart every now and then.  that’s pretty easy in mom.  you can do it by issuing a simple net stop && net start command as illustrated in this post.

the general perception is that admins are lazy.  to help perpetuate this obvious lie, i tried to use the simple method above but failed.  it turns out that some services don’t terminate the processes upon stopping, as you would expect.  short of trying some ridiculously long for loop statements inside of the batch response, you have to go with a script.

i really did consider going with batch script but ended up needing a bit more flexibility.  for instance, instead of blindly going through the cycle, i wanted to make sure we were still in the given condition before we went ahead with it.  to do that, we have to check the process utilization state.  anyway, the script does the following:

  • examines process(es) and the processor utilization rate
  • stops the service
  • terminates the running process(es)
  • starts the service
  • creates a log output
  • stamps the alert description with informative data

of course, we need to give it the process and service name we want it to attack.  for that, you’ll need the following parameters when you set up this script in mom.

  • sProcess – process name
  • iThreshold – threshold that the process utilization must be above
  • sService – service name to restart
  • sLogName – name of log file to generate

a bit more minutia – the script will check the process utilization 10 times in a row, then divide by 10 for the average.  if the average is above the threshold, it goes through the cycle to reset the thing.  you can change all that crap around in the script but is not exposed by parameter.

we’re in testing with opsmgr, so whenever we go live, i’ll have to convert these scripts.  i’ll post them in opsmgr format as i get them prepared.  for now, here’s the mom 2005 version:

'==========================================================================
' NAME: Service/Process Restart
'
' AUTHOR: Marcus C. Oh
' DATE  : 9/15/2008
'
' COMMENT: Recycles runaway processes and services based on a threshold
'          Logs to %windir%\temp directory
'
' VERSION: 1.0
'==========================================================================

' Standard event constants
Const EVENT_TYPE_SUCCESS = 0
Const EVENT_TYPE_ERROR   = 1
Const EVENT_TYPE_WARNING = 2
Const EVENT_TYPE_INFORMATION = 4

' Parameters for MOM
sProcess = ScriptContext.Parameters.Get("Process")
iThreshold = CInt(ScriptContext.Parameters.Get("Threshold"))
sService = ScriptContext.Parameters.Get("Service")
sLogName = ScriptContext.Parameters.Get("LogName")

sComputer = "."
bCycle = False

Set oAlert = ScriptContext.Alert


' Spin up the File System provider and create the log file
Set oShell = CreateObject("Wscript.Shell")
sWinDir = oShell.ExpandEnvironmentStrings("%WinDir%")
Set oFS = CreateObject("Scripting.FileSystemObject")
Set myLogFile = oFS.CreateTextFile(sWinDir & "\temp\" & sLogName,True)


' Spin up WMI
Set oWMIService = GetObject("winmgmts:\\" & sComputer & "\root\cimv2")


' Check the process from the parameter to see if the utilization 
' is currently above the indicated threshold.

myLog "[Starting process cycling...]"

'Set oPerfData = ScriptContext.Perfdata
myLog VbCrLf & vbTab & "Checking process(es) for: " & sProcess

Set cProcessNames = oWMIService.ExecQuery("Select handle from Win32_Process Where Name like '" & sProcess & "%'")
For Each oProcName In cProcessNames
    iLoop = 0
    iProcTime = 0
    myLog vbTab & "Examining process handle " & oProcName.handle
    While iLoop < 10
        Set cProcesses = oWMIService.ExecQuery("Select * From Win32_PerfFormattedData_PerfProc_Process Where IDProcess = '" & oProcName.handle & "'")
        For Each oProcess in cProcesses
            iProcTime = iProcTime + CInt(oProcess.PercentProcessorTime)
            myLog vbTab & oProcess.Name & " utilization aggregate - " & iProcTime & " (sample value - " & CInt(oProcess.PercentProcessorTime) & ")"
        Next
        iLoop = iLoop + 1
        mySleep(1000)
    Wend
    
    myLog vbTab & "Aggregate utilization for process handle " & oProcName.handle & " - " & iProcTime
    
    If iProcTime/10 > iThreshold Then
        myLog vbTab & "Process utilization matches criteria."
        myLog vbTab & "Divided by 10 - " & iProcTime/10
        bCycle = True
        Exit For
    Else
        myLog vbTab & "Process utilization at " & iProcTime/10 & " does not exceed threshold of " & iThreshold & VbCrLf
    End If
Next

If bCycle = True Then
    ' Stop the service.
    Call CommandService(sService,"Stop")
    mySleep(5000)
    
    
    ' Terminate all running processes.
    If VerifyService(sService,"Stopped") Then
        myLog VbCrLf & vbTab & sService & " has stopped successfully."
        myLog VbCrLf & vbTab & "Terminating process(es): " & sProcess
        Call TerminateProcess(sProcess)
    End If
    mySleep(5000)


    ' Start the service.
    Call CommandService(sService,"Start")
    mySleep(10000)
    
    
    'Verify the service started.
    If VerifyService(sService,"Started") Then
        myLog vbTab & sService & " has started successfully."
    Else
        myLog vbTab & sService & " has failed to start."
    End If

    
    ' Rewrite the original description with additional data.
    oAlert.Description = oAlert.Description & VbCrLf & VbCrLf &_
        "Remediation script for runaway processes has been executed." &_
        "Please review the following log for details: " & sWinDir & "\temp\" & sLogName
Else
    myLog vbTab & "Process utilization exceed threshold."
    
    ' Rewrite the original description with additional data.
    oAlert.Description = oAlert.Description & VbCrLf & VbCrLf &_
        "No remediation attempt required."
End If

myLog VbCrLf & "[Stopping process cycling...]"

' Close out the file
myLogFile.Close


' Subs and Functions ------------------------------------------------------

' Start/stop the service
Sub CommandService(sService,sAction)
    Set cServices = oWMIService.ExecQuery("Select * from Win32_Service where Name='" & sService & "'")
    For Each oService in cServices
        myLog VbCrLf & vbTab & sAction & " -- " & sService
        If sAction = "Stop" Then
            oService.StopService()
        ElseIf sAction = "Start" Then
            oService.StartService()
        End If
    Next
End Sub

' Verify the service state
Function VerifyService(sService,sState)
    Set cServices = oWMIService.ExecQuery("Select * From Win32_Service Where Name ='" & sService & "'")
    For Each oService in cServices
        If oService.State = sState Then
            VerifyService = True
        End If
    Next
End Function

' Terminate the processes
Sub TerminateProcess(sSGProcess)
    Set cRunningProcesses = oWMIService.ExecQuery("Select * from Win32_Process Where Name like '" & sSGProcess & "%'")
    For Each oRunningProcess in cRunningProcesses
        oRunningProcess.Terminate()
    Next
End Sub

' General sleep sub to switch between MOM and cmd line
Sub mySleep(iSleep)
    ScriptContext.Sleep(iSleep)
End Sub

Sub myLog(sData)
    myLogFile.WriteLine(sData)
End Sub

' Standard Event creation subroutine
Sub CreateEvent(iEventNumber,iEventType,sEventSource,sEventMessage)
    Set oEvent = ScriptContext.CreateEvent()
    oEvent.EventNumber = iEventNumber
    oEvent.EventType = iEventType 
    oEvent.EventSource = sEventSource
    oEvent.Message = sEventMessage
    ScriptContext.Submit oEvent
End Sub

Comments

Popular posts from this blog

using preloadpkgonsite.exe to stage compressed copies to child site distribution points

UPDATE: john marcum sent me a kind email to let me know about a problem he ran into with preloadpkgonsite.exe in the new SCCM Toolkit V2 where under certain conditions, packages will not uncompress.  if you are using the v2 toolkit, PLEASE read this blog post before proceeding.   here’s a scenario that came up on the mssms@lists.myitforum.com mailing list. when confronted with a situation of large packages and wan links, it’s generally best to get the data to the other location without going over the wire. in this case, 75gb. :/ the “how” you get the files there is really not the most important thing to worry about. once they’re there and moved to the appropriate location, preloadpkgonsite.exe is required to install the compressed source files. once done, a status message goes back to the parent server which should stop the upstream server from copying the package source files over the wan to the child site. anyway, if it’s a relatively small amount of packages, you can

How to Identify Applications Using Your Domain Controller

Problem Everyone has been through it. We've all had to retire or replace a domain controller at some point in our checkered collective experiences. While AD provides very intelligent high availability, some applications are just plain dumb. They do not observe site awareness or participate in locating a domain controller. All they want is the name or IP of one domain controller which gets hardcoded in a configuration file somewhere, deeply embedded in some file folder or setting that you are never going to find. How do you look at a DC and decide which applications might be doing it? Packet trace? Logs? Shut it down and wait for screaming? It seems very tedious and nearly impossible. Potential Solution Obviously I wouldn't even bother posting this if I hadn't run across something interesting. :) I ran across something in draftcalled Domain Controller Isolation. Since it's in draft, I don't know that it's published yet. HOWEVER, the concept is based off

sccm: content hash fails to match

back in 2008, I wrote up a little thing about how distribution manager fails to send a package to a distribution point . even though a lot of what I wrote that for was the failure of packages to get delivered to child sites, the result was pretty much the same. when the client tries to run the advertisement with an old package, the result was a failure because of content mismatch. I went through an ordeal recently capturing these exact kinds of failures and corrected quite a number of problems with these packages. the resulting blog post is my effort to capture how these problems were resolved. if nothing else, it's a basic checklist of things you can use.   DETECTION status messages take a look at your status messages. this has to be the easiest way to determine where these problems exist. unfortunately, it requires that a client is already experiencing problems. there are client logs you can examine as well such as cas, but I wasn't even sure I was going to have enough m