/usr/share/check_mk/checks-man/ps is in check-mk-server 1.2.8p16-1ubuntu0.1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | title: State and Count of Processes
agents: linux, windows, aix, solaris, openvms
catalog: os/ps
license: GPL
distribution: check_mk
description:
This check looks into the list of then current running processes for
those matching a certain name or regular expression and optionally
being owned by a certain user. The number of
matching processes is matched against warning and critical levels.
item:
A user definable service description for Nagios.
That description must be unique within each host.
Changing the description will make Nagios think that
it is another service.
inventory:
As of version 1.1.1, {ps} supports inventory. Since Check_MK cannot
know which processes are of relevance to you, some configuration is
needed. The configuration is done via {inventory_processes}. The structure
of that variable is a list of seven-tuples. It is similar but not
completely the same as the configuration of manual checks. The seven
components of each entry are: {(1)} a service description, {(2)} a pattern
(just as the first parameter of the check), {(3)} a
user specification and {(4)} - {(7)} the warning and critical levels
for the number of processes.
During inventory Check_MK tries to match all entries on each
process found on the target host. If an entry matches, a new check will
be created according to the entry (if it's not already existing).
The {service description} may contain one or more occurances of {%s}.
If you do this, then the pattern must be a regular expression and
be prefixed with {~}. For each {%s} in the description, the expression has to contain
one "group". A group is a subexpression enclosed in brackets, for example
{(.*)} or {([a-zA-Z]+)} or {(...)}. When the inventory finds a process
matching the pattern, it will substitute all such groups with the
{actual values} when creating the check. That way one rule can create
several checks on a host.
If the pattern contains more groups then occurrances of {%s} in
the service description then only the first matching subexpressions
are used for the service descriptions. The matched substrings corresponding
to the remaining groups are copied into the regular expression, nevertheless.
New in version 1.2.2: As an alternative to {%s} you may also use {%1}, {%2},
etc. These will be replaced by the first, second, ... matching group. This
allows you to reorder things.
Wildcards not enclosed by brackets are simply copied verbatim to the created
checks. Please refer to the examples for more details.
The {user specification} can either be a user name (string). The inventory
will then trigger only if that user matches the user the process is running as
and the resulting check will require that user. Alternatively you can specify
{ANY_USER} or {GRAB_USER}. If you specify {ANY_USER} then the user field
will simply be ignored. The created check will not check for a specific user.
Specifying {GRAB_USER} makes the created check expect the process to run
as the same user as during inventory: the user name will be hardcoded into
the check. In that case if you put {%u} into the service description,
that will be replaced by the actual user name during inventory. You need
that if your rule might match for more than one user - your would create
duplicate services with the same description otherwise.
The {warning and critical levels} are simply copied to the created
checks.
As of version {1.1.7i1} ps inventory allows optional host specification.
You can prepend a list of host names or a list of tag names and {ALL_HOSTS}
to some of all rules of the inventory specification. That way you can
make the inventory apply some rules only to certain hosts.
examples:
# Examples for manually configured checks
checks += [
# make sure exactly one ntpd is running
( "somehost", "ps", "NTP", ( "/usr/sbin/ntpd", 1, 1, 1, 1 ) ),
# or this can be done with the new, dictionary based
# syntax; in this case additional parameters cpulevels and
# cpu_average can be passed
( 'somehost', 'ps', 'NTP', {
'process': '/usr/sbin/ntpd',
'user': 'ntp',
'warnmin': 1,
'warnmax': 1,
'okmin': 1,
'okmax': 1,
'cpulevels': (90.0, 98.0),
'cpu_average': 15,
}),
# the same, but for all hosts with the tag "lnx"
( ["lnx"], ALL_HOSTS, "ps", "NTPD", ( "/usr/sbin/ntpd", 1, 1, 1, 1 ) ),
# command lines containing "prefork" should be in range 20 ... 30
( "somehost", "ps", "prefork", ( "~.*prefork", 15, 20, 30, 35 ) ),
# no process containing "sshd" should exist
( "somehost", "ps", "SSHD", ( "~.*sshd", 0, 0, 0, 0 ) ),
# check number of processes owned by user "www-data"
( "somehost", "ps", "count www", ( None, "www-data", 10, 15, 25, 30 ) ),
# make sure no "sshd" processes owned by user "root" exists
( "somehost", "ps", "SSHD[root]", ( "~.*sshd", "root", 0, 0, 0, 0) ),
]
# Examples for automatic inventorization of processes
inventory_processes = [
# if we find a /usr/sbin/ntpd on any host => monitor it
( "NTPD", "/usr/sbin/ntpd", ANY_USER, 1, 1, 1, 1),
# alternative(1): do the same, but only on hosts tagged with "ntp":
( ['ntp'], ALL_HOSTS, "NTPD", "/usr/sbin/ntpd", ANY_USER, 1, 1, 1, 1),
# alternative(2): do the same, but only on two specific hosts
( ['xsrv1', 'xsrv2'], "NTPD", "/usr/sbin/ntpd", ANY_USER, 1, 1, 1, 1),
# alternative(3): do the same, but make sure, ntpd is always
# running as the user under which we found it first:
( "NTPD2 %u", "/usr/sbin/ntpd", GRAB_USER, 1, 1, 1, 1),
# if we find Apache2, monitor it and make sure it's running
# as the user 'www-data'! Make also sure, that the number
# of processes is between 5 and 10, (critical if not running
# or if more than 30 processes):
( "Apache2", "~.*bin/apache2", "www-data", 1, 5, 10, 30 ),
# now to something more complex: monitor dhclient processes.
# For each NIC there might by one dhclient. We monitor the
# processes separately for each NIC at which we find a dhclient
# running.
( "DHCP-Client NIC %s", "~dhclient.* ([a-z]+[0-9]-*)$", ANY_USER, 1, 1, 1, 1) ,
# SAP: Search for SAP processes, extract three components, add the
# user name the process was running as to the service description
( "SCS %s/%s/%s %u", "~(..)\.sap(...)_SCS([0-9]+) pf=", GRAB_USER, 1, 1, 1, 1 ),
]
[configuration]
inventory_processes (list): A list of 7-tuples. See {INVENTORY} for details.
[parameters]
procname (string): Process specification. This can either be the name of a process as ouput by the agent.
It must exactly match the first column of the agents output. Or - if the string is
beginning with a tilde - it is interpreted as a regular expression that must match
the beginning of the process line as output by the agent. For the Linux/UNIX agents
this allows you to match specific command line arguments of the process. A third possibility is to
set the process name to {None}. In that case {all} processes will match. This is probably only useful
if you specify a user name.
user (string): Name of user that owns the process(es). This parameter is optional and can be left out completely.
warnmin (int): Minimum number of matched process for WARNING state
okmin (int): Minimum number for OK state
okmax (int): Maximum number for OK state
warnmax (int): Maximum number for WARNING state. Counts less than warnmin or greater than
warnmax are CRITICAL
parameters (dict): Or the parameters can be given in the new, dictionary based syntax
with the elements below.
In this case additional parameters cpulevels and cpu_average can be passed.
{"process"}: procname, see above
{"user"}: see above
{"warnmin"}: see above
{"warnmax"}: see above
{"okmin"}: see above
{"okmax"}: see above
{"cpulevels"}: (float, float): levels of CPU usage for {WARN} and {CRIT} in percent
{"cpu_average"}: int: build average over CPU usage for the given time in minutes
{"virtual_levels"}: (int, int): levels for the virtual memory usage in bytes
{"resident_levels"}: (int, int): levels for the resident memory usage in bytes
{"handle_count"}: (int, int): levels for the process handle count (only applies to windows)
|