Description :
Checks by snmp v1 or v3 if a process is running
and how many instances are running (minimum & maximum).
It is also possible to check memory and cpu used by one or a group
of process
Works on Windows, Linux/Unix, AS400.
Vérifie par snmp v1 ou v3 si un process tourne
et combien d'instances de ce process tournent (minimum et maximum).
Il est également possible de vérifier la mémoire
et le cpu utilisé.
Standard checks
The plugin checks if there is at least one
process matching the filter (-n option) when no warning or
critical levels are set.
The filter is treated as a regular expression by default, but you
can deactivate this (-r)
You can use -w and -c options to set the warning
and critical levels :
-w <minW>,<maxW> : with minW and maxW the minimum
and maximum number of processes.
-c <minC>,<maxC> : same thing
Of course : minC <= minW < maxW <=maxC
You can omit <maxW> and <maxC>
Saying N is the current number of processes
- N < minC : critical
- minC < N <=minW : warning
- minW< N <= maxW : OK
- maxW< N <= maxC : warning
- maxC < N : critical
Memory checks
The -m option can check the memory used
by the selected processes.
By default, this will select the process wich use the maximum memory.
The -a switch will make an average
Ex : -m 7,20 will send a warning if a process uses more than
7 Mb, and critical for more than 20Mb.
CPU checks
When you use the -u option, a temporary
file will be created in "/tmp" by default : this can be
changed at the beginning of the script.
The file name will be : tmp_Nagios_proc.<host IP>.<process
filter>.
The -u option will add all the cpu used by
all selected process and the make the check
-u 91,95 : will send a warning if more
than 91% of cpu is used, and critical if more than 95% is used.
On multiprocessor hosts, the % of cpu use can
be > 100% : on a 4 CPU host, cpu usage can go up to 400% (the
script doesn't check if a host is multiprocessor or not).
The script curently wants a minimum of 5 minutes
between values taken from host (can be changed at the beginning
of the scripts). You can check more than once every 5 minutes but
don't put check-interval to more than 15 minutes.
When the script doesn't have enough data to compute the CPU use
(for example, the first time it is run), then it will return a UNKNOWN
status.
Msg size option (-o option)
In case you get a "ERROR: running table
: Message size exceeded maxMsgSize" error, you may need to
adjust the maxMsgSize, i.e. the maximum size of snmp message with
the -o option. Try a value with the -o AND the -v option : the script
will output the actual value so you can add some octets to it with
the -o option.
SNMP Login
See snmp info page
Requirements :
- Perl in /usr/bin/perl - or just run 'perl
script'
- Net::SNMP
- file 'utils.pm' in plugin diretory
Dowload
latest version : 1.4
Configurations
examples
Changelog
: On CVS repository on sourceforge : http://nagios-snmp.cvs.sourceforge.net/nagios-snmp/plugins/.
Examples :
All examples below are considering the script is local directory.
Host to be checked is 127.0.0.1 with snmp community "public".
If multiple interfaces are selected, all must
be up to get an OK result
Get help
|
./check_snmp_process.pl -h
|
snmpv3 login |
./check_snmp_process.pl -H
127.0.0.1 -l login -x passwd |
Check if at least one process matching http is running
|
./check_snmp_process.pl -H 127.0.0.1 -C public -n http
|
Result example :
|
3 process matching http : > 0 : OK
|
Check if at least 3 process matching http are running
|
./check_snmp_process.pl -H 127.0.0.1 -C public -n http
-w 2 -c 0
|
Result example :
(<=2 will return warning, 0 critical)
|
3 process matching httpd :
> 2 : OK |
Check if at
least one process named "httpd" exists (no regexp) |
./check_snmp_process.pl -H
127.0.0.1 -C public -n http -r |
Result example :
|
3 process named httpd
: > 0 : OK |
Check process
by their full path : check process of /opt/soft/bin/ (at least
one) |
./check_snmp_process.pl -H
127.0.0.1 -C public -n /opt/soft/bin/ -f |
Check that at
least 3 process but not more than 8 are running |
./check_snmp_process.pl -H
127.0.0.1 -C public -n http -w 3,8 -c 0,15 |
Same checks
+ checks maximum memory used by process (in Mb) : warning and
critical levels |
./check_snmp_process.pl -H
127.0.0.1 -C public -n http -w 3,8 -c 0,15 -m 9,25 |
Same check but
sum all CPU used by all selected process |
./check_snmp_process.pl -H
127.0.0.1 -C public -n http -w 3,8 -c 0,15 -m 9,25 -u 70,99 |
Output of check_snmp_process.pl -h
SNMP Process Monitor for Nagios version 1.4
GPL licence, (c)2004-2006 Patrick Proy
Usage: ./check_snmp_process.pl [-v] -H <host> -C <snmp_community>
[-2] | (-l login -x passwd) [-p <port>] -n <name>
[-w <min_proc>[,<max_proc>] -c <min_proc>[,max_proc]
] [-m<warn Mb>,<crit Mb> -a -u<warn %>,<crit%>
] [-t <timeout>] [-o <octet_length>] [-f ] [-r]
[-V] [-g]
-v, --verbose
print extra debugging information (and lists all storages)
-h, --help
print this help message
-H, --hostname=HOST
name or IP address of host to check
-C, --community=COMMUNITY NAME
community name for the host's SNMP agent (implies SNMP v1
or v2c with option)
-l, --login=LOGIN ; -x, --passwd=PASSWD, -2, --v2c
Login and auth password for snmpv3 authentication
If no priv password exists, implies AuthNoPriv
-2 : use snmp v2c
-X, --privpass=PASSWD
Priv password for snmpv3 (AuthPriv protocol)
-L, --protocols=<authproto>,<privproto>
<authproto> : Authentication protocol (md5|sha : default
md5)
<privproto> : Priv protocole (des|aes : default des)
-p, --port=PORT
SNMP port (Default 161)
-n, --name=NAME
Name of the process (regexp)
No trailing slash !
-r, --noregexp
Do not use regexp to match NAME in description OID
-f, --fullpath
Use full path name instead of process name
(Windows doesn't provide full path name)
-w, --warn=MIN[,MAX]
Number of process that will cause a warning
-1 for no warning, MAX must be >0. Ex : -w-1,50
-c, --critical=MIN[,MAX]
number of process that will cause an error (
-1 for no critical, MAX must be >0. Ex : -c-1,50
Notes on warning and critical :
with the following options : -w m1,x1 -c m2,x2
you must have : m2 <= m1 < x1 <= x2
you can omit x1 or x2 or both
-m, --memory=WARN,CRIT
checks memory usage (default max of all process)
values are warning and critical values in Mb
-a, --average
makes an average of memory used by process instead of max
-u, --cpu=WARN,CRIT
checks cpu usage of all process
values are warning and critical values in % of CPU usage
if more than one CPU, value can be > 100% : 100%=1 CPU
-g, --getall
In some cases, it is necessary to get all data at once because
process die very frequently.
This option eats bandwidth an cpu (for remote host) at breakfast.
-o, --octetlength=INTEGER
max-size of the SNMP message, usefull in case of Too Long
responses.
Be carefull with network filters. Range 484 - 65535, default
are
usually 1472,1452,1460 or 1440.
-t, --timeout=INTEGER
timeout for SNMP in seconds (Default: 5)
-V, --version
prints version number
Note :
CPU usage is in % of one cpu, so maximum can be 100% * number
of CPU
example :
Browse process list : <script> -C <community>
-H <host> -n <anything> -v
the -n option allows regexp in perl format :
All process of /opt/soft/bin : -n /opt/soft/bin/ -f
All 'named' process : -n named
|
|