764 lines
		
	
	
	
		
			24 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			764 lines
		
	
	
	
		
			24 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
<?xml version="1.0" ?>
 | 
						|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 | 
						|
<html xmlns="http://www.w3.org/1999/xhtml">
 | 
						|
<head>
 | 
						|
<title>ps-watcher - monitors various processes based on ps-like information.</title>
 | 
						|
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
 | 
						|
<link rev="made" href="mailto:root@localhost" />
 | 
						|
</head>
 | 
						|
 | 
						|
<body style="background-color: white">
 | 
						|
 | 
						|
<p><a name="__index__"></a></p>
 | 
						|
<!-- INDEX BEGIN -->
 | 
						|
 | 
						|
<ul>
 | 
						|
 | 
						|
	<li><a href="#name">NAME</a></li>
 | 
						|
	<li><a href="#synopsis">SYNOPSIS</a></li>
 | 
						|
	<li><a href="#description">DESCRIPTION</a></li>
 | 
						|
	<ul>
 | 
						|
 | 
						|
		<li><a href="#options">OPTIONS</a></li>
 | 
						|
		<li><a href="#configuration_file_modification_and_signal_handling">CONFIGURATION FILE MODIFICATION AND SIGNAL HANDLING</a></li>
 | 
						|
	</ul>
 | 
						|
 | 
						|
	<li><a href="#configuration_file_format">CONFIGURATION FILE FORMAT</a></li>
 | 
						|
	<ul>
 | 
						|
 | 
						|
		<li><a href="#expanded_variables_in_trigger_action_clauses">EXPANDED VARIABLES IN TRIGGER/ACTION CLAUSES</a></li>
 | 
						|
		<li><a href="#other_things_in_trigger_clauses">OTHER THINGS IN TRIGGER CLAUSES</a></li>
 | 
						|
	</ul>
 | 
						|
 | 
						|
	<li><a href="#example_configuration">EXAMPLE CONFIGURATION</a></li>
 | 
						|
	<li><a href="#using__prolog_for_getting_nonps_information">Using $PROLOG for getting non-ps information</a></li>
 | 
						|
	<li><a href="#security_considerations">SECURITY CONSIDERATIONS</a></li>
 | 
						|
	<li><a href="#troubleshooting">TROUBLESHOOTING</a></li>
 | 
						|
	<li><a href="#bugs">BUGS</a></li>
 | 
						|
	<li><a href="#see_also">SEE ALSO</a></li>
 | 
						|
	<li><a href="#author">AUTHOR</a></li>
 | 
						|
	<li><a href="#copyright">COPYRIGHT</a></li>
 | 
						|
</ul>
 | 
						|
<!-- INDEX END -->
 | 
						|
 | 
						|
<hr />
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<h1><a name="name">NAME</a></h1>
 | 
						|
<p>ps-watcher - monitors various processes based on ps-like information.</p>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="synopsis">SYNOPSIS</a></h1>
 | 
						|
<p><strong>ps-watcher</strong> [<em>options</em>...]
 | 
						|
            [<code>--config</code>] <em>config-file</em></p>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="description">DESCRIPTION</a></h1>
 | 
						|
<p>Periodically a list of processes obtained via <code>ps</code>. More precisely
 | 
						|
each item in the list contains the process name (just what's listed in
 | 
						|
the ``cmd'' field, not the full command and arguments) and its process
 | 
						|
id (pid). A configuration file specifies a list of Perl
 | 
						|
regular-expression patterns to match the processes against. For each
 | 
						|
match, a Perl expression specified for that pattern is evaluated. The
 | 
						|
evaluated expression can refer to variables which are set by ps and
 | 
						|
pertain to the matched process(es), for example the amount memory
 | 
						|
consumed by the process, or the total elapsed time. Some other
 | 
						|
variables are set by the program, such as the number of times the
 | 
						|
process is running. If the Perl expression for a matched pattern
 | 
						|
evaluates true, then an action can be run such as killing the program,
 | 
						|
restarting it, or mailing an alert, or running some arbitrary Perl
 | 
						|
code.</p>
 | 
						|
<p>Some things you might want to watch a daemon or process for:</p>
 | 
						|
<ul>
 | 
						|
<li>
 | 
						|
<p>check that it is running (hasn't died)</p>
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
<p>ensure it is not running too many times</p>
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
<p>isn't consuming too much memory (perhaps a memory leak), or I/O</p>
 | 
						|
</li>
 | 
						|
</ul>
 | 
						|
<p>Some actions you might want to take:</p>
 | 
						|
<ul>
 | 
						|
<li>
 | 
						|
<p>restart a process</p>
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
<p>kill off rampant processes</p>
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
<p>send an alert about any of the conditions listed above</p>
 | 
						|
</li>
 | 
						|
</ul>
 | 
						|
<p>Depending on options specfied, this program can be run as a daemon,
 | 
						|
run once (which is suitable as a <code>cron</code> job), or run not as a daemon
 | 
						|
but still continuously (which may be handy in testing the program or
 | 
						|
your configuration).</p>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<h2><a name="options">OPTIONS</a></h2>
 | 
						|
<dl>
 | 
						|
<dt><strong><a name="item__2d_2dhelp">--help</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>Print a usage message on standard error and exit with a return code
 | 
						|
of 100.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__2d_2ddoc">--doc</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>Extact the full documentation that you are reading now, print it and
 | 
						|
exit with a return code of 101.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__2d_2dversion">--version</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>Print the version release on standard output and exit with a return
 | 
						|
code of 10.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__2d_2ddebug_number">--debug <em>number</em></a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>Give debugging output. The higher the number, the more the output. The
 | 
						|
default is 0 = none. 2 is the most debugging output.</p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__5b_2d_2dconfig_5d_configuration_file">[--config] <em>configuration file</em></a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>Specify configuration file. .</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>See <a href="#configuration_file_format">CONFIGURATION FILE FORMAT</a> below for information on the format
 | 
						|
of the configuration file and <a href="#example_configuration">EXAMPLE CONFIGURATION</a> for a complete
 | 
						|
example of a configuration file.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__2d_2dlog__5blog_file_5d">--log [<em>log file</em>]</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>Send or don't send error and debugging output to a log file. If option
 | 
						|
is given but no logfile is specified, then use STDERR. The default is
 | 
						|
no error log file.  See also --syslog below.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__2d_2dsyslog__7c__2d_2dnosyslog">--syslog | --nosyslog</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>Send or don't send error and debugging output to syslog. The default
 | 
						|
is to syslog error and debug output.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__2d_2ddaemon__7c__2d_2dnodaemon">--daemon | --nodaemon</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>Run or don't as a daemon.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__2d_2dpath_search_2dpath">--path <em>search-path</em></a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>Specify the executable search path used in running commands.</p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__2d_2dps_2dprog_program">--ps-prog <em>program</em></a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>One can specify the command that gives ps information. By default, the
 | 
						|
command is <em>/bin/ps</em>.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__2d_2drun__7c__2d_2dnorun">--run | --norun</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>do/don't run actions go through the motions as though we were going
 | 
						|
to. This may be useful in debugging.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__2d_2dsleep_interval_in_seconds">--sleep <em>interval in seconds</em></a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>It is expected that one might want to run ps-watcher over and over
 | 
						|
again. In such instances one can specify the amount of time between
 | 
						|
iterations with this option.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>If a negative number is specified the program is run only once.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
</dl>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<h2><a name="configuration_file_modification_and_signal_handling">CONFIGURATION FILE MODIFICATION AND SIGNAL HANDLING</a></h2>
 | 
						|
<p>Periodically ps-watcher checks to see if the configuration file
 | 
						|
that it was run against has changed. If so, the program rereads the
 | 
						|
configuration file.</p>
 | 
						|
<p>More precisely, the checks are done after waking up from a slumber.
 | 
						|
If the sleep interval is long (or if you are impatient), you can
 | 
						|
probably force the program to wake up using a HUP signal.</p>
 | 
						|
<p>At any time you can increase the level of debug output by sending a
 | 
						|
USR1 signal to the ps-watcher process. Similarly you can decrease the
 | 
						|
level of debug output by sending the process a USR2 signal.</p>
 | 
						|
<p>It is recommended that you terminate ps-watcher via an INT, TERM, or QUIT
 | 
						|
signal.</p>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="configuration_file_format">CONFIGURATION FILE FORMAT</a></h1>
 | 
						|
<p>The format of a configuration file is a series of fully qualified
 | 
						|
filenames enclosed in square brackets followed by a number of
 | 
						|
parameter lines. Each parameter line has a parameter names followed by
 | 
						|
an ``equal'' sign and finally value. That is:</p>
 | 
						|
<pre>
 | 
						|
 # This is a comment line
 | 
						|
 ; So is this.
 | 
						|
 [process-pattern1]
 | 
						|
  parameter1 = value1
 | 
						|
  parameter2 = value2</pre>
 | 
						|
<pre>
 | 
						|
 [process-pattern2]
 | 
						|
  parameter1 = value3
 | 
						|
  parameter2 = value4</pre>
 | 
						|
<p>Comments start with # or ; and take effect to the end of the line.</p>
 | 
						|
<p>This should be familiar to those who have worked with text-readible
 | 
						|
Microsoft <code>.INI</code> files.</p>
 | 
						|
<p>Note process patterns, (<em>process-pattern1</em> and <em>process-pattern2</em>
 | 
						|
above) must be unique. If there are times when you may want to
 | 
						|
refer to the same process, one can be creative to make these unique.
 | 
						|
e.g. <em>cron</em> and <em>[c]ron</em> which refer to the same process even
 | 
						|
though they <em>appear</em> to be different.</p>
 | 
						|
<p>As quoted directly from the Config::IniFiles documentation:</p>
 | 
						|
<p>Multiline or multivalued fields may also be defined ala UNIX
 | 
						|
``here document'' syntax:</p>
 | 
						|
<pre>
 | 
						|
  Parameter=<<EOT
 | 
						|
  value/line 1
 | 
						|
  value/line 2
 | 
						|
  EOT</pre>
 | 
						|
<p>You may use any string you want in place of ``EOT''.  Note
 | 
						|
that what follows the ``<<'' and what appears at the end of
 | 
						|
the text <em>must</em> match exactly, including any trailing
 | 
						|
whitespace.</p>
 | 
						|
<p>There are two special ``process patterns'': $PROLOG and $EPILOG, the
 | 
						|
former should appear first and the latter last.</p>
 | 
						|
<p>You can put perl code to initialize variables here and do cleanup
 | 
						|
actions in these sections using ``perl-action.''</p>
 | 
						|
<p>A description of parameters names, their meanings and potential values
 | 
						|
follows.</p>
 | 
						|
<dl>
 | 
						|
<dt><strong><a name="item_trigger">trigger</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>This parameter specifies the condition on which a process action is
 | 
						|
fired.  The condition is evaluated with Perl <code>eval()</code> and should
 | 
						|
therefore return something which is equivalent to ``true'' in a Perl
 | 
						|
expression.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>If no trigger is given in a section, true or 1 is assumed and
 | 
						|
the action is unconditionally triggered.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>Example:</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
  # Match if httpd has not spawned enough (<4) times. NFS and databases
 | 
						|
  # daemons typically spawn child processes.  Since the program
 | 
						|
  # matches against the command names, not commands and arguments,
 | 
						|
  # something like: ps -ef | grep httpd won't match the below.
 | 
						|
  # If you want to match against the command with arguments, see
 | 
						|
  # the example with $args below.
 | 
						|
  [httpd$]
 | 
						|
  trigger = $count <= 4</pre>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item_occurs">occurs</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>This parameter specifies how many times an action should be performed
 | 
						|
on processes matching the section trigger. Acceptable values are
 | 
						|
``every'', ``first'', ``first-trigger'', and ``none''.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>Setting the occurs value to ``none'' causes the the trigger to be
 | 
						|
evaluated when there are no matching processes.  Although one might
 | 
						|
think ``$count == 0'' in the action expression would do the same thing,
 | 
						|
currently as coded this does not work.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>Setting the occurs value to ``first'' causes the process-pattern rule to
 | 
						|
be finished after handling the first rule that matches, whether or not the
 | 
						|
trigger evaluated to true.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>Setting the occurs value to ``first-trigger'' causes the process-pattern
 | 
						|
rule to be finished after handling the first rule that matches <em>and</em>
 | 
						|
the trigger evaluates to true.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>If the item parameter is not specified, ``first'' is assumed.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>Examples:</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
  [.]
 | 
						|
  occurs = first
 | 
						|
  action = echo "You have $count processes running"</pre>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
  # Note in the above since there is no trigger specified,
 | 
						|
  #   occurs = first
 | 
						|
  # is the same thing as 
 | 
						|
  #   occurs = first-trigger</pre>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
  [.?]
 | 
						|
  trigger = $vsz > 1000
 | 
						|
  occurs  = every
 | 
						|
  action  = echo "Large program $command matches $ps_pat: $vsz KB"</pre>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
  # Fire if /usr/sbin/syslogd is not running.
 | 
						|
  # Since the program matches against the command names, not commands and
 | 
						|
  # arguments, something like: 
 | 
						|
  #   ps -ef | grep /usr/sbin/syslogd
 | 
						|
  # won't match the below.
 | 
						|
  [(/usr/sbin/)?syslogd]
 | 
						|
  occurs = none
 | 
						|
  action = /etc/init.d/syslogd start</pre>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item_action">action</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>This specifies the action, a command that gets run by the system
 | 
						|
shell, when the trigger condition is evaluated to be true.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>Example:</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
 action = /etc/init.d/market_loader.init restart</pre>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item_perl_2daction">perl-action</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>This specifies Perl statements to be eval'd. This can be especially
 | 
						|
useful in conjunction with $PROLOG and $EPILOG sections to make tests
 | 
						|
across collections of process and do things which ps-watcher
 | 
						|
would otherwise not be able to do.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>Example:</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
  # A Perl variable initialization.
 | 
						|
  # Since ps-watcher runs as a daemon it's a good idea
 | 
						|
  # to (re)initialize variables before each run.
 | 
						|
  [$PROLOG]
 | 
						|
    perl-action = $root_procs=0;</pre>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
  # Keep track of how many root processes we are running
 | 
						|
  [.*]
 | 
						|
    perl-action = $root_procs++ if $uid == 0
 | 
						|
    occurs  = every</pre>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
  # Show this count.
 | 
						|
  [$EPILOG]
 | 
						|
    action  = echo "I counted $root_procs root processes"</pre>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
</dl>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<h2><a name="expanded_variables_in_trigger_action_clauses">EXPANDED VARIABLES IN TRIGGER/ACTION CLAUSES</a></h2>
 | 
						|
<p>Any variables defined in the program can be used in pattern or
 | 
						|
action parameters. For example, <code>$program</code> can be used to refer to 
 | 
						|
the name of this program ps-watcher.</p>
 | 
						|
<p>The following variables can be used in either the pattern or action
 | 
						|
fields.</p>
 | 
						|
<dl>
 | 
						|
<dt><strong><a name="item__action">$action</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>A string containing the text of the action to run.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__perl_action">$perl_action</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>A string containing the text of the perl_action to run.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__ps_pat">$ps_pat</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>The Perl regular expression specified in the beginning of the section.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__command">$command</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>The command that matched $ps_pat.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>The Perl regular expression specified in the beginning of the section.
 | 
						|
Normally processes will not have funny characters in them. Just in
 | 
						|
case, backticks in $command are escaped.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p>Example:</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
  # List processes other than emacs (which is a known pig) that use lots
 | 
						|
  # of virtual memory</pre>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<pre>
 | 
						|
  [.*]
 | 
						|
  trigger = $command !~ /emacs$/ && $vsz > 10
 | 
						|
  action  = echo \"Looks like you have a big \$command program: \$vsz KB\"</pre>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__count">$count</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>The number of times the pattern matched. Presumably the number of
 | 
						|
processes of this class running.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__trigger">$trigger</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>A string containing the text of the trigger.</p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
</dl>
 | 
						|
<p>A list of variables specific to this program or fields commonly found in
 | 
						|
<code>ps</code> output is listed below followed by a description of the more
 | 
						|
common ones. See also <code>ps</code> for a more complete
 | 
						|
description of the meaning of the field.</p>
 | 
						|
<pre>
 | 
						|
 uid euid ruid gid egid rgid alarm blocked bsdtime c caught 
 | 
						|
cputime drs dsiz egroup eip esp etime euser f fgid 
 | 
						|
fgroup flag flags fname fsgid fsgroup fsuid fsuser fuid fuser  
 | 
						|
group ignored intpri lim longtname m_drs m_trs maj_flt majflt 
 | 
						|
min_flt  minflt ni nice nwchan opri pagein pcpu pending pgid pgrp 
 | 
						|
pmem ppid pri rgroup rss rssize rsz ruser s sess session 
 | 
						|
sgi_p sgi_rss sgid sgroup sid sig sig_block sig_catch sig_ignore 
 | 
						|
sig_pend sigcatch sigignore sigmask stackp start start_stack start_time 
 | 
						|
stat state stime suid suser svgid svgroup svuid svuser sz time timeout 
 | 
						|
tmout tname tpgid trs trss tsiz tt tty tty4 tty8 uid_hack uname 
 | 
						|
user vsize vsz wchan</pre>
 | 
						|
<p>Beware though, in some situations ps can return multiple lines for a
 | 
						|
single process and we will use just one of these in the trigger. In
 | 
						|
particular, Solaris's <code>ps</code> will return a line for each LWP (light-weight
 | 
						|
process). So on Solaris, if a trigger uses variable lwp, it may or may
 | 
						|
not match depending on which single line of the multiple <code>ps</code> lines is
 | 
						|
used.</p>
 | 
						|
<p></p>
 | 
						|
<dl>
 | 
						|
<dt><strong><a name="item__args">$args</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>The command along with its command arguments. It is possible that this
 | 
						|
is might get truncated at certain length (if ps does likewise as is
 | 
						|
the case on Solaris).</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__ppid">$ppid</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>The parent process id.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__stime">$stime</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>The start time of the process.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__etime">$etime</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>The end time of the process.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__pmem">$pmem</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>The process memory.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__pcpu">$pcpu</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>The percent CPU utilization.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__tty">$tty</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>The controlling tty.</p>
 | 
						|
</dd>
 | 
						|
<dd>
 | 
						|
<p></p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
<dt><strong><a name="item__szv">$szv</a></strong>
 | 
						|
 | 
						|
<dd>
 | 
						|
<p>Virtual memory size of the process</p>
 | 
						|
</dd>
 | 
						|
</li>
 | 
						|
</dl>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<h2><a name="other_things_in_trigger_clauses">OTHER THINGS IN TRIGGER CLAUSES</a></h2>
 | 
						|
<p>To make testing against elapsed time easier, a function <code>elapse2sec()</code>
 | 
						|
has been written to parse and convert elapsed time strings in the 
 | 
						|
format <code>dd-hh:mm:ss</code> and a number of seconds.</p>
 | 
						|
<p>Some constants for the number of seconds in a minute, hour, or day
 | 
						|
have also been defined. These are referred to as <code>MINS</code>, <code>HOURS</code>,
 | 
						|
and <code>DAYS</code> respectively and they have the expected definitions:</p>
 | 
						|
<pre>
 | 
						|
  use constant MINS   => 60;
 | 
						|
  use constant HOURS  => 60*60;
 | 
						|
  use constant DAYS   => HOURS * 24;</pre>
 | 
						|
<p>Here is an example of the use of <code>elapsed2sec()</code>:</p>
 | 
						|
<pre>
 | 
						|
  # Which processes have been running for more than 3 hours?
 | 
						|
  # Also note use of builtin-function elapsed2secs, variable $etime
 | 
						|
  # and builtin-function HOURS
 | 
						|
  [.]
 | 
						|
    trigger = elapsed2secs('$etime') > 1*DAYS
 | 
						|
    action  = echo "$command has been running more than 1 day ($etime)"
 | 
						|
    occurs  = every</pre>
 | 
						|
<p>Please note the quotes around '$etime'.</p>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="example_configuration">EXAMPLE CONFIGURATION</a></h1>
 | 
						|
<pre>
 | 
						|
  # Comments start with # or ; and go to the end of the line.</pre>
 | 
						|
<pre>
 | 
						|
  # The format for each entry is in Microsoft .INI form:
 | 
						|
  # [process-pattern]
 | 
						|
  # trigger = perl-expression
 | 
						|
  # action  = program-and-arguments-to-run</pre>
 | 
						|
<pre>
 | 
						|
  [httpd$]
 | 
						|
    trigger = $count < 4
 | 
						|
    action  = echo "$trigger fired -- You have $count httpd sessions."</pre>
 | 
						|
<pre>
 | 
						|
  [.]
 | 
						|
  trigger = $vsz > 10
 | 
						|
  action  = echo "Looks like you have a big $command program: $vsz KB"</pre>
 | 
						|
<pre>
 | 
						|
  # Unfortunately we have use a different pattern below. (Here we use
 | 
						|
  # ".?" instead of ".".) In effect the the two patterns mean
 | 
						|
  # test every process.
 | 
						|
  [.?]
 | 
						|
    trigger = elapsed2secs('$etime') > 2*MINS && $pcpu > 40
 | 
						|
    occurs  = every
 | 
						|
    action  = <<EOT
 | 
						|
     echo "$command used $pcpu% CPU for the last $etime seconds" | /bin/mail root
 | 
						|
     kill -TERM $pid
 | 
						|
  EOT</pre>
 | 
						|
<pre>
 | 
						|
  # Scripts don't show as the script name as the command name on some
 | 
						|
  # operating systems.  Rather the name of the interpreter is listed
 | 
						|
  # (e.g. bash or perl) Here's how you can match against a script.
 | 
						|
  # BSD/OS is an exception: it does give the script name rather than
 | 
						|
  # the interpreter name.
 | 
						|
  [/usr/bin/perl]
 | 
						|
    trigger = \$args !~ /ps-watcher/
 | 
						|
    occurs  = every
 | 
						|
    action  = echo "***found perl program ${pid}:\n $args"</pre>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="using__prolog_for_getting_nonps_information">Using $PROLOG for getting non-ps information</a></h1>
 | 
						|
<p>Here is an example to show how to use ps-watcher to do something not
 | 
						|
really possible from ps: check to see if a <em>port</em> is active.  We make
 | 
						|
use of lsof to check port 3333 and the $PROLOG make sure it runs.</p>
 | 
						|
<pre>
 | 
						|
  [$PROLOG]
 | 
						|
    occurs  = first
 | 
						|
    trigger = { \$x=`lsof -i :3333 >/dev/null 2>&1`; \$? >> 8 }
 | 
						|
    action  = <<EOT
 | 
						|
    put-your-favorite-command-here arg1 arg2 ...
 | 
						|
  EOT</pre>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="security_considerations">SECURITY CONSIDERATIONS</a></h1>
 | 
						|
<p>Any daemon such as this one which is sufficiently flexible is a
 | 
						|
security risk. The configuration file allows arbitrary commands to be
 | 
						|
run. In particular if this daemon is run as root and the configuration
 | 
						|
file is not protected so that it can't be modified, a bad person could
 | 
						|
have their programs run as root.</p>
 | 
						|
<p>There's nothing in the ps command or ps-watcher, that requires one to
 | 
						|
run this daemon as root.</p>
 | 
						|
<p>So as with all daemons, one needs to take usual security precautions
 | 
						|
that a careful sysadmin/maintainer of a computer would. If you can run
 | 
						|
any daemon as an unprivileged user (or with no privileges), do it!  If
 | 
						|
not, set the permissions on the configuration file and the directory
 | 
						|
it lives in.</p>
 | 
						|
<p>This program can also run chrooted and there is a --path option that
 | 
						|
is available which can be used to set the executable search path.  All
 | 
						|
commands used by ps-watcher are fully qualified, and I generally give a
 | 
						|
full execution path in my configuration file, so consider using the
 | 
						|
option --path=''.</p>
 | 
						|
<p>Commands that need to be run as root you can run via sudo.  I often
 | 
						|
run process accounting which tracks all commands run. Tripwire may be
 | 
						|
useful to track changed configuration files.</p>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="troubleshooting">TROUBLESHOOTING</a></h1>
 | 
						|
<p>To debug a configuration file the following options are useful:</p>
 | 
						|
<p>ps-watcher --log --nodaemon --sleep -1 --debug 2 <em>configuration-file</em></p>
 | 
						|
<p>For even more information and control try running the above under the
 | 
						|
perl debugger, e.g.</p>
 | 
						|
<p>perl -d ps-watcher --log --nodaemon --sleep -1 --debug 2 <em>configuration-file</em></p>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="bugs">BUGS</a></h1>
 | 
						|
<p>Well, some of these are not so much a bug in ps-watcher so much as a
 | 
						|
challenge to getting ps-watcher to do what you want it to do.</p>
 | 
						|
<p>One common problem people run in into is understanding exactly what
 | 
						|
the process variables mean. The manual page <em>ps(1)</em> should be of
 | 
						|
help, but I've found some of the descriptions either a bit vague or
 | 
						|
just plain lacking.</p>
 | 
						|
<p>Sometimes one will see this error message when debug tracing is turned on:</p>
 | 
						|
<pre>
 | 
						|
  ** debug ** Something wrong getting ps variables</pre>
 | 
						|
<p>This just means that the process died betwee the time ps-watcher first
 | 
						|
saw the existence of the process and the time that it queried
 | 
						|
variables.</p>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="see_also">SEE ALSO</a></h1>
 | 
						|
<p>See also <em>ps(1)</em> and <em>syslogd(8)</em>.</p>
 | 
						|
<p>Another cool program doing ps-like things is <code>xps</code>. Well okay, it's
 | 
						|
another program I distributed. It shows the process tree dynamically
 | 
						|
updated using X Motif and tries to display the output ``attractively''
 | 
						|
but fast. You can the find the homepage at
 | 
						|
<a href="http://motif-pstree.sourceforge.net">http://motif-pstree.sourceforge.net</a> and it download via
 | 
						|
<a href="http://prdownloads.sourceforge.net/motif-pstree?sort_by=date&sort=desc">http://prdownloads.sourceforge.net/motif-pstree</a></p>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="author">AUTHOR</a></h1>
 | 
						|
<p>Rocky Bernstein (<a href="mailto:rocky@cpan.org">rocky@cpan.org</a>)</p>
 | 
						|
<p>
 | 
						|
</p>
 | 
						|
<hr />
 | 
						|
<h1><a name="copyright">COPYRIGHT</a></h1>
 | 
						|
<pre>
 | 
						|
  Copyright (C) 2000, 2002, 2003, 2004, 2005, 2006
 | 
						|
  Rocky Bernstein, email: rocky@cpan.org.
 | 
						|
  This program is free software; you can redistribute it and/or modify
 | 
						|
  it under the terms of the GNU General Public License as published by
 | 
						|
  the Free Software Foundation; either version 2 of the License, or
 | 
						|
  (at your option) any later version.</pre>
 | 
						|
<pre>
 | 
						|
  This program is distributed in the hope that it will be useful,
 | 
						|
  but WITHOUT ANY WARRANTY; without even the implied warranty of
 | 
						|
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 | 
						|
  GNU General Public License for more details.</pre>
 | 
						|
<pre>
 | 
						|
  You should have received a copy of the GNU General Public License
 | 
						|
  along with this program; if not, write to the Free Software
 | 
						|
  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.</pre>
 | 
						|
 | 
						|
</body>
 | 
						|
 | 
						|
</html>
 |