Writing great Nagios plugins

Posted on 18/03/

So you want to write a Nagios plugin, and you want it to be a great one! A great plugin, aside from having some great functionality is one that provides good documentation and fits nicely into the Nagios ecosystem, that is, that nagios users will be comfortable with it.

Right now you are thinking: "how can I do that? I have to look at other plugins, read guidelines, learn a lot about the nagios way to do things, and what the community expects from a plugin, etc. It's a quite big task, and I just wanted to write a quick and dirty plugin!"

If you program your plugins in perl you are a lucky man, because smart people have already done that for you! Nagios::Plugin helps you fit into that ecosystem and get a lot of functionality for the best cost: FREE, and get your plugins done in less time and with more features, with less bugs.

First step: Instance a Nagios::Plugin object

my $np = Nagios::Plugin->new(
        usage => "Usage: %s [-v|--verbose] [-t|--timeout=seconds] -c|--critical=<threshold>"
        version => 1.0,
        blurb => qq{Count the xxx's in yyy},
        extra => qq{
 -c 10
   returns CRITICAL if xxx's are greater than 10
 -c 20 -t 60
   returns CRITICAL if xxx's are greater than 20. Timeout in 60 seconds if it takes too long.},
        url => 'http://example.com'
);

You get:

  • standard parameters
    -V version info
    -h autogenerated help
    -v verbose output flag
    -t timeout
    nice features that you don't have to worry about, and that Nagios users will be very happy to have. Programs like Opsview will show the help on it's web interface (again... for free).

  • plugin versioning
    version and url get outputted for free (too) in help and -V
  • help text
    the help text consists of the version info, license (GPL if not overridden), blurb (text describing what the plugin does), parameter help list (autogenerated with the add_arg() info, and extra info. The extra info is the ideal place to give the user a couple of usage examples with a small description of what the invocation of the plugin with those parameters does.
That's a lot for one statement!

Second step: add your parameters

$np->add_arg(
    spec => 'warning|w=s',
    help => "-w, --warning=RANGE\n     Range for returning WARNING"
);
$np->add_arg(
    spec => 'number|n=i',
    help => "-n, --number=INTEGER\n     Number of yyy's to xxx",
    required => 1
);
$np->add_arg(
    spec => 'filter|f=s',
    help => "-f, --filter=aaa\n    Filter by aaa",
    default => 'aaa'
);

# Parse @ARGV and process standard arguments (e.g. usage, help, version)
$np->getopts;
You get free parameter type validation, so if you declare that a parameter is an integer, the plugin will not go past the $np->getopts statement. You also specify a string for each parameter that will be displayed when the user calls the plugin with --help. If you are going to have a critical and a warning threshold, tell the user that they are RANGE items (you'll see why below). Some standard parameter names are:
-c critical range
-w warning range
-C for parameters that start with "c" other than critical
-H hostname: for names of machines
-p port: for port numbers
-4 for using IPv4
-6 for using IPv6

Third step: do what your plugin does

Now you have to work (hey! you haven't broken a sweat yet!). To get the value of the parameters passed to your script, you have handy $np->opts->paramname accessors.

Fourth step: return performance data (it's free)

You have almost surely collected a measurable quantity to compare against a threshold. Output the recollected data via performance data. I'm sure you will want to see how your recollected data evolves through time with a nice graphing tool. Is it going up? down? is it high at work hours? is it low on weekends?

$np->add_perfdata(
    label => "size",
    value => $value,
    uom => "kB",
    warning => $np->opts->warning,
    critical => $np->opts->critical
);

Let UOM be:

  • no unit specified - assume a number (int or float) of things (eg, users, processes, load averages)
  • s - seconds (also us, ms)
  • % - percentage
  • B - bytes (also KB, MB, TB)
  • c - a continous counter (such as bytes transmitted on an interface)

Fifth step: return the status

Now you decide if the plugin has to return CRITICAL, WARNING or OK. This code quickly springs to mind:

if (recollected_value > critical)
    ...
elsif (recollected_value between critical and warning)
    ...
else
    ...

What if somebody wants OK between critical and warning? Again you can work less and get more: $np->check_threshold to the resue! Nagios has a RANGE specification that check_threshold understands so you can just pass the recollected value, the critical parameter and the warning parameter. You get the status that has to be returned calculated for free!

my $status = $np->check_threshold(
    check => $value,
    warning => $np->opts->warning,
    critical => $np->opts->critical
);

Now just return the calculated status and a little single line text with the exit method. Don't be too verbose, though, because the output gets cut!

$np->nagios_exit( $status, "$value xxx's where found" );

More neat (and free) details

  • verbosity
    $np->opts->verbose will return the number of -v flags in the parameters. Use it if you want to give the users a little more info (-vv or a little more (-vvv or a lot more)) :p.
  • Read the docs
    The docs will reveal all sorts of extra info. Read the helper classes (Nagios::Plugin::Xxx) documentation too, because not everything is exposed in the Nagios::Plugin documentation ;)

Resuming

Nagios::Plugin will save your time, and make your plugins better, with less effort.