[cricket-users] Spike suppression

From: Bert Driehuis (driehuis@playbeing.org)
Date: Fri Jan 14 2000 - 05:40:11 PST


From: Bert Driehuis <driehuis@playbeing.org>

When an SNMP agent restarts, this will show up as a spike on the Cricket
graphs because of the way RRDTOOL handles SNMP counter values. Of
course, they can be suppressed by defining a maximum value for the
datasource, but doing that sort of sucks because you need a very precise
understanding of the ranges that your data can take (which, in
particular for things like monitoring a Squid cache, if often
information you don't have at the outset).

The attached diff collects the agent uptime as part of the collection
process, and inserts a dummy record into RRD to clear the counter
history.

I have also attached an updated version of the CDEF fix I sent before
Y2K, this time using the proper PERL comparison operators.

If either of these patches makes it into Cricket, I'm willing to do the
documentation update for it as well.

There is more info in the diff files themselves.

Oh, and if you're stuck with existing rrd archives with spike, I've
quickly hacked together a script that can read an "rrdtool dump", ditch
the worst records, and writes an output .xml file suitable for reading
back with "rrdtool restore".

Cheers,

                                        -- Bert

-- 
Bert Driehuis -- driehuis@playbeing.org -- +31-20-3116119
The grand leap of the whale up the Fall of Niagara is esteemed, by all
who have seen it, as one of the finest spectacles in nature.
                -- Benjamin Franklin.
--------------------------- ONElist Sponsor ----------------------------

Hey Freelancers: Find your next project through JobSwarm! You can even make money in your sleep by referring friends. <a href=" http://clickme.onelist.com/ad/jobswarm1 ">Click Here</a>

------------------------------------------------------------------------

Spike suppression.

This patch addresses the problem of SNMP agent restarts. It works by requesting the agent uptime, and comparing it to the poll interval. If the uptime is less than the poll interval, it will inject a record of all "U" (unknown value) right before the current sample into the RRA. This makes sure that RRD will not store a bogus value for counters (gauges are not affected, as the record with the current timestamp will overwrite the unknown data).

When (and if) RRD gets a flag to clear the old counters, that should be used instead of a dummy record and Cricket should be modified to suit then. For now, this sure beats having to use a maximum defined range in RRD, because reality will eventually catch up with you if you define the range inappropriately.

It is imperative that the BER pretty timeticks misfeature is disabled, which is why that patch is also in here. Using pretty timeticks will bite us in the ass anyway, so we might as well bite the bullit and change it now.

To activate, add the appropriate OID to your Defaults files as follows:

target --default-- [...] snmp-uptime = 1.3.6.1.2.1.1.3.0

The reason it is not hardcoded is that some SNMP agents (e.g., the Squid agent) do not implement the default system.sysUpTime OID, and in the scenario of a host with multiple agents with possibly different uptimes, you want to specify which to use anyway (e.g., on a host with both a Squid SNMP agent and a UCD SNMP agent, with Squid forwarding unknown OID's to UCD SNMP, the UCD agent uptime is completely irrelevant to Squids counters).

Bert Driehuis, <driehuis@playbeing.org>

diff -rc2 cricket-0.71/collector ./collector *** cricket-0.71/collector Mon Jan 10 06:37:37 2000 --- ./collector Thu Jan 13 17:15:13 2000 *************** *** 158,162 **** } ! my(@data) = retrieveData($name, $target); if ($#data+1 == 0) { Warn("No data retrieved. Skipping RRD update."); --- 158,163 ---- } ! my $agent_restart = 0; ! my(@data) = retrieveData($name, $target, \$agent_restart); if ($#data+1 == 0) { Warn("No data retrieved. Skipping RRD update."); *************** *** 185,188 **** --- 186,202 ---- } + # If an SNMP agent restart occurred, insert a dummy record consisting + # of all "U" one second before the current sample. This causes RRD + # to set the previous value of counters to undefined, avoiding weird + # results when a counter goes negative because of a restart. + + if ($agent_restart && $when eq "N") { + my @dummyresults = @data2; + grep { $_ = 'U'} @dummyresults; + Info("Inserting dummy record because of agent restart"); + my $now = time(); + RRDs::update($datafile, join(":", $now - 1, @dummyresults)); + } + RRDs::update($datafile, join(":", $when, @data2)); if (my $error = RRDs::error()) { *************** *** 220,224 **** sub retrieveData { ! my($name, $target) = @_; my($tname) = $target->{'auto-target-name'}; --- 234,238 ---- sub retrieveData { ! my($name, $target, $restart_ref) = @_; my($tname) = $target->{'auto-target-name'}; *************** *** 282,285 **** --- 296,303 ---- } } + if (defined($target->{'snmp-uptime'})) { + my $ds = "--snmp://$target->{'snmp'}/$target->{'snmp-uptime'}"; + push @targetDSs, $ds; + } # this will hold a hash of ds-method names. the values will *************** *** 366,369 **** --- 384,400 ---- } + # Check the agent uptime with the poll interval. If the uptime is + # less then one poll interval, notify our caller of the restart. + + if (defined($target->{'snmp-uptime'})) { + my($agent_uptime) = pop @results; + my($poll) = $target->{'rrd-poll-interval'}; + $poll = 300 unless (defined($poll)); + if ($agent_uptime ne "U" && $agent_uptime < $poll * 100) { + Info("Agent uptime is less than poll interval"); + $$restart_ref = 1 if (defined($restart_ref)) + } + } + # if we are verifying, check the # fetched mapping key to make certain it's right *************** *** 391,395 **** # (this time there is no need to verify) delete($target->{'--verify-mapkey--'}); ! @results = retrieveData($name, $target); } else { # fill in all unknown, since the mapping key seems --- 422,426 ---- # (this time there is no need to verify) delete($target->{'--verify-mapkey--'}); ! @results = retrieveData($name, $target, undef); } else { # fill in all unknown, since the mapping key seems diff -rc2 cricket-0.71/lib/snmpUtils.pm ./lib/snmpUtils.pm *** cricket-0.71/lib/snmpUtils.pm Wed Jun 16 04:28:00 1999 --- ./lib/snmpUtils.pm Mon Dec 27 18:12:47 1999 *************** *** 16,19 **** --- 16,20 ---- $main::DEBUG = 0; } + $BER::pretty_print_timeticks = 0; my($err) = '';

An updated diff for handling self-referential RPN code for scaling, allowing statements like

graph --default-- scale = 1000000,GT,UNKN,ds#,IF

in the config file (with ds# replaced automagically with the proper datasource).

The previous diff got its math wrong (I used "gt" rather then ">", sigh).

To discuss: is "ds#" the proper syntax for "The current datasource"? Alternate suggestions: "this", "the_ds", "self".

Bert Driehuis <driehuis@Playbeing.org>

diff -rc2 cricket-0.71/grapher.cgi ./grapher.cgi *** cricket-0.71/grapher.cgi Fri Jan 14 10:25:20 2000 --- ./grapher.cgi Fri Jan 14 10:26:22 2000 *************** *** 835,839 **** if (defined($value) && !isnan($value) && defined($scale)) { my($rpn) = new RPN; ! my($res) = $rpn->run("$value,$scale"); if (! defined($res)) { --- 835,843 ---- if (defined($value) && !isnan($value) && defined($scale)) { my($rpn) = new RPN; ! my $cdefv = $scale; ! $cdefv =~ s/ds#/$value/g; ! my($res) = $rpn->run("$value,$cdefv"); ! Debug("RPN: $value $cdefv->$res\n)"; ! $res = "NaN" if $res eq "UNKN"; if (! defined($res)) { *************** *** 874,877 **** --- 878,885 ---- $scale = "1,*"; } + my $cdefv1 = $scale; + $cdefv1 =~ s/#/0/g; + my $cdefv2 = $scale; + $cdefv2 =~ s/#/1/g; my(@args) = ( *************** *** 879,886 **** "DEF:ds0=$rrdfile:ds$dsnum:AVERAGE", "DEF:ds1=$rrdfile:ds$dsnum:MAX", ! "CDEF:sds0=ds0,$scale", ! "CDEF:sds1=ds1,$scale", ! "PRINT:sds0:AVERAGE:\%lf", ! "PRINT:sds1:MAX:\%lf" ); ($mmax, undef, undef) = RRDs::graph @args; --- 887,894 ---- "DEF:ds0=$rrdfile:ds$dsnum:AVERAGE", "DEF:ds1=$rrdfile:ds$dsnum:MAX", ! "CDEF:sds0=ds0,$cdefv1", ! "CDEF:sds1=ds1,$cdefv2", ! "PRINT:sds0:AVERAGE:\%f", ! "PRINT:sds1:MAX:\%f" ); ($mmax, undef, undef) = RRDs::graph @args; *************** *** 1363,1368 **** my($mod) = $ct % $numDSs; if (defined($scale)) { ! push @cdefs, "CDEF:smx$ct=mx$ct,$scale" if ($mx); ! push @cdefs, "CDEF:sds$ct=ds$ct,$scale"; if ($isMTargetsOps) { if (!$linePushed[$mod]) { --- 1371,1378 ---- my($mod) = $ct % $numDSs; if (defined($scale)) { ! my $cdefv = $scale; ! $cdefv =~ s/#/$ct/; ! push @cdefs, "CDEF:smx$ct=mx$ct,$cdefv" if ($mx); ! push @cdefs, "CDEF:sds$ct=ds$ct,$cdefv"; if ($isMTargetsOps) { if (!$linePushed[$mod]) { *************** *** 1410,1414 **** $i++; my($nameme); ! if ($scaled{$dslist[$i %numDSs]}) { $nameme = "sds"; } else { --- 1420,1424 ---- $i++; my($nameme); ! if ($scaled{$dslist[$i % numDSs]}) { $nameme = "sds"; } else { diff -rc2 cricket-0.71/lib/RPN.pm ./lib/RPN.pm *** cricket-0.71/lib/RPN.pm Wed Jun 16 04:28:00 1999 --- ./lib/RPN.pm Thu Jan 13 21:09:39 2000 *************** *** 71,74 **** --- 71,97 ---- $self->push(undef); } + } elsif ($op eq 'IF') { + my($b) = $self->pop(); + my($a) = $self->pop(); + my($res) = $self->pop(); + return unless (defined($res) && defined($a) && defined($b)); + + if ($res) { + $self->push($a); + } else { + $self->push($b); + } + } elsif ($op eq 'LT') { + my($a) = $self->pop(); + my($b) = $self->pop(); + return unless (defined($a) && defined($b)); + + $self->push($b < $a ? 1 : 0); + } elsif ($op eq 'GT') { + my($a) = $self->pop(); + my($b) = $self->pop(); + return unless (defined($a) && defined($b)); + + $self->push($b > $a ? 1 : 0); } } *************** *** 91,95 **** my($item); foreach $item (split(/,/, $string)) { ! if ($item =~ /^[\+\*\/\-]/ || $item =~ /^log$/i) { $self->op($item); } else { --- 114,118 ---- my($item); foreach $item (split(/,/, $string)) { ! if ($item =~ /^[\+\*\/\-]/ || $item =~ /^log|lt|gt|if$/i) { $self->op($item); } else {




This archive was generated by hypermail 2b29 : Mon Mar 06 2000 - 19:01:10 PST