Sawtooth looking graphs from Cisco SNMP queries
I have been annoyed for quite some time by very regular pattern’s of “Spikiness ” that I run into on certain network graphs and wanted to share my findings to see if anyone else can explain this better than I can.
Below is a graph of what should be extremely consistent traffic. This graph is created by polling a SNMP OID once every thirty seconds. It is over a one hour time period.
My best theory is that internally to the Cisco device I am polling (in this case a Cisco ASR) the SNMP OID counters are only updated once every “x” amount of time, where “x” is maybe 10 seconds. Since I poll every 30 seconds I most often get three “updates” in between the time I poll the OID, but once every five minutes for some reason I only get two updates worth of data (20 seconds). Then in the next interval I get forty seconds worth of data.
If you have a better explanation than I do, please post a comment or email me and I will update this post with the answer!
-Eric

Hah. Cacti.
I seem to recall in the cactid docs they recommend NOT polling more often than every couple of minutes, and this might be why. Another possible explanation is that there is in fact a periodic burst transfer which happens, say, every 10 seconds, and the poller is catching (mean-1) bursts in one window, and (mean+1) in the next. That would explain the weird up/down patterns. The last peak, though, makes me think otherwise, as does your “constant traffic” assertion.
Best guess:
I presume you’re running this from cactid, in which case I should also note that there is no guarantee in the C or PHP poller that your queries will be run exactly at the same time, or maybe even in the same order. Notice that the peaks happen every five minutes on even timess. That’s the default polling interval for most Cacti data sources! I bet the poller is taking extra long to finish the extra load on those five-minute intervals, and isn’t getting to this particular query until later in the sampling period. If it’s 5 seconds late, more traffic would accrue and that you’d see periodic variance with the same -/+ spike pattern.
Under those conditions, your best bet would be to either relax and stop trying to poll so often–the relative margin of error is way smaller at normal time intervals–or to move this particular job to a standalone poller where it won’t be interrupted.
I actually was generating that graph using PRTG (Paessler Router Traffic Grapher). I have seen this behavior on other graphing implementations including the one built into the Cisco ASDM application.
I have seen this on lots of traffic flows that should be very steady state (including T-1 interfaces that physically can not burst above 1.544 megabits per second).
I think you are right about the mean-1 vs. mean+1 comment. I was thinking that the actual value that is polled is only updated every “x” amount of time even though the traffic flow is consistent.
-Eric
Replicate the same monitor, but change the polling interval to 60 seconds. See if the spikes change or continue to match the 5 minute PRTG averaging.
What is the exact OID you are collecting from?
Is the Cisco interface interval set to 30 seconds as well? By default, Cisco also has a 5-minute default interface averaging.
@Aphyr
Hello, did you come right. I have the same problem and change load intervals doesn’t make any difference.
Regards