netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* update frequency for stats in /proc/net/dev
@ 2007-12-18 13:37 Mark Seger
  2007-12-18 14:57 ` Thomas Graf
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Seger @ 2007-12-18 13:37 UTC (permalink / raw)
  To: netdev

A number of years ago I had written a comprehensive system monitoring 
tool called collectl, which among other things allowed me to monitor 
network traffic in real-time display as well as log the data to a file.  
Furthermore, that file can be generated in a format suitable for 
plotting with gnuplot.  As it turned out, I would very frequently see 
spikes of 200MB/sec on my 1Gb link.  A colleague noticed the reason was 
because the network counters were being updated every 0.9765 seconds and 
this was causing the problem.  I don't know how long this problem 
existed but it was certainly there in 2.4 kernels.  As it turns out, my 
tool is capable of monitoring with a fractional frequency and I have 
been able to get good data in spite of this problem.  However, I've 
since noticed that now the stats are updated once a second but that also 
means when I process the data at 0.9765 I get the wrong numbers again.  
Clearly one answer is to just update the counters more frequently but I 
suspect that is not being done for reasons of performance.

Anyhow, I just wanted to let people know that ALL tools that monitor 
once a second on older counters will get the wrong numbers and tools 
that correct for the wrong number by using fractional intervals (and I 
suspect mine is the only one that does) but run on newer kernels will 
also get the wrong numbers.  In any event, if anyone is interested in 
trying out collectl - it monitors a  LOT more than just networks - you 
can snag a copy of from http://collectl.sourceforge.net/ if you'd like 
to take if for a drive.  The website has a lot of output examples to 
give you a better idea what it can do.  I even included a writeup about 
the odd network performance observations at 
http://collectl.sourceforge.net/NetworkStats.html

-mark



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: update frequency for stats in /proc/net/dev
  2007-12-18 13:37 update frequency for stats in /proc/net/dev Mark Seger
@ 2007-12-18 14:57 ` Thomas Graf
  2007-12-18 15:10   ` Mark Seger
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Graf @ 2007-12-18 14:57 UTC (permalink / raw)
  To: Mark Seger; +Cc: netdev

* Mark Seger <Mark.Seger@hp.com> 2007-12-18 08:37
> Anyhow, I just wanted to let people know that ALL tools that monitor 
> once a second on older counters will get the wrong numbers and tools 
> that correct for the wrong number by using fractional intervals (and I 
> suspect mine is the only one that does) but run on newer kernels will 
> also get the wrong numbers.  In any event, if anyone is interested in 
> trying out collectl - it monitors a  LOT more than just networks - you 
> can snag a copy of from http://collectl.sourceforge.net/ if you'd like 
> to take if for a drive.  The website has a lot of output examples to 
> give you a better idea what it can do.  I even included a writeup about 
> the odd network performance observations at 
> http://collectl.sourceforge.net/NetworkStats.html

I've solved this problem by using netlink to read the interface counters
ten times per second and maintain an own counter from which I calculate
the rate exactly once per second/minute/hour. The rate per second may
still be inaccurate to some degree, therefore I keep a history of 2-5
rates and take them into account to smoothen the result. This works
fairly well with _all_ operating systems.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: update frequency for stats in /proc/net/dev
  2007-12-18 14:57 ` Thomas Graf
@ 2007-12-18 15:10   ` Mark Seger
  0 siblings, 0 replies; 3+ messages in thread
From: Mark Seger @ 2007-12-18 15:10 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev



Thomas Graf wrote:
> * Mark Seger <Mark.Seger@hp.com> 2007-12-18 08:37
>   
>> Anyhow, I just wanted to let people know that ALL tools that monitor 
>> once a second on older counters will get the wrong numbers and tools 
>> that correct for the wrong number by using fractional intervals (and I 
>> suspect mine is the only one that does) but run on newer kernels will 
>> also get the wrong numbers.  In any event, if anyone is interested in 
>> trying out collectl - it monitors a  LOT more than just networks - you 
>> can snag a copy of from http://collectl.sourceforge.net/ if you'd like 
>> to take if for a drive.  The website has a lot of output examples to 
>> give you a better idea what it can do.  I even included a writeup about 
>> the odd network performance observations at 
>> http://collectl.sourceforge.net/NetworkStats.html
>>     
>
> I've solved this problem by using netlink to read the interface counters
> ten times per second and maintain an own counter from which I calculate
> the rate exactly once per second/minute/hour. The rate per second may
> still be inaccurate to some degree, therefore I keep a history of 2-5
> rates and take them into account to smoothen the result. This works
> fairly well with _all_ operating systems.
>   
I guess I'm not entirely sure what you're saying with respect to 10 
times/sec. Is this once very .1 secs or 10 times in rapid fire?  From a 
general purpose monitoring perspective, since I read hundreds of 
counters every second doing it 10 times/sec is way too much overhead and 
special processing for netowork counters would also be pretty painful.  
The general problem of the counters only changing once a second means 
you'll never do that well when you monitor close the the interval and 
you can't ever get accurate counters at lower rates.  In fact, if you 
try to treat the network counters like any other and if you monitor say 
every .2 seconds, you see a rate of 0 for 4 of the 5 intervals and 
500MB/sec for the 5th.
-mark



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-12-18 15:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-18 13:37 update frequency for stats in /proc/net/dev Mark Seger
2007-12-18 14:57 ` Thomas Graf
2007-12-18 15:10   ` Mark Seger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).