* update frequency for stats in /proc/net/dev
@ 2007-12-18 13:37 Mark Seger
2007-12-18 14:57 ` Thomas Graf
0 siblings, 1 reply; 3+ messages in thread
From: Mark Seger @ 2007-12-18 13:37 UTC (permalink / raw)
To: netdev
A number of years ago I had written a comprehensive system monitoring
tool called collectl, which among other things allowed me to monitor
network traffic in real-time display as well as log the data to a file.
Furthermore, that file can be generated in a format suitable for
plotting with gnuplot. As it turned out, I would very frequently see
spikes of 200MB/sec on my 1Gb link. A colleague noticed the reason was
because the network counters were being updated every 0.9765 seconds and
this was causing the problem. I don't know how long this problem
existed but it was certainly there in 2.4 kernels. As it turns out, my
tool is capable of monitoring with a fractional frequency and I have
been able to get good data in spite of this problem. However, I've
since noticed that now the stats are updated once a second but that also
means when I process the data at 0.9765 I get the wrong numbers again.
Clearly one answer is to just update the counters more frequently but I
suspect that is not being done for reasons of performance.
Anyhow, I just wanted to let people know that ALL tools that monitor
once a second on older counters will get the wrong numbers and tools
that correct for the wrong number by using fractional intervals (and I
suspect mine is the only one that does) but run on newer kernels will
also get the wrong numbers. In any event, if anyone is interested in
trying out collectl - it monitors a LOT more than just networks - you
can snag a copy of from http://collectl.sourceforge.net/ if you'd like
to take if for a drive. The website has a lot of output examples to
give you a better idea what it can do. I even included a writeup about
the odd network performance observations at
http://collectl.sourceforge.net/NetworkStats.html
-mark
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: update frequency for stats in /proc/net/dev
2007-12-18 13:37 update frequency for stats in /proc/net/dev Mark Seger
@ 2007-12-18 14:57 ` Thomas Graf
2007-12-18 15:10 ` Mark Seger
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Graf @ 2007-12-18 14:57 UTC (permalink / raw)
To: Mark Seger; +Cc: netdev
* Mark Seger <Mark.Seger@hp.com> 2007-12-18 08:37
> Anyhow, I just wanted to let people know that ALL tools that monitor
> once a second on older counters will get the wrong numbers and tools
> that correct for the wrong number by using fractional intervals (and I
> suspect mine is the only one that does) but run on newer kernels will
> also get the wrong numbers. In any event, if anyone is interested in
> trying out collectl - it monitors a LOT more than just networks - you
> can snag a copy of from http://collectl.sourceforge.net/ if you'd like
> to take if for a drive. The website has a lot of output examples to
> give you a better idea what it can do. I even included a writeup about
> the odd network performance observations at
> http://collectl.sourceforge.net/NetworkStats.html
I've solved this problem by using netlink to read the interface counters
ten times per second and maintain an own counter from which I calculate
the rate exactly once per second/minute/hour. The rate per second may
still be inaccurate to some degree, therefore I keep a history of 2-5
rates and take them into account to smoothen the result. This works
fairly well with _all_ operating systems.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: update frequency for stats in /proc/net/dev
2007-12-18 14:57 ` Thomas Graf
@ 2007-12-18 15:10 ` Mark Seger
0 siblings, 0 replies; 3+ messages in thread
From: Mark Seger @ 2007-12-18 15:10 UTC (permalink / raw)
To: Thomas Graf; +Cc: netdev
Thomas Graf wrote:
> * Mark Seger <Mark.Seger@hp.com> 2007-12-18 08:37
>
>> Anyhow, I just wanted to let people know that ALL tools that monitor
>> once a second on older counters will get the wrong numbers and tools
>> that correct for the wrong number by using fractional intervals (and I
>> suspect mine is the only one that does) but run on newer kernels will
>> also get the wrong numbers. In any event, if anyone is interested in
>> trying out collectl - it monitors a LOT more than just networks - you
>> can snag a copy of from http://collectl.sourceforge.net/ if you'd like
>> to take if for a drive. The website has a lot of output examples to
>> give you a better idea what it can do. I even included a writeup about
>> the odd network performance observations at
>> http://collectl.sourceforge.net/NetworkStats.html
>>
>
> I've solved this problem by using netlink to read the interface counters
> ten times per second and maintain an own counter from which I calculate
> the rate exactly once per second/minute/hour. The rate per second may
> still be inaccurate to some degree, therefore I keep a history of 2-5
> rates and take them into account to smoothen the result. This works
> fairly well with _all_ operating systems.
>
I guess I'm not entirely sure what you're saying with respect to 10
times/sec. Is this once very .1 secs or 10 times in rapid fire? From a
general purpose monitoring perspective, since I read hundreds of
counters every second doing it 10 times/sec is way too much overhead and
special processing for netowork counters would also be pretty painful.
The general problem of the counters only changing once a second means
you'll never do that well when you monitor close the the interval and
you can't ever get accurate counters at lower rates. In fact, if you
try to treat the network counters like any other and if you monitor say
every .2 seconds, you see a rate of 0 for 4 of the 5 intervals and
500MB/sec for the 5th.
-mark
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-12-18 15:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-18 13:37 update frequency for stats in /proc/net/dev Mark Seger
2007-12-18 14:57 ` Thomas Graf
2007-12-18 15:10 ` Mark Seger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).