netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Network latency regressions from 2.6.22 to 2.6.29
@ 2009-04-16 16:10 Christoph Lameter
  2009-04-16 17:21 ` Rick Jones
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Christoph Lameter @ 2009-04-16 16:10 UTC (permalink / raw)
  To: netdev

The following are results of lantency measurements using udpping
(available from http://gentwo.org/ll). It shows that significant latencies
were added since 2.6.27. I surely wish we could get back to times below 90
microseconds.

The tests were done over 1G ethernet using
09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5754
Gigabit Ethernet PCI Express (rev 02)

Results:

           2.6.22 2.6.23 2.6.24 2.6.25 2.6.26 2.6.27 2.6.28 2.6.29
40 Bytes    89.50  90.75  89.61  91.51  91.89  99.17  99.80  99.34
400 Bytes   98.58 101.44  97.85  99.61 100.36 117.96 118.10 126.79
1400 Bytes 152.76 153.75 153.85	156.22 156.66 163.92 165.54 166.04

Compared to 2.6.22 2.6.23 2.6.24 2.6.25 2.6.26 2.6.27 2.6.28 2.6.29
40 Bytes            -1.4%  -0.1%  -2.2%  -2.6%  -9.8%  -10.3% -9.9%
400 Bytes           -2.8%   0.7%  -1.0%  -1.8% -16.4%  -16.5%-22.2%
1400 Bytes          -0.6%  -0.7%  -2.2%  -2.5%  -6.8%  -7.7%  -8.0%

I presented these numbers with some nice graphs at the Linux Collab Summit
last week.

See
http://www.kernel.org/pub/linux/kernel/people/christoph/collab-spring-2009/Collab-summit-2009-sf.pdf


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 16:10 Network latency regressions from 2.6.22 to 2.6.29 Christoph Lameter
@ 2009-04-16 17:21 ` Rick Jones
  2009-04-16 19:06   ` Christoph Lameter
  2009-04-16 18:07 ` Ben Hutchings
  2009-04-16 19:43 ` Eric Dumazet
  2 siblings, 1 reply; 36+ messages in thread
From: Rick Jones @ 2009-04-16 17:21 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev

Christoph Lameter wrote:
> The following are results of lantency measurements using udpping
> (available from http://gentwo.org/ll). It shows that significant latencies
> were added since 2.6.27. I surely wish we could get back to times below 90
> microseconds.
> 
> The tests were done over 1G ethernet using
> 09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5754
> Gigabit Ethernet PCI Express (rev 02)
> 
> Results:
> 
>            2.6.22 2.6.23 2.6.24 2.6.25 2.6.26 2.6.27 2.6.28 2.6.29
> 40 Bytes    89.50  90.75  89.61  91.51  91.89  99.17  99.80  99.34
> 400 Bytes   98.58 101.44  97.85  99.61 100.36 117.96 118.10 126.79
> 1400 Bytes 152.76 153.75 153.85	156.22 156.66 163.92 165.54 166.04
> 
> Compared to 2.6.22 2.6.23 2.6.24 2.6.25 2.6.26 2.6.27 2.6.28 2.6.29
> 40 Bytes            -1.4%  -0.1%  -2.2%  -2.6%  -9.8%  -10.3% -9.9%
> 400 Bytes           -2.8%   0.7%  -1.0%  -1.8% -16.4%  -16.5%-22.2%
> 1400 Bytes          -0.6%  -0.7%  -2.2%  -2.5%  -6.8%  -7.7%  -8.0%
> 
> I presented these numbers with some nice graphs at the Linux Collab Summit
> last week.
> 
> See
> http://www.kernel.org/pub/linux/kernel/people/christoph/collab-spring-2009/Collab-summit-2009-sf.pdf

Does udpping have a concept of service demand a la netperf?  That could help show 
how much was code bloat vs say some tweak to interrupt coalescing parameters in 
the NIC/driver.

[root@bl870c1 netperf2_trunk]# netperf -T 0 -c -C -t UDP_RR -H bl870c2.west -v 2
UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to bl870c2.west 
(10.208.0.210) port 0 AF_INET : histogram : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

126976 126976 1       1      10.00   7550.46   2.33   2.41   24.721  25.551
126976 126976

Histogram of request/reponse times.
UNIT_USEC     :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
TEN_USEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    1
HUNDRED_USEC  :    0: 75508:    0:    0:    0:    0:    0:    0:    0:    0
UNIT_MSEC     :    0:    2:    0:    0:    0:    0:    0:    0:    0:    0
TEN_MSEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
HUNDRED_MSEC  :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
UNIT_SEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
TEN_SEC       :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
 >100_SECS: 0
HIST_TOTAL:      75511

I guess one of these days I have to tweak netperf to be able to get latencies 
when doing bursts on the connection...

[root@bl870c1 netperf2_trunk]# ethtool -C eth0 rx-frames 1
[do the same on the other end]
[root@bl870c1 netperf2_trunk]# netperf -T 0 -c -C -t UDP_RR -H bl870c2.west -v 2
UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to bl870c2.west 
(10.208.0.210) port 0 AF_INET : histogram : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

126976 126976 1       1      10.00   11126.15   2.60   3.43   18.711  24.633
126976 126976

Histogram of request/reponse times.
UNIT_USEC     :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
TEN_USEC      :    0:    0:    0:    0:    0:    0:    0:   16: 71799: 38583
HUNDRED_USEC  :    0:  856:    8:    0:    0:    0:    0:    0:    0:    0
UNIT_MSEC     :    0:    1:    0:    0:    0:    0:    1:    0:    0:    0
TEN_MSEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
HUNDRED_MSEC  :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
UNIT_SEC      :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
TEN_SEC       :    0:    0:    0:    0:    0:    0:    0:    0:    0:    0
 >100_SECS: 0
HIST_TOTAL:      111264

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 16:10 Network latency regressions from 2.6.22 to 2.6.29 Christoph Lameter
  2009-04-16 17:21 ` Rick Jones
@ 2009-04-16 18:07 ` Ben Hutchings
  2009-04-16 19:02   ` Christoph Lameter
  2009-04-16 19:43 ` Eric Dumazet
  2 siblings, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2009-04-16 18:07 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev

On Thu, 2009-04-16 at 12:10 -0400, Christoph Lameter wrote:
> The following are results of lantency measurements using udpping
> (available from http://gentwo.org/ll). It shows that significant latencies
> were added since 2.6.27. I surely wish we could get back to times below 90
> microseconds.
[...]

This "90 microseconds" figure is specific to a particular driver and
hardware.  Have you verified that it applies to others?  The variation
you've reported is tiny compared to the improvements that can be made or
lost by hardware tuning.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 18:07 ` Ben Hutchings
@ 2009-04-16 19:02   ` Christoph Lameter
  2009-04-16 21:19     ` Ben Hutchings
  2009-04-16 22:47     ` David Miller
  0 siblings, 2 replies; 36+ messages in thread
From: Christoph Lameter @ 2009-04-16 19:02 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev

On Thu, 16 Apr 2009, Ben Hutchings wrote:

> On Thu, 2009-04-16 at 12:10 -0400, Christoph Lameter wrote:
> > The following are results of lantency measurements using udpping
> > (available from http://gentwo.org/ll). It shows that significant latencies
> > were added since 2.6.27. I surely wish we could get back to times below 90
> > microseconds.
> [...]
>
> This "90 microseconds" figure is specific to a particular driver and
> hardware.  Have you verified that it applies to others?  The variation
> you've reported is tiny compared to the improvements that can be made or
> lost by hardware tuning.

The RX delay can influence this of course and so can the driver. But the
measurements are with the same RX delay and the same driver. Variations
are up to 20% which is not tiny and its only due to different kernel
versions.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 17:21 ` Rick Jones
@ 2009-04-16 19:06   ` Christoph Lameter
  2009-04-16 19:29     ` Eric Dumazet
  2009-04-16 20:05     ` Rick Jones
  0 siblings, 2 replies; 36+ messages in thread
From: Christoph Lameter @ 2009-04-16 19:06 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

On Thu, 16 Apr 2009, Rick Jones wrote:

> Does udpping have a concept of service demand a la netperf?  That could help
> show how much was code bloat vs say some tweak to interrupt coalescing
> parameters in the NIC/driver.

No. What does service on demand mean? The ping pong tests are very simple
back and forths without any streaming or overlay.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:06   ` Christoph Lameter
@ 2009-04-16 19:29     ` Eric Dumazet
  2009-04-16 19:33       ` Christoph Lameter
  2009-04-16 20:05     ` Rick Jones
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2009-04-16 19:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Rick Jones, netdev

Christoph Lameter a écrit :
> On Thu, 16 Apr 2009, Rick Jones wrote:
> 
>> Does udpping have a concept of service demand a la netperf?  That could help
>> show how much was code bloat vs say some tweak to interrupt coalescing
>> parameters in the NIC/driver.
> 
> No. What does service on demand mean? The ping pong tests are very simple
> back and forths without any streaming or overlay.
> --

Would be good to measure a ping flood, because this one doesnt
need wakeup of a user process on the remote side.


$ ping -f 192.168.20.112 -c 10000
PING 192.168.20.112 (192.168.20.112) 56(84) bytes of data.

--- 192.168.20.112 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 3337ms
rtt min/avg/max/mdev = 0.072/0.184/20.011/1.389 ms, pipe 2, ipg/ewma 0.333/0.078 ms

and check if various kernel versions already give you different values (for avg)

When you perform your test, you upgrade kernels of both machines ?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:29     ` Eric Dumazet
@ 2009-04-16 19:33       ` Christoph Lameter
  2009-04-16 22:47         ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2009-04-16 19:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev

On Thu, 16 Apr 2009, Eric Dumazet wrote:

> When you perform your test, you upgrade kernels of both machines ?

Yes. Both machines always run the same kernel version.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 16:10 Network latency regressions from 2.6.22 to 2.6.29 Christoph Lameter
  2009-04-16 17:21 ` Rick Jones
  2009-04-16 18:07 ` Ben Hutchings
@ 2009-04-16 19:43 ` Eric Dumazet
  2009-04-16 19:50   ` Eric Dumazet
  2009-04-16 19:55   ` Network latency regressions from 2.6.22 to 2.6.29 Christoph Lameter
  2 siblings, 2 replies; 36+ messages in thread
From: Eric Dumazet @ 2009-04-16 19:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev

Christoph Lameter a écrit :
> The following are results of lantency measurements using udpping
> (available from http://gentwo.org/ll). It shows that significant latencies
> were added since 2.6.27. I surely wish we could get back to times below 90
> microseconds.
> 
> The tests were done over 1G ethernet using
> 09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5754
> Gigabit Ethernet PCI Express (rev 02)
> 
> Results:
> 
>            2.6.22 2.6.23 2.6.24 2.6.25 2.6.26 2.6.27 2.6.28 2.6.29
> 40 Bytes    89.50  90.75  89.61  91.51  91.89  99.17  99.80  99.34
> 400 Bytes   98.58 101.44  97.85  99.61 100.36 117.96 118.10 126.79
> 1400 Bytes 152.76 153.75 153.85	156.22 156.66 163.92 165.54 166.04
> 
> Compared to 2.6.22 2.6.23 2.6.24 2.6.25 2.6.26 2.6.27 2.6.28 2.6.29
> 40 Bytes            -1.4%  -0.1%  -2.2%  -2.6%  -9.8%  -10.3% -9.9%
> 400 Bytes           -2.8%   0.7%  -1.0%  -1.8% -16.4%  -16.5%-22.2%
> 1400 Bytes          -0.6%  -0.7%  -2.2%  -2.5%  -6.8%  -7.7%  -8.0%
> 
> I presented these numbers with some nice graphs at the Linux Collab Summit
> last week.
> 
> See
> http://www.kernel.org/pub/linux/kernel/people/christoph/collab-spring-2009/Collab-summit-2009-sf.pdf
> 

Is it a bnx2 or tg3 driver ?

copybreak feature could explain a regression for small packets, since
an extra copy is done.

It would be sane to always set on both machines :
ethtool -C eth0 rx-usecs 0 rx-frames 1
prior to your tests...



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:43 ` Eric Dumazet
@ 2009-04-16 19:50   ` Eric Dumazet
  2009-04-16 20:01     ` Christoph Lameter
  2009-04-16 19:55   ` Network latency regressions from 2.6.22 to 2.6.29 Christoph Lameter
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2009-04-16 19:50 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev

Eric Dumazet a écrit :
> Christoph Lameter a écrit :
>> The following are results of lantency measurements using udpping
>> (available from http://gentwo.org/ll). It shows that significant latencies
>> were added since 2.6.27. I surely wish we could get back to times below 90
>> microseconds.
>>
>> The tests were done over 1G ethernet using
>> 09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5754
>> Gigabit Ethernet PCI Express (rev 02)
>>
>> Results:
>>
>>            2.6.22 2.6.23 2.6.24 2.6.25 2.6.26 2.6.27 2.6.28 2.6.29
>> 40 Bytes    89.50  90.75  89.61  91.51  91.89  99.17  99.80  99.34
>> 400 Bytes   98.58 101.44  97.85  99.61 100.36 117.96 118.10 126.79
>> 1400 Bytes 152.76 153.75 153.85	156.22 156.66 163.92 165.54 166.04
>>
>> Compared to 2.6.22 2.6.23 2.6.24 2.6.25 2.6.26 2.6.27 2.6.28 2.6.29
>> 40 Bytes            -1.4%  -0.1%  -2.2%  -2.6%  -9.8%  -10.3% -9.9%
>> 400 Bytes           -2.8%   0.7%  -1.0%  -1.8% -16.4%  -16.5%-22.2%
>> 1400 Bytes          -0.6%  -0.7%  -2.2%  -2.5%  -6.8%  -7.7%  -8.0%
>>
>> I presented these numbers with some nice graphs at the Linux Collab Summit
>> last week.
>>
>> See
>> http://www.kernel.org/pub/linux/kernel/people/christoph/collab-spring-2009/Collab-summit-2009-sf.pdf
>>
> 
> Is it a bnx2 or tg3 driver ?
> 
> copybreak feature could explain a regression for small packets, since
> an extra copy is done.
> 
> It would be sane to always set on both machines :
> ethtool -C eth0 rx-usecs 0 rx-frames 1
> prior to your tests...
> 

Also you might check /proc/interrupts, because recent linux versions are
distributing interrupts on all cpus instead of one on previous version

Old kernel :
# grep eth1 /proc/interrupts
 58:   25754498          0          0          0          0          0          0          0         PCI-MSI  eth1


New kernel :
# grep eth1 /proc/interrupts
221:   16476622   16478581   16478007   16478793   16478237   16472191   16474083   16476374   PCI-MSI-edge      eth1

(and /proc/irq/221/smp_affinity is not taken into account on some NICS)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:43 ` Eric Dumazet
  2009-04-16 19:50   ` Eric Dumazet
@ 2009-04-16 19:55   ` Christoph Lameter
  2009-04-16 21:57     ` Michael Chan
  2009-04-16 22:59     ` David Miller
  1 sibling, 2 replies; 36+ messages in thread
From: Christoph Lameter @ 2009-04-16 19:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Thu, 16 Apr 2009, Eric Dumazet wrote:

> Is it a bnx2 or tg3 driver ?

bnx2.

> It would be sane to always set on both machines :
> ethtool -C eth0 rx-usecs 0 rx-frames 1
> prior to your tests...

rx-usecs is set to 18 by default by the driver and rx-frames is set to 6.
We saw some funky behavior with rx-usecs 0 so we left the defaults.

But what does it matter? Just changing the kernel should not cause
increases in latency.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:50   ` Eric Dumazet
@ 2009-04-16 20:01     ` Christoph Lameter
  2009-04-16 23:00       ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2009-04-16 20:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Thu, 16 Apr 2009, Eric Dumazet wrote:

> Also you might check /proc/interrupts, because recent linux versions are
> distributing interrupts on all cpus instead of one on previous version

Argh. I just checked it and the systems used for the test had no irq
pinning. We are rerunning the series.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:06   ` Christoph Lameter
  2009-04-16 19:29     ` Eric Dumazet
@ 2009-04-16 20:05     ` Rick Jones
  1 sibling, 0 replies; 36+ messages in thread
From: Rick Jones @ 2009-04-16 20:05 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev

Christoph Lameter wrote:
> On Thu, 16 Apr 2009, Rick Jones wrote:
> 
> 
>>Does udpping have a concept of service demand a la netperf?  That could help
>>show how much was code bloat vs say some tweak to interrupt coalescing
>>parameters in the NIC/driver.
> 
> 
> No. What does service on demand mean? The ping pong tests are very simple
> back and forths without any streaming or overlay.

It is a measure of efficiency - quantity of CPU consumed per unit of work.  For 
example, from my previous email:

UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to bl870c2.west 
(10.208.0.210) port 0 AF_INET : histogram : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

126976 126976 1       1      10.00   7550.46   2.33   2.41   24.721  25.551
126976 126976

The transaction rate was 7550 (invert for the latency) and service demand was 
24.721 (give or take :) microseconds of CPU time consumed per transaction on the 
one side and 25.551 on the other (identical systems and kernels).  If we make the 
handwaving assumption that virtually all the CPU consumption on either side is in 
the latency path, and calculate the overall latency we have:

overall: 132.44 microseconds per transaction
CPU time: 50.27
other:    80.17

With "other" being such a large component, it is a tip-off (not a slam dunk, but 
a big clue, that there was a sub-standard interrupt avoidance mechanism at work. 
  Even if we calculate the transmission time on the wire for the request and 
response - 1 byte payload, 8 bytes UDP header, 20 bytes IPv4 header, 14 bytes 
Ethernet header - 344 bits or 688 both request and response (does full-duplex GbE 
still enforce the 60 byte minimum?  I forget) we have:

wiretime: 0.69

and even if DMA time were twice that, there is still 75+ microseconds 
unaccounted.  Smells like a timer running in a NIC.  And/or some painfully slow 
firmware on the NIC.

If the latency were constrained almost entirely by the CPU consumption in the 
case above, the transaction rate should have been more like 19000 transactions 
per second.  And with those two systems, with a different, 10G NIC installed (not 
that 10G speed is required for single-byte _RR), I've seen 20,000 
transactions/second.

That is with netperf/netserver running on the same CPU as taking the NIC 
interrupt(s).  When running on a different core from the interrupt(s) the 
cache-to-cache traffic dropped the transaction rate by 30% (a 2.6.18esque kernel, 
but I've seen similar stuff elsewhere).

So, when looking for latency regressions, there can be a lot of variables.

rick jones

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:02   ` Christoph Lameter
@ 2009-04-16 21:19     ` Ben Hutchings
  2009-04-16 22:47     ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: Ben Hutchings @ 2009-04-16 21:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev

On Thu, 2009-04-16 at 15:02 -0400, Christoph Lameter wrote:
> On Thu, 16 Apr 2009, Ben Hutchings wrote:
> 
> > On Thu, 2009-04-16 at 12:10 -0400, Christoph Lameter wrote:
> > > The following are results of lantency measurements using udpping
> > > (available from http://gentwo.org/ll). It shows that significant latencies
> > > were added since 2.6.27. I surely wish we could get back to times below 90
> > > microseconds.
> > [...]
> >
> > This "90 microseconds" figure is specific to a particular driver and
> > hardware.  Have you verified that it applies to others?  The variation
> > you've reported is tiny compared to the improvements that can be made or
> > lost by hardware tuning.
> 
> The RX delay can influence this of course and so can the driver. But the
> measurements are with the same RX delay and the same driver. Variations
> are up to 20% which is not tiny and its only due to different kernel
> versions.

I just ran netperf UDP_RR against sfc (the out-of-tree version, so that
driver changes should not be a factor) on "my" test rig (a couple of
servers from 2005; don't quote these figures).  The transaction rates
were:

2.6.22:   38584.58
2.6.27:   35312.01
2.6.29.1: 38006.50

So for this hardware and driver, 2.6.27 has slightly _higher_ latency.
This is why I'm asking whether you tested with multiple drivers, so you
are not measuring changes in tg3 (or whichever driver it was).

Note also that 1G Ethernet is hardly representative of HPC hardware.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:55   ` Network latency regressions from 2.6.22 to 2.6.29 Christoph Lameter
@ 2009-04-16 21:57     ` Michael Chan
  2009-04-17 13:47       ` Christoph Lameter
  2009-04-16 22:59     ` David Miller
  1 sibling, 1 reply; 36+ messages in thread
From: Michael Chan @ 2009-04-16 21:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, netdev@vger.kernel.org


On Thu, 2009-04-16 at 12:55 -0700, Christoph Lameter wrote:
> On Thu, 16 Apr 2009, Eric Dumazet wrote:
> 
> > Is it a bnx2 or tg3 driver ?
> 
> bnx2.
> 
> > It would be sane to always set on both machines :
> > ethtool -C eth0 rx-usecs 0 rx-frames 1
> > prior to your tests...
> 
> rx-usecs is set to 18 by default by the driver and rx-frames is set to 6.
> We saw some funky behavior with rx-usecs 0 so we left the defaults.

We haven't changed these defaults since the beginning, but firmware
updates between 2.6.22 and 2.6.29 can affect latency.

Yes, Eric is right, using rx-frames 1 will give the best latency number.
(rx-usecs won't matter if rx-frames is set to 1).

> 
> But what does it matter? Just changing the kernel should not cause
> increases in latency.
> 

It will not matter if you use the same out-of-tree driver when switching
kernels.  If you use in-tree driver, the firmware may have changed
between kernel versions.




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:33       ` Christoph Lameter
@ 2009-04-16 22:47         ` David Miller
  2009-04-17 13:46           ` Christoph Lameter
  0 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2009-04-16 22:47 UTC (permalink / raw)
  To: cl; +Cc: dada1, rick.jones2, netdev

From: Christoph Lameter <cl@linux.com>
Date: Thu, 16 Apr 2009 15:33:06 -0400 (EDT)

> On Thu, 16 Apr 2009, Eric Dumazet wrote:
> 
>> When you perform your test, you upgrade kernels of both machines ?
> 
> Yes. Both machines always run the same kernel version.

It's good to change multiple variables at a time between
tests...

More seriously, it's going to be hard to isolate anything
unless you concern yourself with changes to one thing
on one system at a time.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:02   ` Christoph Lameter
  2009-04-16 21:19     ` Ben Hutchings
@ 2009-04-16 22:47     ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: David Miller @ 2009-04-16 22:47 UTC (permalink / raw)
  To: cl; +Cc: bhutchings, netdev

From: Christoph Lameter <cl@linux.com>
Date: Thu, 16 Apr 2009 15:02:36 -0400 (EDT)

> On Thu, 16 Apr 2009, Ben Hutchings wrote:
> 
>> On Thu, 2009-04-16 at 12:10 -0400, Christoph Lameter wrote:
>> > The following are results of lantency measurements using udpping
>> > (available from http://gentwo.org/ll). It shows that significant latencies
>> > were added since 2.6.27. I surely wish we could get back to times below 90
>> > microseconds.
>> [...]
>>
>> This "90 microseconds" figure is specific to a particular driver and
>> hardware.  Have you verified that it applies to others?  The variation
>> you've reported is tiny compared to the improvements that can be made or
>> lost by hardware tuning.
> 
> The RX delay can influence this of course and so can the driver. But the
> measurements are with the same RX delay and the same driver. Variations
> are up to 20% which is not tiny and its only due to different kernel
> versions.

The driver is changing, and that can also be enough.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 19:55   ` Network latency regressions from 2.6.22 to 2.6.29 Christoph Lameter
  2009-04-16 21:57     ` Michael Chan
@ 2009-04-16 22:59     ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: David Miller @ 2009-04-16 22:59 UTC (permalink / raw)
  To: cl; +Cc: dada1, netdev

From: Christoph Lameter <cl@linux.com>
Date: Thu, 16 Apr 2009 15:55:02 -0400 (EDT)

> But what does it matter? Just changing the kernel should not cause
> increases in latency.

There's firmware running on those cards doing all kinds of
fun things, the firmware gets changed in driver updates,
therefore...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 20:01     ` Christoph Lameter
@ 2009-04-16 23:00       ` David Miller
  2009-04-17 16:42         ` Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity) Christoph Lameter
  0 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2009-04-16 23:00 UTC (permalink / raw)
  To: cl; +Cc: dada1, netdev

From: Christoph Lameter <cl@linux.com>
Date: Thu, 16 Apr 2009 16:01:16 -0400 (EDT)

> On Thu, 16 Apr 2009, Eric Dumazet wrote:
> 
>> Also you might check /proc/interrupts, because recent linux versions are
>> distributing interrupts on all cpus instead of one on previous version
> 
> Argh. I just checked it and the systems used for the test had no irq
> pinning. We are rerunning the series.

Good catch Eric :-)

We really need to find a good way to fix that IRQ affinity issue.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 22:47         ` David Miller
@ 2009-04-17 13:46           ` Christoph Lameter
  2009-04-17 21:43             ` Ilpo Järvinen
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2009-04-17 13:46 UTC (permalink / raw)
  To: David Miller; +Cc: dada1, rick.jones2, netdev

On Thu, 16 Apr 2009, David Miller wrote:

> More seriously, it's going to be hard to isolate anything
> unless you concern yourself with changes to one thing
> on one system at a time.

Only the kernel version was changed on both systems. Changing only a
single system would have meant considering complex interactions between
the network layers of varying versions which would make
the problem even more difficult to track down.

I changed one thing: The kernel version.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-16 21:57     ` Michael Chan
@ 2009-04-17 13:47       ` Christoph Lameter
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2009-04-17 13:47 UTC (permalink / raw)
  To: Michael Chan; +Cc: Eric Dumazet, netdev@vger.kernel.org

On Thu, 16 Apr 2009, Michael Chan wrote:

> It will not matter if you use the same out-of-tree driver when switching
> kernels.  If you use in-tree driver, the firmware may have changed
> between kernel versions.

The in tree driver was used.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-16 23:00       ` David Miller
@ 2009-04-17 16:42         ` Christoph Lameter
  2009-04-18  8:18           ` Eric Dumazet
  2009-04-18 19:43           ` Eric Dumazet
  0 siblings, 2 replies; 36+ messages in thread
From: Christoph Lameter @ 2009-04-17 16:42 UTC (permalink / raw)
  To: David Miller; +Cc: dada1, Michael Chan, Ben Hutchings, netdev

On Thu, 16 Apr 2009, David Miller wrote:

> We really need to find a good way to fix that IRQ affinity issue.

Here are the results with setting interrupt affinity. We are still loosing
5 usecs from .22 to .30-rc2. We will investigate more in depth time
permitting.

	UDP ping pong 40 bytes

Kernel		Test 1	Test 2	Test 3	Test 4	Avg
2.6.22		83.1125	83.4	83.03	82.95	83.07
2.6.23		83.33	83.51	83.18	83.42	83.21
2.6.24		82.7775	82.8	82.77	82.77	82.77
2.6.25		85.875	85.94	85.75	86.06	85.75
2.6.26		87.4025	87.53	87.25	87.3	87.53
2.6.27		87.81	87.81	87.77	87.82	87.84
2.6.28		87.4275	87.77	87.24	87.28	87.42
2.6.29		88.4325	88.48	88.42	88.44	88.39
2.6.30-rc2	88.4925	88.49	88.44	88.56	88.48

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29
  2009-04-17 13:46           ` Christoph Lameter
@ 2009-04-17 21:43             ` Ilpo Järvinen
  0 siblings, 0 replies; 36+ messages in thread
From: Ilpo Järvinen @ 2009-04-17 21:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, dada1, rick.jones2, Netdev

On Fri, 17 Apr 2009, Christoph Lameter wrote:

> On Thu, 16 Apr 2009, David Miller wrote:
> 
> > More seriously, it's going to be hard to isolate anything
> > unless you concern yourself with changes to one thing
> > on one system at a time.
> 
> Only the kernel version was changed on both systems. Changing only a
> single system would have meant considering complex interactions between
> the network layers of varying versions which would make
> the problem even more difficult to track down.
> 
> I changed one thing: The kernel version.

No we're not talking about n^2 results here but claiming that changing 
only a single network endpoint makes things _more complex_ is quite bogus? 
Like it or not, those network layers you mention are still separated by 
pretty standadized medium whose encoding and timing is certainly quite 
independent of the kernel version :-).

I'd suggest you take the latest and v2.6.22 and run the remaining two 
mixed setups (I hope your scripts are not too fixed with identical 
versions... :-)).

-- 
 i.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-17 16:42         ` Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity) Christoph Lameter
@ 2009-04-18  8:18           ` Eric Dumazet
  2009-04-18  8:20             ` Eric Dumazet
  2009-04-18 19:43           ` Eric Dumazet
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2009-04-18  8:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

Christoph Lameter a écrit :
> On Thu, 16 Apr 2009, David Miller wrote:
> 
>> We really need to find a good way to fix that IRQ affinity issue.
> 
> Here are the results with setting interrupt affinity. We are still loosing
> 5 usecs from .22 to .30-rc2. We will investigate more in depth time
> permitting.
> 
> 	UDP ping pong 40 bytes
> 
> Kernel		Test 1	Test 2	Test 3	Test 4	Avg
> 2.6.22		83.1125	83.4	83.03	82.95	83.07
> 2.6.23		83.33	83.51	83.18	83.42	83.21
> 2.6.24		82.7775	82.8	82.77	82.77	82.77
> 2.6.25		85.875	85.94	85.75	86.06	85.75
> 2.6.26		87.4025	87.53	87.25	87.3	87.53
> 2.6.27		87.81	87.81	87.77	87.82	87.84
> 2.6.28		87.4275	87.77	87.24	87.28	87.42
> 2.6.29		88.4325	88.48	88.42	88.44	88.39
> 2.6.30-rc2	88.4925	88.49	88.44	88.56	88.48


You are using UDP, and UDP got memory accounting. This 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-18  8:18           ` Eric Dumazet
@ 2009-04-18  8:20             ` Eric Dumazet
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2009-04-18  8:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, David Miller, Michael Chan, Ben Hutchings,
	netdev

Eric Dumazet a écrit :
> Christoph Lameter a écrit :
>> On Thu, 16 Apr 2009, David Miller wrote:
>>
>>> We really need to find a good way to fix that IRQ affinity issue.
>> Here are the results with setting interrupt affinity. We are still loosing
>> 5 usecs from .22 to .30-rc2. We will investigate more in depth time
>> permitting.
>>
>> 	UDP ping pong 40 bytes
>>
>> Kernel		Test 1	Test 2	Test 3	Test 4	Avg
>> 2.6.22		83.1125	83.4	83.03	82.95	83.07
>> 2.6.23		83.33	83.51	83.18	83.42	83.21
>> 2.6.24		82.7775	82.8	82.77	82.77	82.77
>> 2.6.25		85.875	85.94	85.75	86.06	85.75
>> 2.6.26		87.4025	87.53	87.25	87.3	87.53
>> 2.6.27		87.81	87.81	87.77	87.82	87.84
>> 2.6.28		87.4275	87.77	87.24	87.28	87.42
>> 2.6.29		88.4325	88.48	88.42	88.44	88.39
>> 2.6.30-rc2	88.4925	88.49	88.44	88.56	88.48
> 
> 
> You are using UDP, and UDP got memory accounting. This 

This could explain a slowdown, but not 2.5 us per packet
(I take half your 5us, since you are measuring the two ways latencies)



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-17 16:42         ` Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity) Christoph Lameter
  2009-04-18  8:18           ` Eric Dumazet
@ 2009-04-18 19:43           ` Eric Dumazet
  2009-04-20 17:29             ` Christoph Lameter
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2009-04-18 19:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

Christoph Lameter a écrit :
> On Thu, 16 Apr 2009, David Miller wrote:
> 
>> We really need to find a good way to fix that IRQ affinity issue.

Yep... this is annoying, but disable_msi=1 can help

> 
> Here are the results with setting interrupt affinity. We are still loosing
> 5 usecs from .22 to .30-rc2. We will investigate more in depth time
> permitting.
> 
> 	UDP ping pong 40 bytes
> 
> Kernel		Test 1	Test 2	Test 3	Test 4	Avg
> 2.6.22		83.1125	83.4	83.03	82.95	83.07
> 2.6.23		83.33	83.51	83.18	83.42	83.21
> 2.6.24		82.7775	82.8	82.77	82.77	82.77
> 2.6.25		85.875	85.94	85.75	86.06	85.75
> 2.6.26		87.4025	87.53	87.25	87.3	87.53
> 2.6.27		87.81	87.81	87.77	87.82	87.84
> 2.6.28		87.4275	87.77	87.24	87.28	87.42
> 2.6.29		88.4325	88.48	88.42	88.44	88.39
> 2.6.30-rc2	88.4925	88.49	88.44	88.56	88.48

I did my tests with 3 machines, trying to reproduce your results and failed.

So its good news for us (we can go back watching TV :) )

192.168.20.110 : linux-2.6 latest git tree, receiver "newkernel"
192.168.20.120 : linux-2.6.22, receiver "oldkernel"
192.168.20.112 : 2.6.28.7 as the sender machine (we dont really care)

I only took traces on receivers to rule out different coalescing delays
of NIC that could depend on firmware or version

bnx2 driver statically included in vmlinux.
(on newkernel, its "disable_msi" set to 1 (because this is the only way my BCM5708S can
 have interrupt affinities set by the admin used by the hardware on 2.6.28+)

Result for both new and old kernel :

3 us of latency to answer to a ICMP ping request, if irq affinity correct.

This 3 us includes the overhead of timestamping and pcap copy of packets
 since I have my tcpdump launched.

In my case I want NIC IRQs on CPU0 : "echo 1 >/proc/irq/16/smp_affinity"


(trace taken on the receiver side)
ICMP Ping for 2.6.30-rc2 : 3us, 3us, 3us, 3us
15:11:52.856072 IP 192.168.20.112 > 192.168.20.110: ICMP echo request, id 7957, seq 3330, length 64
15:11:52.856075 IP 192.168.20.110 > 192.168.20.112: ICMP echo reply, id 7957, seq 3330, length 64

15:11:52.856168 IP 192.168.20.112 > 192.168.20.110: ICMP echo request, id 7957, seq 3331, length 64
15:11:52.856171 IP 192.168.20.110 > 192.168.20.112: ICMP echo reply, id 7957, seq 3331, length 64

15:11:52.856264 IP 192.168.20.112 > 192.168.20.110: ICMP echo request, id 7957, seq 3332, length 64
15:11:52.856267 IP 192.168.20.110 > 192.168.20.112: ICMP echo reply, id 7957, seq 3332, length 64

15:11:52.856359 IP 192.168.20.112 > 192.168.20.110: ICMP echo request, id 7957, seq 3333, length 64
15:11:52.856362 IP 192.168.20.110 > 192.168.20.112: ICMP echo reply, id 7957, seq 3333, length 64


ICMP Ping for 2.6.22: 3us, 3us, 3us, 3us : Exactly same latencies than current kernel

20:35:06.375011 IP 192.168.20.112 > 192.168.20.120: icmp 64: echo request seq 110
20:35:06.375014 IP 192.168.20.120 > 192.168.20.112: icmp 64: echo reply seq 110

20:35:06.375106 IP 192.168.20.112 > 192.168.20.120: icmp 64: echo request seq 111
20:35:06.375109 IP 192.168.20.120 > 192.168.20.112: icmp 64: echo reply seq 111

20:35:06.375201 IP 192.168.20.112 > 192.168.20.120: icmp 64: echo request seq 112
20:35:06.375204 IP 192.168.20.120 > 192.168.20.112: icmp 64: echo reply seq 112

20:35:06.375296 IP 192.168.20.112 > 192.168.20.120: icmp 64: echo request seq 113
20:35:06.375299 IP 192.168.20.120 > 192.168.20.112: icmp 64: echo reply seq 113



Answering to a ping involves the regular network stack, with softirq and all...
But we know for sure that the reply comes from the cpu that handled the rx, so
no false sharing between cpus, and that no user land process was involved, only softirqs.


Now with your udpping I get very variables times : 21 us, 14 us, 27 us, 11 us on following random samples

(trace taken on receiver side again)
15:18:50.256238 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
15:18:50.256259 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

15:18:50.256371 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
15:18:50.256385 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

15:18:50.256498 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
15:18:50.256525 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

15:18:50.256638 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
15:18:50.256649 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

Then if I force uddpping receiver running on CPU2 (same physical CPU receiving NIC interrupts,
but a different core) :

I get : 13us, 11us, 10us, 10us, which is not bad.

21:07:48.823741 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:07:48.823754 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

21:07:48.823852 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:07:48.823863 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

21:07:48.823962 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:07:48.823972 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

21:07:48.824087 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:07:48.824097 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

If run on another physical cpu : I get 15us, 12us, 12us, 12us

21:09:10.667120 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:09:10.667135 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:09:10.667234 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:09:10.667246 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:09:10.667345 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:09:10.667357 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:09:10.667471 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:09:10.667483 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300


If I force udpping to same cpu than IRQS (CPU0) I get : 7us, 8us, 7us, 7us
And yes, this is very good and expected IMHO :)

21:04:23.780421 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:04:23.780428 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

21:04:23.780539 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:04:23.780547 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

21:04:23.780663 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:04:23.780670 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300

21:04:23.780783 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:04:23.780790 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300


Now running the receiver on 2.6.22

udpping: 6us, 15us, 14us, 13us

16:34:34.266069 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
16:34:34.266075 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300

16:34:34.266093 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
16:34:34.266108 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300

16:34:34.266209 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
16:34:34.266223 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300

16:34:34.266323 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
16:34:34.266335 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300

Same cpu for user process and IRQS :

udpping : 7us, 9us, 8us, 10us : Very good, as we could expect !

20:39:07.106698 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
20:39:07.106705 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300

20:39:07.106808 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
20:39:07.106819 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300

20:39:07.106918 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
20:39:07.106926 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300

20:39:07.107046 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
20:39:07.107056 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300

But in the overall, I get litle bit better results with 2.6.30-rc2 : 7 us


I did an oprofile session on 2.6.30-rc2 for the 'best case' (7us of latency)
tracing only kernel events.

(and unfortunalty IRQ16 is shared by eth0, eth1 and uhci_hcd:usb1


CPU: Core 2, speed 3000.19 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 10000
samples  cum. samples  %        cum. %     symbol name
252423   252423        19.4717  19.4717    mwait_idle
99195    351618         7.6518  27.1235    apic_timer_interrupt
64108    415726         4.9452  32.0687    bnx2_interrupt
62195    477921         4.7977  36.8664    find_busiest_group
60022    537943         4.6300  41.4964    scheduler_tick
40772    578715         3.1451  44.6415    read_tsc
38550    617265         2.9737  47.6152    uhci_irq
36074    653339         2.7827  50.3980    sched_clock_cpu
25319    678658         1.9531  52.3510    rb_get_reader_page
25114    703772         1.9373  54.2883    native_sched_clock
23753    727525         1.8323  56.1206    getnstimeofday
22103    749628         1.7050  57.8256    ktime_get
19924    769552         1.5369  59.3625    hrtimer_interrupt
18999    788551         1.4656  60.8281    ktime_get_ts
17910    806461         1.3816  62.2096    tick_check_oneshot_broadcast
17664    824125         1.3626  63.5722    ring_buffer_consume
13354    837479         1.0301  64.6023    irq_entries_start
12885    850364         0.9939  65.5963    sched_clock_tick
12270    862634         0.9465  66.5428    tick_sched_timer
11232    873866         0.8664  67.4092    __schedule
10607    884473         0.8182  68.2274    softlockup_tick
9446     893919         0.7287  68.9561    raise_softirq
8483     902402         0.6544  69.6104    run_timer_softirq
8295     910697         0.6399  70.2503    __do_softirq
8208     918905         0.6332  70.8835    copy_to_user
7929     926834         0.6116  71.4951    run_rebalance_domains
7478     934312         0.5768  72.0720    bnx2_poll_work
6872     941184         0.5301  72.6021    cpu_idle
6267     947451         0.4834  73.0855    rcu_pending
5953     953404         0.4592  73.5447    rb_advance_reader
5781     959185         0.4459  73.9906    handle_fasteoi_irq
5762     964947         0.4445  74.4351    cpu_clock
5615     970562         0.4331  74.8682    udp_sendmsg
5544     976106         0.4277  75.2959    __skb_recv_datagram
5493     981599         0.4237  75.7196    set_normalized_timespec
5437     987036         0.4194  76.1390    update_wall_time
5282     992318         0.4074  76.5465    smp_apic_timer_interrupt
5097     997415         0.3932  76.9397    ip_append_data
5034     1002449        0.3883  77.3280    find_next_bit


my vmstat 1 on 2.6.30-rC2 receiver during test :

 0  0      0 4047312  12944  40156    0    0     0     0 64231 26101  0  1 99  0
 0  1      0 4047312  12948  40156    0    0     0    32 64297 25762  0  1 99  0
 0  0      0 4047304  12952  40156    0    0     0     8 64650 25510  0  1 99  0
 0  0      0 4047312  12960  40152    0    0     0   344 64533 25434  0  1 98  1
 0  0      0 4047312  12960  40156    0    0     0     0 64279 25628  0  1 99  0
 0  0      0 4047312  12960  40156    0    0     0     4 64606 25356  0  1 99  0
 0  0      0 4047312  12960  40156    0    0     0     0 64253 25732  0  1 99  0
 0  1      0 4047304  12968  40152    0    0     0    40 64735 25211  0  1 99  0
 0  0      0 4047312  12968  40156    0    0     0   356 64638 25443  0  1 99  0
 0  0      0 4047312  12968  40156    0    0     0     0 64269 25608  0  1 99  0
 0  0      0 4047312  12968  40156    0    0     0     0 64562 25120  0  1 99  0
 1  0      0 4047312  12968  40156    0    0     0     4 64668 24811  0  1 99  0


My three machines have same hardware :
HP ProLiant BL460c G1
2 quad core Intel(R) Xeon(R) CPU E5450  @ 3.00GHz  FSB 1333MHz
4GB ram (667 MHz)
Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
32 bit kernel


In conclusion, something might be wrong in your methodology, or your hardware, or...

Please Christoph, would you mind posting more information on your config ?

Any chance you have CONFIG_PREEMT enabled or any expensive kernel debug thing ?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-18 19:43           ` Eric Dumazet
@ 2009-04-20 17:29             ` Christoph Lameter
  2009-04-20 17:57               ` Eric Dumazet
  2009-04-20 19:44               ` Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity) Evgeniy Polyakov
  0 siblings, 2 replies; 36+ messages in thread
From: Christoph Lameter @ 2009-04-20 17:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

On Sat, 18 Apr 2009, Eric Dumazet wrote:

> I did my tests with 3 machines, trying to reproduce your results and failed.

Ow.... Where are the results? Seems that you only included the results of
traces? The numbers would be much higher with udpping since it measures
the full round trip time from user space of one machine to the other and
back. The code segment that you are tracing may not be relevant.

> I only took traces on receivers to rule out different coalescing delays
> of NIC that could depend on firmware or version

Hmmm... Okay there could be a firmware influence....

> bnx2 driver statically included in vmlinux.
> (on newkernel, its "disable_msi" set to 1 (because this is the only way my BCM5708S can
>  have interrupt affinities set by the admin used by the hardware on 2.6.28+)

Here bnx2 is a module and MSI is enabled after it became available
(2.6.27+). Affinities seem to be effective after resetting the driver.

> If I force udpping to same cpu than IRQS (CPU0) I get : 7us, 8us, 7us, 7us
> And yes, this is very good and expected IMHO :)

Sounds very good. If I just knew what you are measuring.

> In conclusion, something might be wrong in your methodology, or your hardware, or...

Cannot really say since the results are not published? We can retry this
by making sure that udpping runs on the same processor as the IRQ.

> Please Christoph, would you mind posting more information on your config ?

Sure.

Dell PowerEdge 1950
http://www1.ap.dell.com/content/products/productdetails.aspx/pedge_1950?c=au&cs=aubsd1&l=en&s=bsd

> Any chance you have CONFIG_PREEMT enabled or any expensive kernel debug thing ?

Nope. Even if so: Would that not be offset by running the same kernel
version?

CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_NR_CPUS=32
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
# CONFIG_PREEMPT_BKL is not set
CONFIG_X86_LOCAL_APIC=y

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_UNUSED_SYMBOLS=y
# CONFIG_DEBUG_FS is not set
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_SCHEDSTATS is not set
# CONFIG_TIMER_STATS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_DEBUG_KOBJECT is not set
# CONFIG_DEBUG_HIGHMEM is not set
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_INFO is not set
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_LIST is not set
# CONFIG_FRAME_POINTER is not set
# CONFIG_FORCED_INLINING is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_LKDTM is not set
# CONFIG_FAULT_INJECTION is not set
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_RODATA is not set
# CONFIG_4KSTACKS is not set
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_DOUBLEFAULT=y



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-20 17:29             ` Christoph Lameter
@ 2009-04-20 17:57               ` Eric Dumazet
  2009-04-20 18:13                 ` Christoph Lameter
  2009-04-20 19:44               ` Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity) Evgeniy Polyakov
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2009-04-20 17:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

Christoph Lameter a écrit :
> On Sat, 18 Apr 2009, Eric Dumazet wrote:
> 
>> I did my tests with 3 machines, trying to reproduce your results and failed.
> 
> Ow.... Where are the results? Seems that you only included the results of
> traces? The numbers would be much higher with udpping since it measures
> the full round trip time from user space of one machine to the other and
> back. The code segment that you are tracing may not be relevant.
> 
>> I only took traces on receivers to rule out different coalescing delays
>> of NIC that could depend on firmware or version
> 
> Hmmm... Okay there could be a firmware influence....
> 
>> bnx2 driver statically included in vmlinux.
>> (on newkernel, its "disable_msi" set to 1 (because this is the only way my BCM5708S can
>>  have interrupt affinities set by the admin used by the hardware on 2.6.28+)
> 
> Here bnx2 is a module and MSI is enabled after it became available
> (2.6.27+). Affinities seem to be effective after resetting the driver.

Hmm, I am not sure I can do that, I would loose my ssh connection...

> 
>> If I force udpping to same cpu than IRQS (CPU0) I get : 7us, 8us, 7us, 7us
>> And yes, this is very good and expected IMHO :)
> 
> Sounds very good. If I just knew what you are measuring.

Rephrasing my email, I was measuring latencies on the receiving machine,
by using tcpdump and doing substraction of 'answer_time' and 'request_time'
Thing is that timestamps dont care about hardware delays.
(we note time when RX interrupt delivers the packet, and time right before giving frame to hardware)


21:04:23.780421 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300   (request)
21:04:23.780428 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300   (answer)

Here, [21:04:23.780428 - 21:04:23.780421] = 7 us

So my results are extensively published :)


> 
>> In conclusion, something might be wrong in your methodology, or your hardware, or...
> 
> Cannot really say since the results are not published? We can retry this
> by making sure that udpping runs on the same processor as the IRQ.
> 
>> Please Christoph, would you mind posting more information on your config ?
> 
> Sure.
> 
> Dell PowerEdge 1950
> http://www1.ap.dell.com/content/products/productdetails.aspx/pedge_1950?c=au&cs=aubsd1&l=en&s=bsd
> 
>> Any chance you have CONFIG_PREEMT enabled or any expensive kernel debug thing ?
> 
> Nope. Even if so: Would that not be offset by running the same kernel
> version?

I dont know, it was a random thought of mine, since I had in the past bad results
if this one was enabled. This might had changed.

> 
> CONFIG_HPET_TIMER=y
> CONFIG_HPET_EMULATE_RTC=y
> CONFIG_NR_CPUS=32
> CONFIG_SCHED_SMT=y

OK, I had "# CONFIG_SCHED_SMT is not set"
I'll try with this option set

> CONFIG_SCHED_MC=y
> CONFIG_PREEMPT_NONE=y
> # CONFIG_PREEMPT_VOLUNTARY is not set
> # CONFIG_PREEMPT is not set
> # CONFIG_PREEMPT_BKL is not set
> CONFIG_X86_LOCAL_APIC=y
> 
> #
> # Kernel hacking
> #
> CONFIG_TRACE_IRQFLAGS_SUPPORT=y
> # CONFIG_PRINTK_TIME is not set
> CONFIG_ENABLE_MUST_CHECK=y
> CONFIG_MAGIC_SYSRQ=y
> CONFIG_UNUSED_SYMBOLS=y
> # CONFIG_DEBUG_FS is not set
> # CONFIG_HEADERS_CHECK is not set
> CONFIG_DEBUG_KERNEL=y
> # CONFIG_DEBUG_SHIRQ is not set
> CONFIG_DETECT_SOFTLOCKUP=y
> # CONFIG_SCHEDSTATS is not set
> # CONFIG_TIMER_STATS is not set
> # CONFIG_DEBUG_SLAB is not set
> # CONFIG_DEBUG_RT_MUTEXES is not set
> # CONFIG_RT_MUTEX_TESTER is not set
> # CONFIG_DEBUG_SPINLOCK is not set
> # CONFIG_DEBUG_MUTEXES is not set
> # CONFIG_DEBUG_LOCK_ALLOC is not set
> # CONFIG_PROVE_LOCKING is not set
> # CONFIG_DEBUG_SPINLOCK_SLEEP is not set
> # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
> # CONFIG_DEBUG_KOBJECT is not set
> # CONFIG_DEBUG_HIGHMEM is not set
> CONFIG_DEBUG_BUGVERBOSE=y
> # CONFIG_DEBUG_INFO is not set
> # CONFIG_DEBUG_VM is not set
> # CONFIG_DEBUG_LIST is not set
> # CONFIG_FRAME_POINTER is not set
> # CONFIG_FORCED_INLINING is not set
> # CONFIG_RCU_TORTURE_TEST is not set
> # CONFIG_LKDTM is not set
> # CONFIG_FAULT_INJECTION is not set
> CONFIG_EARLY_PRINTK=y
> CONFIG_DEBUG_STACKOVERFLOW=y
> # CONFIG_DEBUG_STACK_USAGE is not set
> # CONFIG_DEBUG_RODATA is not set
> # CONFIG_4KSTACKS is not set
> CONFIG_X86_FIND_SMP_CONFIG=y
> CONFIG_X86_MPPARSE=y
> CONFIG_DOUBLEFAULT=y
> 

Are you running a 32 or 64 bits kernel ?

Thanks



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-20 17:57               ` Eric Dumazet
@ 2009-04-20 18:13                 ` Christoph Lameter
  2009-04-20 18:46                   ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2009-04-20 18:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

On Mon, 20 Apr 2009, Eric Dumazet wrote:

> > Sounds very good. If I just knew what you are measuring.
>
> Rephrasing my email, I was measuring latencies on the receiving machine,
> by using tcpdump and doing substraction of 'answer_time' and 'request_time'
> Thing is that timestamps dont care about hardware delays.
> (we note time when RX interrupt delivers the packet, and time right before giving frame to hardware)
>
>
> 21:04:23.780421 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300   (request)
> 21:04:23.780428 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300   (answer)
>
> Here, [21:04:23.780428 - 21:04:23.780421] = 7 us
>
> So my results are extensively published :)

But they are not comparable with my results. There could be other effects
in the system call API etc that have caused this regression. Plus tcpdump
causes additional copies of the packet to be delivered to user space.

> > CONFIG_HPET_TIMER=y
> > CONFIG_HPET_EMULATE_RTC=y
> > CONFIG_NR_CPUS=32
> > CONFIG_SCHED_SMT=y
>
> OK, I had "# CONFIG_SCHED_SMT is not set"
> I'll try with this option set

Should not be relevant since the processor has no hyperthreading.

> Are you running a 32 or 64 bits kernel ?

Test was done using a 64 bit kernel.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-20 18:13                 ` Christoph Lameter
@ 2009-04-20 18:46                   ` Eric Dumazet
  2009-04-20 19:16                     ` Christoph Lameter
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2009-04-20 18:46 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

Christoph Lameter a écrit :
> On Mon, 20 Apr 2009, Eric Dumazet wrote:
> 
>>> Sounds very good. If I just knew what you are measuring.
>> Rephrasing my email, I was measuring latencies on the receiving machine,
>> by using tcpdump and doing substraction of 'answer_time' and 'request_time'
>> Thing is that timestamps dont care about hardware delays.
>> (we note time when RX interrupt delivers the packet, and time right before giving frame to hardware)
>>
>>
>> 21:04:23.780421 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300   (request)
>> 21:04:23.780428 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300   (answer)
>>
>> Here, [21:04:23.780428 - 21:04:23.780421] = 7 us
>>
>> So my results are extensively published :)
> 
> But they are not comparable with my results. There could be other effects
> in the system call API etc that have caused this regression. Plus tcpdump
> causes additional copies of the packet to be delivered to user space.

Yep, this was all mentioned in my mail.
I wanted to compare latencies on receiver only, ruling out hardware, and ruling out sender (no need to reboot it)
These latencies are higher than ones without tcpdump, since more copies are involved with tcpdump.

About system call API effects, they are included in my tests.
Since :

t0 : time we receive packet from NIC
  -> wakeup user process, scheduler...
     User process returns from the recvfrom()  copy from system to user space
     User process does the sendto()    copy from user to system space

t1:  -> calling dev_start_xmit()
     packet given to NIC driver (idle during the tests, so should really send the packet asap)

     User call again recvfrom() and block (this one is not accounted in latency, as in your test)
t2: NIC driver acknowledge the TX 

delta = t1 - t0

One thing that could hurt is the TX done interrupt, but this is done a few us after "t1" so it
doesnt hurt your workload, since next frame is received at least 100us after the last answer...

(cpu is idle 99%)

Point is that even with tcpdump running, latencies are very good on 2.6.30-rc2, and were very good
with 2.6.22. I see no significant increase/decrease...


> 
>>> CONFIG_HPET_TIMER=y
>>> CONFIG_HPET_EMULATE_RTC=y
>>> CONFIG_NR_CPUS=32
>>> CONFIG_SCHED_SMT=y
>> OK, I had "# CONFIG_SCHED_SMT is not set"
>> I'll try with this option set
> 
> Should not be relevant since the processor has no hyperthreading.
> 
>> Are you running a 32 or 64 bits kernel ?
> 
> Test was done using a 64 bit kernel.

Ok, I'll try 64bit too :)

1 us is time to access about 10 false shared cache lines.
64 bit arches store less pointers/long per cache line.
So a 64 bit kernel could be slower on this kind of workload in the general case (if several cpus play the game)



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-20 18:46                   ` Eric Dumazet
@ 2009-04-20 19:16                     ` Christoph Lameter
  2009-04-20 20:07                       ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2009-04-20 19:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

On Mon, 20 Apr 2009, Eric Dumazet wrote:

> Point is that even with tcpdump running, latencies are very good on 2.6.30-rc2, and were very good
> with 2.6.22. I see no significant increase/decrease...

Well okay that applies to your testing methodology but the statement that
you have shown that the regression that I reported does not exist is not
proven since you ran a different test.

> 1 us is time to access about 10 false shared cache lines.

That depends on the size of the system and the number of processors
contending for the cache line.

> 64 bit arches store less pointers/long per cache line.
> So a 64 bit kernel could be slower on this kind of workload in the general case (if several cpus play the game)

Right. But in practice I have also seen slight performance increases due
to the increased availability of memory and the avoidence of various 32
bit hacks (like highmem). Plus several recent subsystems seem to be
optimized for 64 bit like f.e. Infiniband.

I'd still like to see udpping results on your testing rigg to get another
datapoint. If the udpping results are not showing regressions on your
tests then there is likely a config issue at the core of the regression
that I am seeing here.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-20 17:29             ` Christoph Lameter
  2009-04-20 17:57               ` Eric Dumazet
@ 2009-04-20 19:44               ` Evgeniy Polyakov
  1 sibling, 0 replies; 36+ messages in thread
From: Evgeniy Polyakov @ 2009-04-20 19:44 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, David Miller, Michael Chan, Ben Hutchings, netdev

Hi.

On Mon, Apr 20, 2009 at 01:29:54PM -0400, Christoph Lameter (cl@linux.com) wrote:
> > I only took traces on receivers to rule out different coalescing delays
> > of NIC that could depend on firmware or version
> 
> Hmmm... Okay there could be a firmware influence....

Can you replace the driver in one or another kernel?
Can those be scheduler (I said wrong words I know) iteractions? Maybe
test with sched_rt priority?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-20 19:16                     ` Christoph Lameter
@ 2009-04-20 20:07                       ` Eric Dumazet
  2009-04-20 21:14                         ` Christoph Lameter
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2009-04-20 20:07 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

Christoph Lameter a écrit :
> On Mon, 20 Apr 2009, Eric Dumazet wrote:
> 
>> Point is that even with tcpdump running, latencies are very good on 2.6.30-rc2, and were very good
>> with 2.6.22. I see no significant increase/decrease...
> 
> Well okay that applies to your testing methodology but the statement that
> you have shown that the regression that I reported does not exist is not
> proven since you ran a different test.

I ran half the test. Receiver is OK, and this is the latency we all expect, as
service provider.

Now you can focus to the sender point.

For example, your program uses a kernel service to gather time with nanosecond precision.

Maybe there is a problem with it, I dont know...

Your test has so many variables it his hard to guess which part has a problem.

Maybe this is what you wanted to show after all, and you are not really interested
to really discover what is happening. Oh well, just kidding.

I am not trying to say you are right or wrong Christoph, just trying to 
check if really linux got a regression in various past releases. So far,
I did not found some strange results on UDP path, once IRQ affinities
are fixed of course.

> 
>> 1 us is time to access about 10 false shared cache lines.
> 
> That depends on the size of the system and the number of processors
> contending for the cache line.
> 
>> 64 bit arches store less pointers/long per cache line.
>> So a 64 bit kernel could be slower on this kind of workload in the general case (if several cpus play the game)
> 
> Right. But in practice I have also seen slight performance increases due
> to the increased availability of memory and the avoidence of various 32
> bit hacks (like highmem). Plus several recent subsystems seem to be
> optimized for 64 bit like f.e. Infiniband.

Maybe, but on udpping of 40 bytes messages, I am not sure it can make a difference.

> 
> I'd still like to see udpping results on your testing rigg to get another
> datapoint. If the udpping results are not showing regressions on your
> tests then there is likely a config issue at the core of the regression
> that I am seeing here.

No changes in udpping but noise.

Also, my machines use bonding and vlans, so I probably have a litle bit of overhead
(bonding uses rwlock, not very SMP friendly...)



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-20 20:07                       ` Eric Dumazet
@ 2009-04-20 21:14                         ` Christoph Lameter
  2009-04-20 21:52                           ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2009-04-20 21:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

On Mon, 20 Apr 2009, Eric Dumazet wrote:

> No changes in udpping but noise.

No changes in the results of udpping that you would not consider noise?

The time syscalls are virtual syscalls which are quite cheap.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-20 21:14                         ` Christoph Lameter
@ 2009-04-20 21:52                           ` Eric Dumazet
  2009-04-21 14:00                             ` Christoph Lameter
  2009-04-21 19:36                             ` Network latency regressions from 2.6.22 to 2.6.29 (MSI off) Christoph Lameter
  0 siblings, 2 replies; 36+ messages in thread
From: Eric Dumazet @ 2009-04-20 21:52 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

Christoph Lameter a écrit :
> On Mon, 20 Apr 2009, Eric Dumazet wrote:
> 
>> No changes in udpping but noise.
> 
> No changes in the results of udpping that you would not consider noise?

Only changes where when every cpus played, so it really is not meaningfull.

> 
> The time syscalls are virtual syscalls which are quite cheap.

Not on 32bits afaik, with glibc I use at least, strace told me.

clock_gettime(CLOCK_REALTIME, {1240262824, 400770067}) = 0
sendto(3, "\314\335\356\377\300\250\24p#)\0\0\23\324\207\264\377\377\377\377\0\1\2\3\4\5\6\7\10\t\0\0\3"..., 300, 0, {sa_family=AF_INET, sin_port=htons(9000), sin_addr=inet_addr("192.168.20.112")}, 16) = 300
recvfrom(3, "\314\335\356\377\300\250\24p#)\0\0\23\324\207\264\377\377\377\377\0\1\2\3\4\5\6\7\10\t\0\0\3"..., 300, 0, NULL, NULL) = 300
clock_gettime(CLOCK_REALTIME, {1240262824, 401108324}) = 0
clock_gettime(CLOCK_REALTIME, {1240262824, 401177464}) = 0

cost is about 384 ns on this 32bits 3GHz machine (without strace), while its 51 ns on 64 bit

Anyway I'll try to boot 64bit kernels on those 32bit distro machines in following days...



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity)
  2009-04-20 21:52                           ` Eric Dumazet
@ 2009-04-21 14:00                             ` Christoph Lameter
  2009-04-21 19:36                             ` Network latency regressions from 2.6.22 to 2.6.29 (MSI off) Christoph Lameter
  1 sibling, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2009-04-21 14:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev

On Mon, 20 Apr 2009, Eric Dumazet wrote:

> Not on 32bits afaik, with glibc I use at least, strace told me.

Right the vsyscall optimizations are not for 32 bit.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Network latency regressions from 2.6.22 to 2.6.29 (MSI off)
  2009-04-20 21:52                           ` Eric Dumazet
  2009-04-21 14:00                             ` Christoph Lameter
@ 2009-04-21 19:36                             ` Christoph Lameter
  1 sibling, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2009-04-21 19:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Michael Chan, Ben Hutchings, netdev


Switching off MSI increases the latency another step:


MSI off
UDP ping pong 40 bytes	Test 1	Test 2	Test 3	Test 4
2.6.27		91.1725	91.31	91.17	91.12	91.09
2.6.28		91.5625	91.71	91.45	91.67	91.42
2.6.29		91.7	91.71	91.67	91.72	91.7
2.6.30-rc2	91.835	91.83	91.78	91.82	91.91

MSI on

UDP ping pong 40 bytes	Test 1	Test 2	Test 3	Test 4
2.6.22		83.1125	83.4	83.03	82.95	83.07
2.6.23		83.33	83.51	83.18	83.42	83.21
2.6.24		82.7775	82.8	82.77	82.77	82.77
2.6.25		85.875	85.94	85.75	86.06	85.75
2.6.26		87.4025	87.53	87.25	87.3	87.53
2.6.27		87.81	87.81	87.77	87.82	87.84
2.6.28		87.4275	87.77	87.24	87.28	87.42
2.6.29		88.4325	88.48	88.42	88.44	88.39
2.6.30-rc2	88.4925	88.49	88.44	88.56	88.48

Next test series will have always the same kernel on one side etc.



^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2009-04-21 19:44 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-16 16:10 Network latency regressions from 2.6.22 to 2.6.29 Christoph Lameter
2009-04-16 17:21 ` Rick Jones
2009-04-16 19:06   ` Christoph Lameter
2009-04-16 19:29     ` Eric Dumazet
2009-04-16 19:33       ` Christoph Lameter
2009-04-16 22:47         ` David Miller
2009-04-17 13:46           ` Christoph Lameter
2009-04-17 21:43             ` Ilpo Järvinen
2009-04-16 20:05     ` Rick Jones
2009-04-16 18:07 ` Ben Hutchings
2009-04-16 19:02   ` Christoph Lameter
2009-04-16 21:19     ` Ben Hutchings
2009-04-16 22:47     ` David Miller
2009-04-16 19:43 ` Eric Dumazet
2009-04-16 19:50   ` Eric Dumazet
2009-04-16 20:01     ` Christoph Lameter
2009-04-16 23:00       ` David Miller
2009-04-17 16:42         ` Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity) Christoph Lameter
2009-04-18  8:18           ` Eric Dumazet
2009-04-18  8:20             ` Eric Dumazet
2009-04-18 19:43           ` Eric Dumazet
2009-04-20 17:29             ` Christoph Lameter
2009-04-20 17:57               ` Eric Dumazet
2009-04-20 18:13                 ` Christoph Lameter
2009-04-20 18:46                   ` Eric Dumazet
2009-04-20 19:16                     ` Christoph Lameter
2009-04-20 20:07                       ` Eric Dumazet
2009-04-20 21:14                         ` Christoph Lameter
2009-04-20 21:52                           ` Eric Dumazet
2009-04-21 14:00                             ` Christoph Lameter
2009-04-21 19:36                             ` Network latency regressions from 2.6.22 to 2.6.29 (MSI off) Christoph Lameter
2009-04-20 19:44               ` Network latency regressions from 2.6.22 to 2.6.29 (results with IRQ affinity) Evgeniy Polyakov
2009-04-16 19:55   ` Network latency regressions from 2.6.22 to 2.6.29 Christoph Lameter
2009-04-16 21:57     ` Michael Chan
2009-04-17 13:47       ` Christoph Lameter
2009-04-16 22:59     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).