netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unexcepted latency (order of 100-200 ms) with TCP (packet receive)
@ 2007-04-26 10:20 Ilpo Järvinen
  2007-04-26 16:16 ` Chuck Ebbert
  2007-04-26 16:38 ` Rick Jones
  0 siblings, 2 replies; 9+ messages in thread
From: Ilpo Järvinen @ 2007-04-26 10:20 UTC (permalink / raw)
  To: Netdev

Hi,


I've been trying to figure out unexcepted delay spikes that seem to occur 
for TCP flows with 2.6 (tested 2.6.18.8 + a punch of own modifications and 
also with 2.6.20.7 with a single additional changeset only that enables me 
to TCP's control initial window at will). I'll first describe the problem 
and then give a more complete description of the workload on the host.

Less than year ago, my test network kernels where upgraded from 2.4.22 to 
2.6.18. With 2.4.22 I was able to achieve very good timings in transfers 
between components. Some time ago I noticed that with 2.6.18 I 
occassionally get latency spikes as long as 100-200ms in the TCP 
transfers between components (I describe later how TCP was tuned during 
these tests to avoid problems that occur with small segments). I started 
to investigate the spikes, and here are the findings so far:

 - Problem is very well reproducable with the workload but I haven't been
   able to come up with simple client + server to cause the same with it.
 - Downgrading the emulator host to 2.4.22 removes the problem
 - Moving the emulator to another host in the network does not remove 
   the problem. It had different ethernet hw (and not the same driver).
 - I placed a hub to get exact timings on the wire without potential 
   interference from tcpdump on the emulator host (test done with 2.6.18) 
   but to my great surprise, the problem vanished completely
 - Due to the hub test result, I tested 10/100/duplex settings and found 
   out that if the emulator host has 10fd, the problem does not occur with
   2.6 either?!? This could be due to luck but I cannot say for sure, yet
   the couple of tests I've run with 10fd, did not show this...
 - Tried to change cable & switch that connect hosts together, no effect


To prove this with 100Mbps, I setup routing so that with a host with 10/FD 
configuration (known, based on earlier, to be unlikely to cause errors) I 
collected all traffic between the emulator host and one of the packet 
capture hosts. Here is one example point where a long delay occurs 
(EMU is the emulator host, in the real log each packet is shown twice, I 
only leave the later one here):

1177577267.364596 IP CAP.35305 > EMU.52246: . 17231434:17232894(1460) ack 383357 win 16293
1177577267.364688 IP CAP.35305 > EMU.52246: P 17232894:17232946(52) ack 383357 win 16293
1177577267.366093 IP EMU.52246 > CAP.35305: . ack 17232894 win 32718
1177577267.493815 IP EMU.52246 > CAP.35305: P 383357:383379(22) ack 17232894 win 32718
1177577267.534252 IP CAP.35305 > EMU.52246: . ack 383379 win 16293
1177577267.534517 IP EMU.59050 > CAP.58452: P 624496:624528(32) ack 328 win 365
1177577267.534730 IP CAP.58452 > EMU.59050: . ack 624528 win 16293
1177577267.536267 IP CAP.35305 > EMU.52246: . 17232946:17234406(1460) ack 383379 win 16293
1177577267.536360 IP CAP.35305 > EMU.52246: P 17234406:17234458(52) ack 383379 win 16293
1177577267.537764 IP EMU.52246 > CAP.35305: . ack 17234406 win 32718

In the receiving emulator process I have, at the same time, this debug 
output (select calls while in busywait are not printed to avoid 
performance and huge log problems):

To select 1177577267.362959 timeout: 0.011148 trackfd: 4 (in set)
Out select 1177577267.365923 trackfd: 4 (in set)
Read at 1177577267.365936 fromfd: 4 (had 0, tried 2048, got more 1460), while in busywait: 0
To select 1177577267.365949 timeout: 0.008158 trackfd: 4 (in set)
Out select 1177577267.377752 trackfd: 4 (not in set)
  567.177960: DL: delayed: [...emulation event...]
To select 1177577267.394121 timeout: 0.079489 trackfd: 4 (in set)
Out select 1177577267.473751 trackfd: 4 (not in set)
  567.277465: UL: delayed: [...emulation event...]
  567.277494: UL: sent   : [...emulation event...]
  567.297236: DL: delayed: [...emulation event...]
To select 1177577267.513401 timeout: 0.014452 trackfd: 4 (in set)
Out select 1177577267.513406 trackfd: 4 (not in set)
  567.297268: DL: sent   : [...emulation event...]
To select 1177577267.513431 timeout: 0.014422 trackfd: 4 (in set)
Out select 1177577267.529751 trackfd: 4 (not in set)
Read at 1177577267.534294 fromfd: 4 (had 0, tried 2048, got more 52), while in busywait: 1
  567.318147: DL: pktinfo: [...full 1460+52 bytes finally received...]


The first part of the 1460+52 bytes is received by the emulator process at 
1177577267.365936 while the 52 bytes is enourmously delayed and arrives at 
1177577267.534294 (=168ms later!?!). During this time the emulator process 
is able to sleep in select 11.8 + 79.6 + 16.3 ms + does tons of calls to 
select during busy-wait with timeout being set to zero (not shown) and the 
packet just does not arrive?!? Also the tcpdump above shows the delay, the 
ack field at 1177577267.493815 does not yet show the latest segment though 
it already had more than 120 ms to reach TCP, that should be plenty of 
time, shouldn't it? :-)

I've tried once to run tcpdump on the emulator host too and the delay was 
also in there (tcpdump shows the received packet with long delay, it was
something above 100ms that time too).


...I'm unsure how to continue the investigation from this point onward 
and asking for ideas/suggestions or how to rule out more possibilities... 
Or is there some knob which I don't know of that should be toggled or 
something, is 2.6 network stack expected to behave this way?



About the workload & TCP configuration:

The host in question runs an wireless emulator process, other services are
carefully shutdown to avoid any interference. During tests s/utimes of the 
other, remaining, unrelated processes do not increase (I'm not qualified 
to say that this proves that they did not run at all, anyone can verify 
this? Nor that nothing else was running). Also sshd was started only from 
inetd because it would periodically recreate its keys causing latency 
spikes. The emulator uses select + busy-wait combination to achieve less 
than millisecond accuracy in timing (since wireless links I don't have 
very large data rates, the small inaccuracies that obviously will occur 
are known to exist and are under monitoring). The host is p4 (ht enabled). 
Other emulator components are connected to the emulator using a 
bidirectional TCP flow (basically the packet capture + forward to the 
emulator host process). The packet flow to (but not from) the emulator has 
a flow control that sends credits using another TCP flow (32 bytes / flow 
control ack). All things use TCP_NODELAY. The network is isolated so that 
no other traffic can cause unexcepted effects. The emulator does collect 
log only to a mem buffer that is flushed through TCP only between tests 
(and thus does not cause timing problems).

Achieving a good timing between components required some tuning since flow 
control acks (on TCP level) and a part of the captured packets are less 
than full-sized as delayed acks would cause problems. To circumvent 
delayed ACK related issues, TCP initial windows on the emulator host and 
on the packet capture hosts are set to a large value to avoid cwnd limit. 
In addition, slow start after idle & application limited transfer things 
had to be disabled (2.6.18 changes include a sysctl similar to the commit 
15d33c070ddde99f7368e6b17b71d22c866f97d9 that is also included in the 
tested 2.6.20). This worked well with 2.4 but no longer with 2.6, I've no 
idea where the time during a latency spike is wasted with 2.6...


--
 i.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unexcepted latency (order of 100-200 ms) with TCP (packet receive)
  2007-04-26 10:20 Unexcepted latency (order of 100-200 ms) with TCP (packet receive) Ilpo Järvinen
@ 2007-04-26 16:16 ` Chuck Ebbert
  2007-04-26 19:30   ` Ilpo Järvinen
  2007-04-26 16:38 ` Rick Jones
  1 sibling, 1 reply; 9+ messages in thread
From: Chuck Ebbert @ 2007-04-26 16:16 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev

Ilpo Järvinen wrote:
> ...I'm unsure how to continue the investigation from this point onward 
> and asking for ideas/suggestions or how to rule out more possibilities... 
> Or is there some knob which I don't know of that should be toggled or 
> something, is 2.6 network stack expected to behave this way?
> 

Try a different network adapter.

Try turning off hardware TSO offload:
  ethtool -K ethX tso off


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unexcepted latency (order of 100-200 ms) with TCP (packet receive)
  2007-04-26 10:20 Unexcepted latency (order of 100-200 ms) with TCP (packet receive) Ilpo Järvinen
  2007-04-26 16:16 ` Chuck Ebbert
@ 2007-04-26 16:38 ` Rick Jones
  2007-04-26 19:18   ` Ilpo Järvinen
  1 sibling, 1 reply; 9+ messages in thread
From: Rick Jones @ 2007-04-26 16:38 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev

Ilpo Järvinen wrote:
> Hi,
> 
> ...
> Some time ago I noticed that with 2.6.18 I occassionally get latency
> spikes as long as 100-200ms in the TCP transfers between components
> (I describe later how TCP was tuned during these tests to avoid
> problems that occur with small segments). I started to investigate
> the spikes, and here are the findings so far:
> ...
>  - I placed a hub to get exact timings on the wire without potential 
>    interference from tcpdump on the emulator host (test done with 2.6.18) 
>    but to my great surprise, the problem vanished completely

Sounds like tcpdump getting in the way?  How many CPUs do you have in 
the system, and have you tried some explicit binding of processes to 
different CPUs? (taskset etc...)

When running tcpdump are you simply sending raw traces to a file, or are 
you having the ASCII redirected to a file?  What about name resolution 
(ie areyou using -n)?

>  - Due to the hub test result, I tested 10/100/duplex settings and found 
>    out that if the emulator host has 10fd, the problem does not occur with
>    2.6 either?!? This could be due to luck but I cannot say for sure, yet
>    the couple of tests I've run with 10fd, did not show this...
>  - Tried to change cable & switch that connect hosts together, no effect
> 
> 
> To prove this with 100Mbps, I setup routing so that with a host with 10/FD 
> configuration (known, based on earlier, to be unlikely to cause errors) I 
> collected all traffic between the emulator host and one of the packet 
> capture hosts. Here is one example point where a long delay occurs 
> (EMU is the emulator host, in the real log each packet is shown twice, I 
> only leave the later one here):
> 
> 1177577267.364596 IP CAP.35305 > EMU.52246: . 17231434:17232894(1460) ack 383357 win 16293
> 1177577267.364688 IP CAP.35305 > EMU.52246: P 17232894:17232946(52) ack 383357 win 16293
> 1177577267.366093 IP EMU.52246 > CAP.35305: . ack 17232894 win 32718
> 1177577267.493815 IP EMU.52246 > CAP.35305: P 383357:383379(22) ack 17232894 win 32718
> 1177577267.534252 IP CAP.35305 > EMU.52246: . ack 383379 win 16293

What is the length of the standalone ACK timer these days?

> 1177577267.534517 IP EMU.59050 > CAP.58452: P 624496:624528(32) ack 328 win 365
> 1177577267.534730 IP CAP.58452 > EMU.59050: . ack 624528 win 16293
> 1177577267.536267 IP CAP.35305 > EMU.52246: . 17232946:17234406(1460) ack 383379 win 16293
> 1177577267.536360 IP CAP.35305 > EMU.52246: P 17234406:17234458(52) ack 383379 win 16293
> 1177577267.537764 IP EMU.52246 > CAP.35305: . ack 17234406 win 32718
> ...
> All things use TCP_NODELAY. The network is isolated so that 
> no other traffic can cause unexcepted effects. The emulator does collect 
> log only to a mem buffer that is flushed through TCP only between tests 
> (and thus does not cause timing problems).

Might tcp_abc have crept back-in?

rick jones

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unexcepted latency (order of 100-200 ms) with TCP (packet receive)
  2007-04-26 16:38 ` Rick Jones
@ 2007-04-26 19:18   ` Ilpo Järvinen
  0 siblings, 0 replies; 9+ messages in thread
From: Ilpo Järvinen @ 2007-04-26 19:18 UTC (permalink / raw)
  To: Rick Jones; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6196 bytes --]

On Thu, 26 Apr 2007, Rick Jones wrote:

> Ilpo Järvinen wrote:
> > 
> > ...
> > Some time ago I noticed that with 2.6.18 I occassionally get latency
> > spikes as long as 100-200ms in the TCP transfers between components
> > (I describe later how TCP was tuned during these tests to avoid
> > problems that occur with small segments). I started to investigate
> > the spikes, and here are the findings so far:
> > ...
> >  - I placed a hub to get exact timings on the wire without potential 
> >    interference from tcpdump on the emulator host (test done with 2.6.18)
> >    but to my great surprise, the problem vanished completely
> 
> Sounds like tcpdump getting in the way? 

No, tcpdump is not running on the host in question at all (that is what 
I'm trying to say but my English might betray me). I just describe there 
that I made a testing configuration when (after) I noticed that this 
100-200ms problem exists without any tcpdumps. With hub, the network was 
configured like this (I tried to avoid additional linux sender):

packet capture host ---> switch ---> hub ---> emulation host
                                      |
                                      |
                                     \ /
                              host with tcpdump

(for the record: due to the emulator nature, the sender in this case 
is called "packet capture host" even though I don't mean that I capture 
these tcpdumps at it.)

Sadle enough, the problem disappeared with it (as the hub caused eth to 
become 10mbps network...). Because of that, the tcpdump I'm showing below 
is captured with this route though another host configuration (additional 
sender is there which I tried to completely avoid by using the above hub 
setup):


                host with tcpdump, ip_forward = 1
                   (10Mbps eth on purpose)
                          / \  |
                           |   |
                           |  \ /
packet capture host ---->  switch -----> emulation host

...the reason being that tcpdump on the emulation host could get into the 
way like you're saying. I run it once there too, and it saw the packet 
delayed by the 100-130ms which probably is valid occurance of this same 
problem (too large to be due to scheduling or so, IMHO) that occurs 
without either of the tcpdump configs presented here, when traffic just 
goes directly from packet capture host through the switch to emulation 
host.

> How many CPUs do you have in the system, and have you tried some 
> explicit binding of processes to different CPUs? (taskset etc...)

One p4. What processes do you mean, I've carefully disabled most of the 
services (please read the explination in the end of the original mail), 
and the s/utimes of the remaining (from proc/<pid>/stat) processes on the 
emulation host do not increase during the test... Is it still possible 
that they could have been scheduled but did not get s/utime incremented, 
I'm not qualified to say that but perhaps you or somebody else knows if 
it's so?

Besides, I've the very same configuration (no problem => without tcpdump) 
working perfectly with 2.4 and also with 2.6 when the emulation host eth 
is set to 10Mbps instead of 100Mbps using ethtool. Could 2.6 with 100Mbps 
eth really cause 100ms latency spikes, especially, considering that my 
emulated wireless link is just 384kbps, so the amount of data that is 
going past ain't much larger than that...

> When running tcpdump are you simply sending raw traces to a file, or are you
> having the ASCII redirected to a file?
> What about name resolution (ie areyou using -n)?

I used this command:
  tcpdump -i eth0 -w tracefile host 192.168.1.3 or host 192.168.1.2

All hosts in the network should be in /etc/hosts, but I can try with -n 
tomorrow but I doubt that it will make any difference because the problem 
is there regardless of the tcpdump. The very reason of the tcpdump being 
there is I just to debug the problem between the emulation host and the 
sender to see whether the problem is in the sender end or in the receiving 
end. 

> > To prove this with 100Mbps, I setup routing so that with a host with 10/FD
> > configuration (known, based on earlier, to be unlikely to cause errors) I
> > collected all traffic between the emulator host and one of the packet
> > capture hosts. Here is one example point where a long delay occurs (EMU is
> > the emulator host, in the real log each packet is shown twice, I only leave
> > the later one here):
> > 
> > 1177577267.364596 IP CAP.35305 > EMU.52246: . 17231434:17232894(1460) ack
> > 383357 win 16293
> > 1177577267.364688 IP CAP.35305 > EMU.52246: P 17232894:17232946(52) ack
> > 383357 win 16293
> > 1177577267.366093 IP EMU.52246 > CAP.35305: . ack 17232894 win 32718
> > 1177577267.493815 IP EMU.52246 > CAP.35305: P 383357:383379(22) ack 17232894
> > win 32718

At this point the segment with seq 17232946 hasn't yet reached TCP (to 
be more precise, when this was sent) because cumulatively acks only to 
17232894! That's 129 ms later than the segment with seq 17232946 was seen 
by the tcpdump host (a quite long time)...

> > 1177577267.534252 IP CAP.35305 > EMU.52246: . ack 383379 win 16293
> 
> What is the length of the standalone ACK timer these days?

Do you mean so called delayed ack timer?

kernels/linux/include$ grep "DELACK" */*
net/tcp.h:#define TCP_DELACK_MAX        ((unsigned)(HZ/5))      /* maximal 
time to delay before sending an ACK */
net/tcp.h:#define TCP_DELACK_MIN        ((unsigned)(HZ/25))     /* minimal 
time to delay before sending an ACK */
net/tcp.h:#define TCP_DELACK_MIN        4U


I suspect that it could be so that delayed ack is related in way as it 
is the thing that wakes up the adapter first time after the delay (when 
nothing else from the network arrives before the delack timer expires).

However, I'm quite sure this problem might be below TCP level because of 
tcpdump cumulative ACK seq in segment 1177577267.493815 which does not yet 
include the latest segment thatshould have already arrived (tcpdump saw it 
forwarded 129ms earlier).

> Might tcp_abc have crept back-in?

# cat /proc/sys/net/ipv4/tcp_abc
0


Thanks anyway for the ideas.

-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unexcepted latency (order of 100-200 ms) with TCP (packet receive)
  2007-04-26 16:16 ` Chuck Ebbert
@ 2007-04-26 19:30   ` Ilpo Järvinen
  2007-04-27  7:17     ` Bill Fink
  0 siblings, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2007-04-26 19:30 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 842 bytes --]

On Thu, 26 Apr 2007, Chuck Ebbert wrote:

> Ilpo Järvinen wrote:
> > ...I'm unsure how to continue the investigation from this point onward 
> > and asking for ideas/suggestions or how to rule out more possibilities... 
> > Or is there some knob which I don't know of that should be toggled or 
> > something, is 2.6 network stack expected to behave this way?
> > 
> 
> Try a different network adapter.

Hmm, I thought I had already done this but I just noticed that it is so 
that the adapter was still the same as the other host has two adapter (now 
that I look again). I'll give it a try tomorrow to see if using the 
another adapter makes any difference.

> Try turning off hardware TSO offload:
>   ethtool -K ethX tso off

# ethtool -K eth0 tso off
Cannot set device tcp segmentation offload settings: Operation not 
supported

-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unexcepted latency (order of 100-200 ms) with TCP (packet receive)
  2007-04-26 19:30   ` Ilpo Järvinen
@ 2007-04-27  7:17     ` Bill Fink
  2007-04-27 15:07       ` Ilpo Järvinen
  0 siblings, 1 reply; 9+ messages in thread
From: Bill Fink @ 2007-04-27  7:17 UTC (permalink / raw)
  To: Ilpo Järvinen ; +Cc: Chuck Ebbert, Netdev

On Thu, 26 Apr 2007, Ilpo Järvinen wrote:

> On Thu, 26 Apr 2007, Chuck Ebbert wrote:
> 
> > Ilpo Järvinen wrote:
> > > ...I'm unsure how to continue the investigation from this point onward 
> > > and asking for ideas/suggestions or how to rule out more possibilities... 
> > > Or is there some knob which I don't know of that should be toggled or 
> > > something, is 2.6 network stack expected to behave this way?
> > > 
> > 
> > Try a different network adapter.
> 
> Hmm, I thought I had already done this but I just noticed that it is so 
> that the adapter was still the same as the other host has two adapter (now 
> that I look again). I'll give it a try tomorrow to see if using the 
> another adapter makes any difference.
> 
> > Try turning off hardware TSO offload:
> >   ethtool -K ethX tso off
> 
> # ethtool -K eth0 tso off
> Cannot set device tcp segmentation offload settings: Operation not 
> supported

Could the delays be caused by Ethernet PAUSE frames (which might not
be rquired at the slower 10FD but might at 100)?  TX and RX pause
control settings (check with "ethtool -a") might be different between
the 2.4 and 2.6 kernels.

Also check things like net.core.netdev_max_backlog and ifconfig
txqueuelen settings.  And check "ethtool -S", "netstat -s", and
ifconfig error counters.

						-Bill

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unexcepted latency (order of 100-200 ms) with TCP (packet receive)
  2007-04-27  7:17     ` Bill Fink
@ 2007-04-27 15:07       ` Ilpo Järvinen
  2007-05-07 11:45         ` Ilpo Järvinen
  0 siblings, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2007-04-27 15:07 UTC (permalink / raw)
  To: Bill Fink; +Cc: Chuck Ebbert, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2264 bytes --]

On Fri, 27 Apr 2007, Bill Fink wrote:

> On Thu, 26 Apr 2007, Ilpo Järvinen wrote:
> 
> > On Thu, 26 Apr 2007, Chuck Ebbert wrote:
> > 
> > > Ilpo Järvinen wrote:
> > > > ...I'm unsure how to continue the investigation from this point onward 
> > > > and asking for ideas/suggestions or how to rule out more possibilities... 
> > > > Or is there some knob which I don't know of that should be toggled or 
> > > > something, is 2.6 network stack expected to behave this way?
> > > > 
> > > 
> > > Try a different network adapter.
> > 
> > Hmm, I thought I had already done this but I just noticed that it is so 
> > that the adapter was still the same as the other host has two adapter (now 
> > that I look again). I'll give it a try tomorrow to see if using the 
> > another adapter makes any difference.

...Much more promising result this time. I noticed that there was another 
eth hw on mainboard, thus my previous test with different hw was not 
valid as I assumed "wrong" (didn't even notice the other) one to be eth0:

02:05.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
02:08.0 Ethernet controller: Intel Corporation 82801DB PRO/100 VE (LOM) Ethernet Controller (rev 82)

With 3c905 I have the problem, with Intel one it does not show up 
(tested today).

> > > Try turning off hardware TSO offload:
> > >   ethtool -K ethX tso off
> > 
> > # ethtool -K eth0 tso off
> > Cannot set device tcp segmentation offload settings: Operation not 
> > supported
> 
> Could the delays be caused by Ethernet PAUSE frames (which might not
> be rquired at the slower 10FD but might at 100)?  TX and RX pause
> control settings (check with "ethtool -a") might be different between
> the 2.4 and 2.6 kernels.

# ethtool -a eth0
Pause parameters for eth0:
Cannot get device pause settings: Operation not supported


> Also check things like net.core.netdev_max_backlog and ifconfig
> txqueuelen settings. 

# cat /proc/sys/net/core/netdev_max_backlog 
1000

...and...

txqueuelen:1000

> And check "ethtool -S", "netstat -s", and ifconfig error counters.

Nothigh really alarming was found, errors were all zero, only thing that 
could be even remotely interesting is this:
    5 delayed acks further delayed because of locked socket


-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unexcepted latency (order of 100-200 ms) with TCP (packet receive)
  2007-04-27 15:07       ` Ilpo Järvinen
@ 2007-05-07 11:45         ` Ilpo Järvinen
  2007-05-09  7:03           ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2007-05-07 11:45 UTC (permalink / raw)
  To: John W. Linville, Andrew Morton, Jeff Garzik
  Cc: Bill Fink, Chuck Ebbert, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3179 bytes --]

On Fri, 27 Apr 2007, Ilpo Järvinen wrote:
> > On Thu, 26 Apr 2007, Ilpo Järvinen wrote:
> > > On Thu, 26 Apr 2007, Chuck Ebbert wrote:
> > > > 
> > > > Try a different network adapter.
> > > 
> > > Hmm, I thought I had already done this but I just noticed that it is so 
> > > that the adapter was still the same as the other host has two adapter (now 
> > > that I look again). I'll give it a try tomorrow to see if using the 
> > > another adapter makes any difference.
> 
> ...Much more promising result this time. I noticed that there was another 
> eth hw on mainboard, thus my previous test with different hw was not 
> valid as I assumed "wrong" (didn't even notice the other) one to be eth0:
> 
> 02:05.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
> 02:08.0 Ethernet controller: Intel Corporation 82801DB PRO/100 VE (LOM) Ethernet Controller (rev 82)
> 
> With 3c905 I have the problem, with Intel one it does not show up 
> (tested today).

I found out this shows up when 900fd17dd01d2c99dfd1ec0b53a860894a2673ee is 
included, kernels before that seem to work fine. The problem description 
is in this same thread. I'm not going to repeat it here as it is valid 
except for the fact that my original claim that it happens with another 
hardware is false, please see it in here:

http://marc.info/?l=linux-netdev&m=117758295411277&w=2


Should these delays I see be considered as evindence of mmio not working 
with this card/model or could there be something else wrong somewhere? 
...I can just disable it using global_use_mmio parameter, which removes 
the problem in 2.6.20.7 (tested) but just in case somebody has more clue 
than I do about this, I'm willing to do more testing...


This output comes from 2.6.20.7:


[   15.553104] 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
[   15.553214] 0000:02:05.0: 3Com PCI 3c905C Tornado at f8800c00.

02:05.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
        Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC
        Flags: bus master, medium devsel, latency 32, IRQ 18
        I/O ports at dc00 [size=128]
        Memory at feaffc00 (32-bit, non-prefetchable) [size=128]
        Expansion ROM at f5700000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
00: b7 10 00 92 17 01 10 02 74 00 00 02 08 20 00 00
10: 01 dc 00 00 00 fc af fe 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 b7 10 00 10
30: 00 00 ac fe dc 00 00 00 00 00 00 00 09 01 0a 0a

# cat /proc/interrupts 
           CPU0       
  0:     126277   IO-APIC-edge      timer
  1:          8   IO-APIC-edge      i8042
  2:          0    XT-PIC-XT        cascade
  6:          5   IO-APIC-edge      floppy
 12:          4   IO-APIC-edge      i8042
 14:       3392   IO-APIC-edge      ide0
 15:          0   IO-APIC-edge      ide1
 18:       1269   IO-APIC-fasteoi   eth0
 20:          4   IO-APIC-fasteoi   eth1
NMI:          0 
LOC:     126238 
ERR:          0
MIS:          0

eth0 is the 3c59x, eth1 is the intel.

Network is 100/Full duplex, switched. No additional parameters are given 
to 3c59x.



-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unexcepted latency (order of 100-200 ms) with TCP (packet receive)
  2007-05-07 11:45         ` Ilpo Järvinen
@ 2007-05-09  7:03           ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2007-05-09  7:03 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: linville, akpm, jeff, billfink, cebbert, netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Mon, 7 May 2007 14:45:53 +0300 (EEST)

> I found out this shows up when 900fd17dd01d2c99dfd1ec0b53a860894a2673ee is 
> included, kernels before that seem to work fine. The problem description 
> is in this same thread. I'm not going to repeat it here as it is valid 
> except for the fact that my original claim that it happens with another 
> hardware is false, please see it in here:
> 
> http://marc.info/?l=linux-netdev&m=117758295411277&w=2
> 
> Should these delays I see be considered as evindence of mmio not working 
> with this card/model or could there be something else wrong somewhere? 
> ...I can just disable it using global_use_mmio parameter, which removes 
> the problem in 2.6.20.7 (tested) but just in case somebody has more clue 
> than I do about this, I'm willing to do more testing...

One thing that MMIO changes a lot is timing.

The port I/O instructions on x86 have fixed timings so take the same
amount of time to execute regardless of the underlying bus
capabilities.

This could have been masking some issue in the driver or the hardware.

Port I/O also does not have any write posting issues like MMIO does.

In my opinion it can only cause trouble to start using MMIO on such
old chips when we've been using port I/O to access them for 5+ years.
:-)

Wait, it's timing, I see the bug.  There was a similar problem like
this in the Linux floppy driver some 6 years ago.

Look at issue_and_wait(), how it polls:

	iowrite16(cmd, ioaddr + EL3_CMD);
	for (i = 0; i < 2000; i++) {
		if (!(ioread16(ioaddr + EL3_STATUS) & CmdInProgress))
			return;
	}

That takes longer to run with port I/O than MMIO.  So I bet it
breaks out of the loop faster with MMIO and thus can trigger
this thing:

	/* OK, that didn't work.  Do it the slow way.  One second */
	for (i = 0; i < 100000; i++) {
		if (!(ioread16(ioaddr + EL3_STATUS) & CmdInProgress)) {
			if (vortex_debug > 1)
				printk(KERN_INFO "%s: command 0x%04x took %d usecs\n",
					   dev->name, cmd, i * 10);
			return;
		}
		udelay(10);
	}

and that's where your delays are coming from.

I would suggest adding some kind of (very small) fixed delay to the
first loop.

The rest of the driver should be audited for this kind of problem.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-05-09  7:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-26 10:20 Unexcepted latency (order of 100-200 ms) with TCP (packet receive) Ilpo Järvinen
2007-04-26 16:16 ` Chuck Ebbert
2007-04-26 19:30   ` Ilpo Järvinen
2007-04-27  7:17     ` Bill Fink
2007-04-27 15:07       ` Ilpo Järvinen
2007-05-07 11:45         ` Ilpo Järvinen
2007-05-09  7:03           ` David Miller
2007-04-26 16:38 ` Rick Jones
2007-04-26 19:18   ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).