public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* nfs problem: hp-server --- linux 2.4.13 client, ooops
@ 2001-11-15 21:29 Birger Lammering
  2001-11-15 23:58 ` Trond Myklebust
  0 siblings, 1 reply; 29+ messages in thread
From: Birger Lammering @ 2001-11-15 21:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: birger.lammering

Hi there,

I've got an interesting problem for people working on nfs:

Kernel version: 2.4.13 (up and smp), Trond's seekdir patch, ext3 patch
Hardware: P4 1700 (1 and 2), 2 GB RAM.
Compiled with RedHat 7.1 gcc (2.96-85)
nfs-Server: HP-UX 10.20

Without the nfs patch I had a lot of trouble with Irix servers - now Irix,
NetApp, Aix, Linux... nfs- servers are fine, but connecting to a HP server
leads to statements like this in the syslog:
NFS: short packet in readdir reply!

Even worse: Accessing one particular directory (using ls, find etc) leads
to a segmentation fault in the accessing process. The machine stays up and
running, but some other processes (without relationship to nfs mount) die a
sudden death as well.

There is nothing special about the directory on the nfs server.
There is no problem with Linux  2.2.14-5.0smp. I can check with other 2.4
versions if necessary. The problem is independent of the nfs version.

The syslog says:


Unable to handle kernel paging request at virtual address fe01e000
 printing eip:
f898eaed
*pde = 00004063
*pte = 00000000
Oops: 0000
CPU:    1
EIP:    0010:[<f898eaed>]    Not tainted
EFLAGS: 00010296
eax: 01000000   ebx: fe01dff8   ecx: fe01e000   edx: 00000019
esi: fe01e000   edi: 0000006c   ebp: f537405c   esp: f4e83c60
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 11091, stackpage=f4e83000)
Stack: f4e83cec f518c200 f898ea4c f8956e07 f537405c f6a15240 f4e83d94
f4e83d40 
       f4e83d54 f4e83cac 00000002 f895a18c f4e83cec f4e83cec fffffff5
f4e83cec 
       f518c200 f4e82000 f4e82000 00000000 f4e82000 f4e83d58 f4e83d58
f895a498 
Call Trace: [<f898ea4c>] [<f8956e07>] [<f895a18c>] [<f895a498>]
[<f895646e>] 
   [<f8997520>] [<f89594e0>] [<f898e09e>] [<f8997520>] [<f898b89a>]
[<c01299e8>] 
   [<f898bc2b>] [<f898b7c0>] [<f898eb1c>] [<c0142bb4>] [<c0142f90>]
[<c01430f3>]
 
   [<c0142f90>] [<c0125838>] [<c0106f7b>] 

Code: 8b 11 0f ca 8d 42 03 24 fc 8d 4c 01 08 81 fa ff 00 00 00 77 
 <1>Unable to handle kernel NULL pointer dereference at virtual address
00000000
 printing eip:
00000000
*pde = 00000000
Oops: 0000
CPU:    1
EIP:    0010:[<00000000>]    Not tainted
EFLAGS: 00010286
eax: 00000000   ebx: f6ff2480   ecx: 00000000   edx: 00000000
esi: f6f49b80   edi: f4e83d24   ebp: f7308180   esp: f4e83d00
ds: 0018   es: 0018   ss: 0018
Process gcc (pid: 11097, stackpage=f4e83000)
Stack: f7308180 f6ff24bc f4e83d90 f4e83d24 f6ff2480 00000000 f4e83e34
f72fae00 
       f4e83d90 00010002 00000000 00000000 00000000 00000000 00000000
00000000 
       00000005 0000a1ff 00000001 00000000 00000000 0000001c 00000000
00001000 
Call Trace: [<c013e3de>] [<c013e850>] [<c013ee1a>] [<c013c65e>]
[<c013d13e>] 
   [<f898eb1c>] [<c0142bb4>] [<c0141f0a>] [<c013e14d>] [<c0105b8f>]
[<c0106f7b>]
 

Code:  Bad EIP value.
 <1>Unable to handle kernel NULL pointer dereference at virtual address
00000000
 printing eip:
00000000
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<00000000>]    Not tainted
EFLAGS: 00010286
eax: 00000000   ebx: f6ff2480   ecx: 00000000   edx: 00000000
esi: f6f49b80   edi: f4e83d24   ebp: f7308180   esp: f4e83d00
ds: 0018   es: 0018   ss: 0018
Process gcc (pid: 11139, stackpage=f4e83000)
Stack: f7308180 f6ff24bc f4e83d90 f4e83d24 f6ff2480 00000000 f4e83e34
f72fae00 
       f4e83d90 08060002 f4e82000 f6fe4100 00001812 00000000 080603ec
080600a0 
       00000005 0000a1ff 00000001 00000000 00000000 0000001c 00000000
00001000 
Call Trace: [<c013e3de>] [<c013e850>] [<c013ee1a>] [<c013c65e>]
[<c013d13e>] 
   [<c012ee22>] [<c01201e5>] [<c01202aa>] [<c0116da1>] [<c0141f0a>]
[<c013e14d>]
 
   [<c0105b8f>] [<c0106f7b>] 

Code:  Bad EIP value.

If you are interested in more details or even better have a solution,
please also cc email to my account, since I'm only a casual spectator at
this list.

Cheers,
Birger

--
Dr. Birger Lammering                         
science+computing ag                        Tel: 089 356386-15
Geschaeftsstelle Muenchen                   Fax: 089 356386-37
Ingolstaedter Str. 22
80807 Muenchen         email:  Birger.Lammering(at)partner.bmw.de
                              B.Lammering(at)science-computing.de


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: nfs problem: hp-server --- linux 2.4.13 client, ooops
  2001-11-15 21:29 nfs problem: hp-server --- linux 2.4.13 client, ooops Birger Lammering
@ 2001-11-15 23:58 ` Trond Myklebust
  2001-11-16 11:24   ` nfs problem: hp|aix-server --- linux 2.4.15pre5 client Birger Lammering
  0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2001-11-15 23:58 UTC (permalink / raw)
  To: Birger Lammering; +Cc: linux-kernel, birger.lammering

>>>>> " " == Birger Lammering <lammering@planet-interkom.de> writes:

     > Without the nfs patch I had a lot of trouble with Irix servers
     > - now Irix, NetApp, Aix, Linux... nfs- servers are fine, but
     > connecting to a HP server leads to statements like this in the
     > syslog: NFS: short packet in readdir reply!

<snip>
     > Unable to handle kernel paging request at virtual address
     > fe01e000
     >  printing eip:
     > f898eaed *pde = 00004063 *pte = 00000000 Oops: 0000 CPU: 1 EIP:
     > 0010:[<f898eaed>] Not tainted EFLAGS: 00010296 eax: 01000000
     > ebx: fe01dff8 ecx: fe01e000 edx: 00000019 esi: fe01e000 edi:
     > 0000006c ebp: f537405c esp: f4e83c60 ds: 0018 es: 0018 ss: 0018
     > Process ls (pid: 11091, stackpage=f4e83000) Stack: f4e83cec
     > f518c200 f898ea4c f8956e07 f537405c f6a15240 f4e83d94 f4e83d40
     >        f4e83d54 f4e83cac 00000002 f895a18c f4e83cec f4e83cec
     >        fffffff5
     > f4e83cec
     >        f518c200 f4e82000 f4e82000 00000000 f4e82000 f4e83d58
     >        f4e83d58
     > f895a498 Call Trace: [<f898ea4c>] [<f8956e07>] [<f895a18c>]
     > [<f895a498>] [<f895646e>]
     >    [<f8997520>] [<f89594e0>] [<f898e09e>] [<f8997520>]
     >    [<f898b89a>]
     > [<c01299e8>]
     >    [<f898bc2b>] [<f898b7c0>] [<f898eb1c>] [<c0142bb4>]
     >    [<c0142f90>]
     > [<c01430f3>]
 
That particular Oops should already be fixed in 2.4.14.

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* nfs problem: hp|aix-server --- linux 2.4.15pre5 client
  2001-11-15 23:58 ` Trond Myklebust
@ 2001-11-16 11:24   ` Birger Lammering
  2001-11-16 11:45     ` Trond Myklebust
  0 siblings, 1 reply; 29+ messages in thread
From: Birger Lammering @ 2001-11-16 11:24 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel

Hi Trond,

Trond Myklebust writes:
 >      > [<c01430f3>]
 >  
 > That particular Oops should already be fixed in 2.4.14.

Thanks, I've tried 2.4.15pre5 (+ seekdir patch) now and could not
reproduce the Ooops with the HP nfs server. But still: The Kernel
complains about: "NFS: short packet in readdir reply!" when I access
any directory provided by a HP-UX 10.20 nfs-server (using nfs2 |
nfs3). I dont' notice any strange behaviour other than that, though.

But a new problem emerged: Copying from a linux (2.4.13 | 2.4.15pre5)
nfs-client onto an AIX nfs-server doesn't work. About 800kb are copied
then the cp command just hangs. Syslog says:
Nov 16 11:51:12 capc20 kernel: nfs: server caes04 not responding, still trying
This problem only occures with nfs3, and not with nfs2.

Is there a cure for this, without being too experimental? 

Cheers,
Birger

ps: It seems to be quite difficult to get (and surely to write :-)
nfs3-clients that work with all thinkable other platforms - we not
only have nfs problems with Linux-clients in out network...


 > 
 > Cheers,
 >    Trond
 > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* nfs problem: hp|aix-server --- linux 2.4.15pre5 client
  2001-11-16 11:24   ` nfs problem: hp|aix-server --- linux 2.4.15pre5 client Birger Lammering
@ 2001-11-16 11:45     ` Trond Myklebust
  2001-11-16 12:01       ` Miquel van Smoorenburg
  2001-11-16 13:19       ` nfs problem: aix-server " Birger Lammering
  0 siblings, 2 replies; 29+ messages in thread
From: Trond Myklebust @ 2001-11-16 11:45 UTC (permalink / raw)
  To: Birger Lammering; +Cc: linux-kernel

>>>>> " " == Birger Lammering <b.lammering@science-computing.de> writes:

     > not reproduce the Ooops with the HP nfs server. But still: The
     > Kernel complains about: "NFS: short packet in readdir reply!" 
     > when I access any directory provided by a HP-UX 10.20
     > nfs-server (using nfs2 | nfs3). I dont' notice any strange
     > behaviour other than that, though.

That's because the HP is returning a READDIR reply that is larger than
the buffer size we specified. When this happens, we truncate the reply
at the last valid record before the buffer overflow, and print out the
above message.

     > But a new problem emerged: Copying from a linux (2.4.13 |
     > 2.4.15pre5) nfs-client onto an AIX nfs-server doesn't
     > work. About 800kb are copied then the cp command just
     > hangs. Syslog says: Nov 16 11:51:12 capc20 kernel: nfs: server
     > caes04 not responding, still trying This problem only occures
     > with nfs3, and not with nfs2.

     > Is there a cure for this, without being too experimental?

No idea. That depends on where the problem lies...
Do you have a tcpdump for me? Preferably one from the client and one
from the server (showing the same period of time).

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: nfs problem: hp|aix-server --- linux 2.4.15pre5 client
  2001-11-16 11:45     ` Trond Myklebust
@ 2001-11-16 12:01       ` Miquel van Smoorenburg
  2001-11-16 12:24         ` Trond Myklebust
  2001-11-16 13:19       ` nfs problem: aix-server " Birger Lammering
  1 sibling, 1 reply; 29+ messages in thread
From: Miquel van Smoorenburg @ 2001-11-16 12:01 UTC (permalink / raw)
  To: linux-kernel

In article <15348.64613.465429.628445@charged.uio.no>,
Trond Myklebust  <trond.myklebust@fys.uio.no> wrote:
>>>>>> " " == Birger Lammering <b.lammering@science-computing.de> writes:
>
>     > not reproduce the Ooops with the HP nfs server. But still: The
>     > Kernel complains about: "NFS: short packet in readdir reply!" 
>
>That's because the HP is returning a READDIR reply that is larger than
>the buffer size we specified. When this happens, we truncate the reply
>at the last valid record before the buffer overflow, and print out the
>above message.

Shouldn't the message then be "NFS: too large packet in readdir reply!" ?

Mike.
-- 
"Only two things are infinite, the universe and human stupidity,
 and I'm not sure about the former" -- Albert Einstein.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: nfs problem: hp|aix-server --- linux 2.4.15pre5 client
  2001-11-16 12:01       ` Miquel van Smoorenburg
@ 2001-11-16 12:24         ` Trond Myklebust
  0 siblings, 0 replies; 29+ messages in thread
From: Trond Myklebust @ 2001-11-16 12:24 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-kernel

>>>>> " " == Miquel van Smoorenburg <miquels@cistron-office.nl> writes:

    >> That's because the HP is returning a READDIR reply that is
    >> larger than the buffer size we specified. When this happens, we
    >> truncate the reply at the last valid record before the buffer
    >> overflow, and print out the above message.

     > Shouldn't the message then be "NFS: too large packet in readdir
     > reply!" ?

8) Or 'NFS: truncated packet in readdir reply!', since that is what
NFS actually is returned by the RPC layer.

When we are sure that the code is stable, the whole message can go. It
is really just reporting a server error. As long as we handle it
correctly, there should be no need to churn out all these printks.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* nfs problem: aix-server --- linux 2.4.15pre5 client
  2001-11-16 11:45     ` Trond Myklebust
  2001-11-16 12:01       ` Miquel van Smoorenburg
@ 2001-11-16 13:19       ` Birger Lammering
       [not found]         ` <15349.39320.876188.274548@charged.uio.no>
  1 sibling, 1 reply; 29+ messages in thread
From: Birger Lammering @ 2001-11-16 13:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-kernel

Hi,

Trond Myklebust writes:
 >      > But a new problem emerged: Copying from a linux (2.4.13 |
 >      > 2.4.15pre5) nfs-client onto an AIX nfs-server doesn't
 >      > work. About 800kb are copied then the cp command just
 >      > hangs. Syslog says: Nov 16 11:51:12 capc20 kernel: nfs: server
 >      > caes04 not responding, still trying This problem only occures
 >      > with nfs3, and not with nfs2.
 > 
 >      > Is there a cure for this, without being too experimental?
 > 
 > No idea. That depends on where the problem lies...

I should have known that. ;-)

 > Do you have a tcpdump for me? Preferably one from the client and one
 > from the server (showing the same period of time).

The tcpdump was taken while copying from the Linux nfs3-Client onto
the Aix nfs3-Server:

tcpdump on AIX (caes04):
13:47:28.337776317 truncated-ip - 18 bytes missing!capc25.muc.796 > caes04.muc.shilp: P 2059179904:2059180060(156) ack 4022052897 win 17520 (DF)
13:47:28.337860266 caes04.muc.shilp > capc25.muc.796: P 1:121(120) ack 156 win 60032
13:47:28.343619224 capc25.muc.796 > caes04.muc.shilp: . ack 121 win 17520 (DF)
13:47:28.344042473 truncated-ip - 50 bytes missing!capc25.muc.796 > caes04.muc.shilp: P 156:344(188) ack 121 win 17520 (DF)
13:47:28.357398139 truncated-ip - 138 bytes missing!caes04.muc.shilp > capc25.muc.796: P 121:397(276) ack 344 win 60032
13:47:28.364496982 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 344:1804(1460) ack 397 win 17520 (DF)
13:47:28.364872917 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 1804:3264(1460) ack 397 win 17520 (DF)
13:47:28.365142046 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 3264:4724(1460) ack 397 win 17520 (DF)
13:47:28.482136300 caes04.muc.shilp > capc25.muc.796: . ack 4724 win 55652
13:47:28.489517784 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 4724:6184(1460) ack 397 win 17520 (DF)
13:47:28.490018916 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 6184:7644(1460) ack 397 win 17520 (DF)
13:47:28.490313868 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 7644:9104(1460) ack 397 win 17520 (DF)
13:47:28.490566573 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 9104:10564(1460) ack 397 win 17520 (DF)
13:47:28.685246999 caes04.muc.shilp > capc25.muc.796: . ack 2059190468 win 49812

and on Linux (capc25):
13:47:28.324042 > capc25.muc.796 > caes04.muc.nfs: P 2059179904:2059180060(156) ack 4022052897 win 17520 (DF)
13:47:28.330599 < caes04.muc.nfs > capc25.muc.796: P 1:121(120) ack 156 win 60032
13:47:28.330620 > capc25.muc.796 > caes04.muc.nfs: . 156:156(0) ack 121 win 17520 (DF)
13:47:28.330857 > capc25.muc.796 > caes04.muc.nfs: P 156:344(188) ack 121 win 17520 (DF)
13:47:28.350291 < caes04.muc.nfs > capc25.muc.796: P 121:397(276) ack 344 win 60032
13:47:28.350556 > capc25.muc.796 > caes04.muc.nfs: . 344:1804(1460) ack 397 win 17520 (DF)
13:47:28.350569 > capc25.muc.796 > caes04.muc.nfs: . 1804:3264(1460) ack 397 win 17520 (DF)
13:47:28.350581 > capc25.muc.796 > caes04.muc.nfs: . 3264:4724(1460) ack 397 win 17520 (DF)
13:47:28.475691 < caes04.muc.nfs > capc25.muc.796: . 397:397(0) ack 4724 win 55652
13:47:28.475724 > capc25.muc.796 > caes04.muc.nfs: . 4724:6184(1460) ack 397 win 17520 (DF)
13:47:28.475734 > capc25.muc.796 > caes04.muc.nfs: . 6184:7644(1460) ack 397 win 17520 (DF)
13:47:28.475743 > capc25.muc.796 > caes04.muc.nfs: . 7644:9104(1460) ack 397 win 17520 (DF)
13:47:28.475752 > capc25.muc.796 > caes04.muc.nfs: . 9104:10564(1460) ack 397 win 17520 (DF)
13:47:28.678570 < caes04.muc.nfs > capc25.muc.796: . 397:397(0) ack 10564 win 49812
13:47:28.678604 > capc25.muc.796 > caes04.muc.nfs: . 10564:12024(1460) ack 397 win 17520 (DF)
13:47:28.678614 > capc25.muc.796 > caes04.muc.nfs: . 12024:13484(1460) ack 397 win 17520 (DF)
13:47:28.678623 > capc25.muc.796 > caes04.muc.nfs: . 13484:14944(1460) ack 397 win 17520 (DF)
13:47:28.678632 > capc25.muc.796 > caes04.muc.nfs: . 14944:16404(1460) ack 397 win 17520 (DF)
13:47:28.678642 > capc25.muc.796 > caes04.muc.nfs: . 16404:17864(1460) ack 397 win 17520 (DF)
13:47:28.884628 < caes04.muc.nfs > capc25.muc.796: . 397:397(0) ack 17864 win 42512


I can send you the whole dump if you think it's helpful.

The Linux settings:
Linux capc20 2.4.15pre5-sc1 #1 Fri Nov 16 10:26:21 CET 2001 i686 unknown
 
Gnu C                  2.96
Gnu make               3.79.1
binutils               2.10.91.0.2
util-linux             2.11f
mount                  2.11g
modutils               2.4.6
e2fsprogs              1.23
PPP                    2.4.0
Linux C Library        2.2.4
Dynamic linker (ldd)   2.2.4
Procps                 2.0.7
Net-tools              1.57
Console-tools          0.3.3
Sh-utils               2.0
Modules Loaded         firegl23 3c59x usb-uhci usbcore

Cheers,
Birger

 > Cheers,
 >    Trond
 > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* IPV4 socket layer,  was: nfs problem: aix-server --- linux 2.4.15pre5 client
       [not found]         ` <15349.39320.876188.274548@charged.uio.no>
@ 2001-11-19 10:20           ` Birger Lammering
  2001-11-19 11:16             ` Alan Cox
  0 siblings, 1 reply; 29+ messages in thread
From: Birger Lammering @ 2001-11-19 10:20 UTC (permalink / raw)
  To: linux-kernel

Hi,

there seems to be a problem with (at least) Linux 2.4.13, 2.4.15pre3
and network connections to AIX.

Trond thinks, it's not in the nfs3 bit and I found the problem with
these driver/nic combinations:
1. e100 driver (from Intel) on a EtherExpress Pro 100 
2. kernel-3c59x on a 3Com 905C
3. recent Donald Becker-3c59x on a 3Com 905C.

on 10Mbit/HalfDuplex and 100MBit/FullDuplex


So it seems to be somewhere in between (IPV4 socket layer???).

Here is what happens:

When copying a file of >900kb into a nfs3-exported Directory of an AIX
nfs3-Server we get on the second attempt:

nfs: server camc1083 not responding, still trying

and after a while:

nfs: server camc1083 OK

The tcpdump during a copy looks like this:
tcpdump on AIX (caes04):
13:47:28.337776317 truncated-ip - 18 bytes missing!capc25.muc.796 > caes04.muc.shilp: P 2059179904:2059180060(156) ack 4022052897 win 17520 (DF)
13:47:28.337860266 caes04.muc.shilp > capc25.muc.796: P 1:121(120) ack 156 win 60032
13:47:28.343619224 capc25.muc.796 > caes04.muc.shilp: . ack 121 win 17520 (DF)
13:47:28.344042473 truncated-ip - 50 bytes missing!capc25.muc.796 > caes04.muc.shilp: P 156:344(188) ack 121 win 17520 (DF)
13:47:28.357398139 truncated-ip - 138 bytes missing!caes04.muc.shilp > capc25.muc.796: P 121:397(276) ack 344 win 60032
13:47:28.364496982 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 344:1804(1460) ack 397 win 17520 (DF)
13:47:28.364872917 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 1804:3264(1460) ack 397 win 17520 (DF)
13:47:28.365142046 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 3264:4724(1460) ack 397 win 17520 (DF)
13:47:28.482136300 caes04.muc.shilp > capc25.muc.796: . ack 4724 win 55652
13:47:28.489517784 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 4724:6184(1460) ack 397 win 17520 (DF)
13:47:28.490018916 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 6184:7644(1460) ack 397 win 17520 (DF)
13:47:28.490313868 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 7644:9104(1460) ack 397 win 17520 (DF)
13:47:28.490566573 truncated-ip - 1322 bytes missing!capc25.muc.796 > caes04.muc.shilp: . 9104:10564(1460) ack 397 win 17520 (DF)
13:47:28.685246999 caes04.muc.shilp > capc25.muc.796: . ack 2059190468 win 49812

and on Linux (capc25):
13:47:28.324042 > capc25.muc.796 > caes04.muc.nfs: P 2059179904:2059180060(156) ack 4022052897 win 17520 (DF)
13:47:28.330599 < caes04.muc.nfs > capc25.muc.796: P 1:121(120) ack 156 win 60032
13:47:28.330620 > capc25.muc.796 > caes04.muc.nfs: . 156:156(0) ack 121 win 17520 (DF)
13:47:28.330857 > capc25.muc.796 > caes04.muc.nfs: P 156:344(188) ack 121 win 17520 (DF)
13:47:28.350291 < caes04.muc.nfs > capc25.muc.796: P 121:397(276) ack 344 win 60032
13:47:28.350556 > capc25.muc.796 > caes04.muc.nfs: . 344:1804(1460) ack 397 win 17520 (DF)
13:47:28.350569 > capc25.muc.796 > caes04.muc.nfs: . 1804:3264(1460) ack 397 win 17520 (DF)
13:47:28.350581 > capc25.muc.796 > caes04.muc.nfs: . 3264:4724(1460) ack 397 win 17520 (DF)
13:47:28.475691 < caes04.muc.nfs > capc25.muc.796: . 397:397(0) ack 4724 win 55652
13:47:28.475724 > capc25.muc.796 > caes04.muc.nfs: . 4724:6184(1460) ack 397 win 17520 (DF)
13:47:28.475734 > capc25.muc.796 > caes04.muc.nfs: . 6184:7644(1460) ack 397 win 17520 (DF)
13:47:28.475743 > capc25.muc.796 > caes04.muc.nfs: . 7644:9104(1460) ack 397 win 17520 (DF)
13:47:28.475752 > capc25.muc.796 > caes04.muc.nfs: . 9104:10564(1460) ack 397 win 17520 (DF)
13:47:28.678570 < caes04.muc.nfs > capc25.muc.796: . 397:397(0) ack 10564 win 49812
13:47:28.678604 > capc25.muc.796 > caes04.muc.nfs: . 10564:12024(1460) ack 397 win 17520 (DF)
13:47:28.678614 > capc25.muc.796 > caes04.muc.nfs: . 12024:13484(1460) ack 397 win 17520 (DF)
13:47:28.678623 > capc25.muc.796 > caes04.muc.nfs: . 13484:14944(1460) ack 397 win 17520 (DF)
13:47:28.678632 > capc25.muc.796 > caes04.muc.nfs: . 14944:16404(1460) ack 397 win 17520 (DF)
13:47:28.678642 > capc25.muc.796 > caes04.muc.nfs: . 16404:17864(1460) ack 397 win 17520 (DF)
13:47:28.884628 < caes04.muc.nfs > capc25.muc.796: . 397:397(0) ack 17864 win 42512

I have no clue on how to debug a IPV4 socket layer, so I'm grateful
for any comment on how to provide better information...

Regards,
Birger

Trond Myklebust writes:
 > Odd... That dump seems to indicate a problem with TCP in which the
 > server is seeing bad packets. That's not an NFS problem per se, as we
 > just use the standard socket functions. IOW it's either a bug in the
 > IPV4 socket layer or a bug with your network card driver.
 > 
 > Have you tried changing the networking card and/or its driver?
 > 
 > Cheers,
 >   Trond
 > 

ps: Trond: the email I sent earlier was written after too little
testing...

--
Dr. Birger Lammering                         
science+computing ag                        fon: 089 356386-15
Geschaeftsstelle Muenchen                   fax: 089 356386-37
Ingolstaedter Str. 22
D-80807 Muenchen         mailto:B.Lammering@science-computing.de


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: IPV4 socket layer,  was: nfs problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 10:20           ` IPV4 socket layer, was: " Birger Lammering
@ 2001-11-19 11:16             ` Alan Cox
  2001-11-19 16:37               ` more tcpdumpinfo for nfs3 " Birger Lammering
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Cox @ 2001-11-19 11:16 UTC (permalink / raw)
  To: Birger Lammering; +Cc: linux-kernel

> 13:47:28.337776317 truncated-ip - 18 bytes missing!capc25.muc.796 > caes04.muc.shilp: P 2059179904:2059180060(156) ack 4022052897 win 17520 (DF)

Right so someone is truncating frames. I'd start with the hub then work
outwards. 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 11:16             ` Alan Cox
@ 2001-11-19 16:37               ` Birger Lammering
  2001-11-19 18:22                 ` Trond Myklebust
  0 siblings, 1 reply; 29+ messages in thread
From: Birger Lammering @ 2001-11-19 16:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: trond.myklebust

Hi,

Alan Cox writes:
 > Right so someone is truncating frames. I'd start with the hub then work
 > outwards. 

tcpdump on the aix was truncating the frames. sorry, the snap length was
too small --- with using -s 64000 there were no truncated frames
anymore. 


Well, I guess (n.b.: GUESS) the problem is with nfs again...

This is a consistent reproducible bug. We have tested it on several
AIXen, Linuxen on different switched Networks.

After having learned a bit better how to use tcpdump, here we go:

nfs3 Client: Linux 2.4.15pre5 (capc25) 3com 905B / Intel EtherReal 100+
nfs3 Server: Aix 4.3 (caes04),  IBM SP2 / 7044-170
100MBit/FullDuplex
switched Network (Cisco Catalyst)



This is the first attempt of copying a 500k file onto the nfs server:


16:24:19.652672 > capc25.muc.654 > caes04.muc.sunrpc: udp 40 (DF)
16:24:19.655073 < caes04.muc.sunrpc > capc25.muc.654: udp 24
16:24:19.655119 > capc25.muc.655 > caes04.muc.sunrpc: udp 56 (DF)
16:24:19.656813 < caes04.muc.sunrpc > capc25.muc.655: udp 28
16:24:19.656853 > capc25.muc.656 > caes04.muc.nfs: S 2952133600:2952133600(0) win 5840 <mss 1460,sackOK,timestamp 2001001 0,nop,wscale 0> (DF)
16:24:19.657695 < caes04.muc.nfs > capc25.muc.656: S 4177984247:4177984247(0) ack 2952133601 win 59860 <mss 1460>
16:24:19.657711 > capc25.muc.656 > caes04.muc.nfs: . 1:1(0) ack 1 win 5840 (DF)
16:24:19.657746 > capc25.muc.656 > caes04.muc.nfs: P 1:45(44) ack 1 win 5840 (DF)
16:24:19.658653 < caes04.muc.nfs > capc25.muc.656: P 1:29(28) ack 45 win 60032
16:24:19.658666 > capc25.muc.656 > caes04.muc.nfs: . 45:45(0) ack 29 win 5840 (DF)
16:24:19.658702 > capc25.muc.656 > caes04.muc.nfs: F 45:45(0) ack 29 win 5840 (DF)
16:24:19.658749 > capc25.muc.657 > caes04.muc.sunrpc: udp 40 (DF)
16:24:19.659739 < caes04.muc.nfs > capc25.muc.656: . 29:29(0) ack 46 win 60032
16:24:19.660017 < caes04.muc.nfs > capc25.muc.656: F 29:29(0) ack 46 win 60032
16:24:19.660026 > capc25.muc.656 > caes04.muc.nfs: . 46:46(0) ack 30 win 5840 (DF)
16:24:19.660157 < caes04.muc.sunrpc > capc25.muc.657: udp 24
16:24:19.660199 > capc25.muc.658 > caes04.muc.sunrpc: udp 56 (DF)
16:24:19.662361 < caes04.muc.sunrpc > capc25.muc.658: udp 28
16:24:19.662393 > capc25.muc.50947429 > caes04.muc.nfs: 40 null (DF)
16:24:19.664439 < caes04.muc.nfs > capc25.muc.50947429: reply ok 24 null
16:24:19.665397 > capc25.muc.799 > caes04.muc.nfs: S 2955913379:2955913379(0) win 5840 <mss 1460,sackOK,timestamp 2001002 0,nop,wscale 0> (DF)
16:24:19.667274 < caes04.muc.nfs > capc25.muc.799: S 3186497216:3186497216(0) ack 2955913380 win 59860 <mss 1460>
16:24:19.667293 > capc25.muc.799 > caes04.muc.nfs: . 1:1(0) ack 1 win 5840 (DF)
16:24:19.667328 > capc25.muc.799 > caes04.muc.nfs: P 1:109(108) ack 1 win 5840 (DF)
16:24:19.669096 < caes04.muc.nfs > capc25.muc.799: P 1:117(116) ack 109 win 60032
16:24:19.669108 > capc25.muc.799 > caes04.muc.nfs: . 109:109(0) ack 117 win 5840 (DF)
16:24:19.669155 > capc25.muc.799 > caes04.muc.nfs: P 109:217(108) ack 117 win 5840 (DF)
16:24:19.670267 < caes04.muc.nfs > capc25.muc.799: P 117:289(172) ack 217 win 60032
16:24:19.670295 > capc25.muc.799 > caes04.muc.nfs: P 217:325(108) ack 289 win 6432 (DF)
16:24:19.671674 < caes04.muc.nfs > capc25.muc.799: P 289:457(168) ack 325 win 60032
16:24:19.673364 > capc25.muc.799 > caes04.muc.nfs: P 325:461(136) ack 457 win 6432 (DF)
16:24:19.675030 < caes04.muc.nfs > capc25.muc.799: P 457:701(244) ack 461 win 60032
16:24:19.675121 > capc25.muc.799 > caes04.muc.nfs: P 461:625(164) ack 701 win 7504 (DF)
16:24:19.693914 < caes04.muc.nfs > capc25.muc.799: P 701:849(148) ack 625 win 60032
16:24:19.694170 > capc25.muc.799 > caes04.muc.nfs: . 625:2085(1460) ack 849 win 7504 (DF)
16:24:19.694186 > capc25.muc.799 > caes04.muc.nfs: . 2085:3545(1460) ack 849 win 7504 (DF)
16:24:19.774514 < caes04.muc.nfs > capc25.muc.799: . 849:849(0) ack 3545 win 57112
16:24:19.774545 > capc25.muc.799 > caes04.muc.nfs: . 3545:5005(1460) ack 849 win 7504 (DF)
16:24:19.774556 > capc25.muc.799 > caes04.muc.nfs: . 5005:6465(1460) ack 849 win 7504 (DF)
16:24:19.774565 > capc25.muc.799 > caes04.muc.nfs: . 6465:7925(1460) ack 849 win 7504 (DF)
16:24:19.983865 < caes04.muc.nfs > capc25.muc.799: . 849:849(0) ack 7925 win 52732
16:24:19.983887 > capc25.muc.799 > caes04.muc.nfs: . 7925:9385(1460) ack 849 win 7504 (DF)
16:24:19.983897 > capc25.muc.799 > caes04.muc.nfs: . 9385:10845(1460) ack 849 win 7504 (DF)
16:24:19.983906 > capc25.muc.799 > caes04.muc.nfs: . 10845:12305(1460) ack 849 win 7504 (DF)
16:24:19.983916 > capc25.muc.799 > caes04.muc.nfs: P 12305:13765(1460) ack 849 win 7504 (DF)
16:24:20.194551 < caes04.muc.nfs > capc25.muc.799: . 849:849(0) ack 13765 win 46892
16:24:20.194569 > capc25.muc.799 > caes04.muc.nfs: . 13765:15225(1460) ack 849 win 7504 (DF)
16:24:20.194578 > capc25.muc.799 > caes04.muc.nfs: P 15225:16685(1460) ack 849 win 7504 (DF)
16:24:20.194588 > capc25.muc.799 > caes04.muc.nfs: . 16685:18145(1460) ack 849 win 7504 (DF)
16:24:20.194598 > capc25.muc.799 > caes04.muc.nfs: . 18145:19605(1460) ack 849 win 7504 (DF)
16:24:20.194607 > capc25.muc.799 > caes04.muc.nfs: P 19605:21065(1460) ack 849 win 7504 (DF)
16:24:20.396361 < caes04.muc.nfs > capc25.muc.799: . 849:849(0) ack 21065 win 39592

...and so on...

16:24:21.904205 < caes04.muc.nfs > capc25.muc.799: . 2981:2981(0) ack 462513 win 58968
16:24:21.904231 > capc25.muc.799 > caes04.muc.nfs: P 480033:481493(1460) ack 2981 win 7504 (DF)
16:24:21.904239 > capc25.muc.799 > caes04.muc.nfs: . 481493:482953(1460) ack 2981 win 7504 (DF)
16:24:21.904247 > capc25.muc.799 > caes04.muc.nfs: . 482953:484413(1460) ack 2981 win 7504 (DF)
16:24:21.904256 > capc25.muc.799 > caes04.muc.nfs: . 484413:485873(1460) ack 2981 win 7504 (DF)
16:24:21.904265 > capc25.muc.799 > caes04.muc.nfs: . 485873:487333(1460) ack 2981 win 7504 (DF)
16:24:21.904277 > capc25.muc.799 > caes04.muc.nfs: . 487333:488793(1460) ack 2981 win 7504 (DF)
16:24:21.904289 > capc25.muc.799 > caes04.muc.nfs: . 488793:490253(1460) ack 2981 win 7504 (DF)
16:24:21.904298 > capc25.muc.799 > caes04.muc.nfs: . 490253:491713(1460) ack 2981 win 7504 (DF)
16:24:21.904309 > capc25.muc.799 > caes04.muc.nfs: . 491713:493173(1460) ack 2981 win 7504 (DF)
16:24:21.904317 > capc25.muc.799 > caes04.muc.nfs: . 493173:494633(1460) ack 2981 win 7504 (DF)
16:24:21.904327 > capc25.muc.799 > caes04.muc.nfs: . 494633:496093(1460) ack 2981 win 7504 (DF)
16:24:21.904337 > capc25.muc.799 > caes04.muc.nfs: . 496093:497553(1460) ack 2981 win 7504 (DF)
16:24:21.904349 > capc25.muc.799 > caes04.muc.nfs: . 497553:499013(1460) ack 2981 win 7504 (DF)
16:24:21.904357 > capc25.muc.799 > caes04.muc.nfs: . 499013:500473(1460) ack 2981 win 7504 (DF)
16:24:21.904364 > capc25.muc.799 > caes04.muc.nfs: . 500473:501933(1460) ack 2981 win 7504 (DF)
16:24:21.904372 > capc25.muc.799 > caes04.muc.nfs: . 501933:503393(1460) ack 2981 win 7504 (DF)
16:24:21.905180 < caes04.muc.nfs > capc25.muc.799: P 2981:3145(164) ack 465433 win 56048
16:24:21.905979 > capc25.muc.799 > caes04.muc.nfs: . 503393:504853(1460) ack 2981 win 7504 (DF)
16:24:21.905986 > capc25.muc.799 > caes04.muc.nfs: . 504853:506313(1460) ack 2981 win 7504 (DF)
16:24:21.905992 > capc25.muc.799 > caes04.muc.nfs: . 506313:507773(1460) ack 2981 win 7504 (DF)
16:24:21.905998 > capc25.muc.799 > caes04.muc.nfs: . 507773:509233(1460) ack 2981 win 7504 (DF)
16:24:21.906004 > capc25.muc.799 > caes04.muc.nfs: P 509233:510693(1460) ack 2981 win 7504 (DF)
16:24:21.906009 > capc25.muc.799 > caes04.muc.nfs: P 510693:512153(1460) ack 2981 win 7504 (DF)
16:24:21.906016 > capc25.muc.799 > caes04.muc.nfs: . 512153:513613(1460) ack 2981 win 7504 (DF)
16:24:21.906021 > capc25.muc.799 > caes04.muc.nfs: . 513613:515073(1460) ack 3145 win 7504 (DF)
16:24:21.906027 > capc25.muc.799 > caes04.muc.nfs: P 515073:515133(60) ack 3145 win 7504 (DF)
16:24:21.909806 < caes04.muc.nfs > capc25.muc.799: . 3145:3145(0) ack 494633 win 59764
16:24:21.910644 < caes04.muc.nfs > capc25.muc.799: P 3145:3309(164) ack 497553 win 56844
16:24:21.912247 < caes04.muc.nfs > capc25.muc.799: . 3309:3309(0) ack 515133 win 59892
16:24:21.912855 < caes04.muc.nfs > capc25.muc.799: P 3309:3473(164) ack 515133 win 59892
16:24:21.941386 > capc25.muc.799 > caes04.muc.nfs: . 515133:515133(0) ack 3473 win 7504 (DF)
16:24:21.965016 < caes04.muc.nfs > capc25.muc.799: P 3473:3629(156) ack 515133 win 60032
16:24:21.965042 > capc25.muc.799 > caes04.muc.nfs: . 515133:515133(0) ack 3629 win 7504 (DF)
16:24:21.965128 > capc25.muc.799 > caes04.muc.nfs: P 515133:515273(140) ack 3629 win 7504 (DF)
16:24:21.998128 < caes04.muc.nfs > capc25.muc.799: P 3629:3785(156) ack 515273 win 60032
16:24:22.031385 > capc25.muc.799 > caes04.muc.nfs: . 515273:515273(0) ack 3785 win 7504 (DF)


The file was copied all right.

qx09820@capc25 /home/qx09820 > netstat |grep caes04
tcp        0      0 capc25.muc:729          caes04.muc:nfs          TIME_WAIT   
tcp        0      0 capc25.muc:744          caes04.muc:nfs          TIME_WAIT   
tcp        0      0 capc25.muc:737          caes04.muc:nfs          TIME_WAIT   
tcp        0      0 capc25.muc:798          caes04.muc:nfs          ESTABLISHED 


Now copying the same file for a second time:

16:27:20.716640 > capc25.muc.799 > caes04.muc.nfs: P 2956428652:2956428780(128) ack 3186501001 win 7504 (DF)
16:27:20.717226 < caes04.muc.nfs > capc25.muc.799: P 1:117(116) ack 128 win 60032
16:27:20.717243 > capc25.muc.799 > caes04.muc.nfs: . 128:128(0) ack 117 win 7504 (DF)
16:27:20.717360 > capc25.muc.799 > caes04.muc.nfs: P 128:264(136) ack 117 win 7504 (DF)
16:27:20.717966 < caes04.muc.nfs > capc25.muc.799: P 117:361(244) ack 264 win 60032
16:27:20.718041 > capc25.muc.799 > caes04.muc.nfs: P 264:428(164) ack 361 win 8576 (DF)
16:27:20.735478 < caes04.muc.nfs > capc25.muc.799: P 361:509(148) ack 428 win 60032
16:27:20.735745 > capc25.muc.799 > caes04.muc.nfs: . 428:1888(1460) ack 509 win 8576 (DF)
16:27:20.735760 > capc25.muc.799 > caes04.muc.nfs: . 1888:3348(1460) ack 509 win 8576 (DF)
16:27:20.858656 < caes04.muc.nfs > capc25.muc.799: . 509:509(0) ack 3348 win 57112
16:27:20.858672 > capc25.muc.799 > caes04.muc.nfs: . 3348:4808(1460) ack 509 win 8576 (DF)
16:27:20.858682 > capc25.muc.799 > caes04.muc.nfs: . 4808:6268(1460) ack 509 win 8576 (DF)
16:27:20.858692 > capc25.muc.799 > caes04.muc.nfs: . 6268:7728(1460) ack 509 win 8576 (DF)
16:27:21.068297 < caes04.muc.nfs > capc25.muc.799: . 509:509(0) ack 7728 win 52732
16:27:21.068318 > capc25.muc.799 > caes04.muc.nfs: . 7728:9188(1460) ack 509 win 8576 (DF)
16:27:21.068328 > capc25.muc.799 > caes04.muc.nfs: . 9188:10648(1460) ack 509 win 8576 (DF)
16:27:21.068337 > capc25.muc.799 > caes04.muc.nfs: . 10648:12108(1460) ack 509 win 8576 (DF)
16:27:21.068347 > capc25.muc.799 > caes04.muc.nfs: . 12108:13568(1460) ack 509 win 8576 (DF)
16:27:21.278296 < caes04.muc.nfs > capc25.muc.799: . 509:509(0) ack 13568 win 46892
16:27:21.278321 > capc25.muc.799 > caes04.muc.nfs: . 13568:15028(1460) ack 509 win 8576 (DF)
16:27:21.278331 > capc25.muc.799 > caes04.muc.nfs: . 15028:16488(1460) ack 509 win 8576 (DF)
16:27:21.278341 > capc25.muc.799 > caes04.muc.nfs: . 16488:17948(1460) ack 509 win 8576 (DF)
16:27:21.278350 > capc25.muc.799 > caes04.muc.nfs: . 17948:19408(1460) ack 509 win 8576 (DF)
16:27:21.278359 > capc25.muc.799 > caes04.muc.nfs: . 19408:20868(1460) ack 509 win 8576 (DF)
16:27:21.488343 < caes04.muc.nfs > capc25.muc.799: . 509:509(0) ack 20868 win 39592
16:27:21.488376 > capc25.muc.799 > caes04.muc.nfs: . 20868:22328(1460) ack 509 win 8576 (DF)
16:27:21.488387 > capc25.muc.799 > caes04.muc.nfs: . 22328:23788(1460) ack 509 win 8576 (DF)
16:27:21.488397 > capc25.muc.799 > caes04.muc.nfs: . 23788:25248(1460) ack 509 win 8576 (DF)
16:27:21.488406 > capc25.muc.799 > caes04.muc.nfs: . 25248:26708(1460) ack 509 win 8576 (DF)
16:27:21.488415 > capc25.muc.799 > caes04.muc.nfs: . 26708:28168(1460) ack 509 win 8576 (DF)
16:27:21.488424 > capc25.muc.799 > caes04.muc.nfs: . 28168:29628(1460) ack 509 win 8576 (DF)
16:27:21.690712 < caes04.muc.nfs > capc25.muc.799: . 509:509(0) ack 29628 win 30832
16:27:21.690736 > capc25.muc.799 > caes04.muc.nfs: P 29628:31088(1460) ack 509 win 8576 (DF)
16:27:21.690746 > capc25.muc.799 > caes04.muc.nfs: . 31088:32548(1460) ack 509 win 8576 (DF)
16:27:21.690756 > capc25.muc.799 > caes04.muc.nfs: . 32548:34008(1460) ack 509 win 8576 (DF)
16:27:21.690765 > capc25.muc.799 > caes04.muc.nfs: . 34008:35468(1460) ack 509 win 8576 (DF)
16:27:21.690774 > capc25.muc.799 > caes04.muc.nfs: . 35468:36928(1460) ack 509 win 8576 (DF)
16:27:21.690784 > capc25.muc.799 > caes04.muc.nfs: . 36928:38388(1460) ack 509 win 8576 (DF)
16:27:21.690793 > capc25.muc.799 > caes04.muc.nfs: . 38388:39848(1460) ack 509 win 8576 (DF)

..and so on..

16:27:22.567415 < caes04.muc.nfs > capc25.muc.799: P 2477:2641(164) ack 434048 win 54320
16:27:22.568505 > capc25.muc.799 > caes04.muc.nfs: . 470548:472008(1460) ack 2477 win 8576 (DF)
16:27:22.568512 > capc25.muc.799 > caes04.muc.nfs: . 472008:473468(1460) ack 2477 win 8576 (DF)
16:27:22.568518 > capc25.muc.799 > caes04.muc.nfs: . 473468:474928(1460) ack 2477 win 8576 (DF)
16:27:22.568524 > capc25.muc.799 > caes04.muc.nfs: . 474928:476388(1460) ack 2477 win 8576 (DF)
16:27:22.568530 > capc25.muc.799 > caes04.muc.nfs: . 476388:477848(1460) ack 2477 win 8576 (DF)
16:27:22.568536 > capc25.muc.799 > caes04.muc.nfs: . 477848:479308(1460) ack 2641 win 8576 (DF)
16:27:22.568541 > capc25.muc.799 > caes04.muc.nfs: . 479308:480768(1460) ack 2641 win 8576 (DF)
16:27:22.568550 > capc25.muc.799 > caes04.muc.nfs: . 480768:482228(1460) ack 2641 win 8576 (DF)
16:27:22.568556 > capc25.muc.799 > caes04.muc.nfs: . 482228:483688(1460) ack 2641 win 8576 (DF)
16:27:22.570140 < caes04.muc.nfs > capc25.muc.799: . 2641:2641(0) ack 461788 win 59496
16:27:22.570163 > capc25.muc.799 > caes04.muc.nfs: . 483688:485148(1460) ack 2641 win 8576 (DF)
16:27:22.570172 > capc25.muc.799 > caes04.muc.nfs: . 485148:486608(1460) ack 2641 win 8576 (DF)
16:27:22.570180 > capc25.muc.799 > caes04.muc.nfs: . 486608:488068(1460) ack 2641 win 8576 (DF)
16:27:22.570190 > capc25.muc.799 > caes04.muc.nfs: . 488068:489528(1460) ack 2641 win 8576 (DF)
16:27:22.570200 > capc25.muc.799 > caes04.muc.nfs: . 489528:490988(1460) ack 2641 win 8576 (DF)
16:27:22.570214 > capc25.muc.799 > caes04.muc.nfs: . 490988:492448(1460) ack 2641 win 8576 (DF)
16:27:22.570222 > capc25.muc.799 > caes04.muc.nfs: . 492448:493908(1460) ack 2641 win 8576 (DF)
16:27:22.570230 > capc25.muc.799 > caes04.muc.nfs: . 493908:495368(1460) ack 2641 win 8576 (DF)
16:27:22.570238 > capc25.muc.799 > caes04.muc.nfs: . 495368:496828(1460) ack 2641 win 8576 (DF)
16:27:22.570247 > capc25.muc.799 > caes04.muc.nfs: . 496828:498288(1460) ack 2641 win 8576 (DF)
16:27:22.570257 > capc25.muc.799 > caes04.muc.nfs: P 498288:499748(1460) ack 2641 win 8576 (DF)
16:27:22.570268 > capc25.muc.799 > caes04.muc.nfs: . 499748:501208(1460) ack 2641 win 8576 (DF)
16:27:22.570279 > capc25.muc.799 > caes04.muc.nfs: . 501208:502668(1460) ack 2641 win 8576 (DF)
16:27:22.570287 > capc25.muc.799 > caes04.muc.nfs: P 502668:504128(1460) ack 2641 win 8576 (DF)
16:27:22.570294 > capc25.muc.799 > caes04.muc.nfs: . 504128:505588(1460) ack 2641 win 8576 (DF)
16:27:22.570307 > capc25.muc.799 > caes04.muc.nfs: . 505588:507048(1460) ack 2641 win 8576 (DF)
16:27:22.570433 < caes04.muc.nfs > capc25.muc.799: P 2641:2805(164) ack 464708 win 56576
16:27:22.571906 > capc25.muc.799 > caes04.muc.nfs: . 507048:508508(1460) ack 2641 win 8576 (DF)
16:27:22.571913 > capc25.muc.799 > caes04.muc.nfs: . 508508:509968(1460) ack 2641 win 8576 (DF)
16:27:22.571919 > capc25.muc.799 > caes04.muc.nfs: . 509968:511428(1460) ack 2641 win 8576 (DF)
16:27:22.571924 > capc25.muc.799 > caes04.muc.nfs: . 511428:512888(1460) ack 2641 win 8576 (DF)
16:27:22.571933 > capc25.muc.799 > caes04.muc.nfs: . 512888:514348(1460) ack 2805 win 8576 (DF)
16:27:22.571938 > capc25.muc.799 > caes04.muc.nfs: P 514348:514796(448) ack 2805 win 8576 (DF)
16:27:22.573924 < caes04.muc.nfs > capc25.muc.799: . 2805:2805(0) ack 495368 win 58832
16:27:22.574462 < caes04.muc.nfs > capc25.muc.799: P 2805:2969(164) ack 499748 win 54452
16:27:22.575192 < caes04.muc.nfs > capc25.muc.799: . 2969:2969(0) ack 514796 win 60032
16:27:22.575518 < caes04.muc.nfs > capc25.muc.799: P 2969:3133(164) ack 514796 win 60032
16:27:22.608665 > capc25.muc.799 > caes04.muc.nfs: . 514796:514796(0) ack 3133 win 8576 (DF)
16:27:26.248872 > capc25.muc.799 > caes04.muc.nfs: P 514796:514936(140) ack 3133 win 8576 (DF)
16:27:26.268210 < caes04.muc.nfs > capc25.muc.799: . 3133:3133(0) ack 514936 win 60032
16:27:26.282829 < caes04.muc.nfs > capc25.muc.799: P 3133:3289(156) ack 514936 win 60032
16:27:26.282843 > capc25.muc.799 > caes04.muc.nfs: . 514936:514936(0) ack 3289 win 8576 (DF)

from now on we get lot's of these:

16:27:26.489024 > capc25.muc.576126976 > caes04.muc.nfs: 40 null (DF)
16:27:26.489647 < caes04.muc.nfs > capc25.muc.576126976: reply ok 24 null

The cp command on the Linux nfs3-client side hangs and cannot be
killed. We get:

dmesg: nfs: server caes04 not responding, still trying

then after a while:
dmesg: nfs: server caes04 OK

qx09820@capc25 /home/qx09820 > netstat | grep caes04
tcp        0      0 capc25.muc:798          caes04.muc:nfs          ESTABLISHED 

The cp seems to be finished and the game can start again. :-(

The interesting bit seems to be at the end, where `cp` on the Linux
nfs3 client hangs. The cause of the problem might just as well be on
the AIX nfs3 server as on the client. The tcpdump on the server is
basically identical -- now that option -s 64000 has been used.

Files of size 500kB, 1MB and 2MB have been used. Sometimes the whole file
is copied before (after/while?) cp locks up, sometimes just about 800k
are copied.

We also got these messages after the cp died and the nfs: server
caes04 OK message appeared. The line with <mss 1460,sackOK,timestamp
2101448 0,nop,wscale 0> looks weird. Selective acknowledgement? Is
this normal?


16:34:56.487168 > capc25.muc.1649999872 > caes04.muc.nfs: 40 null (DF)
16:34:56.493170 < caes04.muc.nfs > capc25.muc.1649999872: reply ok 24 null
16:35:26.488376 > capc25.muc.2186870784 > caes04.muc.nfs: 40 null (DF)
16:35:26.492354 < caes04.muc.nfs > capc25.muc.2186870784: reply ok 24 null
16:35:56.489587 > capc25.muc.2723741696 > caes04.muc.nfs: 40 null (DF)
16:35:56.491055 < caes04.muc.nfs > capc25.muc.2723741696: reply ok 24 null
16:36:26.490783 > capc25.muc.3260612608 > caes04.muc.nfs: 40 null (DF)
16:36:26.505492 < caes04.muc.nfs > capc25.muc.3260612608: reply ok 24 null
16:36:56.491996 > capc25.muc.3797483520 > caes04.muc.nfs: 40 null (DF)
16:36:56.494516 < caes04.muc.nfs > capc25.muc.3797483520: reply ok 24 null
16:37:26.493194 > capc25.muc.39452672 > caes04.muc.nfs: 40 null (DF)
16:37:26.493631 < caes04.muc.nfs > capc25.muc.39452672: reply ok 24 null
16:37:56.484418 > capc25.muc.576323584 > caes04.muc.nfs: 40 null (DF)
16:37:56.485075 < caes04.muc.nfs > capc25.muc.576323584: reply ok 24 null
16:38:26.485620 > capc25.muc.1113194496 > caes04.muc.nfs: 40 null (DF)
16:38:26.486086 < caes04.muc.nfs > capc25.muc.1113194496: reply ok 24 null
16:38:56.486833 > capc25.muc.1650065408 > caes04.muc.nfs: 40 null (DF)
16:38:56.487256 < caes04.muc.nfs > capc25.muc.1650065408: reply ok 24 null
16:39:26.488042 > capc25.muc.2186936320 > caes04.muc.nfs: 40 null (DF)
16:39:26.488555 < caes04.muc.nfs > capc25.muc.2186936320: reply ok 24 null
16:39:56.489253 > capc25.muc.2723807232 > caes04.muc.nfs: 40 null (DF)
16:39:56.489655 < caes04.muc.nfs > capc25.muc.2723807232: reply ok 24 null
16:40:26.490458 > capc25.muc.3260678144 > caes04.muc.nfs: 40 null (DF)
16:40:26.490858 < caes04.muc.nfs > capc25.muc.3260678144: reply ok 24 null
16:40:56.491656 > capc25.muc.3797549056 > caes04.muc.nfs: 40 null (DF)
16:40:56.492277 < caes04.muc.nfs > capc25.muc.3797549056: reply ok 24 null
16:41:04.171081 < caes04.muc.nfs > capc25.muc.798: F 2278783378:2278783378(0) ack 3544800811 win 60032
16:41:04.171126 > capc25.muc.798 > caes04.muc.nfs: F 1:1(0) ack 1 win 9408 (DF)
16:41:04.171307 > capc25.muc.799 > caes04.muc.nfs: S 4020491914:4020491914(0) win 5840 <mss 1460,sackOK,timestamp 2101448 0,nop,wscale 0> (DF)
16:41:04.172213 < caes04.muc.nfs > capc25.muc.798: . 1:1(0) ack 2 win 60032
16:41:04.172379 < caes04.muc.nfs > capc25.muc.799: S 3316289216:3316289216(0) ack 4020491915 win 59860 <mss 1460>
16:41:04.172394 > capc25.muc.799 > caes04.muc.nfs: . 1:1(0) ack 1 win 5840 (DF)
16:41:04.172426 > capc25.muc.799 > caes04.muc.nfs: . 1:1461(1460) ack 1 win 5840 (DF)
16:41:04.172437 > capc25.muc.799 > caes04.muc.nfs: . 1461:2921(1460) ack 1 win 5840 (DF)
16:41:04.193288 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 2921 win 57112
16:41:04.193322 > capc25.muc.799 > caes04.muc.nfs: . 2921:4381(1460) ack 1 win 5840 (DF)
16:41:04.193330 > capc25.muc.799 > caes04.muc.nfs: . 4381:5841(1460) ack 1 win 5840 (DF)
16:41:04.193339 > capc25.muc.799 > caes04.muc.nfs: . 5841:7301(1460) ack 1 win 5840 (DF)
16:41:04.403451 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 7301 win 52732
16:41:04.403467 > capc25.muc.799 > caes04.muc.nfs: . 7301:8761(1460) ack 1 win 5840 (DF)
16:41:04.403477 > capc25.muc.799 > caes04.muc.nfs: . 8761:10221(1460) ack 1 win 5840 (DF)
16:41:04.403486 > capc25.muc.799 > caes04.muc.nfs: . 10221:11681(1460) ack 1 win 5840 (DF)
16:41:04.403496 > capc25.muc.799 > caes04.muc.nfs: P 11681:13141(1460) ack 1 win 5840 (DF)
16:41:04.613256 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 13141 win 46892
16:41:04.613294 > capc25.muc.799 > caes04.muc.nfs: . 13141:14601(1460) ack 1 win 5840 (DF)
16:41:04.613305 > capc25.muc.799 > caes04.muc.nfs: P 14601:16061(1460) ack 1 win 5840 (DF)
16:41:04.613315 > capc25.muc.799 > caes04.muc.nfs: . 16061:17521(1460) ack 1 win 5840 (DF)
16:41:04.613324 > capc25.muc.799 > caes04.muc.nfs: . 17521:18981(1460) ack 1 win 5840 (DF)
16:41:04.613334 > capc25.muc.799 > caes04.muc.nfs: P 18981:20441(1460) ack 1 win 5840 (DF)
16:41:04.815623 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 20441 win 39592
16:41:04.815641 > capc25.muc.799 > caes04.muc.nfs: . 20441:21901(1460) ack 1 win 5840 (DF)
16:41:04.815651 > capc25.muc.799 > caes04.muc.nfs: . 21901:23361(1460) ack 1 win 5840 (DF)
16:41:04.815660 > capc25.muc.799 > caes04.muc.nfs: . 23361:24821(1460) ack 1 win 5840 (DF)
16:41:04.815670 > capc25.muc.799 > caes04.muc.nfs: P 24821:26281(1460) ack 1 win 5840 (DF)
16:41:04.815693 > capc25.muc.799 > caes04.muc.nfs: . 26281:27741(1460) ack 1 win 5840 (DF)
16:41:04.815704 > capc25.muc.799 > caes04.muc.nfs: . 27741:29201(1460) ack 1 win 5840 (DF)
16:41:05.023080 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 29201 win 30832
16:41:05.023117 > capc25.muc.799 > caes04.muc.nfs: . 29201:30661(1460) ack 1 win 5840 (DF)
16:41:05.023127 > capc25.muc.799 > caes04.muc.nfs: . 30661:32121(1460) ack 1 win 5840 (DF)
16:41:05.023137 > capc25.muc.799 > caes04.muc.nfs: P 32121:32917(796) ack 1 win 5840 (DF)
16:41:05.024644 < caes04.muc.nfs > capc25.muc.799: . 1:1(0) ack 32917 win 60032
16:41:05.025192 < caes04.muc.nfs > capc25.muc.799: P 1:165(164) ack 32917 win 60032
16:41:05.025203 > capc25.muc.799 > caes04.muc.nfs: . 32917:32917(0) ack 165 win 5840 (DF)
16:41:05.025272 > capc25.muc.799 > caes04.muc.nfs: P 32917:33057(140) ack 165 win 5840 (DF)
16:41:05.048922 < caes04.muc.nfs > capc25.muc.799: P 165:321(156) ack 33057 win 60032
16:41:05.081819 > capc25.muc.799 > caes04.muc.nfs: . 33057:33057(0) ack 321 win 5840 (DF)
16:41:26.492871 > capc25.muc.39518208 > caes04.muc.nfs: 40 null (DF)
16:41:26.495098 < caes04.muc.nfs > capc25.muc.39518208: reply ok 24 null
16:41:56.494082 > capc25.muc.576389120 > caes04.muc.nfs: 40 null (DF)
16:41:56.500261 < caes04.muc.nfs > capc25.muc.576389120: reply ok 24 null
16:42:26.485359 > capc25.muc.1113260032 > caes04.muc.nfs: 40 null (DF)
16:42:26.487429 < caes04.muc.nfs > capc25.muc.1113260032: reply ok 24 null
16:42:56.486530 > capc25.muc.1650130944 > caes04.muc.nfs: 40 null (DF)
16:42:56.487097 < caes04.muc.nfs > capc25.muc.1650130944: reply ok 24 null
16:43:26.487748 > capc25.muc.2187001856 > caes04.muc.nfs: 40 null (DF)
16:43:26.489881 < caes04.muc.nfs > capc25.muc.2187001856: reply ok 24 null
16:43:34.489177 > capc25.muc.799 > caes04.muc.nfs: F 33057:33057(0) ack 321 win 5840 (DF)
16:43:34.489777 < caes04.muc.nfs > capc25.muc.799: . 321:321(0) ack 33058 win 60032
16:43:34.490370 < caes04.muc.nfs > capc25.muc.799: F 321:321(0) ack 33058 win 60032
16:43:34.490391 > capc25.muc.799 > caes04.muc.nfs: . 33058:33058(0) ack 322 win 5840 (DF)
16:43:34.494041 > capc25.muc.1022 > caes04.muc.32859: udp 80 (DF)
16:43:34.494079 > capc25.muc.1022 > caes04.muc.sunrpc: udp 84 (DF)
16:43:34.494091 > capc25.muc.1022 > caes04.muc.32859: udp 80 (DF)
16:43:34.496654 < caes04.muc.sunrpc > capc25.muc.1022: udp 28
16:43:34.507767 < caes04.muc.32859 > capc25.muc.1022: udp 72
16:43:34.507939 < caes04.muc.32859 > capc25.muc.1022: udp 24
16:43:56.488965 > capc25.muc.2723872768 > caes04.muc.nfs: 40 null (DF)
16:43:56.489405 < caes04.muc.nfs > capc25.muc.2723872768: reply ok 24 null
16:44:26.490297 > capc25.muc.3260743680 > caes04.muc.nfs: 40 null (DF)
16:44:26.490754 < caes04.muc.nfs > capc25.muc.3260743680: reply ok 24 null


Slighlty clueless,
Birger



^ permalink raw reply	[flat|nested] 29+ messages in thread

* more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 16:37               ` more tcpdumpinfo for nfs3 " Birger Lammering
@ 2001-11-19 18:22                 ` Trond Myklebust
  2001-11-19 18:49                   ` kuznet
  2001-11-20  9:26                   ` Birger Lammering
  0 siblings, 2 replies; 29+ messages in thread
From: Trond Myklebust @ 2001-11-19 18:22 UTC (permalink / raw)
  To: Birger Lammering; +Cc: linux-kernel, kuznet

>>>>> " " == Birger Lammering <b.lammering@science-computing.de> writes:

     > 3133:3289(156) ack 514936 win 60032 16:27:26.282843 >
     > capc25.muc.799 > caes04.muc.nfs: . 514936:514936(0) ack 3289
     > win 8576 (DF)

     > from now on we get lot's of these:

     > 16:27:26.489024 > capc25.muc.576126976 > caes04.muc.nfs: 40
     > null (DF) 16:27:26.489647 < caes04.muc.nfs >
     > capc25.muc.576126976: reply ok 24 null

     > The cp command on the Linux nfs3-client side hangs and cannot
     > be killed. We get:

     > dmesg: nfs: server caes04 not responding, still trying

     > then after a while: dmesg: nfs: server caes04 OK

     > qx09820@capc25 /home/qx09820 > netstat | grep caes04 tcp 0 0
     > capc25.muc:798 caes04.muc:nfs ESTABLISHED

Ho hum... It looks to me as if the problem is that the Linux NFS
client is falling asleep before a write, and then not waking
up. That sort of points at the write_space() callback.

When the socket buffer is full, and we get an EAGAIN response to our
sendmsg() request, we normally put the request to sleep, block the
socket, and rely on write_space() to wake us up when there is enough
memory to proceed.

Assuming that this is the case, there are 2 possible causes:

   1) A bug in the IPV4 TCP layer in which we don't call write_space()
      despite having liberated enough memory to proceed.

   2) I've misunderstood the IPV4 tcp api, and so the check for
      sock_writeable() in net/sunrpc/xprt.c:tcp_write_space() is
      incorrect.

Alexey: Do you have any comments? Is it correct to check for
sock_writeable() on a TCP socket?


Birger: could you try the following patch, that simply removes the
check for sock_writeable()?

Cheers,
  Trond

--- linux-2.4.15-pre6/net/sunrpc/xprt.c.orig	Mon Oct  8 21:36:07 2001
+++ linux-2.4.15-pre6/net/sunrpc/xprt.c	Mon Nov 19 19:07:09 2001
@@ -1071,10 +1071,6 @@
 	if (xprt->shutdown)
 		return;
 
-	/* Wait until we have enough socket memory */
-	if (!sock_writeable(sk))
-		return;
-
 	if (!xprt_test_and_set_wspace(xprt)) {
 		spin_lock(&xprt->sock_lock);
 		if (xprt->snd_task && xprt->snd_task->tk_rpcwait == &xprt->sending)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 18:22                 ` Trond Myklebust
@ 2001-11-19 18:49                   ` kuznet
  2001-11-19 18:55                     ` Trond Myklebust
  2001-11-20  9:26                   ` Birger Lammering
  1 sibling, 1 reply; 29+ messages in thread
From: kuznet @ 2001-11-19 18:49 UTC (permalink / raw)
  To: trond.myklebust; +Cc: b.lammering, linux-kernel

Hello!

> Alexey: Do you have any comments? Is it correct to check for
> sock_writeable() on a TCP socket?

No. sock_writable() is for datagram sockets, TCP never used or
satisfied this predicate, it used(s) more interesting one.

BTW applications need not use this anyway, we do not awake people
for no reasons. If a write failed with EAGAIN, wakeup will happen
only when there is some room for write. And it will not be awaken
again until the next write will fail. So, if you rejected wakeup
(due to wrong predicate), nobody will remind you again.

Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 18:49                   ` kuznet
@ 2001-11-19 18:55                     ` Trond Myklebust
  2001-11-19 19:09                       ` kuznet
  0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2001-11-19 18:55 UTC (permalink / raw)
  To: kuznet; +Cc: b.lammering, linux-kernel

>>>>> " " == kuznet  <kuznet@ms2.inr.ac.ru> writes:

     > No. sock_writable() is for datagram sockets, TCP never used or
     > satisfied this predicate, it used(s) more interesting one.

     > BTW applications need not use this anyway, we do not awake
     > people for no reasons. If a write failed with EAGAIN, wakeup
     > will happen only when there is some room for write. And it will
     > not be awaken again until the next write will fail. So, if you
     > rejected wakeup (due to wrong predicate), nobody will remind
     > you again.

Thanks... Then the patch I sent Birger is very likely the correct one.

Originally I had a test for whether sock_wspace(sk) was greater than
some minimal value. We need this for UDP in order to avoid waking up
'rpciod' before the socket buffer is large enough to accommodate the
RPC datagram. As the same code worked in 2.2.x for TCP, I had assumed
it was OK...

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 18:55                     ` Trond Myklebust
@ 2001-11-19 19:09                       ` kuznet
  2001-11-19 19:29                         ` Trond Myklebust
  0 siblings, 1 reply; 29+ messages in thread
From: kuznet @ 2001-11-19 19:09 UTC (permalink / raw)
  To: trond.myklebust; +Cc: b.lammering, linux-kernel

Hello!

> Originally I had a test for whether sock_wspace(sk) was greater than
> some minimal value. We need this for UDP in order to avoid waking up
> 'rpciod' before the socket buffer is large enough to accommodate the
> RPC datagram. 

I do no think this was right, to be honest. write_space with udp is
too hairy thing to use it correctly. :-) Anyway, it is enough to select
right sndbuf. Right is... well, default value is right. :-)


>	As the same code worked in 2.2.x for TCP, I had assumed
> it was OK...

Most likely, it worked because 2.2 did not protect of overschedule
and waked up thread each time when some space was freed, so it was enough
that wakeup predicate used by tcp and by application had one common point:
wmem_alloc==0 and they always have it. 2.4 does not wake
up process, if it did not see full buffer after previous wakeup.

Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 19:09                       ` kuznet
@ 2001-11-19 19:29                         ` Trond Myklebust
  2001-11-19 19:52                           ` kuznet
  0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2001-11-19 19:29 UTC (permalink / raw)
  To: kuznet; +Cc: b.lammering, linux-kernel

>>>>> " " == kuznet  <kuznet@ms2.inr.ac.ru> writes:

     > I do no think this was right, to be honest. write_space with
     > udp is too hairy thing to use it correctly. :-) Anyway, it is
     > enough to select right sndbuf. Right is... well, default value
     > is right. :-)

The sndbuf is by default 64k, so that should indeed suffice to fit all
the possible varieties of NFS datagram.

The problem is that when EAGAIN is returned by sendmsg, we want to put
the RPC request to sleep (and have rpciod deal with other pending
requests), and then reactivate it as soon as sock_wspace() reports
that the available free buffer space is large enough to fit the next
request.

Assuming that sock_wfree() always gets called whenever an skb is
released and that sock_wspace() does indeed reflect more or less the
maximum message size for which there is free buffer space (I allow a
couple of kilobytes for extra padding) then the current UDP code
should be correct and race-free.

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 19:29                         ` Trond Myklebust
@ 2001-11-19 19:52                           ` kuznet
  2001-11-19 20:38                             ` Trond Myklebust
  0 siblings, 1 reply; 29+ messages in thread
From: kuznet @ 2001-11-19 19:52 UTC (permalink / raw)
  To: trond.myklebust; +Cc: b.lammering, linux-kernel

Hello!

> The problem is that when EAGAIN is returned by sendmsg,

BTW this is already problem. UDP should not hit EAGAIN case, if the
predicate is right.


> requests), and then reactivate it as soon as sock_wspace() reports
> that the available free buffer space is large enough to fit the next
> request.

sock_wspace() is OK. sock_writable() is bad.


> released and that sock_wspace() does indeed reflect more or less the
> maximum message size for which there is free buffer space (I allow a
> couple of kilobytes for extra padding)

Do not economize. :-) The longer you wait the less you scheduled.
We used to wait for half of sndbuf to be freed.


>						 then the current UDP code
> should be correct and race-free.

BTW recently I was reported it deadlocks on some spinlock...

Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 19:52                           ` kuznet
@ 2001-11-19 20:38                             ` Trond Myklebust
  2001-11-19 21:17                               ` Trond Myklebust
  2001-11-20 17:41                               ` kuznet
  0 siblings, 2 replies; 29+ messages in thread
From: Trond Myklebust @ 2001-11-19 20:38 UTC (permalink / raw)
  To: kuznet; +Cc: linux-kernel

>>>>> " " == kuznet  <kuznet@ms2.inr.ac.ru> writes:

    >> The problem is that when EAGAIN is returned by sendmsg,

     > BTW this is already problem. UDP should not hit EAGAIN case, if
     > the predicate is right.

You are saying that the it is impossible for sock_alloc_send_skb() to
fail when using non-blocking writes? It was certainly occuring in
2.2.x.
Don't forget that we can be trying to fire off 16 * 32k 'simultaneous'
requests down the same socket when NFS is doing asynchronous block
writes. (Note: by 'simultaneous' I mean that we don't wait for the
server to reply before firing off the next request)

    >> then the current UDP code should be correct and race-free.

     > BTW recently I was reported it deadlocks on some spinlock...

Ulrich Weigand reported the following problem on the S/390:

A QDIO networking driver bottom half was grabbing a private spinlock,
then calling dev_kfree_skb_any() which again calls write_space() (via
kfree_skb()) and so tries to take xprt->sock_lock.

At the same time, a QDIO hard interrupt could be trying to take the
same private spinlock on another processor. Since the RPC layer only
protects against bottom halves, and since the interrupted process
could already be holding the RPC lock that kfree_skb() tries to take,
the hard interrupt could deadlock.

I haven't done anything about this because IMHO it makes more sense to
have the QDIO driver drop their special spinlock when calling external
functions such as dev_kfree_skb_any() rather than to force the RPC
layer to use the spin_lock_irqsave().

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 20:38                             ` Trond Myklebust
@ 2001-11-19 21:17                               ` Trond Myklebust
  2001-11-20 17:42                                 ` kuznet
  2001-11-20 17:41                               ` kuznet
  1 sibling, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2001-11-19 21:17 UTC (permalink / raw)
  To: kuznet; +Cc: linux-kernel

>>>>> " " == Trond Myklebust <trond.myklebust@fys.uio.no> writes:

     > I haven't done anything about this because IMHO it makes more
     > sense to have the QDIO driver drop their special spinlock when
     > calling external functions such as dev_kfree_skb_any() rather
     > than to force the RPC layer to use the spin_lock_irqsave().

I forgot to add: The socket fasync lists use spinlocking in the same
was as RPC does, with sock_fasync() setting
write_lock_bh(&sk->callback_lock), and sock_def_write_space()
doing read_lock(&sk->callback_lock).

So that would deadlock with the QDIO driver in the exact same manner
as the RPC stuff (albeit probably a lot less frequently).

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 18:22                 ` Trond Myklebust
  2001-11-19 18:49                   ` kuznet
@ 2001-11-20  9:26                   ` Birger Lammering
  1 sibling, 0 replies; 29+ messages in thread
From: Birger Lammering @ 2001-11-20  9:26 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-kernel, kuznet

Hi Trond,

sorry to disappoint you: the patch didn't work for me :-(

I've commented out the two lines in net/sunrpc/xprt.c of 2.4.15pre5.
(Has there been a change in pre6 that might make a difference?)

The result (copying 500k twice from capc25 to caes04):

tcpdump on capc25 (Linux 2.4.15pre5 with sock_writeable patch)
09:59:08.243965 eth0 > capc25.muc.33882112 > caes04.muc.nfs: 40 null (DF)
09:59:08.244500 eth0 < caes04.muc.nfs > capc25.muc.33882112: reply ok 24 null
09:59:08.244533 eth0 > capc25.muc.1022 > caes04.muc.sunrpc: udp 84 (DF)
09:59:08.245271 eth0 < caes04.muc.sunrpc > capc25.muc.1022: udp 28
09:59:31.841577 eth0 > capc25.muc.798 > caes04.muc.nfs: P 1035886130:1035886266(136) ack 1814843210 win 7504 (DF)
09:59:31.843105 eth0 < caes04.muc.nfs > capc25.muc.798: P 1:245(244) ack 136 win 60032
09:59:31.843118 eth0 > capc25.muc.798 > caes04.muc.nfs: . 136:136(0) ack 245 win 8576 (DF)
09:59:31.843160 eth0 > capc25.muc.798 > caes04.muc.nfs: P 136:300(164) ack 245 win 8576 (DF)
09:59:31.850179 eth0 < caes04.muc.nfs > capc25.muc.798: P 245:393(148) ack 300 win 60032
09:59:31.850373 eth0 > capc25.muc.798 > caes04.muc.nfs: . 300:1760(1460) ack 393 win 8576 (DF)
09:59:31.850387 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1760:3220(1460) ack 393 win 8576 (DF)
09:59:31.864330 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 3220 win 57112
09:59:31.864360 eth0 > capc25.muc.798 > caes04.muc.nfs: . 3220:4680(1460) ack 393 win 8576 (DF)
09:59:31.864370 eth0 > capc25.muc.798 > caes04.muc.nfs: . 4680:6140(1460) ack 393 win 8576 (DF)
09:59:31.864379 eth0 > capc25.muc.798 > caes04.muc.nfs: . 6140:7600(1460) ack 393 win 8576 (DF)
09:59:32.064719 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 7600 win 52732
09:59:32.064732 eth0 > capc25.muc.798 > caes04.muc.nfs: . 7600:9060(1460) ack 393 win 8576 (DF)
09:59:32.064741 eth0 > capc25.muc.798 > caes04.muc.nfs: . 9060:10520(1460) ack 393 win 8576 (DF)
09:59:32.064750 eth0 > capc25.muc.798 > caes04.muc.nfs: . 10520:11980(1460) ack 393 win 8576 (DF)
09:59:32.064760 eth0 > capc25.muc.798 > caes04.muc.nfs: P 11980:13440(1460) ack 393 win 8576 (DF)
09:59:32.265178 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 13440 win 46892
09:59:32.265205 eth0 > capc25.muc.798 > caes04.muc.nfs: . 13440:14900(1460) ack 393 win 8576 (DF)
09:59:32.265215 eth0 > capc25.muc.798 > caes04.muc.nfs: P 14900:16360(1460) ack 393 win 8576 (DF)
09:59:32.265225 eth0 > capc25.muc.798 > caes04.muc.nfs: . 16360:17820(1460) ack 393 win 8576 (DF)
09:59:32.265234 eth0 > capc25.muc.798 > caes04.muc.nfs: . 17820:19280(1460) ack 393 win 8576 (DF)
09:59:32.265244 eth0 > capc25.muc.798 > caes04.muc.nfs: P 19280:20740(1460) ack 393 win 8576 (DF)
09:59:32.465675 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 20740 win 39592
09:59:32.465686 eth0 > capc25.muc.798 > caes04.muc.nfs: . 20740:22200(1460) ack 393 win 8576 (DF)
09:59:32.465693 eth0 > capc25.muc.798 > caes04.muc.nfs: . 22200:23660(1460) ack 393 win 8576 (DF)
09:59:32.465702 eth0 > capc25.muc.798 > caes04.muc.nfs: . 23660:25120(1460) ack 393 win 8576 (DF)
09:59:32.465710 eth0 > capc25.muc.798 > caes04.muc.nfs: P 25120:26580(1460) ack 393 win 8576 (DF)
09:59:32.465725 eth0 > capc25.muc.798 > caes04.muc.nfs: . 26580:28040(1460) ack 393 win 8576 (DF)
09:59:32.465736 eth0 > capc25.muc.798 > caes04.muc.nfs: . 28040:29500(1460) ack 393 win 8576 (DF)
09:59:32.667194 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 29500 win 30832
09:59:32.667209 eth0 > capc25.muc.798 > caes04.muc.nfs: . 29500:30960(1460) ack 393 win 8576 (DF)
09:59:32.667218 eth0 > capc25.muc.798 > caes04.muc.nfs: . 30960:32420(1460) ack 393 win 8576 (DF)
09:59:32.667227 eth0 > capc25.muc.798 > caes04.muc.nfs: . 32420:33880(1460) ack 393 win 8576 (DF)
09:59:32.667236 eth0 > capc25.muc.798 > caes04.muc.nfs: . 33880:35340(1460) ack 393 win 8576 (DF)
09:59:32.667248 eth0 > capc25.muc.798 > caes04.muc.nfs: . 35340:36800(1460) ack 393 win 8576 (DF)
09:59:32.667257 eth0 > capc25.muc.798 > caes04.muc.nfs: P 36800:38260(1460) ack 393 win 8576 (DF)
09:59:32.667277 eth0 > capc25.muc.798 > caes04.muc.nfs: . 38260:39720(1460) ack 393 win 8576 (DF)
09:59:32.668855 eth0 < caes04.muc.nfs > capc25.muc.798: . 393:393(0) ack 33880 win 59368
09:59:32.668864 eth0 > capc25.muc.798 > caes04.muc.nfs: . 39720:41180(1460) ack 393 win 8576 (DF)
09:59:32.668872 eth0 > capc25.muc.798 > caes04.muc.nfs: . 41180:42640(1460) ack 393 win 8576 (DF)
09:59:32.668880 eth0 > capc25.muc.798 > caes04.muc.nfs: . 42640:44100(1460) ack 393 win 8576 (DF)
09:59:32.668888 eth0 > capc25.muc.798 > caes04.muc.nfs: . 44100:45560(1460) ack 393 win 8576 (DF)
09:59:32.669337 eth0 < caes04.muc.nfs > capc25.muc.798: P 393:557(164) ack 36800 win 56448
09:59:32.669350 eth0 > capc25.muc.798 > caes04.muc.nfs: . 45560:47020(1460) ack 557 win 8576 (DF)
09:59:32.669357 eth0 > capc25.muc.798 > caes04.muc.nfs: . 47020:48480(1460) ack 557 win 8576 (DF)
09:59:32.669365 eth0 > capc25.muc.798 > caes04.muc.nfs: P 48480:49940(1460) ack 557 win 8576 (DF)
09:59:32.867633 eth0 < caes04.muc.nfs > capc25.muc.798: . 557:557(0) ack 49940 win 43308
09:59:32.867668 eth0 > capc25.muc.798 > caes04.muc.nfs: . 49940:51400(1460) ack 557 win 8576 (DF)
09:59:32.867678 eth0 > capc25.muc.798 > caes04.muc.nfs: . 51400:52860(1460) ack 557 win 8576 (DF)
09:59:32.867687 eth0 > capc25.muc.798 > caes04.muc.nfs: P 52860:54320(1460) ack 557 win 8576 (DF)
09:59:32.867696 eth0 > capc25.muc.798 > caes04.muc.nfs: . 54320:55780(1460) ack 557 win 8576 (DF)
09:59:32.867707 eth0 > capc25.muc.798 > caes04.muc.nfs: P 55780:57240(1460) ack 557 win 8576 (DF)
09:59:32.867733 eth0 > capc25.muc.798 > caes04.muc.nfs: . 57240:58700(1460) ack 557 win 8576 (DF)
09:59:32.867745 eth0 > capc25.muc.798 > caes04.muc.nfs: . 58700:60160(1460) ack 557 win 8576 (DF)
09:59:32.867758 eth0 > capc25.muc.798 > caes04.muc.nfs: . 60160:61620(1460) ack 557 win 8576 (DF)
09:59:32.867772 eth0 > capc25.muc.798 > caes04.muc.nfs: . 61620:63080(1460) ack 557 win 8576 (DF)
09:59:32.867781 eth0 > capc25.muc.798 > caes04.muc.nfs: . 63080:64540(1460) ack 557 win 8576 (DF)
.. and so on..
09:59:33.507567 eth0 < caes04.muc.nfs > capc25.muc.798: . 2525:2525(0) ack 462188 win 58968
09:59:33.507587 eth0 > capc25.muc.798 > caes04.muc.nfs: . 478248:479708(1460) ack 2525 win 8576 (DF)
09:59:33.507595 eth0 > capc25.muc.798 > caes04.muc.nfs: . 479708:481168(1460) ack 2525 win 8576 (DF)
09:59:33.507604 eth0 > capc25.muc.798 > caes04.muc.nfs: . 481168:482628(1460) ack 2525 win 8576 (DF)
09:59:33.507613 eth0 > capc25.muc.798 > caes04.muc.nfs: . 482628:484088(1460) ack 2525 win 8576 (DF)
09:59:33.507621 eth0 > capc25.muc.798 > caes04.muc.nfs: . 484088:485548(1460) ack 2525 win 8576 (DF)
09:59:33.507632 eth0 > capc25.muc.798 > caes04.muc.nfs: . 485548:487008(1460) ack 2525 win 8576 (DF)
09:59:33.507644 eth0 > capc25.muc.798 > caes04.muc.nfs: . 487008:488468(1460) ack 2525 win 8576 (DF)
09:59:33.507654 eth0 > capc25.muc.798 > caes04.muc.nfs: . 488468:489928(1460) ack 2525 win 8576 (DF)
09:59:33.507661 eth0 > capc25.muc.798 > caes04.muc.nfs: P 489928:491388(1460) ack 2525 win 8576 (DF)
09:59:33.507672 eth0 > capc25.muc.798 > caes04.muc.nfs: . 491388:492848(1460) ack 2525 win 8576 (DF)
09:59:33.507682 eth0 > capc25.muc.798 > caes04.muc.nfs: P 492848:494040(1192) ack 2525 win 8576 (DF)
09:59:33.507863 eth0 < caes04.muc.nfs > capc25.muc.798: P 2525:2689(164) ack 466568 win 54588
09:59:33.510198 eth0 < caes04.muc.nfs > capc25.muc.798: . 2689:2689(0) ack 494040 win 60032
09:59:33.510770 eth0 < caes04.muc.nfs > capc25.muc.798: P 2689:2853(164) ack 494040 win 60032
09:59:33.510827 eth0 > capc25.muc.798 > caes04.muc.nfs: . 494040:495500(1460) ack 2853 win 8576 (DF)
09:59:33.510838 eth0 > capc25.muc.798 > caes04.muc.nfs: . 495500:496960(1460) ack 2853 win 8576 (DF)
09:59:33.510850 eth0 > capc25.muc.798 > caes04.muc.nfs: . 496960:498420(1460) ack 2853 win 8576 (DF)
09:59:33.510860 eth0 > capc25.muc.798 > caes04.muc.nfs: . 498420:499880(1460) ack 2853 win 8576 (DF)
09:59:33.510875 eth0 > capc25.muc.798 > caes04.muc.nfs: . 499880:501340(1460) ack 2853 win 8576 (DF)
09:59:33.510886 eth0 > capc25.muc.798 > caes04.muc.nfs: . 501340:502800(1460) ack 2853 win 8576 (DF)
09:59:33.510895 eth0 > capc25.muc.798 > caes04.muc.nfs: . 502800:504260(1460) ack 2853 win 8576 (DF)
09:59:33.510905 eth0 > capc25.muc.798 > caes04.muc.nfs: . 504260:505720(1460) ack 2853 win 8576 (DF)
09:59:33.510916 eth0 > capc25.muc.798 > caes04.muc.nfs: . 505720:507180(1460) ack 2853 win 8576 (DF)
09:59:33.510925 eth0 > capc25.muc.798 > caes04.muc.nfs: . 507180:508640(1460) ack 2853 win 8576 (DF)
09:59:33.510935 eth0 > capc25.muc.798 > caes04.muc.nfs: . 508640:510100(1460) ack 2853 win 8576 (DF)
09:59:33.510946 eth0 > capc25.muc.798 > caes04.muc.nfs: . 510100:511560(1460) ack 2853 win 8576 (DF)
09:59:33.510955 eth0 > capc25.muc.798 > caes04.muc.nfs: . 511560:513020(1460) ack 2853 win 8576 (DF)
09:59:33.510965 eth0 > capc25.muc.798 > caes04.muc.nfs: . 513020:514480(1460) ack 2853 win 8576 (DF)
09:59:33.510977 eth0 > capc25.muc.798 > caes04.muc.nfs: P 514480:514668(188) ack 2853 win 8576 (DF)
09:59:33.513913 eth0 < caes04.muc.nfs > capc25.muc.798: . 2853:2853(0) ack 514668 win 60032
09:59:33.514296 eth0 < caes04.muc.nfs > capc25.muc.798: P 2853:3017(164) ack 514668 win 60032
09:59:33.514361 eth0 > capc25.muc.798 > caes04.muc.nfs: P 514668:514808(140) ack 3017 win 8576 (DF)
09:59:33.628254 eth0 < caes04.muc.nfs > capc25.muc.798: P 3017:3173(156) ack 514808 win 60032
09:59:33.665083 eth0 > capc25.muc.798 > caes04.muc.nfs: . 514808:514808(0) ack 3173 win 8576 (DF)
09:59:38.625495 eth0 > capc25.muc.839188480 > caes04.muc.nfs: 40 null (DF)
09:59:38.625997 eth0 < caes04.muc.nfs > capc25.muc.839188480: reply ok 24 null
09:59:49.583361 eth0 > capc25.muc.798 > caes04.muc.nfs: P 514808:514936(128) ack 3173 win 8576 (DF)
09:59:49.584108 eth0 < caes04.muc.nfs > capc25.muc.798: P 3173:3289(116) ack 514936 win 60032
09:59:49.584123 eth0 > capc25.muc.798 > caes04.muc.nfs: . 514936:514936(0) ack 3289 win 8576 (DF)
09:59:49.586188 eth0 > capc25.muc.798 > caes04.muc.nfs: P 514936:515072(136) ack 3289 win 8576 (DF)
09:59:49.586917 eth0 < caes04.muc.nfs > capc25.muc.798: P 3289:3533(244) ack 515072 win 60032
09:59:49.625793 eth0 > capc25.muc.798 > caes04.muc.nfs: . 515072:515072(0) ack 3533 win 8576 (DF)
09:59:52.567700 eth0 > capc25.muc.798 > caes04.muc.nfs: P 515072:515208(136) ack 3533 win 8576 (DF)
09:59:52.568477 eth0 < caes04.muc.nfs > capc25.muc.798: P 3533:3777(244) ack 515208 win 60032
09:59:52.568491 eth0 > capc25.muc.798 > caes04.muc.nfs: . 515208:515208(0) ack 3777 win 8576 (DF)
09:59:52.568530 eth0 > capc25.muc.798 > caes04.muc.nfs: P 515208:515372(164) ack 3777 win 8576 (DF)
09:59:52.584532 eth0 < caes04.muc.nfs > capc25.muc.798: P 3777:3925(148) ack 515372 win 60032
09:59:52.584767 eth0 > capc25.muc.798 > caes04.muc.nfs: . 515372:516832(1460) ack 3925 win 8576 (DF)
09:59:52.584781 eth0 > capc25.muc.798 > caes04.muc.nfs: . 516832:518292(1460) ack 3925 win 8576 (DF)
09:59:52.745784 eth0 < caes04.muc.nfs > capc25.muc.798: . 3925:3925(0) ack 518292 win 57112
09:59:52.745809 eth0 > capc25.muc.798 > caes04.muc.nfs: . 518292:519752(1460) ack 3925 win 8576 (DF)
09:59:52.745820 eth0 > capc25.muc.798 > caes04.muc.nfs: . 519752:521212(1460) ack 3925 win 8576 (DF)
09:59:52.745829 eth0 > capc25.muc.798 > caes04.muc.nfs: . 521212:522672(1460) ack 3925 win 8576 (DF)
09:59:52.946195 eth0 < caes04.muc.nfs > capc25.muc.798: . 3925:3925(0) ack 522672 win 52732
09:59:52.946209 eth0 > capc25.muc.798 > caes04.muc.nfs: . 522672:524132(1460) ack 3925 win 8576 (DF)
09:59:52.946219 eth0 > capc25.muc.798 > caes04.muc.nfs: . 524132:525592(1460) ack 3925 win 8576 (DF)
09:59:52.946228 eth0 > capc25.muc.798 > caes04.muc.nfs: . 525592:527052(1460) ack 3925 win 8576 (DF)
09:59:52.946237 eth0 > capc25.muc.798 > caes04.muc.nfs: . 527052:528512(1460) ack 3925 win 8576 (DF)
09:59:53.146698 eth0 < caes04.muc.nfs > capc25.muc.798: . 3925:3925(0) ack 528512 win 46892
09:59:53.146726 eth0 > capc25.muc.798 > caes04.muc.nfs: . 528512:529972(1460) ack 3925 win 8576 (DF)
09:59:53.146736 eth0 > capc25.muc.798 > caes04.muc.nfs: . 529972:531432(1460) ack 3925 win 8576 (DF)
09:59:53.146746 eth0 > capc25.muc.798 > caes04.muc.nfs: . 531432:532892(1460) ack 3925 win 8576 (DF)
09:59:53.146755 eth0 > capc25.muc.798 > caes04.muc.nfs: . 532892:534352(1460) ack 3925 win 8576 (DF)
09:59:53.146763 eth0 > capc25.muc.798 > caes04.muc.nfs: . 534352:535812(1460) ack 3925 win 8576 (DF)
09:59:53.346907 eth0 < caes04.muc.nfs > capc25.muc.798: . 3925:3925(0) ack 535812 win 39592
.. and so on...
09:59:54.380949 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1010312:1011772(1460) ack 5893 win 8576 (DF)
09:59:54.380956 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1011772:1013232(1460) ack 5893 win 8576 (DF)
09:59:54.380964 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1013232:1014692(1460) ack 5893 win 8576 (DF)
09:59:54.380971 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1014692:1016152(1460) ack 5893 win 8576 (DF)
09:59:54.380978 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1016152:1017612(1460) ack 5893 win 8576 (DF)
09:59:54.382590 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1017612:1019072(1460) ack 5893 win 8576 (DF)
09:59:54.382598 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1019072:1020532(1460) ack 5893 win 8576 (DF)
09:59:54.382603 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1020532:1021992(1460) ack 5893 win 8576 (DF)
09:59:54.382608 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1021992:1023452(1460) ack 5893 win 8576 (DF)
09:59:54.382617 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1023452:1024912(1460) ack 5893 win 8576 (DF)
09:59:54.382623 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1024912:1026372(1460) ack 5893 win 8576 (DF)
09:59:54.382627 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1026372:1027832(1460) ack 5893 win 8576 (DF)
09:59:54.384353 eth0 < caes04.muc.nfs > capc25.muc.798: . 5893:5893(0) ack 1010312 win 58832
09:59:54.384375 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1027832:1029292(1460) ack 5893 win 8576 (DF)
09:59:54.384383 eth0 > capc25.muc.798 > caes04.muc.nfs: P 1029292:1029740(448) ack 5893 win 8576 (DF)
09:59:54.385187 eth0 < caes04.muc.nfs > capc25.muc.798: P 5893:6057(164) ack 1014692 win 54452
09:59:54.385216 eth0 < caes04.muc.nfs > capc25.muc.798: P 6057:6221(164) ack 1017612 win 51532
09:59:54.385237 eth0 < caes04.muc.nfs > capc25.muc.798: P 6221:6385(164) ack 1021992 win 47152
09:59:54.385830 eth0 < caes04.muc.nfs > capc25.muc.798: . 6385:6385(0) ack 1029740 win 60032
09:59:54.386145 eth0 < caes04.muc.nfs > capc25.muc.798: P 6385:6549(164) ack 1029740 win 60032
09:59:54.386153 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1029740:1029740(0) ack 6549 win 8576 (DF)
09:59:58.976285 eth0 > capc25.muc.798 > caes04.muc.nfs: P 1029740:1029880(140) ack 6549 win 8576 (DF)
09:59:59.029839 eth0 < caes04.muc.nfs > capc25.muc.798: P 6549:6705(156) ack 1029880 win 60032
09:59:59.066239 eth0 > capc25.muc.798 > caes04.muc.nfs: . 1029880:1029880(0) ack 6705 win 8576 (DF)
10:00:08.566727 eth0 > capc25.muc.1644494848 > caes04.muc.nfs: 40 null (DF)
10:00:08.567211 eth0 < caes04.muc.nfs > capc25.muc.1644494848: reply ok 24 null
10:00:38.018114 eth0 > capc25.muc.2449801216 > caes04.muc.nfs: 40 null (DF)
10:00:38.018764 eth0 < caes04.muc.nfs > capc25.muc.2449801216: reply ok 24 null




tcpdump on caes04 (AIX)
09:59:31.845604763 capc25.muc.798 > caes04.muc.shilp: P 1035886130:1035886266(136) ack 1814843210 win 7504 (DF)
09:59:31.845756217 caes04.muc.shilp > capc25.muc.798: P 1:245(244) ack 136 win 60032
09:59:31.846304152 capc25.muc.798 > caes04.muc.shilp: . ack 245 win 8576 (DF)
09:59:31.846367867 capc25.muc.798 > caes04.muc.shilp: P 136:300(164) ack 245 win 8576 (DF)
09:59:31.852903330 caes04.muc.shilp > capc25.muc.798: P 245:393(148) ack 300 win 60032
09:59:31.854441517 capc25.muc.798 > caes04.muc.shilp: . 300:1760(1460) ack 393 win 8576 (DF)
09:59:31.854577748 capc25.muc.798 > caes04.muc.shilp: . 1035887890:1035889350(1460) ack 1814843602 win 8576 (DF)
09:59:31.867147719 caes04.muc.shilp > capc25.muc.798: . ack 1460 win 57112
09:59:31.868416956 capc25.muc.798 > caes04.muc.shilp: . 1460:2920(1460) ack 1 win 8576 (DF)
09:59:31.868541353 capc25.muc.798 > caes04.muc.shilp: . 2920:4380(1460) ack 1 win 8576 (DF)
09:59:31.868699607 capc25.muc.798 > caes04.muc.shilp: . 4380:5840(1460) ack 1 win 8576 (DF)
09:59:32.067542678 caes04.muc.shilp > capc25.muc.798: . ack 5840 win 52732
09:59:32.068873485 capc25.muc.798 > caes04.muc.shilp: . 5840:7300(1460) ack 1 win 8576 (DF)
09:59:32.068995737 capc25.muc.798 > caes04.muc.shilp: . 7300:8760(1460) ack 1 win 8576 (DF)
09:59:32.069121401 capc25.muc.798 > caes04.muc.shilp: . 8760:10220(1460) ack 1 win 8576 (DF)
09:59:32.069246176 capc25.muc.798 > caes04.muc.shilp: P 10220:11680(1460) ack 1 win 8576 (DF)
09:59:32.268000574 caes04.muc.shilp > capc25.muc.798: . ack 11680 win 46892
09:59:32.269330237 capc25.muc.798 > caes04.muc.shilp: . 11680:13140(1460) ack 1 win 8576 (DF)
09:59:32.269447200 capc25.muc.798 > caes04.muc.shilp: P 13140:14600(1460) ack 1 win 8576 (DF)
09:59:32.269575574 capc25.muc.798 > caes04.muc.shilp: . 14600:16060(1460) ack 1 win 8576 (DF)
09:59:32.269700305 capc25.muc.798 > caes04.muc.shilp: . 16060:17520(1460) ack 1 win 8576 (DF)
09:59:32.269824713 capc25.muc.798 > caes04.muc.shilp: P 17520:18980(1460) ack 1 win 8576 (DF)
09:59:32.468507295 caes04.muc.shilp > capc25.muc.798: . ack 18980 win 39592
09:59:32.469794655 capc25.muc.798 > caes04.muc.shilp: . 18980:20440(1460) ack 1 win 8576 (DF)
09:59:32.469949532 capc25.muc.798 > caes04.muc.shilp: . 20440:21900(1460) ack 1 win 8576 (DF)
09:59:32.470074273 capc25.muc.798 > caes04.muc.shilp: . 21900:23360(1460) ack 1 win 8576 (DF)
09:59:32.470197836 capc25.muc.798 > caes04.muc.shilp: P 23360:24820(1460) ack 1 win 8576 (DF)
09:59:32.470324178 capc25.muc.798 > caes04.muc.shilp: . 24820:26280(1460) ack 1 win 8576 (DF)
09:59:32.470450442 capc25.muc.798 > caes04.muc.shilp: . 26280:27740(1460) ack 1 win 8576 (DF)
09:59:32.669979456 caes04.muc.shilp > capc25.muc.798: . ack 27740 win 30832
09:59:32.671290251 capc25.muc.798 > caes04.muc.shilp: . 27740:29200(1460) ack 1 win 8576 (DF)
09:59:32.671411003 capc25.muc.798 > caes04.muc.shilp: . 29200:30660(1460) ack 1 win 8576 (DF)
09:59:32.671536311 capc25.muc.798 > caes04.muc.shilp: . 30660:32120(1460) ack 1 win 8576 (DF)
09:59:32.671604804 caes04.muc.shilp > capc25.muc.798: . ack 32120 win 59368
09:59:32.671661586 capc25.muc.798 > caes04.muc.shilp: . 32120:33580(1460) ack 1 win 8576 (DF)
09:59:32.671888756 capc25.muc.798 > caes04.muc.shilp: . 33580:35040(1460) ack 1 win 8576 (DF)
09:59:32.672038243 caes04.muc.shilp > capc25.muc.798: P 1:165(164) ack 35040 win 56448
09:59:32.672178508 capc25.muc.798 > caes04.muc.shilp: . 36500:37960(1460) ack 1 win 8576 (DF)
09:59:32.672918233 capc25.muc.798 > caes04.muc.shilp: . 37960:39420(1460) ack 1 win 8576 (DF)
.. and so on...
09:59:33.515030950 capc25.muc.798 > caes04.muc.shilp: . 493740:495200(1460) ack 2461 win 8576 (DF)
09:59:33.515177881 capc25.muc.798 > caes04.muc.shilp: . 495200:496660(1460) ack 2461 win 8576 (DF)
09:59:33.515304223 capc25.muc.798 > caes04.muc.shilp: . 496660:498120(1460) ack 2461 win 8576 (DF)
09:59:33.515429442 capc25.muc.798 > caes04.muc.shilp: . 498120:499580(1460) ack 2461 win 8576 (DF)
09:59:33.515561750 capc25.muc.798 > caes04.muc.shilp: . 499580:501040(1460) ack 2461 win 8576 (DF)
09:59:33.515695048 capc25.muc.798 > caes04.muc.shilp: . 501040:502500(1460) ack 2461 win 8576 (DF)
09:59:33.515817122 capc25.muc.798 > caes04.muc.shilp: . 502500:503960(1460) ack 2461 win 8576 (DF)
09:59:33.515942297 capc25.muc.798 > caes04.muc.shilp: . 503960:505420(1460) ack 2461 win 8576 (DF)
09:59:33.516066994 capc25.muc.798 > caes04.muc.shilp: . 505420:506880(1460) ack 2461 win 8576 (DF)
09:59:33.516190724 capc25.muc.798 > caes04.muc.shilp: . 506880:508340(1460) ack 2461 win 8576 (DF)
09:59:33.516316599 capc25.muc.798 > caes04.muc.shilp: . 508340:509800(1460) ack 2461 win 8576 (DF)
09:59:33.516443951 capc25.muc.798 > caes04.muc.shilp: . 509800:511260(1460) ack 2461 win 8576 (DF)
09:59:33.516575404 capc25.muc.798 > caes04.muc.shilp: . 511260:512720(1460) ack 2461 win 8576 (DF)
09:59:33.516586905 capc25.muc.798 > caes04.muc.shilp: P 512720:512908(188) ack 2461 win 8576 (DF)
09:59:33.516681467 caes04.muc.shilp > capc25.muc.798: . ack 512908 win 60032
09:59:33.516883991 caes04.muc.shilp > capc25.muc.798: P 2461:2625(164) ack 512908 win 60032
09:59:33.517569101 capc25.muc.798 > caes04.muc.shilp: P 512908:513048(140) ack 2625 win 8576 (DF)
09:59:33.630955691 caes04.muc.shilp > capc25.muc.798: P 2625:2781(156) ack 513048 win 60032
09:59:33.668220832 capc25.muc.798 > caes04.muc.shilp: . ack 2781 win 8576 (DF)
09:59:38.628694302 capc25.muc.32050000 > caes04.muc.nfs: 40 null (DF)
09:59:38.628778496 caes04.muc.nfs > capc25.muc.32050000: reply ok 24
09:59:49.586536582 capc25.muc.798 > caes04.muc.shilp: P 1036400938:1036401066(128) ack 1814846382 win 8576 (DF)
09:59:49.586776164 caes04.muc.shilp > capc25.muc.798: P 1:117(116) ack 128 win 60032
09:59:49.587157099 capc25.muc.798 > caes04.muc.shilp: . ack 117 win 8576 (DF)
09:59:49.589315282 capc25.muc.798 > caes04.muc.shilp: P 128:264(136) ack 117 win 8576 (DF)
09:59:49.589408743 caes04.muc.shilp > capc25.muc.798: P 117:361(244) ack 264 win 60032
09:59:49.629006777 capc25.muc.798 > caes04.muc.shilp: . ack 361 win 8576 (DF)
09:59:52.570844645 capc25.muc.798 > caes04.muc.shilp: P 264:400(136) ack 361 win 8576 (DF)
09:59:52.571043168 caes04.muc.shilp > capc25.muc.798: P 361:605(244) ack 400 win 60032
09:59:52.571576558 capc25.muc.798 > caes04.muc.shilp: . ack 605 win 8576 (DF)
09:59:52.571649841 capc25.muc.798 > caes04.muc.shilp: P 400:564(164) ack 605 win 8576 (DF)
09:59:52.587169789 caes04.muc.shilp > capc25.muc.798: P 605:753(148) ack 564 win 60032
09:59:52.588747424 capc25.muc.798 > caes04.muc.shilp: . 564:2024(1460) ack 753 win 8576 (DF)
09:59:52.588875654 capc25.muc.798 > caes04.muc.shilp: . 515072:516532(1460) ack 3533 win 8576 (DF)
09:59:52.748473502 caes04.muc.shilp > capc25.muc.798: . ack 516532 win 57112
09:59:52.749948552 capc25.muc.798 > caes04.muc.shilp: . 516532:517992(1460) ack 3533 win 8576 (DF)
09:59:52.750054981 capc25.muc.798 > caes04.muc.shilp: . 517992:519452(1460) ack 3533 win 8576 (DF)
09:59:52.750181422 capc25.muc.798 > caes04.muc.shilp: . 519452:520912(1460) ack 3533 win 8576 (DF)
09:59:52.948926208 caes04.muc.shilp > capc25.muc.798: . ack 520912 win 52732
09:59:52.950208057 capc25.muc.798 > caes04.muc.shilp: . 520912:522372(1460) ack 3533 win 8576 (DF)
09:59:52.950317664 capc25.muc.798 > caes04.muc.shilp: . 522372:523832(1460) ack 3533 win 8576 (DF)
09:59:52.950449373 capc25.muc.798 > caes04.muc.shilp: . 523832:525292(1460) ack 3533 win 8576 (DF)
09:59:52.950596493 capc25.muc.798 > caes04.muc.shilp: . 525292:526752(1460) ack 3533 win 8576 (DF)
09:59:53.149427407 caes04.muc.shilp > capc25.muc.798: . ack 526752 win 46892
09:59:53.150752059 capc25.muc.798 > caes04.muc.shilp: . 526752:528212(1460) ack 3533 win 8576 (DF)
09:59:53.150865744 capc25.muc.798 > caes04.muc.shilp: . 528212:529672(1460) ack 3533 win 8576 (DF)
09:59:53.150994330 capc25.muc.798 > caes04.muc.shilp: . 529672:531132(1460) ack 3533 win 8576 (DF)
09:59:53.151120238 capc25.muc.798 > caes04.muc.shilp: . 531132:532592(1460) ack 3533 win 8576 (DF)
09:59:53.151253113 capc25.muc.798 > caes04.muc.shilp: . 532592:534052(1460) ack 3533 win 8576 (DF)
09:59:53.349648355 caes04.muc.shilp > capc25.muc.798: . ack 534052 win 39592
09:59:53.350953271 capc25.muc.798 > caes04.muc.shilp: . 534052:535512(1460) ack 3533 win 8576 (DF)
.. and so on...
09:59:54.385746887 capc25.muc.798 > caes04.muc.shilp: . 999792:1001252(1460) ack 5501 win 8576 (DF)
09:59:54.385886062 capc25.muc.798 > caes04.muc.shilp: . 1001252:1002712(1460) ack 5501 win 8576 (DF)
09:59:54.386048695 capc25.muc.798 > caes04.muc.shilp: P 1002712:1004172(1460) ack 5501 win 8576 (DF)
09:59:54.386173981 capc25.muc.798 > caes04.muc.shilp: . 1004172:1005632(1460) ack 5501 win 8576 (DF)
09:59:54.386299911 capc25.muc.798 > caes04.muc.shilp: . 1005632:1007092(1460) ack 5501 win 8576 (DF)
09:59:54.386425664 capc25.muc.798 > caes04.muc.shilp: P 1007092:1008552(1460) ack 5501 win 8576 (DF)
09:59:54.386469944 caes04.muc.shilp > capc25.muc.798: . ack 1008552 win 58832
09:59:54.386555427 capc25.muc.798 > caes04.muc.shilp: . 1008552:1010012(1460) ack 5501 win 8576 (DF)
09:59:54.386684180 capc25.muc.798 > caes04.muc.shilp: . 1010012:1011472(1460) ack 5501 win 8576 (DF)
09:59:54.386918284 capc25.muc.798 > caes04.muc.shilp: . 1011472:1012932(1460) ack 5501 win 8576 (DF)
09:59:54.386971732 caes04.muc.shilp > capc25.muc.798: P 5501:5665(164) ack 1012932 win 54452
09:59:54.387297675 caes04.muc.shilp > capc25.muc.798: P 5665:5829(164) ack 1015852 win 51532
09:59:54.387677099 caes04.muc.shilp > capc25.muc.798: P 5829:5993(164) ack 1020232 win 47152
09:59:54.387728714 capc25.muc.798 > caes04.muc.shilp: . 1020232:1021692(1460) ack 5501 win 8576 (DF)
09:59:54.387853499 capc25.muc.798 > caes04.muc.shilp: . 1021692:1023152(1460) ack 5501 win 8576 (DF)
09:59:54.387999031 capc25.muc.798 > caes04.muc.shilp: . 1036911042:1036912502(1460) ack 1814849102 win 8576 (DF)
09:59:54.388126872 capc25.muc.798 > caes04.muc.shilp: . 1460:2920(1460) ack 1 win 8576 (DF)

and cp is running in the timeout again....

Cheers,
Birger


Trond Myklebust writes:
 > >>>>> " " == Birger Lammering <b.lammering@science-computing.de> writes:
 > 
 >      > 3133:3289(156) ack 514936 win 60032 16:27:26.282843 >
 >      > capc25.muc.799 > caes04.muc.nfs: . 514936:514936(0) ack 3289
 >      > win 8576 (DF)
 > 
 >      > from now on we get lot's of these:
 > 
 >      > 16:27:26.489024 > capc25.muc.576126976 > caes04.muc.nfs: 40
 >      > null (DF) 16:27:26.489647 < caes04.muc.nfs >
 >      > capc25.muc.576126976: reply ok 24 null
 > 
 >      > The cp command on the Linux nfs3-client side hangs and cannot
 >      > be killed. We get:
 > 
 >      > dmesg: nfs: server caes04 not responding, still trying
 > 
 >      > then after a while: dmesg: nfs: server caes04 OK
 > 
 >      > qx09820@capc25 /home/qx09820 > netstat | grep caes04 tcp 0 0
 >      > capc25.muc:798 caes04.muc:nfs ESTABLISHED
 > 
 > Ho hum... It looks to me as if the problem is that the Linux NFS
 > client is falling asleep before a write, and then not waking
 > up. That sort of points at the write_space() callback.
 > 
 > When the socket buffer is full, and we get an EAGAIN response to our
 > sendmsg() request, we normally put the request to sleep, block the
 > socket, and rely on write_space() to wake us up when there is enough
 > memory to proceed.
 > 
 > Assuming that this is the case, there are 2 possible causes:
 > 
 >    1) A bug in the IPV4 TCP layer in which we don't call write_space()
 >       despite having liberated enough memory to proceed.
 > 
 >    2) I've misunderstood the IPV4 tcp api, and so the check for
 >       sock_writeable() in net/sunrpc/xprt.c:tcp_write_space() is
 >       incorrect.
 > 
 > Alexey: Do you have any comments? Is it correct to check for
 > sock_writeable() on a TCP socket?
 > 
 > 
 > Birger: could you try the following patch, that simply removes the
 > check for sock_writeable()?
 > 
 > Cheers,
 >   Trond
 > 
 > --- linux-2.4.15-pre6/net/sunrpc/xprt.c.orig	Mon Oct  8 21:36:07 2001
 > +++ linux-2.4.15-pre6/net/sunrpc/xprt.c	Mon Nov 19 19:07:09 2001
 > @@ -1071,10 +1071,6 @@
 >  	if (xprt->shutdown)
 >  		return;
 >  
 > -	/* Wait until we have enough socket memory */
 > -	if (!sock_writeable(sk))
 > -		return;
 > -
 >  	if (!xprt_test_and_set_wspace(xprt)) {
 >  		spin_lock(&xprt->sock_lock);
 >  		if (xprt->snd_task && xprt->snd_task->tk_rpcwait == &xprt->sending)
 > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 20:38                             ` Trond Myklebust
  2001-11-19 21:17                               ` Trond Myklebust
@ 2001-11-20 17:41                               ` kuznet
  1 sibling, 0 replies; 29+ messages in thread
From: kuznet @ 2001-11-20 17:41 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-kernel

Hello!

> You are saying that the it is impossible for sock_alloc_send_skb() to
> fail when using non-blocking writes?

It is possible and normal provided frame is not fragmented.
And this is bug in nfsd if this happens with its frames.


> writes. (Note: by 'simultaneous' I mean that we don't wait for the
> server to reply before firing off the next request)

I do not understand, you have said you wait for write space yet. :-)


> I haven't done anything about this because IMHO it makes more sense to
> have the QDIO driver drop their special spinlock when calling external
> functions such as dev_kfree_skb_any() 

It is pretty normal, if I understand your words correctly.
kfree_skb() is called under various kinds of locks in lots of places.

> rather than to force the RPC layer to use the spin_lock_irqsave().

I see no relation at all. Do it irqsave and nothing will change,
write_space is called only from softirqs.

It is bug in xprt level to grab spinlock which can cause deadlocks
inside write_space. Probably, I misunderstood you.

Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-19 21:17                               ` Trond Myklebust
@ 2001-11-20 17:42                                 ` kuznet
  2001-11-20 19:39                                   ` Trond Myklebust
  0 siblings, 1 reply; 29+ messages in thread
From: kuznet @ 2001-11-20 17:42 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel

Hello!

> I forgot to add: The socket fasync lists use spinlocking in the same
> was as RPC does, with sock_fasync() setting
> write_lock_bh(&sk->callback_lock), and sock_def_write_space()
> doing read_lock(&sk->callback_lock).
> 
> So that would deadlock with the QDIO driver in the exact same manner
> as the RPC stuff (albeit probably a lot less frequently).

Please, elaborate. I do not see any way.

Alexey


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-20 17:42                                 ` kuznet
@ 2001-11-20 19:39                                   ` Trond Myklebust
  2001-11-20 19:45                                     ` kuznet
  0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2001-11-20 19:39 UTC (permalink / raw)
  To: kuznet; +Cc: linux-kernel

>>>>> " " == kuznet  <kuznet@ms2.inr.ac.ru> writes:

     > Hello!
    >> I forgot to add: The socket fasync lists use spinlocking in the
    >> same was as RPC does, with sock_fasync() setting
    >> write_lock_bh(&sk->callback_lock), and sock_def_write_space()
    >> doing read_lock(&sk->callback_lock).
    >>
    >> So that would deadlock with the QDIO driver in the exact same
    >> manner as the RPC stuff (albeit probably a lot less
    >> frequently).

     > Please, elaborate. I do not see any way.


Processor 1                                       Processor 2


(Call QDIO bottom half code)
spin_lock(&QDIO_lock);
                                                write_lock_bh(&sk->callback_lock)
dev_kfree_skb_any()
   -> kfree_skb()
                                                 <QDIO hard interrupt>
     -> write_space()
                                                       ->spin_lock(&QDIO_lock)
                                                              (spins...)
        ->read_lock(&sk->callback_lock);
               (spins)


Deadlock - in exactly the same way as with the xprt code...

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-20 19:39                                   ` Trond Myklebust
@ 2001-11-20 19:45                                     ` kuznet
  2001-11-20 20:05                                       ` Trond Myklebust
                                                         ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: kuznet @ 2001-11-20 19:45 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-kernel

Hello!

> Deadlock - in exactly the same way as with the xprt code...

Soooory! Delete from the picture all except for containing QDIO:


(Call QDIO bottom half code)
spin_lock(&QDIO_lock);
                                                 <QDIO hard interrupt>
                                                       ->spin_lock(&QDIO_lock)
                                                              (spins...)

with the same result. No help of nfs is required. :-)

So, you have drawn wrong picture.


Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-20 19:45                                     ` kuznet
@ 2001-11-20 20:05                                       ` Trond Myklebust
  2001-11-20 20:18                                       ` David S. Miller
  2001-11-21 10:07                                       ` Trond Myklebust
  2 siblings, 0 replies; 29+ messages in thread
From: Trond Myklebust @ 2001-11-20 20:05 UTC (permalink / raw)
  To: kuznet; +Cc: linux-kernel

>>>>> " " == kuznet  <kuznet@ms2.inr.ac.ru> writes:

     > Hello!
    >> Deadlock - in exactly the same way as with the xprt code...

     > Soooory! Delete from the picture all except for containing
     > QDIO:


     > (Call QDIO bottom half code) spin_lock(&QDIO_lock);
     >                                                  <QDIO hard
     >                                                  interrupt>
    -> spin_lock(&QDIO_lock)
     >                                                               (spins...)

     > with the same result. No help of nfs is required. :-)

     > So, you have drawn wrong picture.

>From the mail I received, I gathered that they were protected against
this. The hard interrupt could only occur on another processor.

In any case, my point is that the xprt stuff does *nothing* that is
not also done in the fasync code. If a spinlock deadlock scenario is
possible in one, then it is also possible in the other.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-20 19:45                                     ` kuznet
  2001-11-20 20:05                                       ` Trond Myklebust
@ 2001-11-20 20:18                                       ` David S. Miller
  2001-11-20 20:28                                         ` more tcpdumpinfo for nfs3 problem: aix-server --- linux kuznet
  2001-11-20 20:29                                         ` more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client Trond Myklebust
  2001-11-21 10:07                                       ` Trond Myklebust
  2 siblings, 2 replies; 29+ messages in thread
From: David S. Miller @ 2001-11-20 20:18 UTC (permalink / raw)
  To: kuznet; +Cc: trond.myklebust, linux-kernel

   From: kuznet@ms2.inr.ac.ru
   Date: Tue, 20 Nov 2001 22:45:21 +0300 (MSK)

   No help of nfs is required. :-)
   
   So, you have drawn wrong picture.

Trond, Alexey is right, all the examples of sunrpc/fasync/etc.
are totally irrelevant.  This driver is buggy beyond belief.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux
  2001-11-20 20:18                                       ` David S. Miller
@ 2001-11-20 20:28                                         ` kuznet
  2001-11-20 20:29                                         ` more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client Trond Myklebust
  1 sibling, 0 replies; 29+ messages in thread
From: kuznet @ 2001-11-20 20:28 UTC (permalink / raw)
  To: David S. Miller; +Cc: trond.myklebust, linux-kernel

Hello!

> Trond, Alexey is right, all the examples of sunrpc/fasync/etc.
> are totally irrelevant.  This driver is buggy beyond belief.

It is he who is right and said this. :-) And me farted.

Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-20 20:18                                       ` David S. Miller
  2001-11-20 20:28                                         ` more tcpdumpinfo for nfs3 problem: aix-server --- linux kuznet
@ 2001-11-20 20:29                                         ` Trond Myklebust
  1 sibling, 0 replies; 29+ messages in thread
From: Trond Myklebust @ 2001-11-20 20:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: kuznet, linux-kernel

>>>>> " " == David S Miller <davem@redhat.com> writes:

     > Trond, Alexey is right, all the examples of sunrpc/fasync/etc.
     > are totally irrelevant.  This driver is buggy beyond belief.

Right: I wasn't trying to defend it. Rather I wanted to explain why I
believe that this particular driver deadlock is not a good enough
reason to change the current RPC code.

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-20 19:45                                     ` kuznet
  2001-11-20 20:05                                       ` Trond Myklebust
  2001-11-20 20:18                                       ` David S. Miller
@ 2001-11-21 10:07                                       ` Trond Myklebust
  2001-11-21 17:05                                         ` kuznet
  2 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2001-11-21 10:07 UTC (permalink / raw)
  To: kuznet, David S. Miller; +Cc: linux-kernel

On Tuesday 20. November 2001 20:45, kuznet@ms2.inr.ac.ru wrote:

> (Call QDIO bottom half code)
> spin_lock(&QDIO_lock);
>                                                  <QDIO hard interrupt>
>                                                       
>                                                  ->spin_lock(&QDIO_lock)
>                                                  (spins...)
>
> with the same result. No help of nfs is required. :-)

Now that my blood caffeine levels are above the 'sleep' watermark, I should 
probably correct the above in case Ulrich decides to slap us both with a 
slander lawsuit 8-)...

Since (as I said in the original mail) CPU 1 is in a bottom half context, 
both local bh and local interrupts should be disabled on that process!

Basically, the deadlock between the QDIO driver and RPC/fasync can be reduced 
to:

CPU 1                                       CPU 2

(In BH context)                             (In ordinary process context)
spin_lock_irq(&lock1);
                                            spin_lock_bh(&lock2);
dev_kfree_skb_any(skb);
                                            <QDIO Hard interrupt>
                                            (switch to QDIO interrupt context)
    -> sk->write_space();
                                            spin_lock(&lock1);
         -> spin_lock(&lock2);


IOW:
    Either we must demand that CPU 2 uses irq-safe spinlocks in order to 
protect against sk->write_space(), or we must demand that CPU 1 should drop 
'lock1' before being allowed to call dev_kfree_skb_any().

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client
  2001-11-21 10:07                                       ` Trond Myklebust
@ 2001-11-21 17:05                                         ` kuznet
  0 siblings, 0 replies; 29+ messages in thread
From: kuznet @ 2001-11-21 17:05 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: davem, linux-kernel

Hello!

>     Either we must demand that CPU 2 uses irq-safe spinlocks in order to 
> protect against sk->write_space(),

Never. :-)

>				 or we must demand that CPU 1 should drop 
> 'lock1' before being allowed to call dev_kfree_skb_any().

Yes, alas.

Being pretty evident after seen once, this rule was _not_ evident for me
until yesterday. Actually, this is very sad observation, because core
has no way to detect this is a generic way...

Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2001-11-21 17:06 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-15 21:29 nfs problem: hp-server --- linux 2.4.13 client, ooops Birger Lammering
2001-11-15 23:58 ` Trond Myklebust
2001-11-16 11:24   ` nfs problem: hp|aix-server --- linux 2.4.15pre5 client Birger Lammering
2001-11-16 11:45     ` Trond Myklebust
2001-11-16 12:01       ` Miquel van Smoorenburg
2001-11-16 12:24         ` Trond Myklebust
2001-11-16 13:19       ` nfs problem: aix-server " Birger Lammering
     [not found]         ` <15349.39320.876188.274548@charged.uio.no>
2001-11-19 10:20           ` IPV4 socket layer, was: " Birger Lammering
2001-11-19 11:16             ` Alan Cox
2001-11-19 16:37               ` more tcpdumpinfo for nfs3 " Birger Lammering
2001-11-19 18:22                 ` Trond Myklebust
2001-11-19 18:49                   ` kuznet
2001-11-19 18:55                     ` Trond Myklebust
2001-11-19 19:09                       ` kuznet
2001-11-19 19:29                         ` Trond Myklebust
2001-11-19 19:52                           ` kuznet
2001-11-19 20:38                             ` Trond Myklebust
2001-11-19 21:17                               ` Trond Myklebust
2001-11-20 17:42                                 ` kuznet
2001-11-20 19:39                                   ` Trond Myklebust
2001-11-20 19:45                                     ` kuznet
2001-11-20 20:05                                       ` Trond Myklebust
2001-11-20 20:18                                       ` David S. Miller
2001-11-20 20:28                                         ` more tcpdumpinfo for nfs3 problem: aix-server --- linux kuznet
2001-11-20 20:29                                         ` more tcpdumpinfo for nfs3 problem: aix-server --- linux 2.4.15pre5 client Trond Myklebust
2001-11-21 10:07                                       ` Trond Myklebust
2001-11-21 17:05                                         ` kuznet
2001-11-20 17:41                               ` kuznet
2001-11-20  9:26                   ` Birger Lammering

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox