public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.19 xfs crash spawns tcp reclen errors off nfs
@ 2006-12-20 20:55 Mr. Berkley Shands
  2006-12-20 23:46 ` David Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Mr. Berkley Shands @ 2006-12-20 20:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: Dave Lloyd

[-- Attachment #1: Type: text/plain, Size: 6106 bytes --]

We have had several really strange NFS issues, reported out of
net/sunrpc/svcsock.c in

static int
svc_tcp_recvfrom(struct svc_rqst *rqstp)

        svsk->sk_reclen &= 0x7fffffff;
        dprintk("svc: TCP record, %d bytes\n", svsk->sk_reclen);
        if (svsk->sk_reclen > serv->sv_bufsz) {
            printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n",
                   (unsigned long) svsk->sk_reclen);
            printk(KERN_NOTICE "sockaddr_in 0x%04x port 0x%04x max 
0x%08lx\n",
               rqstp->rq_addr.sin_addr.s_addr,
               rqstp->rq_addr.sin_port, (long) serv->sv_bufsz);
            goto err_delete;
        }

The size reported grows, and grows, and...

the server runs 2.6.19 when XFS has its issues (as follows)
2TB partition, NFS exported. All machines are x86_64 AMD 275's with 16GB
and running redhat AS 4.4 (Centos 4.4)

a typical crash under 2.6.19 is

Dec 14 09:13:48 sanandreas kernel: Filesystem "sdb1": XFS internal error 
xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.  Caller 
0xffffffff881ceaa8
Dec 14 09:13:48 sanandreas kernel:
Dec 14 09:13:48 sanandreas kernel: Call Trace:
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff881c79e9>] 
:xfs:xfs_trans_cancel+0x5f/0xfa
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff881ceaa8>] 
:xfs:xfs_create+0x62e/0x686
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff802f62be>] __up_read+0x10/0x8a
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff881d8730>] 
:xfs:xfs_vn_mknod+0x1a2/0x3c5
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff804440a6>] 
_spin_lock_irqsave+0x9/0xe
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff802f62be>] __up_read+0x10/0x8a
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff881c8413>] 
:xfs:xfs_trans_unlocked_item+0x22/0x3c
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff881cd4c3>] 
:xfs:xfs_access+0x3d/0x46
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff881d8dce>] 
:xfs:xfs_vn_permission+0x11/0x15
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff8027fe9f>] 
permission+0xcd/0xd6
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff80280615>] 
__link_path_walk+0x16c/0xdb7
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff8028d8f1>] 
mntput_no_expire+0x17/0x8f
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff80281322>] 
link_path_walk+0xc2/0xd0
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff80281bde>] 
vfs_create+0xdf/0x14b
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff8028200c>] 
open_namei+0x18e/0x662
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff802789a1>] 
do_filp_open+0x1c/0x3d
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff80278b92>] 
get_unused_fd+0xea/0xf9
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff80278cba>] 
do_sys_open+0x44/0xbe
Dec 14 09:13:48 sanandreas kernel:  [<ffffffff8020979e>] 
system_call+0x7e/0x83
Dec 14 09:13:48 sanandreas kernel:
Dec 14 09:13:48 sanandreas kernel: xfs_force_shutdown(sdb1,0x8) called 
from line 1139 of file fs/xfs/xfs_trans.c.  Return address = 
0xffffffff881c7a07
Dec 14 09:13:48 sanandreas kernel: Filesystem "sdb1": Corruption of 
in-memory data detected.  Shutting down filesystem: sdb1
Dec 14 09:13:48 sanandreas kernel: Please umount the filesystem, and 
rectify the problem(s)


this then causes the heavy write clients to write garbage NFS request that
grow...

Dec 20 14:34:15 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 
0x00008400
Dec 20 14:34:18 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:18 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 
0x00008400
Dec 20 14:34:21 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:21 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 
0x00008400
Dec 20 14:34:24 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:24 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 
0x00008400
Dec 20 14:34:27 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:27 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 
0x00008400
Dec 20 14:34:30 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:30 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 
0x00008400
Dec 20 14:34:33 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:33 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 
0x00008400

The clients run 2.6.19. the servers crash once a day when the heavy 
simulations start up.
eth0 and eth1 are bonded into BOND0, tyan 2895 or SuperMicro H8DC8 
motherboards.

Attached is a tarball, bzip'ed of the tcpdump traces and the 
/var/log/messages file
bshands@crash.eng.exegy.net home/bshands/Documents> ls -l reclen.tar.bz2
-rw-r--r--  1 bshands dssi 24379 Dec 20 14:47 reclen.tar.bz2
bshands@crash.eng.exegy.net home/bshands/Documents> md5sum reclen.tar.bz2
f449a3c83ca79cad04d8a63e6e253604  reclen.tar.bz2
bshands@crash.eng.exegy.net /bulk/tmp> ls -l messages reclen.tcp*
-rw-r--r--  1 bshands dssi 37005 Dec 20 14:35 messages
-rw-r--r--  1 root    dssi 57946 Dec 20 14:35 reclen.tcp0
-rw-r--r--  1 root    dssi 40678 Dec 20 14:35 reclen.tcp1


Neil Brown indicated that this NFS packet growth should have been fixed in
2.6.18-rc5, but it still happens, easily triggered by the XFS shutdown.
the packet sizes reported can grow to > 1GB.
Dropping back to 2.6.18.1 stops the XFS crashing, and hence stops the 
NFS mess.

berkley






-- 

//E. F. Berkley Shands, MSc//

**Exegy Inc.**

3668 S. Geyer Road, Suite 300

St. Louis, MO  63127

Direct:  (314) 450-5348

Cell:  (314) 303-2546

Office:  (314) 450-5353

Fax:  (314) 450-5354

 

This e-mail and any documents accompanying it may contain legally 
privileged and/or confidential information belonging to Exegy, Inc.  
Such information may be protected from disclosure by law.  The 
information is intended for use by only the addressee.  If you are not 
the intended recipient, you are hereby notified that any disclosure or 
use of the information is strictly prohibited.  If you have received 
this e-mail in error, please immediately contact the sender by e-mail or 
phone regarding instructions for return or destruction and do not use or 
disclose the content to others.

 


[-- Attachment #2: reclen.tar.bz2 --]
[-- Type: application/x-bzip2, Size: 24379 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2006-12-20 23:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-20 20:55 2.6.19 xfs crash spawns tcp reclen errors off nfs Mr. Berkley Shands
2006-12-20 23:46 ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox