public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* strange linux kernel NFS problem(s)
@ 2010-12-03  2:40 Doug Hughes
  2010-12-03 17:36 ` John Stoffel
  0 siblings, 1 reply; 3+ messages in thread
From: Doug Hughes @ 2010-12-03  2:40 UTC (permalink / raw)
  To: linux-kernel


So, this is my first post, but not my first problem of this nature. It 
just so happens that this is the first one with a recent kernel to give 
useful data, useful enough to post it and seek some advice on the subject:

symptoms: machine gets high load, nfs mount processes hang, and things 
(particularly NFS) stop working. ssh and ip connectivity still works, as 
does ps.

*general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
CPU 1
Modules linked in: nfs auth_rpcgss autofs4 i2c_dev i2c_core lockd sunrpc 
cachefiles fscache ipmi_si ipmi_devintf ipmi_msghandler ip6t_REJECT 
xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 video output battery 
ac parport_pc lp parport joydev button sr_mod pcspkr iTCO_wdt shpchp 
dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage 
pata_acpi ata_piix ata_generic libata uhci_hcd ohci_hcd ehci_hcd [last 
unloaded: microcode]

Pid: 28573, comm: python2.5 Not tainted 2.6.34 #3 X7DWT/X7DWT
RIP: 0010:[<ffffffffa0292cdb>]  [<ffffffffa0292cdb>] 
nfs_release+0x64/0x94 [nfs]
RSP: 0018:ffff88041ccb9d58  EFLAGS: 00010246
RAX: ffff88041c47d160 RBX: ffff88041c47d1e8 RCX: ff88041c47d16088
RDX: ffff88042c593288 RSI: ffff88042c504e40 RDI: ffff88041c47d294
RBP: ffff88041ccb9d78 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000300000000 R11: 0000000000000000 R12: ffff88042c593240
R13: ffff88042c504e40 R14: ffff88041ea59ec0 R15: ffff8804273f55c0
FS:  0000000000000000(0000) GS:ffff880001840000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000003fd5c03350 CR3: 0000000001613000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process python2.5 (pid: 28573, threadinfo ffff88041ccb8000, task 
ffff8803e246adf0)
Stack:
  0000000300000000 ffff88042c504e40 ffff88041c47d1e8 ffff88041c47d1e8
<0> ffff88041ccb9d98 ffffffffa0290fc5 0000000000000010 ffff88042c504e40
<0> ffff88041ccb9dd8 ffffffff810a75b7 ffff88042caf3120 ffff88042c687768
Call Trace:
  [<ffffffffa0290fc5>] nfs_file_release+0x5c/0x61 [nfs]
  [<ffffffff810a75b7>] __fput+0xf6/0x1bf
  [<ffffffff810a78ba>] fput+0x15/0x17
  [<ffffffff8108ccff>] remove_vma+0x36/0x6c
  [<ffffffff8108ce54>] exit_mmap+0x11f/0x141
  [<ffffffff81030119>] mmput+0x2d/0xc3
  [<ffffffff81033e9f>] exit_mm+0x10b/0x118
  [<ffffffff81064b75>] ? audit_free+0x191/0x1c4
  [<ffffffff81035074>] do_exit+0x200/0x685
  [<ffffffff81035567>] do_group_exit+0x6e/0x98
  [<ffffffff810355a3>] sys_exit_group+0x12/0x16
  [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b
Code: 11 e1 49 8d 54 24 48 49 8b 4c 24 48 48 8b 42 08 48 89 41 08 48 89 
08 48 8d 83 78 ff ff ff 48 8b 48 08 49 89 44 24 48 48 89 50 08 <48> 89 
11 48 89 4a 08 fe 83 ac 00 00 00 41 8b 75 38 4c 89 e7 81
RIP  [<ffffffffa0292cdb>] nfs_release+0x64/0x94 [nfs]
  RSP <ffff88041ccb9d58>
---[ end trace 1ac7372e162481b8 ]---
Fixing recursive fault but reboot is needed!
mount: server antonrootfs.d.stor.en.desres.deshaw.com not responding, 
timed out
[root@antonfe0002 ~]# uptime
  20:58:04 up 12 days,  1:05,  4 users,  load average: 20.98, 20.23, 18.99
*
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Nov20 ?        00:00:04 init [3]
root         2     0  0 Nov20 ?        00:00:00 [kthreadd]
root         3     2  0 Nov20 ?        00:00:00 [migration/0]
root         4     2  0 Nov20 ?        02:42:37 [ksoftirqd/0]
root         5     2  0 Nov20 ?        00:00:00 [migration/1]
root         6     2  3 Nov20 ?        10:04:25 [ksoftirqd/1]
root         7     2  0 Nov20 ?        00:00:00 [migration/2]
root         8     2  0 Nov20 ?        01:39:58 [ksoftirqd/2]
root         9     2  0 Nov20 ?        00:00:00 [migration/3]
root        10     2  4 Nov20 ?        13:28:17 [ksoftirqd/3]
root        11     2  0 Nov20 ?        00:00:00 [migration/4]
root        12     2  7 Nov20 ?        20:39:20 [ksoftirqd/4]
root        13     2  0 Nov20 ?        00:00:00 [migration/5]
root        14     2  0 Nov20 ?        00:06:39 [ksoftirqd/5]
root        15     2  0 Nov20 ?        00:00:00 [migration/6]
root        16     2  7 Nov20 ?        21:56:03 [ksoftirqd/6]
root        17     2  0 Nov20 ?        00:00:00 [migration/7]
root        18     2  1 Nov20 ?        03:06:59 [ksoftirqd/7]
root        19     2  0 Nov20 ?        00:00:06 [events/0]
root        20     2  0 Nov20 ?        00:00:22 [events/1]
root        21     2  0 Nov20 ?        00:00:09 [events/2]
root        22     2  0 Nov20 ?        00:00:08 [events/3]
root        23     2  0 Nov20 ?        00:00:05 [events/4]
root        24     2  0 Nov20 ?        00:00:33 [events/5]
root        25     2  0 Nov20 ?        00:00:07 [events/6]
root        26     2  0 Nov20 ?        00:00:12 [events/7]
root        27     2  0 Nov20 ?        00:00:00 [khelper]
root        32     2  0 Nov20 ?        00:00:00 [async/mgr]
root       175     2  0 Nov20 ?        00:00:00 [sync_supers]
root       177     2  0 Nov20 ?        00:00:00 [bdi-default]
root       178     2  0 Nov20 ?        00:00:00 [kintegrityd/0]
root       179     2  0 Nov20 ?        00:00:00 [kintegrityd/1]
root       180     2  0 Nov20 ?        00:00:00 [kintegrityd/2]
root       181     2  0 Nov20 ?        00:00:00 [kintegrityd/3]
root       182     2  0 Nov20 ?        00:00:00 [kintegrityd/4]
root       183     2  0 Nov20 ?        00:00:00 [kintegrityd/5]
root       184     2  0 Nov20 ?        00:00:00 [kintegrityd/6]
root       185     2  0 Nov20 ?        00:00:00 [kintegrityd/7]
root       186     2  0 Nov20 ?        00:00:00 [kblockd/0]
root       187     2  0 Nov20 ?        00:00:00 [kblockd/1]
root       188     2  0 Nov20 ?        00:00:00 [kblockd/2]
root       189     2  0 Nov20 ?        00:00:00 [kblockd/3]
root       190     2  0 Nov20 ?        00:00:00 [kblockd/4]
root       191     2  0 Nov20 ?        00:00:00 [kblockd/5]
root       192     2  0 Nov20 ?        00:00:00 [kblockd/6]
root       193     2  0 Nov20 ?        00:00:00 [kblockd/7]
root       195     2  0 Nov20 ?        00:00:00 [kacpid]
root       196     2  0 Nov20 ?        00:00:00 [kacpi_notify]
root       197     2  0 Nov20 ?        00:00:00 [kacpi_hotplug]
root       304     2  0 Nov20 ?        00:00:00 [khubd]
root       307     2  0 Nov20 ?        00:00:00 [kseriod]
root       416     2  0 Nov20 ?        00:00:00 [kswapd0]
root       417     2  0 Nov20 ?        00:00:00 [aio/0]
root       418     2  0 Nov20 ?        00:00:00 [aio/1]
root       419     2  0 Nov20 ?        00:00:00 [aio/2]
root       420     2  0 Nov20 ?        00:00:00 [aio/3]
root       421     2  0 Nov20 ?        00:00:00 [aio/4]
root       422     2  0 Nov20 ?        00:00:00 [aio/5]
root       423     2  0 Nov20 ?        00:00:00 [aio/6]
root       424     2  0 Nov20 ?        00:00:00 [aio/7]
root       426     2  0 Nov20 ?        00:00:00 [crypto/0]
root       427     2  0 Nov20 ?        00:00:00 [crypto/1]
root       428     2  0 Nov20 ?        00:00:00 [crypto/2]
root       429     2  0 Nov20 ?        00:00:00 [crypto/3]
root       430     2  0 Nov20 ?        00:00:00 [crypto/4]
root       431     2  0 Nov20 ?        00:00:00 [crypto/5]
root       432     2  0 Nov20 ?        00:00:00 [crypto/6]
root       433     2  0 Nov20 ?        00:00:00 [crypto/7]
root       635     2  0 Nov20 ?        00:00:00 [kpsmoused]
root       656     2  0 Nov20 ?        00:00:02 [edac-poller]
root       701     2  0 Nov20 ?        00:00:00 [usbhid_resumer]
root       713     2  0 Nov20 ?        00:00:00 [ata/0]
root       714     2  0 Nov20 ?        00:00:00 [ata/1]
root       715     2  0 Nov20 ?        00:00:00 [ata/2]
root       716     2  0 Nov20 ?        00:00:00 [ata/3]
root       717     2  0 Nov20 ?        00:00:00 [ata/4]
root       718     2  0 Nov20 ?        00:00:00 [ata/5]
root       719     2  0 Nov20 ?        00:00:00 [ata/6]
root       720     2  0 Nov20 ?        00:00:00 [ata/7]
root       721     2  0 Nov20 ?        00:00:00 [ata_aux]
root       724     2  0 Nov20 ?        00:00:00 [scsi_eh_0]
root       725     2  0 Nov20 ?        00:00:00 [scsi_eh_1]
root       733     2  0 Nov20 ?        00:00:00 [scsi_eh_2]
root       734     2  0 Nov20 ?        00:00:00 [usb-storage]
root       753     2  0 Nov20 ?        00:00:00 [kstriped]
root       759     2  0 Nov20 ?        00:00:00 [ksnapd]
root       763     2  0 Nov20 ?        00:33:13 [md3_raid1]
root       766     2  0 Nov20 ?        00:00:24 [md2_raid1]
root       769     2  0 Nov20 ?        00:00:46 [md1_raid1]
root       772     2  0 Nov20 ?        00:00:49 [md0_raid1]
root       777     2  0 Nov20 ?        00:00:00 [kjournald]
root       803     2  0 Nov20 ?        00:00:00 [kauditd]
root       840     1  0 Nov20 ?        00:00:03 /sbin/udevd -d
root      1450  3450  0 20:01 ?        00:00:00 crond
root      1451  1450  0 20:01 ?        00:00:00 /bin/bash 
/usr/bin/run-parts /et
root      1452  1451  0 20:01 ?        00:00:00 /bin/bash 
/etc/cron.hourly/mcelo
root      1453  1451  0 20:01 ?        00:00:00 awk -v 
progname=/etc/cron.hourly
root      1454  1452  0 20:01 ?        00:00:00 /usr/sbin/mcelog 
--ignorenodev -
0001001   2207  3393  0 20:10 ?        00:00:00 sshd: 0001001 [priv]
sshd      2208  2207  0 20:10 ?        00:00:00 sshd: 0001001 [net]
root      2210  3230  0 20:10 ?        00:00:00 /bin/mount -t nfs -s -o 
retry=10
root      2211  2210  0 20:10 ?        00:00:00 /sbin/mount.nfs fish1.nyc
root      2323     2  0 Nov20 ?        00:00:00 [kdmflush]
root      2358     2  0 Nov20 ?        00:00:00 [kjournald]
root      2359     2  0 Nov20 ?        00:00:01 [kjournald]
root      2585  3393  0 12:43 ?        00:00:00 sshd: 001002[priv]
001002    2590  2585  0 12:43 ?        00:00:00 sshd: 001002@pts/3
001002    2591  2590  0 12:43 pts/3    00:00:00 -bash
root      2740     2  0 17:53 ?        00:00:00 [kslowd000]
root      2933     1  0 Nov20 ?        00:00:00 auditd
root      2935  2933  0 Nov20 ?        00:00:00 /sbin/audispd
root      2962     2  0 Nov20 ?        00:26:41 [kipmi0]
root      2981     1  0 Nov20 ?        00:00:01 syslogd -m 0
root      2984     1  0 Nov20 ?        00:00:00 klogd -x
root      3019     1  0 Nov20 ?        00:00:00 cachefilesd
root      3031     1  0 Nov20 ?        00:01:50 irqbalance
rpc       3047     1  0 Nov20 ?        00:00:00 portmap
root      3073     2  0 Nov20 ?        00:00:00 [rpciod/0]
root      3074     2  0 Nov20 ?        00:00:00 [rpciod/1]
root      3075     2  0 Nov20 ?        00:00:00 [rpciod/2]
root      3076     2  0 Nov20 ?        00:00:00 [rpciod/3]
root      3077     2  0 Nov20 ?        00:00:00 [rpciod/4]
root      3078     2  0 Nov20 ?        00:00:00 [rpciod/5]
root      3079     2  0 Nov20 ?        00:00:00 [rpciod/6]
root      3080     2  0 Nov20 ?        00:00:00 [rpciod/7]
root      3086     1  0 Nov20 ?        00:00:00 rpc.statd
root      3135     1  0 Nov20 ?        00:00:02 mdadm --monitor --scan 
-f --pid-
root      3156     1  0 Nov20 ?        00:00:01 rpc.idmapd
root      3195     1  0 Nov20 ?        00:00:00 /usr/sbin/acpid
root      3230     1  0 Nov20 ?        00:02:33 automount
daemon    3318     1  0 Nov20 ?        00:00:35 /usr/sbin/munged
root      3333     1  0 Nov20 ?        00:02:07 /usr/sbin/snmpd -Lsd -Lf 
/dev/nu
distcc    3378     1  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
distcc    3379  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
root      3393     1  0 Nov20 ?        00:00:00 /usr/sbin/sshd
distcc    3412  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
distcc    3414  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
root      3450     1  0 Nov20 ?        00:00:01 crond
distcc    3459  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
root      3466     1  0 Nov20 ?        00:00:00 /opt/slurm/sbin/slurmd
postfix   3476     1  0 Nov20 ?        00:00:00 /usr/sbin/nullmailer-send
root      3496     1  0 Nov20 ?        00:00:00 /usr/sbin/atd
distcc    3564  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
distcc    3594  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
root      3596     1  0 Nov20 ?        00:00:00 /usr/sbin/smartd -q never
root      3599     1  0 Nov20 tty1     00:00:00 /sbin/mingetty tty1
root      3600     1  0 Nov20 tty2     00:00:00 /sbin/mingetty tty2
root      3601     1  0 Nov20 tty3     00:00:00 /sbin/mingetty tty3
root      3602     1  0 Nov20 tty4     00:00:00 /sbin/mingetty tty4
root      3603     1  0 Nov20 tty5     00:00:00 /sbin/mingetty tty5
root      3604     1  0 Nov20 tty6     00:00:00 /sbin/mingetty tty6
distcc    3618  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
distcc    3620  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
distcc    3623  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
distcc    3626  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
--daemon --allo
root      3638     1  0 Nov20 ttyS1    00:00:00 /sbin/agetty -L ttyS1 
19200 vt10
root      3639     1  0 Nov20 ttyS0    00:00:00 /sbin/agetty -L ttyS0 
115200 vt1
root      3650     2  0 Nov20 ?        00:00:00 [nfsiod]
root      4782     1  0 Nov20 ?        00:00:33 /usr/bin/python 
/opt/rocks/bin/g
nobody    4824     1  0 Nov20 ?        00:00:35 /usr/sbin/gmond
root      5164  3393  0 20:48 ?        00:00:00 sshd: root@pts/8
001003    5211     1  0 20:48 ?        00:00:00 /usr/bin/xauth -q -
root      6264  3393  0 20:57 ?        00:00:00 sshd: root@pts/10
root      6274  6264  0 20:57 pts/10   00:00:00 -bash
root      6335  6274  0 20:58 pts/10   00:00:00 ps -ef
root      7138     2  0 Nov20 ?        00:00:00 [lockd]
001003    7607     1  0 17:55 ?        00:00:00 -bash
root      7890  3393  0 Nov20 ?        00:00:00 sshd: 001004 [priv]
001004    7898  7890  0 Nov20 ?        00:00:03 sshd: 001004@pts/0
001004    7899  7898  0 Nov20 pts/0    00:00:00 -tcsh
root     25087     2  0 16:12 ?        00:00:00 [kslowd001]
ntp      25923     1  0 05:38 ?        00:00:00 ntpd -u ntp:ntp -p 
/var/run/ntpd
root     27886  3393  0 Nov22 ?        00:00:00 sshd: 001005 [priv]
001005   27893 27886  0 Nov22 ?        00:00:02 sshd: 001005@pts/1
001005   27895 27893  0 Nov22 pts/1    00:00:00 -bash
001003   28573  7607  0 19:03 ?        00:00:00 [python2.5]
001003   29197     1  0 19:10 ?        00:00:00 -bash
001003   30030 29197 99 19:11 ?        01:46:10 python2.5 
/u/nyc/001003/lib/root
001003   30127     1  0 19:12 ?        00:00:00 /usr/bin/xauth -q -
001003   30149     1  0 19:12 ?        00:00:00 -bash
root     30181  3230  0 19:12 ?        00:00:00 /bin/mount -t nfs -s -o 
retry=10
root     30182 30181  0 19:12 ?        00:00:00 /sbin/mount.nfs host3.nyc
root     30245  3393  0 19:13 ?        00:00:00 sshd: root@pts/7
root     30353     1  0 19:14 ?        00:00:00 /sbin/umount.nfs 
/data/desrad-p
root     30504     1  0 19:16 ?        00:00:00 /sbin/umount.nfs 
/u/nyc/001008
root     31003  3230  0 19:22 ?        00:00:00 /bin/mount -t nfs -s -o 
retry=10
root     31004 31003  0 19:22 ?        00:00:00 /sbin/mount.nfs host3.nyc
root     31569     1  0 19:30 ?        00:00:00 /sbin/umount.nfs 
/proj/desrad-a
root     31632     1  0 19:31 ?        00:00:00 /sbin/umount.nfs 
/u/nyc/0001001
root     31653     1  0 19:31 ?        00:00:00 /sbin/umount.nfs 
/proj/desrad


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: strange linux kernel NFS problem(s)
  2010-12-03  2:40 strange linux kernel NFS problem(s) Doug Hughes
@ 2010-12-03 17:36 ` John Stoffel
  2010-12-03 18:47   ` Doug Hughes
  0 siblings, 1 reply; 3+ messages in thread
From: John Stoffel @ 2010-12-03 17:36 UTC (permalink / raw)
  To: Doug Hughes; +Cc: linux-kernel

>>>>> "Doug" == Doug Hughes <doug@will.to> writes:

Doug> So, this is my first post, but not my first problem of this
Doug> nature. It just so happens that this is the first one with a
Doug> recent kernel to give useful data, useful enough to post it and
Doug> seek some advice on the subject:

kernel 2.6.34 is still pretty old, and there have been lots of NFS
fixes.  Can you upgrade to something newer as a test?  Also, what
distro are you using?  

Is this an NFS client or the NFS server which is crapping out?  More
details please...

John


Doug> symptoms: machine gets high load, nfs mount processes hang, and things 
Doug> (particularly NFS) stop working. ssh and ip connectivity still works, as 
Doug> does ps.

Doug> *general protection fault: 0000 [#1] SMP
Doug> last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
Doug> CPU 1
Doug> Modules linked in: nfs auth_rpcgss autofs4 i2c_dev i2c_core lockd sunrpc 
Doug> cachefiles fscache ipmi_si ipmi_devintf ipmi_msghandler ip6t_REJECT 
Doug> xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 video output battery 
Doug> ac parport_pc lp parport joydev button sr_mod pcspkr iTCO_wdt shpchp 
Doug> dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage 
Doug> pata_acpi ata_piix ata_generic libata uhci_hcd ohci_hcd ehci_hcd [last 
Doug> unloaded: microcode]

Doug> Pid: 28573, comm: python2.5 Not tainted 2.6.34 #3 X7DWT/X7DWT
Doug> RIP: 0010:[<ffffffffa0292cdb>]  [<ffffffffa0292cdb>] 
Doug> nfs_release+0x64/0x94 [nfs]
Doug> RSP: 0018:ffff88041ccb9d58  EFLAGS: 00010246
Doug> RAX: ffff88041c47d160 RBX: ffff88041c47d1e8 RCX: ff88041c47d16088
Doug> RDX: ffff88042c593288 RSI: ffff88042c504e40 RDI: ffff88041c47d294
Doug> RBP: ffff88041ccb9d78 R08: 0000000000000000 R09: 0000000000000000
Doug> R10: 0000000300000000 R11: 0000000000000000 R12: ffff88042c593240
Doug> R13: ffff88042c504e40 R14: ffff88041ea59ec0 R15: ffff8804273f55c0
Doug> FS:  0000000000000000(0000) GS:ffff880001840000(0000) knlGS:0000000000000000
Doug> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Doug> CR2: 0000003fd5c03350 CR3: 0000000001613000 CR4: 00000000000006e0
Doug> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Doug> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Doug> Process python2.5 (pid: 28573, threadinfo ffff88041ccb8000, task 
Doug> ffff8803e246adf0)
Doug> Stack:
Doug>   0000000300000000 ffff88042c504e40 ffff88041c47d1e8 ffff88041c47d1e8
Doug> <0> ffff88041ccb9d98 ffffffffa0290fc5 0000000000000010 ffff88042c504e40
Doug> <0> ffff88041ccb9dd8 ffffffff810a75b7 ffff88042caf3120 ffff88042c687768
Doug> Call Trace:
Doug>   [<ffffffffa0290fc5>] nfs_file_release+0x5c/0x61 [nfs]
Doug>   [<ffffffff810a75b7>] __fput+0xf6/0x1bf
Doug>   [<ffffffff810a78ba>] fput+0x15/0x17
Doug>   [<ffffffff8108ccff>] remove_vma+0x36/0x6c
Doug>   [<ffffffff8108ce54>] exit_mmap+0x11f/0x141
Doug>   [<ffffffff81030119>] mmput+0x2d/0xc3
Doug>   [<ffffffff81033e9f>] exit_mm+0x10b/0x118
Doug>   [<ffffffff81064b75>] ? audit_free+0x191/0x1c4
Doug>   [<ffffffff81035074>] do_exit+0x200/0x685
Doug>   [<ffffffff81035567>] do_group_exit+0x6e/0x98
Doug>   [<ffffffff810355a3>] sys_exit_group+0x12/0x16
Doug>   [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b
Doug> Code: 11 e1 49 8d 54 24 48 49 8b 4c 24 48 48 8b 42 08 48 89 41 08 48 89 
Doug> 08 48 8d 83 78 ff ff ff 48 8b 48 08 49 89 44 24 48 48 89 50 08 <48> 89 
Doug> 11 48 89 4a 08 fe 83 ac 00 00 00 41 8b 75 38 4c 89 e7 81
Doug> RIP  [<ffffffffa0292cdb>] nfs_release+0x64/0x94 [nfs]
Doug>   RSP <ffff88041ccb9d58>
Doug> ---[ end trace 1ac7372e162481b8 ]---
Doug> Fixing recursive fault but reboot is needed!
Doug> mount: server antonrootfs.d.stor.en.desres.deshaw.com not responding, 
Doug> timed out
Doug> [root@antonfe0002 ~]# uptime
Doug>   20:58:04 up 12 days,  1:05,  4 users,  load average: 20.98, 20.23, 18.99
Doug> *
Doug> UID        PID  PPID  C STIME TTY          TIME CMD
Doug> root         1     0  0 Nov20 ?        00:00:04 init [3]
Doug> root         2     0  0 Nov20 ?        00:00:00 [kthreadd]
Doug> root         3     2  0 Nov20 ?        00:00:00 [migration/0]
Doug> root         4     2  0 Nov20 ?        02:42:37 [ksoftirqd/0]
Doug> root         5     2  0 Nov20 ?        00:00:00 [migration/1]
Doug> root         6     2  3 Nov20 ?        10:04:25 [ksoftirqd/1]
Doug> root         7     2  0 Nov20 ?        00:00:00 [migration/2]
Doug> root         8     2  0 Nov20 ?        01:39:58 [ksoftirqd/2]
Doug> root         9     2  0 Nov20 ?        00:00:00 [migration/3]
Doug> root        10     2  4 Nov20 ?        13:28:17 [ksoftirqd/3]
Doug> root        11     2  0 Nov20 ?        00:00:00 [migration/4]
Doug> root        12     2  7 Nov20 ?        20:39:20 [ksoftirqd/4]
Doug> root        13     2  0 Nov20 ?        00:00:00 [migration/5]
Doug> root        14     2  0 Nov20 ?        00:06:39 [ksoftirqd/5]
Doug> root        15     2  0 Nov20 ?        00:00:00 [migration/6]
Doug> root        16     2  7 Nov20 ?        21:56:03 [ksoftirqd/6]
Doug> root        17     2  0 Nov20 ?        00:00:00 [migration/7]
Doug> root        18     2  1 Nov20 ?        03:06:59 [ksoftirqd/7]
Doug> root        19     2  0 Nov20 ?        00:00:06 [events/0]
Doug> root        20     2  0 Nov20 ?        00:00:22 [events/1]
Doug> root        21     2  0 Nov20 ?        00:00:09 [events/2]
Doug> root        22     2  0 Nov20 ?        00:00:08 [events/3]
Doug> root        23     2  0 Nov20 ?        00:00:05 [events/4]
Doug> root        24     2  0 Nov20 ?        00:00:33 [events/5]
Doug> root        25     2  0 Nov20 ?        00:00:07 [events/6]
Doug> root        26     2  0 Nov20 ?        00:00:12 [events/7]
Doug> root        27     2  0 Nov20 ?        00:00:00 [khelper]
Doug> root        32     2  0 Nov20 ?        00:00:00 [async/mgr]
Doug> root       175     2  0 Nov20 ?        00:00:00 [sync_supers]
Doug> root       177     2  0 Nov20 ?        00:00:00 [bdi-default]
Doug> root       178     2  0 Nov20 ?        00:00:00 [kintegrityd/0]
Doug> root       179     2  0 Nov20 ?        00:00:00 [kintegrityd/1]
Doug> root       180     2  0 Nov20 ?        00:00:00 [kintegrityd/2]
Doug> root       181     2  0 Nov20 ?        00:00:00 [kintegrityd/3]
Doug> root       182     2  0 Nov20 ?        00:00:00 [kintegrityd/4]
Doug> root       183     2  0 Nov20 ?        00:00:00 [kintegrityd/5]
Doug> root       184     2  0 Nov20 ?        00:00:00 [kintegrityd/6]
Doug> root       185     2  0 Nov20 ?        00:00:00 [kintegrityd/7]
Doug> root       186     2  0 Nov20 ?        00:00:00 [kblockd/0]
Doug> root       187     2  0 Nov20 ?        00:00:00 [kblockd/1]
Doug> root       188     2  0 Nov20 ?        00:00:00 [kblockd/2]
Doug> root       189     2  0 Nov20 ?        00:00:00 [kblockd/3]
Doug> root       190     2  0 Nov20 ?        00:00:00 [kblockd/4]
Doug> root       191     2  0 Nov20 ?        00:00:00 [kblockd/5]
Doug> root       192     2  0 Nov20 ?        00:00:00 [kblockd/6]
Doug> root       193     2  0 Nov20 ?        00:00:00 [kblockd/7]
Doug> root       195     2  0 Nov20 ?        00:00:00 [kacpid]
Doug> root       196     2  0 Nov20 ?        00:00:00 [kacpi_notify]
Doug> root       197     2  0 Nov20 ?        00:00:00 [kacpi_hotplug]
Doug> root       304     2  0 Nov20 ?        00:00:00 [khubd]
Doug> root       307     2  0 Nov20 ?        00:00:00 [kseriod]
Doug> root       416     2  0 Nov20 ?        00:00:00 [kswapd0]
Doug> root       417     2  0 Nov20 ?        00:00:00 [aio/0]
Doug> root       418     2  0 Nov20 ?        00:00:00 [aio/1]
Doug> root       419     2  0 Nov20 ?        00:00:00 [aio/2]
Doug> root       420     2  0 Nov20 ?        00:00:00 [aio/3]
Doug> root       421     2  0 Nov20 ?        00:00:00 [aio/4]
Doug> root       422     2  0 Nov20 ?        00:00:00 [aio/5]
Doug> root       423     2  0 Nov20 ?        00:00:00 [aio/6]
Doug> root       424     2  0 Nov20 ?        00:00:00 [aio/7]
Doug> root       426     2  0 Nov20 ?        00:00:00 [crypto/0]
Doug> root       427     2  0 Nov20 ?        00:00:00 [crypto/1]
Doug> root       428     2  0 Nov20 ?        00:00:00 [crypto/2]
Doug> root       429     2  0 Nov20 ?        00:00:00 [crypto/3]
Doug> root       430     2  0 Nov20 ?        00:00:00 [crypto/4]
Doug> root       431     2  0 Nov20 ?        00:00:00 [crypto/5]
Doug> root       432     2  0 Nov20 ?        00:00:00 [crypto/6]
Doug> root       433     2  0 Nov20 ?        00:00:00 [crypto/7]
Doug> root       635     2  0 Nov20 ?        00:00:00 [kpsmoused]
Doug> root       656     2  0 Nov20 ?        00:00:02 [edac-poller]
Doug> root       701     2  0 Nov20 ?        00:00:00 [usbhid_resumer]
Doug> root       713     2  0 Nov20 ?        00:00:00 [ata/0]
Doug> root       714     2  0 Nov20 ?        00:00:00 [ata/1]
Doug> root       715     2  0 Nov20 ?        00:00:00 [ata/2]
Doug> root       716     2  0 Nov20 ?        00:00:00 [ata/3]
Doug> root       717     2  0 Nov20 ?        00:00:00 [ata/4]
Doug> root       718     2  0 Nov20 ?        00:00:00 [ata/5]
Doug> root       719     2  0 Nov20 ?        00:00:00 [ata/6]
Doug> root       720     2  0 Nov20 ?        00:00:00 [ata/7]
Doug> root       721     2  0 Nov20 ?        00:00:00 [ata_aux]
Doug> root       724     2  0 Nov20 ?        00:00:00 [scsi_eh_0]
Doug> root       725     2  0 Nov20 ?        00:00:00 [scsi_eh_1]
Doug> root       733     2  0 Nov20 ?        00:00:00 [scsi_eh_2]
Doug> root       734     2  0 Nov20 ?        00:00:00 [usb-storage]
Doug> root       753     2  0 Nov20 ?        00:00:00 [kstriped]
Doug> root       759     2  0 Nov20 ?        00:00:00 [ksnapd]
Doug> root       763     2  0 Nov20 ?        00:33:13 [md3_raid1]
Doug> root       766     2  0 Nov20 ?        00:00:24 [md2_raid1]
Doug> root       769     2  0 Nov20 ?        00:00:46 [md1_raid1]
Doug> root       772     2  0 Nov20 ?        00:00:49 [md0_raid1]
Doug> root       777     2  0 Nov20 ?        00:00:00 [kjournald]
Doug> root       803     2  0 Nov20 ?        00:00:00 [kauditd]
Doug> root       840     1  0 Nov20 ?        00:00:03 /sbin/udevd -d
Doug> root      1450  3450  0 20:01 ?        00:00:00 crond
Doug> root      1451  1450  0 20:01 ?        00:00:00 /bin/bash 
Doug> /usr/bin/run-parts /et
Doug> root      1452  1451  0 20:01 ?        00:00:00 /bin/bash 
Doug> /etc/cron.hourly/mcelo
Doug> root      1453  1451  0 20:01 ?        00:00:00 awk -v 
Doug> progname=/etc/cron.hourly
Doug> root      1454  1452  0 20:01 ?        00:00:00 /usr/sbin/mcelog 
Doug> --ignorenodev -
Doug> 0001001   2207  3393  0 20:10 ?        00:00:00 sshd: 0001001 [priv]
Doug> sshd      2208  2207  0 20:10 ?        00:00:00 sshd: 0001001 [net]
Doug> root      2210  3230  0 20:10 ?        00:00:00 /bin/mount -t nfs -s -o 
Doug> retry=10
Doug> root      2211  2210  0 20:10 ?        00:00:00 /sbin/mount.nfs fish1.nyc
Doug> root      2323     2  0 Nov20 ?        00:00:00 [kdmflush]
Doug> root      2358     2  0 Nov20 ?        00:00:00 [kjournald]
Doug> root      2359     2  0 Nov20 ?        00:00:01 [kjournald]
Doug> root      2585  3393  0 12:43 ?        00:00:00 sshd: 001002[priv]
Doug> 001002    2590  2585  0 12:43 ?        00:00:00 sshd: 001002@pts/3
Doug> 001002    2591  2590  0 12:43 pts/3    00:00:00 -bash
Doug> root      2740     2  0 17:53 ?        00:00:00 [kslowd000]
Doug> root      2933     1  0 Nov20 ?        00:00:00 auditd
Doug> root      2935  2933  0 Nov20 ?        00:00:00 /sbin/audispd
Doug> root      2962     2  0 Nov20 ?        00:26:41 [kipmi0]
Doug> root      2981     1  0 Nov20 ?        00:00:01 syslogd -m 0
Doug> root      2984     1  0 Nov20 ?        00:00:00 klogd -x
Doug> root      3019     1  0 Nov20 ?        00:00:00 cachefilesd
Doug> root      3031     1  0 Nov20 ?        00:01:50 irqbalance
Doug> rpc       3047     1  0 Nov20 ?        00:00:00 portmap
Doug> root      3073     2  0 Nov20 ?        00:00:00 [rpciod/0]
Doug> root      3074     2  0 Nov20 ?        00:00:00 [rpciod/1]
Doug> root      3075     2  0 Nov20 ?        00:00:00 [rpciod/2]
Doug> root      3076     2  0 Nov20 ?        00:00:00 [rpciod/3]
Doug> root      3077     2  0 Nov20 ?        00:00:00 [rpciod/4]
Doug> root      3078     2  0 Nov20 ?        00:00:00 [rpciod/5]
Doug> root      3079     2  0 Nov20 ?        00:00:00 [rpciod/6]
Doug> root      3080     2  0 Nov20 ?        00:00:00 [rpciod/7]
Doug> root      3086     1  0 Nov20 ?        00:00:00 rpc.statd
Doug> root      3135     1  0 Nov20 ?        00:00:02 mdadm --monitor --scan 
Doug> -f --pid-
Doug> root      3156     1  0 Nov20 ?        00:00:01 rpc.idmapd
Doug> root      3195     1  0 Nov20 ?        00:00:00 /usr/sbin/acpid
Doug> root      3230     1  0 Nov20 ?        00:02:33 automount
Doug> daemon    3318     1  0 Nov20 ?        00:00:35 /usr/sbin/munged
Doug> root      3333     1  0 Nov20 ?        00:02:07 /usr/sbin/snmpd -Lsd -Lf 
Doug> /dev/nu
Doug> distcc    3378     1  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> distcc    3379  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> root      3393     1  0 Nov20 ?        00:00:00 /usr/sbin/sshd
Doug> distcc    3412  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> distcc    3414  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> root      3450     1  0 Nov20 ?        00:00:01 crond
Doug> distcc    3459  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> root      3466     1  0 Nov20 ?        00:00:00 /opt/slurm/sbin/slurmd
Doug> postfix   3476     1  0 Nov20 ?        00:00:00 /usr/sbin/nullmailer-send
Doug> root      3496     1  0 Nov20 ?        00:00:00 /usr/sbin/atd
Doug> distcc    3564  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> distcc    3594  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> root      3596     1  0 Nov20 ?        00:00:00 /usr/sbin/smartd -q never
Doug> root      3599     1  0 Nov20 tty1     00:00:00 /sbin/mingetty tty1
Doug> root      3600     1  0 Nov20 tty2     00:00:00 /sbin/mingetty tty2
Doug> root      3601     1  0 Nov20 tty3     00:00:00 /sbin/mingetty tty3
Doug> root      3602     1  0 Nov20 tty4     00:00:00 /sbin/mingetty tty4
Doug> root      3603     1  0 Nov20 tty5     00:00:00 /sbin/mingetty tty5
Doug> root      3604     1  0 Nov20 tty6     00:00:00 /sbin/mingetty tty6
Doug> distcc    3618  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> distcc    3620  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> distcc    3623  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> distcc    3626  3378  0 Nov20 ?        00:00:00 /usr/bin/distccd 
Doug> --daemon --allo
Doug> root      3638     1  0 Nov20 ttyS1    00:00:00 /sbin/agetty -L ttyS1 
Doug> 19200 vt10
Doug> root      3639     1  0 Nov20 ttyS0    00:00:00 /sbin/agetty -L ttyS0 
Doug> 115200 vt1
Doug> root      3650     2  0 Nov20 ?        00:00:00 [nfsiod]
Doug> root      4782     1  0 Nov20 ?        00:00:33 /usr/bin/python 
Doug> /opt/rocks/bin/g
Doug> nobody    4824     1  0 Nov20 ?        00:00:35 /usr/sbin/gmond
Doug> root      5164  3393  0 20:48 ?        00:00:00 sshd: root@pts/8
Doug> 001003    5211     1  0 20:48 ?        00:00:00 /usr/bin/xauth -q -
Doug> root      6264  3393  0 20:57 ?        00:00:00 sshd: root@pts/10
Doug> root      6274  6264  0 20:57 pts/10   00:00:00 -bash
Doug> root      6335  6274  0 20:58 pts/10   00:00:00 ps -ef
Doug> root      7138     2  0 Nov20 ?        00:00:00 [lockd]
Doug> 001003    7607     1  0 17:55 ?        00:00:00 -bash
Doug> root      7890  3393  0 Nov20 ?        00:00:00 sshd: 001004 [priv]
Doug> 001004    7898  7890  0 Nov20 ?        00:00:03 sshd: 001004@pts/0
Doug> 001004    7899  7898  0 Nov20 pts/0    00:00:00 -tcsh
Doug> root     25087     2  0 16:12 ?        00:00:00 [kslowd001]
Doug> ntp      25923     1  0 05:38 ?        00:00:00 ntpd -u ntp:ntp -p 
Doug> /var/run/ntpd
Doug> root     27886  3393  0 Nov22 ?        00:00:00 sshd: 001005 [priv]
Doug> 001005   27893 27886  0 Nov22 ?        00:00:02 sshd: 001005@pts/1
Doug> 001005   27895 27893  0 Nov22 pts/1    00:00:00 -bash
Doug> 001003   28573  7607  0 19:03 ?        00:00:00 [python2.5]
Doug> 001003   29197     1  0 19:10 ?        00:00:00 -bash
Doug> 001003   30030 29197 99 19:11 ?        01:46:10 python2.5 
Doug> /u/nyc/001003/lib/root
Doug> 001003   30127     1  0 19:12 ?        00:00:00 /usr/bin/xauth -q -
Doug> 001003   30149     1  0 19:12 ?        00:00:00 -bash
Doug> root     30181  3230  0 19:12 ?        00:00:00 /bin/mount -t nfs -s -o 
Doug> retry=10
Doug> root     30182 30181  0 19:12 ?        00:00:00 /sbin/mount.nfs host3.nyc
Doug> root     30245  3393  0 19:13 ?        00:00:00 sshd: root@pts/7
Doug> root     30353     1  0 19:14 ?        00:00:00 /sbin/umount.nfs 
Doug> /data/desrad-p
Doug> root     30504     1  0 19:16 ?        00:00:00 /sbin/umount.nfs 
Doug> /u/nyc/001008
Doug> root     31003  3230  0 19:22 ?        00:00:00 /bin/mount -t nfs -s -o 
Doug> retry=10
Doug> root     31004 31003  0 19:22 ?        00:00:00 /sbin/mount.nfs host3.nyc
Doug> root     31569     1  0 19:30 ?        00:00:00 /sbin/umount.nfs 
Doug> /proj/desrad-a
Doug> root     31632     1  0 19:31 ?        00:00:00 /sbin/umount.nfs 
Doug> /u/nyc/0001001
Doug> root     31653     1  0 19:31 ?        00:00:00 /sbin/umount.nfs 
Doug> /proj/desrad

Doug> --
Doug> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Doug> the body of a message to majordomo@vger.kernel.org
Doug> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug> Please read the FAQ at  http://www.tux.org/lkml/

-- 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: strange linux kernel NFS problem(s)
  2010-12-03 17:36 ` John Stoffel
@ 2010-12-03 18:47   ` Doug Hughes
  0 siblings, 0 replies; 3+ messages in thread
From: Doug Hughes @ 2010-12-03 18:47 UTC (permalink / raw)
  To: John Stoffel; +Cc: linux-kernel

>>>>>> "Doug" == Doug Hughes<doug@will.to>  writes:
> Doug>  So, this is my first post, but not my first problem of this
> Doug>  nature. It just so happens that this is the first one with a
> Doug>  recent kernel to give useful data, useful enough to post it and
> Doug>  seek some advice on the subject:
>
> kernel 2.6.34 is still pretty old, and there have been lots of NFS
> fixes.  Can you upgrade to something newer as a test?  Also, what
> distro are you using?
>
> Is this an NFS client or the NFS server which is crapping out?  More
> details please...
>
>
It wasn't very old when we started testing it to resolve further NFS 
problems about 6 weeks ago. It takes a while to get through the 
necessary regressions to make sure things are generally ok before 
getting comfortable with a rollout to more than a couple nodes. The 
problems we experience are more of a statistical nature across nodes, so 
we don't usually experience them until we have some mass of upgraded nodes.

We checked through the changelists and didn't see anything that stood 
out as "ah ha, that's the problem". Most of the updates seemed to not 
mention NFS at all. Do you have one a particular issue/patch in mind?

This is a NFS client mounting a server elsewhere. The ps listing shows 
several stuck mount commands, which is another symptom of the general 
issue. Let me know what else. Certainly it's possible to try another, 
new kernel, but then I'll be posting about .36.1 in about 6-9 weeks and 
chances are that it will be considered old. :\

Distro is Centos5.4 with updates. kernel is from kernel.org


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-12-03 18:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-03  2:40 strange linux kernel NFS problem(s) Doug Hughes
2010-12-03 17:36 ` John Stoffel
2010-12-03 18:47   ` Doug Hughes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox