* nfsv3 client process stuck in rwsem_down_failed_common() @ 2007-05-14 15:54 Frank van Maarseveen 2007-05-14 15:59 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Frank van Maarseveen @ 2007-05-14 15:54 UTC (permalink / raw) To: Linux NFS mailing list On a 2.6.21.1 NFSv3 client box multiple processes got stuck with this trace: [<c02926e5>] rwsem_down_failed_common+0x85/0x180 [<c052a36d>] rwsem_down_read_failed+0x1d/0x30 [<c052a437>] call_rwsem_down_read_failed+0x7/0x10 [<c022622e>] nlmclnt_unlock+0x2e/0xc0 [<c02258da>] nlmclnt_proc+0x29a/0x2d0 [<c01f088e>] nfs3_proc_lock+0xe/0x10 [<c01e3904>] do_unlk+0x44/0x70 [<c01e3a9d>] nfs_lock+0xbd/0x120 [<c017dfd1>] locks_remove_posix+0xb1/0xc0 [<c016dc8d>] filp_close+0x2d/0x70 [<c01248a6>] close_files+0x56/0x70 [<c012490c>] put_files_struct+0x1c/0x50 [<c012533a>] do_exit+0x13a/0x3f0 [<c0125649>] do_group_exit+0x29/0x70 [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0 [<c0103e96>] do_signal+0x56/0x160 [<c0103fde>] do_notify_resume+0x3e/0x40 [<c01041ae>] work_notifysig+0x13/0x25 Two processes had an independent shared read lock on different files and when killing them with ^C they got stuck in state 'D' with above stack trace. I'm not sure what brought then there other than that the server went through a number of unusual reboots for testing purposes. -- Frank ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 15:54 nfsv3 client process stuck in rwsem_down_failed_common() Frank van Maarseveen @ 2007-05-14 15:59 ` Trond Myklebust 2007-05-14 16:05 ` Frank van Maarseveen 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2007-05-14 15:59 UTC (permalink / raw) To: Frank van Maarseveen; +Cc: Linux NFS mailing list On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote: > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with > this trace: > > [<c02926e5>] rwsem_down_failed_common+0x85/0x180 > [<c052a36d>] rwsem_down_read_failed+0x1d/0x30 > [<c052a437>] call_rwsem_down_read_failed+0x7/0x10 > [<c022622e>] nlmclnt_unlock+0x2e/0xc0 > [<c02258da>] nlmclnt_proc+0x29a/0x2d0 > [<c01f088e>] nfs3_proc_lock+0xe/0x10 > [<c01e3904>] do_unlk+0x44/0x70 > [<c01e3a9d>] nfs_lock+0xbd/0x120 > [<c017dfd1>] locks_remove_posix+0xb1/0xc0 > [<c016dc8d>] filp_close+0x2d/0x70 > [<c01248a6>] close_files+0x56/0x70 > [<c012490c>] put_files_struct+0x1c/0x50 > [<c012533a>] do_exit+0x13a/0x3f0 > [<c0125649>] do_group_exit+0x29/0x70 > [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0 > [<c0103e96>] do_signal+0x56/0x160 > [<c0103fde>] do_notify_resume+0x3e/0x40 > [<c01041ae>] work_notifysig+0x13/0x25 > > Two processes had an independent shared read lock on different files > and when killing them with ^C they got stuck in state 'D' with above > stack trace. I'm not sure what brought then there other than that the > server went through a number of unusual reboots for testing purposes. Are there any processes with a name of the form '<hostname>-reclaim' hanging too? Trond ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 15:59 ` Trond Myklebust @ 2007-05-14 16:05 ` Frank van Maarseveen 2007-05-14 16:11 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Frank van Maarseveen @ 2007-05-14 16:05 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen On Mon, May 14, 2007 at 11:59:45AM -0400, Trond Myklebust wrote: > On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote: > > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with > > this trace: > > > > [<c02926e5>] rwsem_down_failed_common+0x85/0x180 > > [<c052a36d>] rwsem_down_read_failed+0x1d/0x30 > > [<c052a437>] call_rwsem_down_read_failed+0x7/0x10 > > [<c022622e>] nlmclnt_unlock+0x2e/0xc0 > > [<c02258da>] nlmclnt_proc+0x29a/0x2d0 > > [<c01f088e>] nfs3_proc_lock+0xe/0x10 > > [<c01e3904>] do_unlk+0x44/0x70 > > [<c01e3a9d>] nfs_lock+0xbd/0x120 > > [<c017dfd1>] locks_remove_posix+0xb1/0xc0 > > [<c016dc8d>] filp_close+0x2d/0x70 > > [<c01248a6>] close_files+0x56/0x70 > > [<c012490c>] put_files_struct+0x1c/0x50 > > [<c012533a>] do_exit+0x13a/0x3f0 > > [<c0125649>] do_group_exit+0x29/0x70 > > [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0 > > [<c0103e96>] do_signal+0x56/0x160 > > [<c0103fde>] do_notify_resume+0x3e/0x40 > > [<c01041ae>] work_notifysig+0x13/0x25 > > > > Two processes had an independent shared read lock on different files > > and when killing them with ^C they got stuck in state 'D' with above > > stack trace. I'm not sure what brought then there other than that the > > server went through a number of unusual reboots for testing purposes. > > Are there any processes with a name of the form '<hostname>-reclaim' > hanging too? yes, two of them, each for a different NFS server (as I would expect). The traces are identical: [<c0512f1d>] rpc_wait_bit_interruptible+0x1d/0x30 [<c0529114>] __wait_on_bit+0x44/0x70 [<c05291bd>] out_of_line_wait_on_bit+0x7d/0x90 [<c05137f5>] __rpc_execute+0xa5/0x1e0 [<c0513949>] rpc_execute+0x19/0x20 [<c050da56>] rpc_call_sync+0x96/0xa0 [<c0225b17>] nlmclnt_call+0x77/0x1e0 [<c02261ac>] nlmclnt_reclaim+0x6c/0xc0 [<c0225236>] reclaimer+0x106/0x1f0 [<c0105317>] kernel_thread_helper+0x7/0x10 -- Frank ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 16:05 ` Frank van Maarseveen @ 2007-05-14 16:11 ` Trond Myklebust 2007-05-14 16:15 ` Frank van Maarseveen 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2007-05-14 16:11 UTC (permalink / raw) To: Frank van Maarseveen; +Cc: Linux NFS mailing list On Mon, 2007-05-14 at 18:05 +0200, Frank van Maarseveen wrote: > On Mon, May 14, 2007 at 11:59:45AM -0400, Trond Myklebust wrote: > > On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote: > > > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with > > > this trace: > > > > > > [<c02926e5>] rwsem_down_failed_common+0x85/0x180 > > > [<c052a36d>] rwsem_down_read_failed+0x1d/0x30 > > > [<c052a437>] call_rwsem_down_read_failed+0x7/0x10 > > > [<c022622e>] nlmclnt_unlock+0x2e/0xc0 > > > [<c02258da>] nlmclnt_proc+0x29a/0x2d0 > > > [<c01f088e>] nfs3_proc_lock+0xe/0x10 > > > [<c01e3904>] do_unlk+0x44/0x70 > > > [<c01e3a9d>] nfs_lock+0xbd/0x120 > > > [<c017dfd1>] locks_remove_posix+0xb1/0xc0 > > > [<c016dc8d>] filp_close+0x2d/0x70 > > > [<c01248a6>] close_files+0x56/0x70 > > > [<c012490c>] put_files_struct+0x1c/0x50 > > > [<c012533a>] do_exit+0x13a/0x3f0 > > > [<c0125649>] do_group_exit+0x29/0x70 > > > [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0 > > > [<c0103e96>] do_signal+0x56/0x160 > > > [<c0103fde>] do_notify_resume+0x3e/0x40 > > > [<c01041ae>] work_notifysig+0x13/0x25 > > > > > > Two processes had an independent shared read lock on different files > > > and when killing them with ^C they got stuck in state 'D' with above > > > stack trace. I'm not sure what brought then there other than that the > > > server went through a number of unusual reboots for testing purposes. > > > > Are there any processes with a name of the form '<hostname>-reclaim' > > hanging too? > > yes, two of them, each for a different NFS server (as I would expect). Are the NFS servers up and running? > The traces are identical: > > [<c0512f1d>] rpc_wait_bit_interruptible+0x1d/0x30 > [<c0529114>] __wait_on_bit+0x44/0x70 > [<c05291bd>] out_of_line_wait_on_bit+0x7d/0x90 > [<c05137f5>] __rpc_execute+0xa5/0x1e0 > [<c0513949>] rpc_execute+0x19/0x20 > [<c050da56>] rpc_call_sync+0x96/0xa0 > [<c0225b17>] nlmclnt_call+0x77/0x1e0 > [<c02261ac>] nlmclnt_reclaim+0x6c/0xc0 > [<c0225236>] reclaimer+0x106/0x1f0 > [<c0105317>] kernel_thread_helper+0x7/0x10 Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to find out on which rpc queue these tasks are sleeping? Cheers Trond ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 16:11 ` Trond Myklebust @ 2007-05-14 16:15 ` Frank van Maarseveen 2007-05-14 16:32 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Frank van Maarseveen @ 2007-05-14 16:15 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen On Mon, May 14, 2007 at 12:11:34PM -0400, Trond Myklebust wrote: > On Mon, 2007-05-14 at 18:05 +0200, Frank van Maarseveen wrote: > > On Mon, May 14, 2007 at 11:59:45AM -0400, Trond Myklebust wrote: > > > On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote: > > > > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with > > > > this trace: > > > > > > > > [<c02926e5>] rwsem_down_failed_common+0x85/0x180 > > > > [<c052a36d>] rwsem_down_read_failed+0x1d/0x30 > > > > [<c052a437>] call_rwsem_down_read_failed+0x7/0x10 > > > > [<c022622e>] nlmclnt_unlock+0x2e/0xc0 > > > > [<c02258da>] nlmclnt_proc+0x29a/0x2d0 > > > > [<c01f088e>] nfs3_proc_lock+0xe/0x10 > > > > [<c01e3904>] do_unlk+0x44/0x70 > > > > [<c01e3a9d>] nfs_lock+0xbd/0x120 > > > > [<c017dfd1>] locks_remove_posix+0xb1/0xc0 > > > > [<c016dc8d>] filp_close+0x2d/0x70 > > > > [<c01248a6>] close_files+0x56/0x70 > > > > [<c012490c>] put_files_struct+0x1c/0x50 > > > > [<c012533a>] do_exit+0x13a/0x3f0 > > > > [<c0125649>] do_group_exit+0x29/0x70 > > > > [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0 > > > > [<c0103e96>] do_signal+0x56/0x160 > > > > [<c0103fde>] do_notify_resume+0x3e/0x40 > > > > [<c01041ae>] work_notifysig+0x13/0x25 > > > > > > > > Two processes had an independent shared read lock on different files > > > > and when killing them with ^C they got stuck in state 'D' with above > > > > stack trace. I'm not sure what brought then there other than that the > > > > server went through a number of unusual reboots for testing purposes. > > > > > > Are there any processes with a name of the form '<hostname>-reclaim' > > > hanging too? > > > > yes, two of them, each for a different NFS server (as I would expect). > > Are the NFS servers up and running? yes, I also ran a tcpdump for one of them but did not see any activity. > > > The traces are identical: > > > > [<c0512f1d>] rpc_wait_bit_interruptible+0x1d/0x30 > > [<c0529114>] __wait_on_bit+0x44/0x70 > > [<c05291bd>] out_of_line_wait_on_bit+0x7d/0x90 > > [<c05137f5>] __rpc_execute+0xa5/0x1e0 > > [<c0513949>] rpc_execute+0x19/0x20 > > [<c050da56>] rpc_call_sync+0x96/0xa0 > > [<c0225b17>] nlmclnt_call+0x77/0x1e0 > > [<c02261ac>] nlmclnt_reclaim+0x6c/0xc0 > > [<c0225236>] reclaimer+0x106/0x1f0 > > [<c0105317>] kernel_thread_helper+0x7/0x10 > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to > find out on which rpc queue these tasks are sleeping? -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops-- 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4 -- Frank ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 16:15 ` Frank van Maarseveen @ 2007-05-14 16:32 ` Trond Myklebust 2007-05-14 16:39 ` Frank van Maarseveen 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2007-05-14 16:32 UTC (permalink / raw) To: Frank van Maarseveen; +Cc: Linux NFS mailing list On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote: > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to > > find out on which rpc queue these tasks are sleeping? > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops-- > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4 > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4 ^^^^^^^^ Ouch! That is a pretty massive timeout. What is your value of /proc/sys/fs/nfs/nlm_timeout ? Cheers Trond ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 16:32 ` Trond Myklebust @ 2007-05-14 16:39 ` Frank van Maarseveen 2007-05-14 16:56 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Frank van Maarseveen @ 2007-05-14 16:39 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote: > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote: > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to > > > find out on which rpc queue these tasks are sleeping? > > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops-- > > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4 > > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4 > ^^^^^^^^ Ouch! > > That is a pretty massive timeout. What is your value > of /proc/sys/fs/nfs/nlm_timeout ? Unfortunately it became necessary to reboot the machine :-(. Right now it says 10. -- Frank ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 16:39 ` Frank van Maarseveen @ 2007-05-14 16:56 ` Trond Myklebust 2007-05-14 17:02 ` Frank van Maarseveen 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2007-05-14 16:56 UTC (permalink / raw) To: Frank van Maarseveen; +Cc: Linux NFS mailing list On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote: > On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote: > > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote: > > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to > > > > find out on which rpc queue these tasks are sleeping? > > > > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops-- > > > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4 > > > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4 > > ^^^^^^^^ Ouch! > > > > That is a pretty massive timeout. What is your value > > of /proc/sys/fs/nfs/nlm_timeout ? > > Unfortunately it became necessary to reboot the machine :-(. Right now it says 10. 10 seconds looks like the correct default. I assume that you hadn't changed that value prior to the reboot... One last question, just in case: what value are you using for CONFIG_HZ? Trond ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 16:56 ` Trond Myklebust @ 2007-05-14 17:02 ` Frank van Maarseveen 2007-05-14 17:05 ` Frank van Maarseveen 0 siblings, 1 reply; 12+ messages in thread From: Frank van Maarseveen @ 2007-05-14 17:02 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen On Mon, May 14, 2007 at 12:56:03PM -0400, Trond Myklebust wrote: > On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote: > > On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote: > > > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote: > > > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to > > > > > find out on which rpc queue these tasks are sleeping? > > > > > > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops-- > > > > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4 > > > > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4 > > > ^^^^^^^^ Ouch! > > > > > > That is a pretty massive timeout. What is your value > > > of /proc/sys/fs/nfs/nlm_timeout ? > > > > Unfortunately it became necessary to reboot the machine :-(. Right now it says 10. > > 10 seconds looks like the correct default. I assume that you hadn't > changed that value prior to the reboot... right, I didn't knew it existed and I'm not aware of any command which can change it. Did some grepping around and it didn't turn up anything. > > One last question, just in case: what value are you using for CONFIG_HZ? 1000 -- Frank ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 17:02 ` Frank van Maarseveen @ 2007-05-14 17:05 ` Frank van Maarseveen 2007-05-14 17:15 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Frank van Maarseveen @ 2007-05-14 17:05 UTC (permalink / raw) To: Linux NFS mailing list; +Cc: Trond Myklebust On Mon, May 14, 2007 at 07:02:16PM +0200, Frank van Maarseveen wrote: > On Mon, May 14, 2007 at 12:56:03PM -0400, Trond Myklebust wrote: > > On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote: > > > On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote: > > > > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote: > > > > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to > > > > > > find out on which rpc queue these tasks are sleeping? > > > > > > > > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops-- > > > > > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4 > > > > > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4 > > > > ^^^^^^^^ Ouch! > > > > > > > > That is a pretty massive timeout. What is your value > > > > of /proc/sys/fs/nfs/nlm_timeout ? > > > > > > Unfortunately it became necessary to reboot the machine :-(. Right now it says 10. > > > > 10 seconds looks like the correct default. I assume that you hadn't > > changed that value prior to the reboot... > > right, I didn't knew it existed and I'm not aware of any command which > can change it. Did some grepping around and it didn't turn up anything. > > > > > One last question, just in case: what value are you using for CONFIG_HZ? > > 1000 hmm, so the timeout has become 10 * HZ * HZ? -- Frank ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 17:05 ` Frank van Maarseveen @ 2007-05-14 17:15 ` Trond Myklebust 2007-05-14 17:17 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2007-05-14 17:15 UTC (permalink / raw) To: Frank van Maarseveen; +Cc: Linux NFS mailing list On Mon, 2007-05-14 at 19:05 +0200, Frank van Maarseveen wrote: > On Mon, May 14, 2007 at 07:02:16PM +0200, Frank van Maarseveen wrote: > > On Mon, May 14, 2007 at 12:56:03PM -0400, Trond Myklebust wrote: > > > On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote: > > > > On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote: > > > > > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote: > > > > > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to > > > > > > > find out on which rpc queue these tasks are sleeping? > > > > > > > > > > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops-- > > > > > > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4 > > > > > > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4 > > > > > ^^^^^^^^ Ouch! > > > > > > > > > > That is a pretty massive timeout. What is your value > > > > > of /proc/sys/fs/nfs/nlm_timeout ? > > > > > > > > Unfortunately it became necessary to reboot the machine :-(. Right now it says 10. > > > > > > 10 seconds looks like the correct default. I assume that you hadn't > > > changed that value prior to the reboot... > > > > right, I didn't knew it existed and I'm not aware of any command which > > can change it. Did some grepping around and it didn't turn up anything. > > > > > > > > One last question, just in case: what value are you using for CONFIG_HZ? > > > > 1000 > > hmm, so the timeout has become 10 * HZ * HZ? Yeah... nlmsvc_timeout is already in HZ, so the line in fs/lockd/host.c is wrong... Trond ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common() 2007-05-14 17:15 ` Trond Myklebust @ 2007-05-14 17:17 ` Trond Myklebust 0 siblings, 0 replies; 12+ messages in thread From: Trond Myklebust @ 2007-05-14 17:17 UTC (permalink / raw) To: Frank van Maarseveen; +Cc: Linux NFS mailing list On Mon, 2007-05-14 at 13:15 -0400, Trond Myklebust wrote: > > > > > > > > One last question, just in case: what value are you using for CONFIG_HZ? > > > > > > 1000 > > > > hmm, so the timeout has become 10 * HZ * HZ? > > Yeah... nlmsvc_timeout is already in HZ, so the line in fs/lockd/host.c > is wrong... The following patch should fix it. Trond --------------------------------------------- commit 471c10cf35e2743746d9a5f671d9cceeb61393c7 Author: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Mon May 14 13:16:36 2007 -0400 NLM: Fix locking client timeouts... nlmsvc_timeout is already in units of HZ... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> diff --git a/fs/lockd/host.c b/fs/lockd/host.c index ad21c07..96070bf 100644 --- a/fs/lockd/host.c +++ b/fs/lockd/host.c @@ -221,7 +221,7 @@ nlm_bind_host(struct nlm_host *host) host->h_nextrebind - jiffies); } } else { - unsigned long increment = nlmsvc_timeout * HZ; + unsigned long increment = nlmsvc_timeout; struct rpc_timeout timeparms = { .to_initval = increment, .to_increment = increment, ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-05-14 17:18 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-14 15:54 nfsv3 client process stuck in rwsem_down_failed_common() Frank van Maarseveen 2007-05-14 15:59 ` Trond Myklebust 2007-05-14 16:05 ` Frank van Maarseveen 2007-05-14 16:11 ` Trond Myklebust 2007-05-14 16:15 ` Frank van Maarseveen 2007-05-14 16:32 ` Trond Myklebust 2007-05-14 16:39 ` Frank van Maarseveen 2007-05-14 16:56 ` Trond Myklebust 2007-05-14 17:02 ` Frank van Maarseveen 2007-05-14 17:05 ` Frank van Maarseveen 2007-05-14 17:15 ` Trond Myklebust 2007-05-14 17:17 ` Trond Myklebust
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.