* nfsv3 client process stuck in rwsem_down_failed_common()
@ 2007-05-14 15:54 Frank van Maarseveen
2007-05-14 15:59 ` Trond Myklebust
0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 15:54 UTC (permalink / raw)
To: Linux NFS mailing list
On a 2.6.21.1 NFSv3 client box multiple processes got stuck with
this trace:
[<c02926e5>] rwsem_down_failed_common+0x85/0x180
[<c052a36d>] rwsem_down_read_failed+0x1d/0x30
[<c052a437>] call_rwsem_down_read_failed+0x7/0x10
[<c022622e>] nlmclnt_unlock+0x2e/0xc0
[<c02258da>] nlmclnt_proc+0x29a/0x2d0
[<c01f088e>] nfs3_proc_lock+0xe/0x10
[<c01e3904>] do_unlk+0x44/0x70
[<c01e3a9d>] nfs_lock+0xbd/0x120
[<c017dfd1>] locks_remove_posix+0xb1/0xc0
[<c016dc8d>] filp_close+0x2d/0x70
[<c01248a6>] close_files+0x56/0x70
[<c012490c>] put_files_struct+0x1c/0x50
[<c012533a>] do_exit+0x13a/0x3f0
[<c0125649>] do_group_exit+0x29/0x70
[<c012e73f>] get_signal_to_deliver+0x21f/0x2b0
[<c0103e96>] do_signal+0x56/0x160
[<c0103fde>] do_notify_resume+0x3e/0x40
[<c01041ae>] work_notifysig+0x13/0x25
Two processes had an independent shared read lock on different files
and when killing them with ^C they got stuck in state 'D' with above
stack trace. I'm not sure what brought then there other than that the
server went through a number of unusual reboots for testing purposes.
--
Frank
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 15:54 nfsv3 client process stuck in rwsem_down_failed_common() Frank van Maarseveen
@ 2007-05-14 15:59 ` Trond Myklebust
2007-05-14 16:05 ` Frank van Maarseveen
0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 15:59 UTC (permalink / raw)
To: Frank van Maarseveen; +Cc: Linux NFS mailing list
On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote:
> On a 2.6.21.1 NFSv3 client box multiple processes got stuck with
> this trace:
>
> [<c02926e5>] rwsem_down_failed_common+0x85/0x180
> [<c052a36d>] rwsem_down_read_failed+0x1d/0x30
> [<c052a437>] call_rwsem_down_read_failed+0x7/0x10
> [<c022622e>] nlmclnt_unlock+0x2e/0xc0
> [<c02258da>] nlmclnt_proc+0x29a/0x2d0
> [<c01f088e>] nfs3_proc_lock+0xe/0x10
> [<c01e3904>] do_unlk+0x44/0x70
> [<c01e3a9d>] nfs_lock+0xbd/0x120
> [<c017dfd1>] locks_remove_posix+0xb1/0xc0
> [<c016dc8d>] filp_close+0x2d/0x70
> [<c01248a6>] close_files+0x56/0x70
> [<c012490c>] put_files_struct+0x1c/0x50
> [<c012533a>] do_exit+0x13a/0x3f0
> [<c0125649>] do_group_exit+0x29/0x70
> [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0
> [<c0103e96>] do_signal+0x56/0x160
> [<c0103fde>] do_notify_resume+0x3e/0x40
> [<c01041ae>] work_notifysig+0x13/0x25
>
> Two processes had an independent shared read lock on different files
> and when killing them with ^C they got stuck in state 'D' with above
> stack trace. I'm not sure what brought then there other than that the
> server went through a number of unusual reboots for testing purposes.
Are there any processes with a name of the form '<hostname>-reclaim'
hanging too?
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 15:59 ` Trond Myklebust
@ 2007-05-14 16:05 ` Frank van Maarseveen
2007-05-14 16:11 ` Trond Myklebust
0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 16:05 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen
On Mon, May 14, 2007 at 11:59:45AM -0400, Trond Myklebust wrote:
> On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote:
> > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with
> > this trace:
> >
> > [<c02926e5>] rwsem_down_failed_common+0x85/0x180
> > [<c052a36d>] rwsem_down_read_failed+0x1d/0x30
> > [<c052a437>] call_rwsem_down_read_failed+0x7/0x10
> > [<c022622e>] nlmclnt_unlock+0x2e/0xc0
> > [<c02258da>] nlmclnt_proc+0x29a/0x2d0
> > [<c01f088e>] nfs3_proc_lock+0xe/0x10
> > [<c01e3904>] do_unlk+0x44/0x70
> > [<c01e3a9d>] nfs_lock+0xbd/0x120
> > [<c017dfd1>] locks_remove_posix+0xb1/0xc0
> > [<c016dc8d>] filp_close+0x2d/0x70
> > [<c01248a6>] close_files+0x56/0x70
> > [<c012490c>] put_files_struct+0x1c/0x50
> > [<c012533a>] do_exit+0x13a/0x3f0
> > [<c0125649>] do_group_exit+0x29/0x70
> > [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0
> > [<c0103e96>] do_signal+0x56/0x160
> > [<c0103fde>] do_notify_resume+0x3e/0x40
> > [<c01041ae>] work_notifysig+0x13/0x25
> >
> > Two processes had an independent shared read lock on different files
> > and when killing them with ^C they got stuck in state 'D' with above
> > stack trace. I'm not sure what brought then there other than that the
> > server went through a number of unusual reboots for testing purposes.
>
> Are there any processes with a name of the form '<hostname>-reclaim'
> hanging too?
yes, two of them, each for a different NFS server (as I would expect).
The traces are identical:
[<c0512f1d>] rpc_wait_bit_interruptible+0x1d/0x30
[<c0529114>] __wait_on_bit+0x44/0x70
[<c05291bd>] out_of_line_wait_on_bit+0x7d/0x90
[<c05137f5>] __rpc_execute+0xa5/0x1e0
[<c0513949>] rpc_execute+0x19/0x20
[<c050da56>] rpc_call_sync+0x96/0xa0
[<c0225b17>] nlmclnt_call+0x77/0x1e0
[<c02261ac>] nlmclnt_reclaim+0x6c/0xc0
[<c0225236>] reclaimer+0x106/0x1f0
[<c0105317>] kernel_thread_helper+0x7/0x10
--
Frank
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 16:05 ` Frank van Maarseveen
@ 2007-05-14 16:11 ` Trond Myklebust
2007-05-14 16:15 ` Frank van Maarseveen
0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 16:11 UTC (permalink / raw)
To: Frank van Maarseveen; +Cc: Linux NFS mailing list
On Mon, 2007-05-14 at 18:05 +0200, Frank van Maarseveen wrote:
> On Mon, May 14, 2007 at 11:59:45AM -0400, Trond Myklebust wrote:
> > On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote:
> > > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with
> > > this trace:
> > >
> > > [<c02926e5>] rwsem_down_failed_common+0x85/0x180
> > > [<c052a36d>] rwsem_down_read_failed+0x1d/0x30
> > > [<c052a437>] call_rwsem_down_read_failed+0x7/0x10
> > > [<c022622e>] nlmclnt_unlock+0x2e/0xc0
> > > [<c02258da>] nlmclnt_proc+0x29a/0x2d0
> > > [<c01f088e>] nfs3_proc_lock+0xe/0x10
> > > [<c01e3904>] do_unlk+0x44/0x70
> > > [<c01e3a9d>] nfs_lock+0xbd/0x120
> > > [<c017dfd1>] locks_remove_posix+0xb1/0xc0
> > > [<c016dc8d>] filp_close+0x2d/0x70
> > > [<c01248a6>] close_files+0x56/0x70
> > > [<c012490c>] put_files_struct+0x1c/0x50
> > > [<c012533a>] do_exit+0x13a/0x3f0
> > > [<c0125649>] do_group_exit+0x29/0x70
> > > [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0
> > > [<c0103e96>] do_signal+0x56/0x160
> > > [<c0103fde>] do_notify_resume+0x3e/0x40
> > > [<c01041ae>] work_notifysig+0x13/0x25
> > >
> > > Two processes had an independent shared read lock on different files
> > > and when killing them with ^C they got stuck in state 'D' with above
> > > stack trace. I'm not sure what brought then there other than that the
> > > server went through a number of unusual reboots for testing purposes.
> >
> > Are there any processes with a name of the form '<hostname>-reclaim'
> > hanging too?
>
> yes, two of them, each for a different NFS server (as I would expect).
Are the NFS servers up and running?
> The traces are identical:
>
> [<c0512f1d>] rpc_wait_bit_interruptible+0x1d/0x30
> [<c0529114>] __wait_on_bit+0x44/0x70
> [<c05291bd>] out_of_line_wait_on_bit+0x7d/0x90
> [<c05137f5>] __rpc_execute+0xa5/0x1e0
> [<c0513949>] rpc_execute+0x19/0x20
> [<c050da56>] rpc_call_sync+0x96/0xa0
> [<c0225b17>] nlmclnt_call+0x77/0x1e0
> [<c02261ac>] nlmclnt_reclaim+0x6c/0xc0
> [<c0225236>] reclaimer+0x106/0x1f0
> [<c0105317>] kernel_thread_helper+0x7/0x10
Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
find out on which rpc queue these tasks are sleeping?
Cheers
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 16:11 ` Trond Myklebust
@ 2007-05-14 16:15 ` Frank van Maarseveen
2007-05-14 16:32 ` Trond Myklebust
0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 16:15 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen
On Mon, May 14, 2007 at 12:11:34PM -0400, Trond Myklebust wrote:
> On Mon, 2007-05-14 at 18:05 +0200, Frank van Maarseveen wrote:
> > On Mon, May 14, 2007 at 11:59:45AM -0400, Trond Myklebust wrote:
> > > On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote:
> > > > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with
> > > > this trace:
> > > >
> > > > [<c02926e5>] rwsem_down_failed_common+0x85/0x180
> > > > [<c052a36d>] rwsem_down_read_failed+0x1d/0x30
> > > > [<c052a437>] call_rwsem_down_read_failed+0x7/0x10
> > > > [<c022622e>] nlmclnt_unlock+0x2e/0xc0
> > > > [<c02258da>] nlmclnt_proc+0x29a/0x2d0
> > > > [<c01f088e>] nfs3_proc_lock+0xe/0x10
> > > > [<c01e3904>] do_unlk+0x44/0x70
> > > > [<c01e3a9d>] nfs_lock+0xbd/0x120
> > > > [<c017dfd1>] locks_remove_posix+0xb1/0xc0
> > > > [<c016dc8d>] filp_close+0x2d/0x70
> > > > [<c01248a6>] close_files+0x56/0x70
> > > > [<c012490c>] put_files_struct+0x1c/0x50
> > > > [<c012533a>] do_exit+0x13a/0x3f0
> > > > [<c0125649>] do_group_exit+0x29/0x70
> > > > [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0
> > > > [<c0103e96>] do_signal+0x56/0x160
> > > > [<c0103fde>] do_notify_resume+0x3e/0x40
> > > > [<c01041ae>] work_notifysig+0x13/0x25
> > > >
> > > > Two processes had an independent shared read lock on different files
> > > > and when killing them with ^C they got stuck in state 'D' with above
> > > > stack trace. I'm not sure what brought then there other than that the
> > > > server went through a number of unusual reboots for testing purposes.
> > >
> > > Are there any processes with a name of the form '<hostname>-reclaim'
> > > hanging too?
> >
> > yes, two of them, each for a different NFS server (as I would expect).
>
> Are the NFS servers up and running?
yes, I also ran a tcpdump for one of them but did not see any activity.
>
> > The traces are identical:
> >
> > [<c0512f1d>] rpc_wait_bit_interruptible+0x1d/0x30
> > [<c0529114>] __wait_on_bit+0x44/0x70
> > [<c05291bd>] out_of_line_wait_on_bit+0x7d/0x90
> > [<c05137f5>] __rpc_execute+0xa5/0x1e0
> > [<c0513949>] rpc_execute+0x19/0x20
> > [<c050da56>] rpc_call_sync+0x96/0xa0
> > [<c0225b17>] nlmclnt_call+0x77/0x1e0
> > [<c02261ac>] nlmclnt_reclaim+0x6c/0xc0
> > [<c0225236>] reclaimer+0x106/0x1f0
> > [<c0105317>] kernel_thread_helper+0x7/0x10
>
> Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> find out on which rpc queue these tasks are sleeping?
-pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
--
Frank
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 16:15 ` Frank van Maarseveen
@ 2007-05-14 16:32 ` Trond Myklebust
2007-05-14 16:39 ` Frank van Maarseveen
0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 16:32 UTC (permalink / raw)
To: Frank van Maarseveen; +Cc: Linux NFS mailing list
On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > find out on which rpc queue these tasks are sleeping?
>
> -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
^^^^^^^^ Ouch!
That is a pretty massive timeout. What is your value
of /proc/sys/fs/nfs/nlm_timeout ?
Cheers
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 16:32 ` Trond Myklebust
@ 2007-05-14 16:39 ` Frank van Maarseveen
2007-05-14 16:56 ` Trond Myklebust
0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 16:39 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen
On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote:
> On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > > find out on which rpc queue these tasks are sleeping?
> >
> > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
> ^^^^^^^^ Ouch!
>
> That is a pretty massive timeout. What is your value
> of /proc/sys/fs/nfs/nlm_timeout ?
Unfortunately it became necessary to reboot the machine :-(. Right now it says 10.
--
Frank
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 16:39 ` Frank van Maarseveen
@ 2007-05-14 16:56 ` Trond Myklebust
2007-05-14 17:02 ` Frank van Maarseveen
0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 16:56 UTC (permalink / raw)
To: Frank van Maarseveen; +Cc: Linux NFS mailing list
On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote:
> On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote:
> > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > > > find out on which rpc queue these tasks are sleeping?
> > >
> > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> > > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> > > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
> > ^^^^^^^^ Ouch!
> >
> > That is a pretty massive timeout. What is your value
> > of /proc/sys/fs/nfs/nlm_timeout ?
>
> Unfortunately it became necessary to reboot the machine :-(. Right now it says 10.
10 seconds looks like the correct default. I assume that you hadn't
changed that value prior to the reboot...
One last question, just in case: what value are you using for CONFIG_HZ?
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 16:56 ` Trond Myklebust
@ 2007-05-14 17:02 ` Frank van Maarseveen
2007-05-14 17:05 ` Frank van Maarseveen
0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 17:02 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen
On Mon, May 14, 2007 at 12:56:03PM -0400, Trond Myklebust wrote:
> On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote:
> > On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote:
> > > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > > > > find out on which rpc queue these tasks are sleeping?
> > > >
> > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> > > > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> > > > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
> > > ^^^^^^^^ Ouch!
> > >
> > > That is a pretty massive timeout. What is your value
> > > of /proc/sys/fs/nfs/nlm_timeout ?
> >
> > Unfortunately it became necessary to reboot the machine :-(. Right now it says 10.
>
> 10 seconds looks like the correct default. I assume that you hadn't
> changed that value prior to the reboot...
right, I didn't knew it existed and I'm not aware of any command which
can change it. Did some grepping around and it didn't turn up anything.
>
> One last question, just in case: what value are you using for CONFIG_HZ?
1000
--
Frank
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 17:02 ` Frank van Maarseveen
@ 2007-05-14 17:05 ` Frank van Maarseveen
2007-05-14 17:15 ` Trond Myklebust
0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 17:05 UTC (permalink / raw)
To: Linux NFS mailing list; +Cc: Trond Myklebust
On Mon, May 14, 2007 at 07:02:16PM +0200, Frank van Maarseveen wrote:
> On Mon, May 14, 2007 at 12:56:03PM -0400, Trond Myklebust wrote:
> > On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote:
> > > On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote:
> > > > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > > > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > > > > > find out on which rpc queue these tasks are sleeping?
> > > > >
> > > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> > > > > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> > > > > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
> > > > ^^^^^^^^ Ouch!
> > > >
> > > > That is a pretty massive timeout. What is your value
> > > > of /proc/sys/fs/nfs/nlm_timeout ?
> > >
> > > Unfortunately it became necessary to reboot the machine :-(. Right now it says 10.
> >
> > 10 seconds looks like the correct default. I assume that you hadn't
> > changed that value prior to the reboot...
>
> right, I didn't knew it existed and I'm not aware of any command which
> can change it. Did some grepping around and it didn't turn up anything.
>
> >
> > One last question, just in case: what value are you using for CONFIG_HZ?
>
> 1000
hmm, so the timeout has become 10 * HZ * HZ?
--
Frank
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 17:05 ` Frank van Maarseveen
@ 2007-05-14 17:15 ` Trond Myklebust
2007-05-14 17:17 ` Trond Myklebust
0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 17:15 UTC (permalink / raw)
To: Frank van Maarseveen; +Cc: Linux NFS mailing list
On Mon, 2007-05-14 at 19:05 +0200, Frank van Maarseveen wrote:
> On Mon, May 14, 2007 at 07:02:16PM +0200, Frank van Maarseveen wrote:
> > On Mon, May 14, 2007 at 12:56:03PM -0400, Trond Myklebust wrote:
> > > On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote:
> > > > On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote:
> > > > > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > > > > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > > > > > > find out on which rpc queue these tasks are sleeping?
> > > > > >
> > > > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> > > > > > 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> > > > > > 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
> > > > > ^^^^^^^^ Ouch!
> > > > >
> > > > > That is a pretty massive timeout. What is your value
> > > > > of /proc/sys/fs/nfs/nlm_timeout ?
> > > >
> > > > Unfortunately it became necessary to reboot the machine :-(. Right now it says 10.
> > >
> > > 10 seconds looks like the correct default. I assume that you hadn't
> > > changed that value prior to the reboot...
> >
> > right, I didn't knew it existed and I'm not aware of any command which
> > can change it. Did some grepping around and it didn't turn up anything.
> >
> > >
> > > One last question, just in case: what value are you using for CONFIG_HZ?
> >
> > 1000
>
> hmm, so the timeout has become 10 * HZ * HZ?
Yeah... nlmsvc_timeout is already in HZ, so the line in fs/lockd/host.c
is wrong...
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: nfsv3 client process stuck in rwsem_down_failed_common()
2007-05-14 17:15 ` Trond Myklebust
@ 2007-05-14 17:17 ` Trond Myklebust
0 siblings, 0 replies; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 17:17 UTC (permalink / raw)
To: Frank van Maarseveen; +Cc: Linux NFS mailing list
On Mon, 2007-05-14 at 13:15 -0400, Trond Myklebust wrote:
> > > >
> > > > One last question, just in case: what value are you using for CONFIG_HZ?
> > >
> > > 1000
> >
> > hmm, so the timeout has become 10 * HZ * HZ?
>
> Yeah... nlmsvc_timeout is already in HZ, so the line in fs/lockd/host.c
> is wrong...
The following patch should fix it.
Trond
---------------------------------------------
commit 471c10cf35e2743746d9a5f671d9cceeb61393c7
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Mon May 14 13:16:36 2007 -0400
NLM: Fix locking client timeouts...
nlmsvc_timeout is already in units of HZ...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index ad21c07..96070bf 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -221,7 +221,7 @@ nlm_bind_host(struct nlm_host *host)
host->h_nextrebind - jiffies);
}
} else {
- unsigned long increment = nlmsvc_timeout * HZ;
+ unsigned long increment = nlmsvc_timeout;
struct rpc_timeout timeparms = {
.to_initval = increment,
.to_increment = increment,
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-05-14 17:18 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-14 15:54 nfsv3 client process stuck in rwsem_down_failed_common() Frank van Maarseveen
2007-05-14 15:59 ` Trond Myklebust
2007-05-14 16:05 ` Frank van Maarseveen
2007-05-14 16:11 ` Trond Myklebust
2007-05-14 16:15 ` Frank van Maarseveen
2007-05-14 16:32 ` Trond Myklebust
2007-05-14 16:39 ` Frank van Maarseveen
2007-05-14 16:56 ` Trond Myklebust
2007-05-14 17:02 ` Frank van Maarseveen
2007-05-14 17:05 ` Frank van Maarseveen
2007-05-14 17:15 ` Trond Myklebust
2007-05-14 17:17 ` Trond Myklebust
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.