nfsv3 client process stuck in rwsem_down_failed

All of lore.kernel.org
 help / color / mirror / Atom feed

* nfsv3 client process stuck in rwsem_down_failed_common()
@ 2007-05-14 15:54 Frank van Maarseveen
  2007-05-14 15:59 ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 15:54 UTC (permalink / raw)
  To: Linux NFS mailing list

On a 2.6.21.1 NFSv3 client box multiple processes got stuck with
this trace:

[<c02926e5>] rwsem_down_failed_common+0x85/0x180
[<c052a36d>] rwsem_down_read_failed+0x1d/0x30
[<c052a437>] call_rwsem_down_read_failed+0x7/0x10
[<c022622e>] nlmclnt_unlock+0x2e/0xc0
[<c02258da>] nlmclnt_proc+0x29a/0x2d0
[<c01f088e>] nfs3_proc_lock+0xe/0x10
[<c01e3904>] do_unlk+0x44/0x70
[<c01e3a9d>] nfs_lock+0xbd/0x120
[<c017dfd1>] locks_remove_posix+0xb1/0xc0
[<c016dc8d>] filp_close+0x2d/0x70
[<c01248a6>] close_files+0x56/0x70
[<c012490c>] put_files_struct+0x1c/0x50
[<c012533a>] do_exit+0x13a/0x3f0
[<c0125649>] do_group_exit+0x29/0x70
[<c012e73f>] get_signal_to_deliver+0x21f/0x2b0
[<c0103e96>] do_signal+0x56/0x160
[<c0103fde>] do_notify_resume+0x3e/0x40
[<c01041ae>] work_notifysig+0x13/0x25

Two processes had an independent shared read lock on different files
and when killing them with ^C they got stuck in state 'D' with above
stack trace. I'm not sure what brought then there other than that the
server went through a number of unusual reboots for testing purposes.

-- 
Frank

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 15:54 nfsv3 client process stuck in rwsem_down_failed_common() Frank van Maarseveen
@ 2007-05-14 15:59 ` Trond Myklebust
  2007-05-14 16:05   ` Frank van Maarseveen
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 15:59 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Linux NFS mailing list

On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote:
> On a 2.6.21.1 NFSv3 client box multiple processes got stuck with
> this trace:
> 
> [<c02926e5>] rwsem_down_failed_common+0x85/0x180
> [<c052a36d>] rwsem_down_read_failed+0x1d/0x30
> [<c052a437>] call_rwsem_down_read_failed+0x7/0x10
> [<c022622e>] nlmclnt_unlock+0x2e/0xc0
> [<c02258da>] nlmclnt_proc+0x29a/0x2d0
> [<c01f088e>] nfs3_proc_lock+0xe/0x10
> [<c01e3904>] do_unlk+0x44/0x70
> [<c01e3a9d>] nfs_lock+0xbd/0x120
> [<c017dfd1>] locks_remove_posix+0xb1/0xc0
> [<c016dc8d>] filp_close+0x2d/0x70
> [<c01248a6>] close_files+0x56/0x70
> [<c012490c>] put_files_struct+0x1c/0x50
> [<c012533a>] do_exit+0x13a/0x3f0
> [<c0125649>] do_group_exit+0x29/0x70
> [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0
> [<c0103e96>] do_signal+0x56/0x160
> [<c0103fde>] do_notify_resume+0x3e/0x40
> [<c01041ae>] work_notifysig+0x13/0x25
> 
> Two processes had an independent shared read lock on different files
> and when killing them with ^C they got stuck in state 'D' with above
> stack trace. I'm not sure what brought then there other than that the
> server went through a number of unusual reboots for testing purposes.

Are there any processes with a name of the form '<hostname>-reclaim'
hanging too?

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 15:59 ` Trond Myklebust
@ 2007-05-14 16:05   ` Frank van Maarseveen
  2007-05-14 16:11     ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 16:05 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen

On Mon, May 14, 2007 at 11:59:45AM -0400, Trond Myklebust wrote:
> On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote:
> > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with
> > this trace:
> > 
> > [<c02926e5>] rwsem_down_failed_common+0x85/0x180
> > [<c052a36d>] rwsem_down_read_failed+0x1d/0x30
> > [<c052a437>] call_rwsem_down_read_failed+0x7/0x10
> > [<c022622e>] nlmclnt_unlock+0x2e/0xc0
> > [<c02258da>] nlmclnt_proc+0x29a/0x2d0
> > [<c01f088e>] nfs3_proc_lock+0xe/0x10
> > [<c01e3904>] do_unlk+0x44/0x70
> > [<c01e3a9d>] nfs_lock+0xbd/0x120
> > [<c017dfd1>] locks_remove_posix+0xb1/0xc0
> > [<c016dc8d>] filp_close+0x2d/0x70
> > [<c01248a6>] close_files+0x56/0x70
> > [<c012490c>] put_files_struct+0x1c/0x50
> > [<c012533a>] do_exit+0x13a/0x3f0
> > [<c0125649>] do_group_exit+0x29/0x70
> > [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0
> > [<c0103e96>] do_signal+0x56/0x160
> > [<c0103fde>] do_notify_resume+0x3e/0x40
> > [<c01041ae>] work_notifysig+0x13/0x25
> > 
> > Two processes had an independent shared read lock on different files
> > and when killing them with ^C they got stuck in state 'D' with above
> > stack trace. I'm not sure what brought then there other than that the
> > server went through a number of unusual reboots for testing purposes.
> 
> Are there any processes with a name of the form '<hostname>-reclaim'
> hanging too?

yes, two of them, each for a different NFS server (as I would expect).
The traces are identical:

[<c0512f1d>] rpc_wait_bit_interruptible+0x1d/0x30
[<c0529114>] __wait_on_bit+0x44/0x70
[<c05291bd>] out_of_line_wait_on_bit+0x7d/0x90
[<c05137f5>] __rpc_execute+0xa5/0x1e0
[<c0513949>] rpc_execute+0x19/0x20
[<c050da56>] rpc_call_sync+0x96/0xa0
[<c0225b17>] nlmclnt_call+0x77/0x1e0
[<c02261ac>] nlmclnt_reclaim+0x6c/0xc0
[<c0225236>] reclaimer+0x106/0x1f0
[<c0105317>] kernel_thread_helper+0x7/0x10

-- 
Frank

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 16:05   ` Frank van Maarseveen
@ 2007-05-14 16:11     ` Trond Myklebust
  2007-05-14 16:15       ` Frank van Maarseveen
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 16:11 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Linux NFS mailing list

On Mon, 2007-05-14 at 18:05 +0200, Frank van Maarseveen wrote:
> On Mon, May 14, 2007 at 11:59:45AM -0400, Trond Myklebust wrote:
> > On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote:
> > > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with
> > > this trace:
> > > 
> > > [<c02926e5>] rwsem_down_failed_common+0x85/0x180
> > > [<c052a36d>] rwsem_down_read_failed+0x1d/0x30
> > > [<c052a437>] call_rwsem_down_read_failed+0x7/0x10
> > > [<c022622e>] nlmclnt_unlock+0x2e/0xc0
> > > [<c02258da>] nlmclnt_proc+0x29a/0x2d0
> > > [<c01f088e>] nfs3_proc_lock+0xe/0x10
> > > [<c01e3904>] do_unlk+0x44/0x70
> > > [<c01e3a9d>] nfs_lock+0xbd/0x120
> > > [<c017dfd1>] locks_remove_posix+0xb1/0xc0
> > > [<c016dc8d>] filp_close+0x2d/0x70
> > > [<c01248a6>] close_files+0x56/0x70
> > > [<c012490c>] put_files_struct+0x1c/0x50
> > > [<c012533a>] do_exit+0x13a/0x3f0
> > > [<c0125649>] do_group_exit+0x29/0x70
> > > [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0
> > > [<c0103e96>] do_signal+0x56/0x160
> > > [<c0103fde>] do_notify_resume+0x3e/0x40
> > > [<c01041ae>] work_notifysig+0x13/0x25
> > > 
> > > Two processes had an independent shared read lock on different files
> > > and when killing them with ^C they got stuck in state 'D' with above
> > > stack trace. I'm not sure what brought then there other than that the
> > > server went through a number of unusual reboots for testing purposes.
> > 
> > Are there any processes with a name of the form '<hostname>-reclaim'
> > hanging too?
> 
> yes, two of them, each for a different NFS server (as I would expect).

Are the NFS servers up and running?

> The traces are identical:
> 
> [<c0512f1d>] rpc_wait_bit_interruptible+0x1d/0x30
> [<c0529114>] __wait_on_bit+0x44/0x70
> [<c05291bd>] out_of_line_wait_on_bit+0x7d/0x90
> [<c05137f5>] __rpc_execute+0xa5/0x1e0
> [<c0513949>] rpc_execute+0x19/0x20
> [<c050da56>] rpc_call_sync+0x96/0xa0
> [<c0225b17>] nlmclnt_call+0x77/0x1e0
> [<c02261ac>] nlmclnt_reclaim+0x6c/0xc0
> [<c0225236>] reclaimer+0x106/0x1f0
> [<c0105317>] kernel_thread_helper+0x7/0x10

Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
find out on which rpc queue these tasks are sleeping?

Cheers
  Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 16:11     ` Trond Myklebust
@ 2007-05-14 16:15       ` Frank van Maarseveen
  2007-05-14 16:32         ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 16:15 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen

On Mon, May 14, 2007 at 12:11:34PM -0400, Trond Myklebust wrote:
> On Mon, 2007-05-14 at 18:05 +0200, Frank van Maarseveen wrote:
> > On Mon, May 14, 2007 at 11:59:45AM -0400, Trond Myklebust wrote:
> > > On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote:
> > > > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with
> > > > this trace:
> > > > 
> > > > [<c02926e5>] rwsem_down_failed_common+0x85/0x180
> > > > [<c052a36d>] rwsem_down_read_failed+0x1d/0x30
> > > > [<c052a437>] call_rwsem_down_read_failed+0x7/0x10
> > > > [<c022622e>] nlmclnt_unlock+0x2e/0xc0
> > > > [<c02258da>] nlmclnt_proc+0x29a/0x2d0
> > > > [<c01f088e>] nfs3_proc_lock+0xe/0x10
> > > > [<c01e3904>] do_unlk+0x44/0x70
> > > > [<c01e3a9d>] nfs_lock+0xbd/0x120
> > > > [<c017dfd1>] locks_remove_posix+0xb1/0xc0
> > > > [<c016dc8d>] filp_close+0x2d/0x70
> > > > [<c01248a6>] close_files+0x56/0x70
> > > > [<c012490c>] put_files_struct+0x1c/0x50
> > > > [<c012533a>] do_exit+0x13a/0x3f0
> > > > [<c0125649>] do_group_exit+0x29/0x70
> > > > [<c012e73f>] get_signal_to_deliver+0x21f/0x2b0
> > > > [<c0103e96>] do_signal+0x56/0x160
> > > > [<c0103fde>] do_notify_resume+0x3e/0x40
> > > > [<c01041ae>] work_notifysig+0x13/0x25
> > > > 
> > > > Two processes had an independent shared read lock on different files
> > > > and when killing them with ^C they got stuck in state 'D' with above
> > > > stack trace. I'm not sure what brought then there other than that the
> > > > server went through a number of unusual reboots for testing purposes.
> > > 
> > > Are there any processes with a name of the form '<hostname>-reclaim'
> > > hanging too?
> > 
> > yes, two of them, each for a different NFS server (as I would expect).
> 
> Are the NFS servers up and running?

yes, I also ran a tcpdump for one of them but did not see any activity.

> 
> > The traces are identical:
> > 
> > [<c0512f1d>] rpc_wait_bit_interruptible+0x1d/0x30
> > [<c0529114>] __wait_on_bit+0x44/0x70
> > [<c05291bd>] out_of_line_wait_on_bit+0x7d/0x90
> > [<c05137f5>] __rpc_execute+0xa5/0x1e0
> > [<c0513949>] rpc_execute+0x19/0x20
> > [<c050da56>] rpc_call_sync+0x96/0xa0
> > [<c0225b17>] nlmclnt_call+0x77/0x1e0
> > [<c02261ac>] nlmclnt_reclaim+0x6c/0xc0
> > [<c0225236>] reclaimer+0x106/0x1f0
> > [<c0105317>] kernel_thread_helper+0x7/0x10
> 
> Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> find out on which rpc queue these tasks are sleeping?

-pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
30871 0002 0480      0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
30873 0002 0480      0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4

-- 
Frank

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 16:15       ` Frank van Maarseveen
@ 2007-05-14 16:32         ` Trond Myklebust
  2007-05-14 16:39           ` Frank van Maarseveen
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 16:32 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Linux NFS mailing list

On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > find out on which rpc queue these tasks are sleeping?
> 
> -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> 30871 0002 0480      0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> 30873 0002 0480      0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
                                                  ^^^^^^^^ Ouch!

That is a pretty massive timeout. What is your value
of /proc/sys/fs/nfs/nlm_timeout ?

Cheers
  Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 16:32         ` Trond Myklebust
@ 2007-05-14 16:39           ` Frank van Maarseveen
  2007-05-14 16:56             ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 16:39 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen

On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote:
> On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > > find out on which rpc queue these tasks are sleeping?
> > 
> > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> > 30871 0002 0480      0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> > 30873 0002 0480      0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
>                                                   ^^^^^^^^ Ouch!
> 
> That is a pretty massive timeout. What is your value
> of /proc/sys/fs/nfs/nlm_timeout ?

Unfortunately it became necessary to reboot the machine :-(. Right now it says 10.

-- 
Frank

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 16:39           ` Frank van Maarseveen
@ 2007-05-14 16:56             ` Trond Myklebust
  2007-05-14 17:02               ` Frank van Maarseveen
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 16:56 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Linux NFS mailing list

On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote:
> On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote:
> > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > > > find out on which rpc queue these tasks are sleeping?
> > > 
> > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> > > 30871 0002 0480      0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> > > 30873 0002 0480      0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
> >                                                   ^^^^^^^^ Ouch!
> > 
> > That is a pretty massive timeout. What is your value
> > of /proc/sys/fs/nfs/nlm_timeout ?
> 
> Unfortunately it became necessary to reboot the machine :-(. Right now it says 10.

10 seconds looks like the correct default. I assume that you hadn't
changed that value prior to the reboot...

One last question, just in case: what value are you using for CONFIG_HZ?

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 16:56             ` Trond Myklebust
@ 2007-05-14 17:02               ` Frank van Maarseveen
  2007-05-14 17:05                 ` Frank van Maarseveen
  0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 17:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS mailing list, Frank van Maarseveen

On Mon, May 14, 2007 at 12:56:03PM -0400, Trond Myklebust wrote:
> On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote:
> > On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote:
> > > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > > > > find out on which rpc queue these tasks are sleeping?
> > > > 
> > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> > > > 30871 0002 0480      0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> > > > 30873 0002 0480      0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
> > >                                                   ^^^^^^^^ Ouch!
> > > 
> > > That is a pretty massive timeout. What is your value
> > > of /proc/sys/fs/nfs/nlm_timeout ?
> > 
> > Unfortunately it became necessary to reboot the machine :-(. Right now it says 10.
> 
> 10 seconds looks like the correct default. I assume that you hadn't
> changed that value prior to the reboot...

right, I didn't knew it existed and I'm not aware of any command which
can change it. Did some grepping around and it didn't turn up anything.

> 
> One last question, just in case: what value are you using for CONFIG_HZ?

1000

-- 
Frank

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 17:02               ` Frank van Maarseveen
@ 2007-05-14 17:05                 ` Frank van Maarseveen
  2007-05-14 17:15                   ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Frank van Maarseveen @ 2007-05-14 17:05 UTC (permalink / raw)
  To: Linux NFS mailing list; +Cc: Trond Myklebust

On Mon, May 14, 2007 at 07:02:16PM +0200, Frank van Maarseveen wrote:
> On Mon, May 14, 2007 at 12:56:03PM -0400, Trond Myklebust wrote:
> > On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote:
> > > On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote:
> > > > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > > > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > > > > > find out on which rpc queue these tasks are sleeping?
> > > > > 
> > > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> > > > > 30871 0002 0480      0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> > > > > 30873 0002 0480      0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
> > > >                                                   ^^^^^^^^ Ouch!
> > > > 
> > > > That is a pretty massive timeout. What is your value
> > > > of /proc/sys/fs/nfs/nlm_timeout ?
> > > 
> > > Unfortunately it became necessary to reboot the machine :-(. Right now it says 10.
> > 
> > 10 seconds looks like the correct default. I assume that you hadn't
> > changed that value prior to the reboot...
> 
> right, I didn't knew it existed and I'm not aware of any command which
> can change it. Did some grepping around and it didn't turn up anything.
> 
> > 
> > One last question, just in case: what value are you using for CONFIG_HZ?
> 
> 1000

hmm, so the timeout has become 10 * HZ * HZ?


-- 
Frank

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 17:05                 ` Frank van Maarseveen
@ 2007-05-14 17:15                   ` Trond Myklebust
  2007-05-14 17:17                     ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 17:15 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Linux NFS mailing list

On Mon, 2007-05-14 at 19:05 +0200, Frank van Maarseveen wrote:
> On Mon, May 14, 2007 at 07:02:16PM +0200, Frank van Maarseveen wrote:
> > On Mon, May 14, 2007 at 12:56:03PM -0400, Trond Myklebust wrote:
> > > On Mon, 2007-05-14 at 18:39 +0200, Frank van Maarseveen wrote:
> > > > On Mon, May 14, 2007 at 12:32:59PM -0400, Trond Myklebust wrote:
> > > > > On Mon, 2007-05-14 at 18:15 +0200, Frank van Maarseveen wrote:
> > > > > > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to
> > > > > > > find out on which rpc queue these tasks are sleeping?
> > > > > > 
> > > > > > -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
> > > > > > 30871 0002 0480      0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4
> > > > > > 30873 0002 0480      0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4
> > > > >                                                   ^^^^^^^^ Ouch!
> > > > > 
> > > > > That is a pretty massive timeout. What is your value
> > > > > of /proc/sys/fs/nfs/nlm_timeout ?
> > > > 
> > > > Unfortunately it became necessary to reboot the machine :-(. Right now it says 10.
> > > 
> > > 10 seconds looks like the correct default. I assume that you hadn't
> > > changed that value prior to the reboot...
> > 
> > right, I didn't knew it existed and I'm not aware of any command which
> > can change it. Did some grepping around and it didn't turn up anything.
> > 
> > > 
> > > One last question, just in case: what value are you using for CONFIG_HZ?
> > 
> > 1000
> 
> hmm, so the timeout has become 10 * HZ * HZ?

Yeah... nlmsvc_timeout is already in HZ, so the line in fs/lockd/host.c
is wrong...

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsv3 client process stuck in rwsem_down_failed_common()
  2007-05-14 17:15                   ` Trond Myklebust
@ 2007-05-14 17:17                     ` Trond Myklebust
  0 siblings, 0 replies; 12+ messages in thread
From: Trond Myklebust @ 2007-05-14 17:17 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Linux NFS mailing list

On Mon, 2007-05-14 at 13:15 -0400, Trond Myklebust wrote:
> > > > 
> > > > One last question, just in case: what value are you using for CONFIG_HZ?
> > > 
> > > 1000
> > 
> > hmm, so the timeout has become 10 * HZ * HZ?
> 
> Yeah... nlmsvc_timeout is already in HZ, so the line in fs/lockd/host.c
> is wrong...

The following patch should fix it.

Trond

---------------------------------------------
commit 471c10cf35e2743746d9a5f671d9cceeb61393c7
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Mon May 14 13:16:36 2007 -0400

    NLM: Fix locking client timeouts...
    
    nlmsvc_timeout is already in units of HZ...
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index ad21c07..96070bf 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -221,7 +221,7 @@ nlm_bind_host(struct nlm_host *host)
 					host->h_nextrebind - jiffies);
 		}
 	} else {
-		unsigned long increment = nlmsvc_timeout * HZ;
+		unsigned long increment = nlmsvc_timeout;
 		struct rpc_timeout timeparms = {
 			.to_initval	= increment,
 			.to_increment	= increment,



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-05-14 17:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-14 15:54 nfsv3 client process stuck in rwsem_down_failed_common() Frank van Maarseveen
2007-05-14 15:59 ` Trond Myklebust
2007-05-14 16:05   ` Frank van Maarseveen
2007-05-14 16:11     ` Trond Myklebust
2007-05-14 16:15       ` Frank van Maarseveen
2007-05-14 16:32         ` Trond Myklebust
2007-05-14 16:39           ` Frank van Maarseveen
2007-05-14 16:56             ` Trond Myklebust
2007-05-14 17:02               ` Frank van Maarseveen
2007-05-14 17:05                 ` Frank van Maarseveen
2007-05-14 17:15                   ` Trond Myklebust
2007-05-14 17:17                     ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.