From mboxrd@z Thu Jan 1 00:00:00 1970 From: Frank van Maarseveen Subject: Re: nfsv3 client process stuck in rwsem_down_failed_common() Date: Mon, 14 May 2007 18:15:12 +0200 Message-ID: <20070514161512.GC5169@janus> References: <20070514155449.GA5169@janus> <1179158385.6474.11.camel@heimdal.trondhjem.org> <20070514160547.GB5169@janus> <1179159094.6474.21.camel@heimdal.trondhjem.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Linux NFS mailing list , Frank van Maarseveen To: Trond Myklebust Return-path: Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.92] helo=mail.sourceforge.net) by sc8-sf-list2-new.sourceforge.net with esmtp (Exim 4.43) id 1HndCZ-00085P-Q8 for nfs@lists.sourceforge.net; Mon, 14 May 2007 09:15:12 -0700 Received: from frankvm.xs4all.nl ([80.126.170.174] helo=janus.localdomain) by mail.sourceforge.net with esmtp (Exim 4.44) id 1HndCc-00089r-4R for nfs@lists.sourceforge.net; Mon, 14 May 2007 09:15:14 -0700 In-Reply-To: <1179159094.6474.21.camel@heimdal.trondhjem.org> List-Id: "Discussion of NFS under Linux development, interoperability, and testing." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfs-bounces@lists.sourceforge.net Errors-To: nfs-bounces@lists.sourceforge.net On Mon, May 14, 2007 at 12:11:34PM -0400, Trond Myklebust wrote: > On Mon, 2007-05-14 at 18:05 +0200, Frank van Maarseveen wrote: > > On Mon, May 14, 2007 at 11:59:45AM -0400, Trond Myklebust wrote: > > > On Mon, 2007-05-14 at 17:54 +0200, Frank van Maarseveen wrote: > > > > On a 2.6.21.1 NFSv3 client box multiple processes got stuck with > > > > this trace: > > > > > > > > [] rwsem_down_failed_common+0x85/0x180 > > > > [] rwsem_down_read_failed+0x1d/0x30 > > > > [] call_rwsem_down_read_failed+0x7/0x10 > > > > [] nlmclnt_unlock+0x2e/0xc0 > > > > [] nlmclnt_proc+0x29a/0x2d0 > > > > [] nfs3_proc_lock+0xe/0x10 > > > > [] do_unlk+0x44/0x70 > > > > [] nfs_lock+0xbd/0x120 > > > > [] locks_remove_posix+0xb1/0xc0 > > > > [] filp_close+0x2d/0x70 > > > > [] close_files+0x56/0x70 > > > > [] put_files_struct+0x1c/0x50 > > > > [] do_exit+0x13a/0x3f0 > > > > [] do_group_exit+0x29/0x70 > > > > [] get_signal_to_deliver+0x21f/0x2b0 > > > > [] do_signal+0x56/0x160 > > > > [] do_notify_resume+0x3e/0x40 > > > > [] work_notifysig+0x13/0x25 > > > > > > > > Two processes had an independent shared read lock on different files > > > > and when killing them with ^C they got stuck in state 'D' with above > > > > stack trace. I'm not sure what brought then there other than that the > > > > server went through a number of unusual reboots for testing purposes. > > > > > > Are there any processes with a name of the form '-reclaim' > > > hanging too? > > > > yes, two of them, each for a different NFS server (as I would expect). > > Are the NFS servers up and running? yes, I also ran a tcpdump for one of them but did not see any activity. > > > The traces are identical: > > > > [] rpc_wait_bit_interruptible+0x1d/0x30 > > [] __wait_on_bit+0x44/0x70 > > [] out_of_line_wait_on_bit+0x7d/0x90 > > [] __rpc_execute+0xa5/0x1e0 > > [] rpc_execute+0x19/0x20 > > [] rpc_call_sync+0x96/0xa0 > > [] nlmclnt_call+0x77/0x1e0 > > [] nlmclnt_reclaim+0x6c/0xc0 > > [] reclaimer+0x106/0x1f0 > > [] kernel_thread_helper+0x7/0x10 > > Could you please use 'echo 0 >/proc/sys/sunrpc/rpc_debug' in order to > find out on which rpc queue these tasks are sleeping? -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops-- 30871 0002 0480 0 c7708614 100021 f43f4000 10000000 xprt_pending c050e4d0 c057f3f4 30873 0002 0480 0 f00b4eb4 100021 cc809000 10000000 xprt_pending c050e4d0 c057f3f4 -- Frank ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs