From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760335AbZAHO5q (ORCPT ); Thu, 8 Jan 2009 09:57:46 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755160AbZAHO5h (ORCPT ); Thu, 8 Jan 2009 09:57:37 -0500 Received: from casper.infradead.org ([85.118.1.10]:34880 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752309AbZAHO5g (ORCPT ); Thu, 8 Jan 2009 09:57:36 -0500 Subject: Re: nfsd stuckage From: Peter Zijlstra To: Andrew Morton Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, Neil Brown , "J. Bruce Fields" , Christoph Hellwig In-Reply-To: <20090106145612.d4d9948d.akpm@linux-foundation.org> References: <20090106145612.d4d9948d.akpm@linux-foundation.org> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Thu, 08 Jan 2009 15:57:30 +0100 Message-Id: <1231426650.11687.459.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.24.2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2009-01-06 at 14:56 -0800, Andrew Morton wrote: > I just built current mainline plus the just-sent 266 -mm patches. > > The machine failed to power off when hit with `halt -pfn'. dmesg output: > [ 672.162677] INFO: task nfsd4:4324 blocked for more than 480 seconds. > [ 672.162706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 672.162725] ffff880251df1d60 0000000000000046 ffff88025e1c0580 ffff8802488013d8 > [ 672.162753] ffff880251df1d20 ffff88024c49a7a0 ffff88025e088760 ffff88024c49ab18 > [ 672.162834] 000000002807ee00 00000000ffff59ec ffff880251df1d50 0000000000000282 > [ 672.162865] Call Trace: > [ 672.162880] [] __mutex_lock_slowpath+0x6a/0xac > [ 672.162895] [] mutex_lock+0x2c/0x30 > [ 672.162908] [] vfs_fsync+0x63/0xa9 > [ 672.162933] [] nfsd_sync_dir+0x10/0x12 [nfsd] > [ 672.162960] [] nfsd4_sync_rec_dir+0x27/0x40 [nfsd] > [ 672.162984] [] nfsd4_recdir_purge_old+0x3d/0x6a [nfsd] > [ 672.163023] [] laundromat_main+0x62/0x225 [nfsd] > [ 672.163049] [] ? laundromat_main+0x0/0x225 [nfsd] > [ 672.163064] [] run_workqueue+0x8d/0x124 > [ 672.163076] [] ? worker_thread+0x0/0xe5 > [ 672.163089] [] worker_thread+0xd8/0xe5 > [ 672.163102] [] ? autoremove_wake_function+0x0/0x36 > [ 672.163115] [] ? worker_thread+0x0/0xe5 > [ 672.163127] [] kthread+0x44/0x6b > [ 672.163140] [] child_rip+0xa/0x20 > [ 672.163151] [] ? kthread+0x0/0x6b > [ 672.163162] [] ? child_rip+0x0/0x20 FWIW lockdep seems to warn about this... All I have to do to trigger this is boot the machine and let it sit for a few minutes. [ 113.552497] ============================================= [ 113.553289] [ INFO: possible recursive locking detected ] [ 113.553289] 2.6.28-tip #592 [ 113.553289] --------------------------------------------- [ 113.553289] nfsd4/1914 is trying to acquire lock: [ 113.553289] (&type->i_mutex_dir_key#4){--..}, at: [] vfs_fsync+0x6c/0xb1 [ 113.553289] [ 113.553289] but task is already holding lock: [ 113.553289] (&type->i_mutex_dir_key#4){--..}, at: [] nfsd4_sync_rec_dir+0x22/0x47 [nfsd] [ 113.553289] [ 113.553289] other info that might help us debug this: [ 113.553289] 4 locks held by nfsd4/1914: [ 113.553289] #0: (nfsd4){--..}, at: [] run_workqueue+0xb6/0x21b [ 113.553289] #1: ((laundromat_work).work){--..}, at: [] run_workqueue+0xb6/0x21b [ 113.553289] #2: (client_mutex){--..}, at: [] laundromat_main+0x33/0x24e [nfsd] [ 113.553289] #3: (&type->i_mutex_dir_key#4){--..}, at: [] nfsd4_sync_rec_dir+0x22/0x47 [nfsd] [ 113.553289] [ 113.553289] stack backtrace: [ 113.553289] Pid: 1914, comm: nfsd4 Not tainted 2.6.28-tip #592 [ 113.553289] Call Trace: [ 113.553289] [] __lock_acquire+0xe42/0x161a [ 113.553289] [] ? __call_rcu+0x7a/0x107 [ 113.553289] [] lock_acquire+0x55/0x71 [ 113.553289] [] ? vfs_fsync+0x6c/0xb1 [ 113.553289] [] mutex_lock_nested+0x4e/0x320 [ 113.553289] [] ? vfs_fsync+0x6c/0xb1 [ 113.553289] [] ? __filemap_fdatawrite_range+0x57/0x5f [ 113.553289] [] vfs_fsync+0x6c/0xb1 [ 113.553289] [] nfsd_sync_dir+0x15/0x17 [nfsd] [ 113.553289] [] nfsd4_sync_rec_dir+0x2e/0x47 [nfsd] [ 113.553289] [] nfsd4_recdir_purge_old+0x45/0x73 [nfsd] [ 113.553289] [] laundromat_main+0x72/0x24e [nfsd] [ 113.553289] [] run_workqueue+0x108/0x21b [ 113.553289] [] ? run_workqueue+0xb6/0x21b [ 113.553289] [] ? laundromat_main+0x0/0x24e [nfsd] [ 113.553289] [] worker_thread+0xe5/0xf6 [ 113.553289] [] ? autoremove_wake_function+0x0/0x3d [ 113.553289] [] ? worker_thread+0x0/0xf6 [ 113.553289] [] kthread+0x4e/0x7b [ 113.553289] [] child_rip+0xa/0x20 [ 113.553289] [] ? restore_args+0x0/0x30 [ 113.553289] [] ? kthread+0x0/0x7b [ 113.553289] [] ? child_rip+0x0/0x20