From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Date: Tue, 22 Jul 2008 12:20:50 +1000 Message-ID: <20080722022050.GG6761@disturbed> References: <18565.6095.988483.628391@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: "Mr. James W. Laferriere" Cc: Neil Brown , linux-raid maillist , xfs@oss.sgi.com List-Id: linux-raid.ids On Mon, Jul 21, 2008 at 03:43:03PM -0800, Mr. James W. Laferriere wrote: > Hello Neil , > > On Tue, 22 Jul 2008, Neil Brown wrote: >> On Monday July 21, babydr@baby-dragons.com wrote: >>> INFO: task pdflush:393 blocked for more than 120 seconds. >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> pdflush D c8209f80 4748 393 2 >>> f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000 >>> f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44 >>> 00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30 >>> Call Trace: >>> [] ? mark_held_locks+0x67/0x80 >>> [] ? add_wait_queue+0x33/0x50 >>> [] xfs_buf_wait_unpin+0xb5/0xe0 >>> [] ? default_wake_function+0x0/0x10 >>> [] ? default_wake_function+0x0/0x10 >>> [] xfs_buf_iorequest+0x4b/0x80 >>> [] xfs_bdstrat_cb+0x3e/0x50 >>> [] xfs_bwrite+0x5c/0xe0 >>> [] xfs_syncsub+0x121/0x2b0 >>> [] ? lock_super+0x1b/0x20 >>> [] ? lock_super+0x1b/0x20 >>> [] xfs_sync+0x48/0x70 >>> [] xfs_fs_write_super+0x23/0x30 >>> [] sync_supers+0xaf/0xc0 >> >> Looks a lot like an XFS problem to me. >> Or at least, XFS people would be able to interpret this stack the >> best. > Hmm , Ok , I'll post there , I can provide a -complete- boot -> > renboot log of the actions , But it ain't small ~ 649K . So I'll post > that on the back of my website , ie: > > http://www.baby-dragons.com/bonnie++1.03c-2.6.26-rc9.console.trace.log Given that it's a log hang on 2.6.29-rc9, I'd first say add this commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=49641f1acfdfd437ed9b0a70b86bf36626c02afe to your build (went in after -rc9 but before 2.6.26 was released) and see if that solves the problem. In more detail, this stack trace implies log I/O has not completed after the log force was triggered in xfs_buf_wait_unpin(). The above patch fixes a bug in log I/o dispatch where an non-atomic compare and decrement would result in log I/O not being dispatched. So, you've got a hang waiting for log I/o to complete on a kernel that has a known problem with log I/O dispatch, so it's likely that's what you've hit. Cheers, Dave. -- Dave Chinner david@fromorbit.com