From mboxrd@z Thu Jan 1 00:00:00 1970 From: Timothy Shimmin Subject: Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Date: Tue, 22 Jul 2008 11:25:35 +1000 Message-ID: <4885370F.9000301@sgi.com> References: <18565.6095.988483.628391@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <18565.6095.988483.628391@notabene.brown> Sender: linux-raid-owner@vger.kernel.org To: Neil Brown Cc: "Mr. James W. Laferriere" , linux-raid maillist , xfs@oss.sgi.com List-Id: linux-raid.ids Neil Brown wrote: > On Monday July 21, babydr@baby-dragons.com wrote: >> INFO: task pdflush:393 blocked for more than 120 seconds. >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> pdflush D c8209f80 4748 393 2 >> f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000 >> f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44 >> 00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30 >> Call Trace: >> [] ? mark_held_locks+0x67/0x80 >> [] ? add_wait_queue+0x33/0x50 >> [] xfs_buf_wait_unpin+0xb5/0xe0 >> [] ? default_wake_function+0x0/0x10 >> [] ? default_wake_function+0x0/0x10 >> [] xfs_buf_iorequest+0x4b/0x80 >> [] xfs_bdstrat_cb+0x3e/0x50 >> [] xfs_bwrite+0x5c/0xe0 >> [] xfs_syncsub+0x121/0x2b0 >> [] ? lock_super+0x1b/0x20 >> [] ? lock_super+0x1b/0x20 >> [] xfs_sync+0x48/0x70 >> [] xfs_fs_write_super+0x23/0x30 >> [] sync_supers+0xaf/0xc0 > > Looks a lot like an XFS problem to me. > Or at least, XFS people would be able to interpret this stack the > best. > I presume if it is waiting in xfs_buf_wait_unpin() for a long time (>2min) then maybe a journal-log io completion hasn't come back to say that the matching buffer item has made to the ondisk log. i.e the buffer hasn't been unpinned yet (pincount>0) which is supposed to happen when its data hits the ondisk log. >> [] wb_kupdate+0x29/0x100 >> [] ? __pdflush+0xcc/0x1a0 >> [] __pdflush+0xd2/0x1a0 >> [] ? pdflush+0x0/0x40 >> [] pdflush+0x31/0x40 >> [] ? wb_kupdate+0x0/0x100 >> [] ? pdflush+0x0/0x40 >> [] kthread+0x5c/0xa0 >> [] ? kthread+0x0/0xa0 >> [] kernel_thread_helper+0x7/0x10 >> ======================= >> 2 locks held by pdflush/393: >> #0: (&type->s_umount_key#17){----}, at: [] sync_supers+0x52/0xc0 >> #1: (&type->s_lock_key#7){--..}, at: [] lock_super+0x1b/0x20 >> >> ...snip... Repeats of above message ad-infintum . > > > Hmm... I guess I clipped a bit too much for our XFS friends to know > the context. > bonnie is being run on an XFS filesystem on md/raid6. and it gets > this warning a lot and essentially hangs. > Just for the record, in rc-9 we hadn't removed the QUEUE_ORDERED tag check yet and so I presume for md/raid6, barriers will be disabled. So barrier writes on the log won't be being issued. I don't see that as anything to do with the problem here - that is more of an issue on replay if we have the cache on and no barrier support - I just thought I'd mention it. --Tim