From: Dave Chinner <david@fromorbit.com>
To: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
xfs@oss.sgi.com, Alan Piszcz <ap@solarrain.com>
Subject: Re: 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available)
Date: Mon, 19 Oct 2009 14:04:56 +1100 [thread overview]
Message-ID: <20091019030456.GS9464@discord.disaster> (raw)
In-Reply-To: <alpine.DEB.2.00.0910181607040.27363@p34.internal.lan>
On Sun, Oct 18, 2009 at 04:17:42PM -0400, Justin Piszcz wrote:
> It has happened again, all sysrq-X output was saved this time.
>
> wget http://home.comcast.net/~jpiszcz/20091018/crash.txt
> wget http://home.comcast.net/~jpiszcz/20091018/dmesg.txt
> wget http://home.comcast.net/~jpiszcz/20091018/interrupts.txt
> wget http://home.comcast.net/~jpiszcz/20091018/sysrq-l.txt
> wget http://home.comcast.net/~jpiszcz/20091018/sysrq-m.txt
> wget http://home.comcast.net/~jpiszcz/20091018/sysrq-p.txt
> wget http://home.comcast.net/~jpiszcz/20091018/sysrq-q.txt
> wget http://home.comcast.net/~jpiszcz/20091018/sysrq-t.txt
> wget http://home.comcast.net/~jpiszcz/20091018/sysrq-w.txt
.....
> Again, some more D-state processes:
>
> [76325.608073] pdflush D 0000000000000001 0 362 2 0x00000000
> [76325.608087] Call Trace:
> [76325.608095] [<ffffffff811ea1c0>] ? xfs_trans_brelse+0x30/0x130
> [76325.608099] [<ffffffff811dc44c>] ? xlog_state_sync+0x26c/0x2a0
> [76325.608103] [<ffffffff810513e0>] ? default_wake_function+0x0/0x10
> [76325.608106] [<ffffffff811dc4d1>] ? _xfs_log_force+0x51/0x80
> [76325.608108] [<ffffffff811dc50b>] ? xfs_log_force+0xb/0x40
>
> [76325.608202] xfssyncd D 0000000000000000 0 831 2 0x00000000
> [76325.608214] Call Trace:
> [76325.608216] [<ffffffff811dc229>] ? xlog_state_sync+0x49/0x2a0
> [76325.608220] [<ffffffff811d3485>] ? __xfs_iunpin_wait+0x95/0xe0
> [76325.608222] [<ffffffff81069c20>] ? autoremove_wake_function+0x0/0x30
> [76325.608225] [<ffffffff811d566d>] ? xfs_iflush+0xdd/0x2f0
> [76325.608228] [<ffffffff811fbe28>] ? xfs_reclaim_inode+0x148/0x190
> [76325.608231] [<ffffffff811fbe70>] ? xfs_reclaim_inode_now+0x0/0xa0
> [76325.608233] [<ffffffff811fc8dc>] ? xfs_inode_ag_walk+0x6c/0xc0
> [76325.608236] [<ffffffff811fbe70>] ? xfs_reclaim_inode_now+0x0/0xa0
>
> All of the D-state processes:
All pointing to log IO not completing.
That is, all of the D state processes are backed up on locks or
waiting for IO completion processing. A lot of the processes are
waiting for _xfs_log_force to complete, others are waiting for
inodes to be unpinned or are backed up behind locked inodes that are
waiting on log IO to complete before they can complete the
transaction and unlock the inode, and so on.
Unfortunately, the xfslogd and xfsdatad kernel threads are not
present in any of the output given, so I can't tell if these have
deadlocked themselves and caused the problem. However, my experience
with such pile-ups is that an I/O completion has not been run for
some reason and that is the cause of the problem. I don't know if
you can provide enough information to tell us if this happened or
not. Instead, do you have a test case that you can share?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2009-10-19 3:04 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-17 22:34 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) Justin Piszcz
2009-10-18 20:17 ` Justin Piszcz
2009-10-19 3:04 ` Dave Chinner [this message]
2009-10-19 10:18 ` Justin Piszcz
2009-10-20 0:33 ` Dave Chinner
2009-10-20 8:33 ` Justin Piszcz
2009-10-21 10:19 ` Justin Piszcz
2009-10-21 14:17 ` mdadm --detail showing annoying device Stephane Bunel
2009-10-21 21:46 ` Neil Brown
2009-10-22 11:22 ` Stephane Bunel
2009-10-29 3:44 ` Neil Brown
2009-11-03 9:37 ` Stephane Bunel
2009-11-03 10:09 ` Beolach
2009-11-03 12:16 ` Stephane Bunel
2009-10-22 11:29 ` Mario 'BitKoenig' Holbe
2009-10-22 14:17 ` Stephane Bunel
2009-10-22 16:00 ` Stephane Bunel
2009-10-22 22:49 ` 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) Justin Piszcz
2009-10-22 23:00 ` Dave Chinner
2009-10-26 11:24 ` Justin Piszcz
2009-11-02 21:46 ` Justin Piszcz
2009-11-20 20:39 ` 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) - root cause found = asterisk Justin Piszcz
2009-11-20 23:44 ` Bug#557262: " Faidon Liambotis
2009-11-20 23:51 ` Justin Piszcz
2009-11-21 14:29 ` Roger Heflin
2009-11-24 13:08 ` Which kernel options should be enabled to find the root cause of this bug? Justin Piszcz
2009-11-24 15:14 ` Eric Sandeen
2009-11-24 16:20 ` Justin Piszcz
2009-11-24 16:23 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091019030456.GS9464@discord.disaster \
--to=david@fromorbit.com \
--cc=ap@solarrain.com \
--cc=jpiszcz@lucidpixels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).