public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Ben Myers <bpm@sgi.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 03/19] xfs: prevent deadlock trying to cover an active log
Date: Thu, 17 Oct 2013 13:51:16 -0500	[thread overview]
Message-ID: <20131017185116.GG1935@sgi.com> (raw)
In-Reply-To: <52600835.9010802@sandeen.net>

On Thu, Oct 17, 2013 at 10:54:29AM -0500, Eric Sandeen wrote:
> On 10/14/13 5:17 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Recent analysis of a deadlocked XFS filesystem from a kernel
> > crash dump indicated that the filesystem was stuck waiting for log
> > space. The short story of the hang on the RHEL6 kernel is this:
> > 
> > 	- the tail of the log is pinned by an inode
> > 	- the inode has been pushed by the xfsaild
> > 	- the inode has been flushed to it's backing buffer and is
> > 	  currently flush locked and hence waiting for backing
> > 	  buffer IO to complete and remove it from the AIL
> > 	- the backing buffer is marked for write - it is on the
> > 	  delayed write queue
> > 	- the inode buffer has been modified directly and logged
> > 	  recently due to unlinked inode list modification
> > 	- the backing buffer is pinned in memory as it is in the
> > 	  active CIL context.
> > 	- the xfsbufd won't start buffer writeback because it is
> > 	  pinned
> > 	- xfssyncd won't force the log because it sees the log as
> > 	  needing to be covered and hence wants to issue a dummy
> > 	  transaction to move the log covering state machine along.
> > 
> > Hence there is no trigger to force the CIL to the log and hence
> > unpin the inode buffer and therefore complete the inode IO, remove
> > it from the AIL and hence move the tail of the log along, allowing
> > transactions to start again.
> > 
> > Mainline kernels also have the same deadlock, though the signature
> > is slightly different - the inode buffer never reaches the delayed
> > write lists because xfs_buf_item_push() sees that it is pinned and
> > hence never adds it to the delayed write list that the xfsaild
> > flushes.
> > 
> > There are two possible solutions here. The first is to simply force
> > the log before trying to cover the log and so ensure that the CIL is
> > emptied before we try to reserve space for the dummy transaction in
> > the xfs_log_worker(). While this might work most of the time, it is
> > still racy and is no guarantee that we don't get stuck in
> > xfs_trans_reserve waiting for log space to come free. Hence it's not
> > the best way to solve the problem.
> > 
> > The second solution is to modify xfs_log_need_covered() to be aware
> > of the CIL. We only should be attempting to cover the log if there
> > is no current activity in the log - covering the log is the process
> > of ensuring that the head and tail in the log on disk are identical
> > (i.e. the log is clean and at idle). Hence, by definition, if there
> > are items in the CIL then the log is not at idle and so we don't
> > need to attempt to cover it.
> > 
> > When we don't need to cover the log because it is active or idle, we
> > issue a log force from xfs_log_worker() - if the log is idle, then
> > this does nothing.  However, if the log is active due to there being
> > items in the CIL, it will force the items in the CIL to the log and
> > unpin them.
> > 
> > In the case of the above deadlock scenario, instead of
> > xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to
> > cover the log, it will instead force the log, thereby unpinning the
> > inode buffer, allowing IO to be issued and complete and hence
> > removing the inode that was pinning the tail of the log from the
> > AIL. At that point, everything will start moving along again. i.e.
> > the xfs_log_worker turns back into a watchdog that can alleviate
> > deadlocks based around pinned items that prevent the tail of the log
> > from being moved...
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>

Applied.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-10-17 18:51 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-14 22:17 [PATCH 00/19 V2] xfs: patches for 3.13 Dave Chinner
2013-10-14 22:17 ` [PATCH 01/19] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
2013-10-14 22:17 ` [PATCH 02/19] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
2013-10-14 22:17 ` [PATCH 03/19] xfs: prevent deadlock trying to cover an active log Dave Chinner
2013-10-17 15:54   ` Eric Sandeen
2013-10-17 18:51     ` Ben Myers [this message]
2013-10-14 22:17 ` [PATCH 04/19] xfs: create a shared header file for format-related information Dave Chinner
2013-10-22 23:12   ` Ben Myers
2013-10-22 23:36     ` [PATCH 04/19, V2] " Dave Chinner
2013-10-14 22:17 ` [PATCH 05/19] xfs: unify directory/attribute format definitions Dave Chinner
2013-10-21 23:11   ` Ben Myers
2013-10-21 23:33     ` Dave Chinner
2013-10-22 14:57       ` Christoph Hellwig
2013-10-14 22:17 ` [PATCH 06/19] xfs: split dquot buffer operations out Dave Chinner
2013-10-14 22:17 ` [PATCH 07/19] xfs: remove unused transaction callback variables Dave Chinner
2013-10-22 14:25   ` Ben Myers
2013-10-14 22:17 ` [PATCH 08/19] xfs: decouple log and transaction headers Dave Chinner
2013-10-22 17:16   ` Ben Myers
2013-10-22 23:50     ` [PATCH 08/19, V2] " Dave Chinner
2013-10-14 22:17 ` [PATCH 09/19] xfs: decouple inode and bmap btree header files Dave Chinner
2013-10-22 18:29   ` Ben Myers
2013-10-22 23:40     ` [PATCH 09/19, V2] " Dave Chinner
2013-10-22 23:48       ` Dave Chinner
2013-10-22 23:51     ` [PATCH 09/19, V3] " Dave Chinner
2013-10-14 22:17 ` [PATCH 10/19] xfs: split xfs_rtalloc.c for userspace sanity Dave Chinner
2013-10-14 22:17 ` [PATCH 11/19] xfs: abstract the differences in dir2/dir3 via an ops vector Dave Chinner
2013-10-14 22:17 ` [PATCH 12/19] xfs: vectorise remaining shortform dir2 ops Dave Chinner
2013-10-14 22:17 ` [PATCH 13/19] xfs: vectorise directory data operations Dave Chinner
2013-10-24 18:39   ` Ben Myers
2013-10-24 21:31     ` Dave Chinner
2013-10-24 21:41       ` Ben Myers
2013-10-24 22:08         ` Dave Chinner
2013-10-24 22:28           ` Ben Myers
2013-10-27 23:01             ` Dave Chinner
2013-10-14 22:18 ` [PATCH 14/19] xfs: vectorise directory data operations part 2 Dave Chinner
2013-10-14 22:18 ` [PATCH 15/19] xfs: vectorise directory leaf operations Dave Chinner
2013-10-20 23:25   ` [PATCH 15/19, V2] " Dave Chinner
2013-10-14 22:18 ` [PATCH 16/19] xfs: vectorise DA btree operations Dave Chinner
2013-10-25 17:17   ` Ben Myers
2013-10-14 22:18 ` [PATCH 17/19] xfs: vectorise encoding/decoding directory headers Dave Chinner
2013-10-25 18:46   ` Ben Myers
2013-10-25 19:20     ` Ben Myers
2013-10-14 22:18 ` [PATCH 18/19] xfs: vectorise directory leaf operations Dave Chinner
2013-10-25 20:18   ` Ben Myers
2013-10-25 21:51   ` Ben Myers
2013-10-14 22:18 ` [PATCH 19/19] xfs: convert directory vector functions to constants Dave Chinner
2013-10-16 21:16   ` Ben Myers
2013-10-16 22:23     ` Dave Chinner
2013-10-16 22:52       ` Ben Myers
2013-10-16 23:06         ` Ben Myers
2013-10-17  0:17           ` Dave Chinner
2013-10-23 22:23             ` Ben Myers
2013-10-25 21:47   ` Ben Myers
2013-10-23 22:23 ` [PATCH 00/19 V2] xfs: patches for 3.13 Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131017185116.GG1935@sgi.com \
    --to=bpm@sgi.com \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox