public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Timothy Shimmin <tes@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Lachlan McIlroy <lachlan@sgi.com>,
	Christoph Hellwig <hch@infradead.org>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: deadlock with latest xfs
Date: Mon, 27 Oct 2008 13:30:58 +1100	[thread overview]
Message-ID: <490527E2.5000600@sgi.com> (raw)
In-Reply-To: <20081026223940.GN18495@disturbed>

Dave Chinner wrote:
> Ok, I think I've found the regression - it's introduced by the AIL
> cursor modifications. The patch below has been running for 15
> minutes now on my UML box that would have hung in a couple of
> minutes otherwise.
> 
> FYI, the way I found this was:
> 
> 	- put a breakpoint on xfs_create() once the fs hung
> 	- `touch /mnt/xfs2/fred` to trigger the break point.
> 	- look at:
> 		- mp->m_ail->xa_target
> 		- mp->m_ail->xa_ail.next->li_lsn
> 		- mp->m_log->l_tail_lsn
> 	  which indicated the push target was way ahead the
> 	  tail of the log, so AIL pushing was obviously not
> 	  happening otherwise we'd be making progress.
> 	- added breakpoint on xfsaild_push() and continued
> 	- xfsaild_push() bp triggered, looked at *last_lsn
> 	  and found it way behind the tail of the log (like
> 	  3 cycle behind), which meant that would return
> 	  NULL instead of the first object and AIL pushing
> 	  would abort. Confirmed with single stepping.
> 
> Cheers,
> 
> Dave.
> XFS: correctly select first log item to push
> 
> Under heavy metadata load we are seeing log hangs. The
> AIL has items in it ready to be pushed, and they are within
> the push target window. However, we are not pushing them
> when the last pushed LSN is less than the LSN of the
> first log item on the AIL. This is a regression introduced
> by the AIL push cursor modifications.
> ---
>  fs/xfs/xfs_trans_ail.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 67ee466..2d47f10 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -228,7 +228,7 @@ xfs_trans_ail_cursor_first(
>  
>  	list_for_each_entry(lip, &ailp->xa_ail, li_ail) {
>  		if (XFS_LSN_CMP(lip->li_lsn, lsn) >= 0)
> -			break;
> +			goto out;
>  	}
>  	lip = NULL;
>  out:

Yeah, the fix looks good. The previous code is pretty
obviously broken - a search which always returns NULL.

Which begs the question on the best way of testing this ail code.
I dunno - it would be nice for independent testing of data structures
but perhaps that is too ambitious.

OOC, so the call path for this code....
xfsaild -> xfsaild_push(ailp, &last_pushed_lsn)
           -> lip = xfs_trans_ail_cursor_first(ailp, cur, *last_lsn)
Initially, last_lsn = 0 in xfsaild
but it will be updated via last_pushed_lsn.
So it looks like things will work initially when lsn==0, because
xfs_trans_ail_cursor_first special cases that and uses the min.
But as soon as the lsn is set to non-zero,
xfs_trans_ail_cursor_first will return NULL,
and xfsaild_push will return early.

--Tim

  reply	other threads:[~2008-10-27  4:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-23  9:17 deadlock with latest xfs Lachlan McIlroy
2008-10-23 20:57 ` Christoph Hellwig
2008-10-23 22:28   ` Dave Chinner
2008-10-24  3:08   ` Lachlan McIlroy
2008-10-24  5:24     ` Dave Chinner
2008-10-24  6:48       ` Dave Chinner
2008-10-26  0:53         ` Dave Chinner
2008-10-26  2:50           ` Dave Chinner
2008-10-26  4:20             ` Dave Chinner
2008-10-27  1:42             ` Lachlan McIlroy
2008-10-27  5:30               ` Dave Chinner
2008-10-27  6:29                 ` Lachlan McIlroy
2008-10-27  6:54                   ` Dave Chinner
2008-10-27  7:31                     ` Lachlan McIlroy
     [not found]             ` <200810281702.17135.nickpiggin@yahoo.com.au>
2008-10-28  6:25               ` Dave Chinner
2008-10-24  8:46       ` Lachlan McIlroy
2008-10-26 22:39     ` Dave Chinner
2008-10-27  2:30       ` Timothy Shimmin [this message]
2008-10-27  5:47         ` Dave Chinner
2008-10-27  7:33       ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=490527E2.5000600@sgi.com \
    --to=tes@sgi.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=lachlan@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox