All of lore.kernel.org
 help / color / mirror / Atom feed
From: Timothy Shimmin <tes@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Lachlan McIlroy <lachlan@sgi.com>,
	Christoph Hellwig <hch@infradead.org>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: deadlock with latest xfs
Date: Mon, 27 Oct 2008 13:30:58 +1100	[thread overview]
Message-ID: <490527E2.5000600@sgi.com> (raw)
In-Reply-To: <20081026223940.GN18495@disturbed>

Dave Chinner wrote:
> Ok, I think I've found the regression - it's introduced by the AIL
> cursor modifications. The patch below has been running for 15
> minutes now on my UML box that would have hung in a couple of
> minutes otherwise.
> 
> FYI, the way I found this was:
> 
> 	- put a breakpoint on xfs_create() once the fs hung
> 	- `touch /mnt/xfs2/fred` to trigger the break point.
> 	- look at:
> 		- mp->m_ail->xa_target
> 		- mp->m_ail->xa_ail.next->li_lsn
> 		- mp->m_log->l_tail_lsn
> 	  which indicated the push target was way ahead the
> 	  tail of the log, so AIL pushing was obviously not
> 	  happening otherwise we'd be making progress.
> 	- added breakpoint on xfsaild_push() and continued
> 	- xfsaild_push() bp triggered, looked at *last_lsn
> 	  and found it way behind the tail of the log (like
> 	  3 cycle behind), which meant that would return
> 	  NULL instead of the first object and AIL pushing
> 	  would abort. Confirmed with single stepping.
> 
> Cheers,
> 
> Dave.
> XFS: correctly select first log item to push
> 
> Under heavy metadata load we are seeing log hangs. The
> AIL has items in it ready to be pushed, and they are within
> the push target window. However, we are not pushing them
> when the last pushed LSN is less than the LSN of the
> first log item on the AIL. This is a regression introduced
> by the AIL push cursor modifications.
> ---
>  fs/xfs/xfs_trans_ail.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 67ee466..2d47f10 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -228,7 +228,7 @@ xfs_trans_ail_cursor_first(
>  
>  	list_for_each_entry(lip, &ailp->xa_ail, li_ail) {
>  		if (XFS_LSN_CMP(lip->li_lsn, lsn) >= 0)
> -			break;
> +			goto out;
>  	}
>  	lip = NULL;
>  out:

Yeah, the fix looks good. The previous code is pretty
obviously broken - a search which always returns NULL.

Which begs the question on the best way of testing this ail code.
I dunno - it would be nice for independent testing of data structures
but perhaps that is too ambitious.

OOC, so the call path for this code....
xfsaild -> xfsaild_push(ailp, &last_pushed_lsn)
           -> lip = xfs_trans_ail_cursor_first(ailp, cur, *last_lsn)
Initially, last_lsn = 0 in xfsaild
but it will be updated via last_pushed_lsn.
So it looks like things will work initially when lsn==0, because
xfs_trans_ail_cursor_first special cases that and uses the min.
But as soon as the lsn is set to non-zero,
xfs_trans_ail_cursor_first will return NULL,
and xfsaild_push will return early.

--Tim

  reply	other threads:[~2008-10-27  4:09 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-23  9:17 deadlock with latest xfs Lachlan McIlroy
2008-10-23 20:57 ` Christoph Hellwig
2008-10-23 22:28   ` Dave Chinner
2008-10-24  3:08   ` Lachlan McIlroy
2008-10-24  5:24     ` Dave Chinner
2008-10-24  6:48       ` Dave Chinner
2008-10-26  0:53         ` Dave Chinner
2008-10-26  2:50           ` Dave Chinner
2008-10-26  2:50             ` Dave Chinner
2008-10-26  4:20             ` Dave Chinner
2008-10-26  4:20               ` Dave Chinner
2008-10-27  1:42             ` Lachlan McIlroy
2008-10-27  1:42               ` Lachlan McIlroy
2008-10-27  5:30               ` Dave Chinner
2008-10-27  5:30                 ` Dave Chinner
2008-10-27  6:29                 ` Lachlan McIlroy
2008-10-27  6:29                   ` Lachlan McIlroy
2008-10-27  6:54                   ` Dave Chinner
2008-10-27  6:54                     ` Dave Chinner
2008-10-27  7:31                     ` Lachlan McIlroy
2008-10-27  7:31                       ` Lachlan McIlroy
2008-10-28  6:02             ` Nick Piggin
2008-10-28  6:25               ` Dave Chinner
2008-10-28  6:25                 ` Dave Chinner
2008-10-28  8:56                 ` Nick Piggin
2008-10-24  8:46       ` Lachlan McIlroy
2008-10-26 22:39     ` Dave Chinner
2008-10-27  2:30       ` Timothy Shimmin [this message]
2008-10-27  5:47         ` Dave Chinner
2008-10-27  7:33       ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=490527E2.5000600@sgi.com \
    --to=tes@sgi.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=lachlan@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.