Re: xfs ioend batching log reservation deadlock

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	linux-xfs@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: xfs ioend batching log reservation deadlock
Date: Wed, 31 Mar 2021 10:57:22 +1100	[thread overview]
Message-ID: <20210330235722.GX63242@dread.disaster.area> (raw)
In-Reply-To: <YGM46ABWNVO5FKld@bfoster>

On Tue, Mar 30, 2021 at 10:42:48AM -0400, Brian Foster wrote:
> On Tue, Mar 30, 2021 at 10:51:42AM +1100, Dave Chinner wrote:
> > On Mon, Mar 29, 2021 at 02:04:25PM -0400, Brian Foster wrote:
> > > On Mon, Mar 29, 2021 at 01:28:26PM +1100, Dave Chinner wrote:
> ...
> > 
> > > @@ -182,12 +185,10 @@ xfs_end_ioend(
> > >  		error = xfs_reflink_end_cow(ip, offset, size);
> > >  	else if (ioend->io_type == IOMAP_UNWRITTEN)
> > >  		error = xfs_iomap_write_unwritten(ip, offset, size, false);
> > > -	else
> > > -		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
> > >  
> > >  done:
> > > -	if (ioend->io_private)
> > > -		error = xfs_setfilesize_ioend(ioend, error);
> > > +	if (ioend->io_flags & IOMAP_F_APPEND)
> > > +		error = xfs_setfilesize(ip, offset, size);
> > >  	iomap_finish_ioends(ioend, error);
> > >  	memalloc_nofs_restore(nofs_flag);
> > >  }
> > > @@ -221,16 +222,28 @@ xfs_end_io(
> > >  	struct iomap_ioend	*ioend;
> > >  	struct list_head	tmp;
> > >  	unsigned long		flags;
> > > +	xfs_off_t		maxendoff;
> > >  
> > >  	spin_lock_irqsave(&ip->i_ioend_lock, flags);
> > >  	list_replace_init(&ip->i_ioend_list, &tmp);
> > >  	spin_unlock_irqrestore(&ip->i_ioend_lock, flags);
> > >  
> > >  	iomap_sort_ioends(&tmp);
> > > +
> > > +	/* XXX: track max endoff manually? */
> > > +	ioend = list_last_entry(&tmp, struct iomap_ioend, io_list);
> > > +	if (((ioend->io_flags & IOMAP_F_SHARED) ||
> > > +	     (ioend->io_type != IOMAP_UNWRITTEN)) &&
> > > +	    xfs_ioend_is_append(ioend)) {
> > > +		ioend->io_flags |= IOMAP_F_APPEND;
> > > +		maxendoff = ioend->io_offset + ioend->io_size;
> > > +	}
> > > +
> > >  	while ((ioend = list_first_entry_or_null(&tmp, struct iomap_ioend,
> > >  			io_list))) {
> > >  		list_del_init(&ioend->io_list);
> > >  		iomap_ioend_try_merge(ioend, &tmp, xfs_ioend_merge_private);
> > > +		ASSERT(ioend->io_offset + ioend->io_size <= maxendoff);
> > >  		xfs_end_ioend(ioend);
> > >  	}
> > >  }
> > 
> > So now when I run a workload that is untarring a large tarball full
> > of small files, we have as many transaction reserve operations
> > runnning concurrently as there are IO completions queued.
> > 
> 
> This patch has pretty much no effect on a typical untar workload because
> it is dominated by delalloc -> unwritten extent conversions that already
> require completion time transaction reservations. Indeed, a quick test
> to untar a kernel source tree produces no setfilesize events at all.

Great, so it's not the obvious case because of the previous
delalloc->unwritten change. What you need to do now is find
out what workload it is that is generating so many setfilesize
completions despite delalloc->unwritten so we can understand what
workloads this will actually impact.

> I'm not sure we have many situations upstream where append transactions
> are used outside of perhaps cow completions (which already have a
> completion time transaction allocation for fork remaps) or intra-block
> file extending writes (that thus produce an inode size change within a
> mapped, already converted block). Otherwise a truncate down should
> always remove post-eof blocks and speculative prealloc originates from
> delalloc, so afaict those should follow the same general sequence. Eh?
> 
> As it is, I think the performance concern is overstated but I'm happy to
> run any tests to confirm or deny that if you want to make more concrete
> suggestions.

As I said: I'm happy to change the code as long as we understand
what workloads it will impact and by how much. We don't know what
workload is generating so many setfilesize transactions yet, so we
can't actually make educated guesses on the wider impact that this
change will have. We also don't have numbers from typical workloads,
just one data point, so nobody actually knows what the impact is.

> This patch is easy to test and pretty much survived an
> overnight regression run (outside of one or two things I have to look
> into..). I'm happy to adjust the approach from there, but I also think
> if it proves necessary there are fallback options (based around the
> original suggestion in my first mail) to preserve current submission
> time (serial) append transaction reservation that don't require to
> categorize/split or partially process the pending ioend list.

IO path behaviour changes require more than functional regression
tests. There is an amount of documented performance regression
testing that is required, too. The argument being made here is that
"this won't affect performance", so all I'm asking for is to be
provided with the evidence that shows this assertion to be true.

This is part of the reason I suggested breaking this up into
separate bug fix and removal patches - a pure bug fix doesn't need
performance regression testing to be done. Further, having the bug
fix separate to changing the behaviour of the code mitigates the
risk of finding an unexpected performance regression from changing
behaviour. Combining the bug fix with a significant change of
behaviour doesn't provide us with a simple method of addressing such
a regression...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2021-03-30 23:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26 15:39 xfs ioend batching log reservation deadlock Brian Foster
2021-03-26 17:32 ` Darrick J. Wong
2021-03-26 18:13   ` Brian Foster
2021-03-29  2:28   ` Dave Chinner
2021-03-29 18:04     ` Brian Foster
2021-03-29 23:51       ` Dave Chinner
2021-03-30 14:42         ` Brian Foster
2021-03-30 23:57           ` Dave Chinner [this message]
2021-03-31 12:26             ` Brian Foster
2021-03-29  5:45 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210330235722.GX63242@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox