Re: [PATCH 2/2] xfs: run blockgc on freeze to avoid iget stalls after reclaim

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Brian Foster <bfoster@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: run blockgc on freeze to avoid iget stalls after reclaim
Date: Sun, 16 Jan 2022 09:40:30 +1100	[thread overview]
Message-ID: <20220115224030.GA59729@dread.disaster.area> (raw)
In-Reply-To: <20220114213043.GB90423@magnolia>

On Fri, Jan 14, 2022 at 01:30:43PM -0800, Darrick J. Wong wrote:
> On Fri, Jan 14, 2022 at 02:45:10PM -0500, Brian Foster wrote:
> > On Fri, Jan 14, 2022 at 09:35:35AM -0800, Darrick J. Wong wrote:
> > > On Fri, Jan 14, 2022 at 09:38:10AM +1100, Dave Chinner wrote:
> > > > On Thu, Jan 13, 2022 at 08:37:01AM -0500, Brian Foster wrote:
> > > > >  STATIC void
> > > > >  xfs_fs_destroy_inode(
> > > > > @@ -764,6 +769,16 @@ xfs_fs_sync_fs(
> > > > >  	 * when the state is either SB_FREEZE_FS or SB_FREEZE_COMPLETE.
> > > > >  	 */
> > > > >  	if (sb->s_writers.frozen == SB_FREEZE_PAGEFAULT) {
> > > > > +		struct xfs_icwalk	icw = {0};
> > > > > +
> > > > > +		/*
> > > > > +		 * Clear out eofb and cowblocks inodes so eviction while frozen
> > > > > +		 * doesn't leave them sitting in the inactivation queue where
> > > > > +		 * they cannot be processed.
> > > > > +		 */
> > > > > +		icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
> > > > > +		xfs_blockgc_free_space(mp, &icw);
> > > > 
> > > > Is a SYNC walk safe to run here? I know we run
> > > > xfs_blockgc_free_space() from XFS_IOC_FREE_EOFBLOCKS under
> > > > SB_FREEZE_WRITE protection, but here we have both frozen writes and
> > > > page faults we're running in a much more constrained freeze context
> > > > here.
> > > > 
> > > > i.e. the SYNC walk will keep busy looping if it can't get the
> > > > IOLOCK_EXCL on an inode that is in cache, so if we end up with an
> > > > inode locked and blocked on SB_FREEZE_WRITE or SB_FREEZE_PAGEFAULT
> > > > for whatever reason this will never return....
> > > 
> > > Are you referring to the case where one could be read()ing from a file
> > > into a buffer that's really a mmap'd page from another file while the
> > > underlying fs is being frozen?
> > > 
> > 
> > I thought this was generally safe as freeze protection sits outside of
> > the locks, but I'm not terribly sure about the read to a mapped buffer
> > case. If that allows an iolock holder to block on a pagefault due to
> > freeze, then SB_FREEZE_PAGEFAULT might be too late for a synchronous
> > scan (i.e. assuming nothing blocks this earlier or prefaults/pins the
> > target buffer)..?
> 
> I think so.  xfs_file_buffered_read takes IOLOCK_SHARED and calls
> filemap_read, which calls copy_page_to_iter.  I /think/ the messy iovec
> code calls copyout, which can then hit a write page fault, which takes
> us to __xfs_filemap_fault.  That takes SB_PAGEFAULT, which is the
> opposite order of what now goes on during a freeze.

Yeah, so this was the sort of thing I was concerned about. I
couldn't put my finger on an actual case, but the general locking
situation felt wrong which is why I asked the question.

That is, normal case for a -write- operation is:

	Prevent write freeze (shared lock!)
	take IO lock (shared or exclusive)
	<page fault>
	Prevent fault freeze (shared)
	take MMAP lock
	do allocation
	Prevent internal freeze (shared)
	run transaction

Which means the stack cannot happen unless a freeze is completely
locked out at the first level and it has to wait for the write
operation to complete before proceeding. So for modification
operations, doing:

	lock out write freeze (excl lock)
	lock out fault freeze (excl lock)
	take IOLOCK excl

would be safe.

However, in the case of a read operation, we don't have that inital
FREEZE_WRITE protection, so the first level an operation might hit
is the page fault freeze protection. Hence we now get inversion
on the write freeze/IOLOCK/fault freeze because we have IOLOCK/fault
freeze without any write freeze protection so this can happen:

task 1				freeze
				freeze write
				.....
				freeze page fault
read()
IOLOCK_SHARED
<page fault>
sb_start_pagefault()
<blocks on fault freeze>
				xfs_blockgc_free_space(SYNC)
				try IOLOCK_EXCL
				<busy loops on failed IOLOCK_EXCL>

So the sync walk here doesn't seem safe. A non-blocking, best-effort
walk would be fine, but busy-looping on inodes we can't lock looks
to be a deadlock vector.

ISTR that we recently wne through a similar readdir vs page fault
locking discussion, so it's not just read() file IO that could
trigger page fault freeze first with an unprotected IOLOCK already
held.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2022-01-15 22:40 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-13 13:36 [PATCH 0/2] xfs: a couple misc/small deferred inactivation tweaks Brian Foster
2022-01-13 13:37 ` [PATCH 1/2] xfs: flush inodegc workqueue tasks before cancel Brian Foster
2022-01-13 18:35   ` Darrick J. Wong
2022-01-13 22:19   ` Dave Chinner
2022-01-13 13:37 ` [PATCH 2/2] xfs: run blockgc on freeze to avoid iget stalls after reclaim Brian Foster
2022-01-13 17:13   ` Darrick J. Wong
2022-01-13 19:58     ` Brian Foster
2022-01-13 20:43       ` Darrick J. Wong
2022-01-13 21:01         ` Darrick J. Wong
2022-01-13 22:38   ` Dave Chinner
2022-01-14 17:35     ` Darrick J. Wong
2022-01-14 19:45       ` Brian Foster
2022-01-14 21:30         ` Darrick J. Wong
2022-01-15  4:09           ` Darrick J. Wong
2022-01-15 22:40           ` Dave Chinner [this message]
2022-01-17 13:37           ` Brian Foster
2022-01-18 18:56             ` Darrick J. Wong
2022-01-19 20:07               ` Brian Foster
2022-01-20  0:36                 ` Darrick J. Wong
2022-01-20  5:18                   ` Dave Chinner
2022-01-24 16:57                   ` Brian Foster
2022-02-02  2:22                     ` Dave Chinner
2022-02-10 19:03                       ` Brian Foster
2022-02-10 23:08                         ` Dave Chinner
2022-02-15  1:54                           ` Darrick J. Wong
2022-02-15  9:26                             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220115224030.GA59729@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.