From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 01/12] xfs: free COW staging extents when freezing filesystem
Date: Thu, 17 Jan 2019 12:20:39 -0800 [thread overview]
Message-ID: <20190117202039.GH4424@magnolia> (raw)
In-Reply-To: <20190117181406.GF37591@bfoster>
On Thu, Jan 17, 2019 at 01:14:06PM -0500, Brian Foster wrote:
> On Thu, Jan 17, 2019 at 09:24:28AM -0800, Darrick J. Wong wrote:
> > On Fri, Jan 11, 2019 at 11:28:38AM -0500, Brian Foster wrote:
> > > On Mon, Dec 31, 2018 at 06:16:57PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > >
> > > > When we're freezing the filesystem, free all the COW staging extents
> > > > before we shut the log down so that we can minimize the amount of
> > > > recovery work that will be necessary during the next mount.
> > > >
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > > fs/xfs/xfs_icache.c | 17 ++++++++++++-----
> > > > fs/xfs/xfs_super.c | 18 ++++++++++++++++++
> > > > 2 files changed, 30 insertions(+), 5 deletions(-)
> > > >
> > > >
> > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > > index 245483cc282b..36d986087abb 100644
> > > > --- a/fs/xfs/xfs_icache.c
> > > > +++ b/fs/xfs/xfs_icache.c
> > > > @@ -1741,6 +1741,7 @@ xfs_inode_free_cowblocks(
> > > > void *args)
> > > > {
> > > > struct xfs_eofblocks *eofb = args;
> > > > + uint lock_mode = 0;
> > > > int match;
> > > > int ret = 0;
> > > >
> > > > @@ -1761,9 +1762,15 @@ xfs_inode_free_cowblocks(
> > > > return 0;
> > > > }
> > > >
> > > > - /* Free the CoW blocks */
> > > > - xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > > > - xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> > > > + /*
> > > > + * Free the CoW blocks. We don't need to lock the inode if we're in
> > > > + * the process of freezing the filesystem because we've already locked
> > > > + * out writes and page faults.
> > > > + */
> > > > + if (ip->i_mount->m_super->s_writers.frozen == SB_UNFROZEN)
> > > > + lock_mode = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> > >
> > > Ok, but is there a problem with actually locking in this context? If so,
> > > I'd expect the comment to say something like: "Free the COW blocks.
> > > Don't lock the inode if we're in the process of freezing the filesystem
> > > because <some bad thing happens>. This is safe because we've already
> > > locked out writes and page faults."
> >
> > Hmmm, it's now been so long since I wrote this patch I don't remember
> > why I did this. Wait something's coming back...
> >
> > > > + if (lock_mode)
> > > > + xfs_ilock(ip, lock_mode);
> > > >
> > > > /*
> > > > * Check again, nobody else should be able to dirty blocks or change
> > > > @@ -1772,8 +1779,8 @@ xfs_inode_free_cowblocks(
> > > > if (xfs_prep_free_cowblocks(ip))
> > > > ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> > > >
> > > > - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > > > - xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > > > + if (lock_mode)
> > > > + xfs_iunlock(ip, lock_mode);
> > > >
> > > > return ret;
> > > > }
> > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > index 6903b36eac5d..bb4953cfd683 100644
> > > > --- a/fs/xfs/xfs_super.c
> > > > +++ b/fs/xfs/xfs_super.c
> > > > @@ -1102,6 +1102,24 @@ xfs_fs_sync_fs(
> > > > if (!wait)
> > > > return 0;
> > > >
> > > > + /*
> > > > + * If we're in the middle of freezing the filesystem, this is our last
> > > > + * chance to run regular transactions. Clear all the COW staging
> > > > + * extents so that we can freeze the filesystem with as little recovery
> > > > + * work to do at the next mount as possible. It's safe to do this
> > > > + * without locking inodes because the freezer code flushed all the
> > > > + * dirty data from the page cache and locked out writes and page faults.
> > > > + */
> > >
> > > Well, we have XFS_TRANS_NO_WRITECOUNT. It seems we could use an eofb
> > > freeze flag or some such to notify the scan that we're in a special
> > > freeze context and in turn use that to tweak the locking (instead of the
> > > SB_FREEZE_* checks) and avoid blocking on transaction allocation. Would
> > > that allow us to run this in xfs_fs_freeze() context?
> >
> > ...oh, right. Another thread could have taken the io/mmap locks (say
> > for fallocate) before the fs entered SB_FROZEN, and is now blocked
> > in xfs_trans_alloc waiting for the fs to unfreeze. I can't remember the
> > exact circumstances though.
> >
>
> Ok, so we'd basically deadlock on the freeze IIUC. That's exactly what
> we'd want to document in the comment. :P
>
> > > I'm also a little curious about the high-level tradeoff we're making.
> > > This patch means that a freeze, and thus something like an LVM snapshot,
> > > would clear/reset all COW blocks in the fs, right? If so, ISTM that
> > > could be annoying if the user had a bunch of reflinked vdisk images or
> > > something on the fs.
> >
> > Yeah, it's sort of painful to drop all those speculative preallocations.
> >
> > > Is the tradeoff just that if the user freezes and then doesn't unfreeze
> > > before the next mount that log recovery will have to deal with COW
> > > blocks, or are there considerations for a typical freeze/unfreeze cycle
> > > as well? Is this more expensive at recovery time than at freeze time?
> >
> > It's entirely to reduce the recovery work necessary after freezing the
> > fs, snapshotting the fs (via lvm or whatever) and then mounting the
> > snapshot separately. TBH this patch has languished for years now
> > because it's only necessary to optimize this one usage.
> >
>
> So the snapshot will run log recovery and end up freeing all of these
> extents "the hard way," right?
Yes. I suspect it's faster to let log recovery clean it up because it
doesn't have to walk the cowblocks inodes to free cow extents piece by
piece.
> > Maybe it's easier to drop this one, unless anyone feels particularly
> > passionate about clearing out cow staging extents before a freeze?
> >
>
> Not really.
>
> On principle, it seems more appropriate to me to put the work onto the
> snapshot rather than disrupt the primary fs. I could see doing something
> like this without much thought if the blocks were already freed or
> something and the change was thus invisible to a user, but ISTM that
> losing the preallocation can have a significant affect on the primary
> workload.
Agreed.
> Then again, this doesn't affect me much and I'm not terribly familiar
> with the use case or whether there have been complaints or anything. I
> don't feel strongly about it so long as the tradeoffs are documented
> somewhere so it's clear what needs fixing if this behavior became
> problematic down the road.
Welll... if nobody else yells I'm happy to set this patch aside and let
the code be preserved in mailing list lore for a while.
--D
> Brian
>
> > --D
> >
> > > Brian
> > >
> > > > + if (sb->s_writers.frozen == SB_FREEZE_PAGEFAULT) {
> > > > + int error;
> > > > +
> > > > + error = xfs_icache_free_cowblocks(mp, NULL);
> > > > + if (error) {
> > > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > > + return error;
> > > > + }
> > > > + }
> > > > +
> > > > xfs_log_force(mp, XFS_LOG_SYNC);
> > > > if (laptop_mode) {
> > > > /*
> > > >
next prev parent reply other threads:[~2019-01-17 20:20 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-01 2:16 [PATCH 00/12] xfs: deferred inode inactivation Darrick J. Wong
2019-01-01 2:16 ` [PATCH 01/12] xfs: free COW staging extents when freezing filesystem Darrick J. Wong
2019-01-11 16:28 ` Brian Foster
2019-01-17 17:24 ` Darrick J. Wong
2019-01-17 18:14 ` Brian Foster
2019-01-17 20:20 ` Darrick J. Wong [this message]
2019-01-01 2:17 ` [PATCH 02/12] xfs: refactor the predicate part of xfs_free_eofblocks Darrick J. Wong
2019-01-11 19:05 ` Brian Foster
2019-01-17 17:33 ` Darrick J. Wong
2019-01-01 2:17 ` [PATCH 03/12] xfs: decide if inode needs inactivation Darrick J. Wong
2019-01-01 2:17 ` [PATCH 04/12] xfs: track unlinked inactive inode fs summary counters Darrick J. Wong
2019-01-01 2:17 ` [PATCH 05/12] xfs: track unlinked inactive inode quota counters Darrick J. Wong
2019-01-01 2:17 ` [PATCH 06/12] xfs: refactor walking of per-AG RECLAIM inodes Darrick J. Wong
2019-01-11 19:06 ` Brian Foster
2019-01-17 17:43 ` Darrick J. Wong
2019-01-01 2:17 ` [PATCH 07/12] xfs: refactor eofblocks inode match code Darrick J. Wong
2019-01-02 9:50 ` Nikolay Borisov
2019-01-17 18:05 ` Darrick J. Wong
2019-01-01 2:17 ` [PATCH 08/12] xfs: deferred inode inactivation Darrick J. Wong
2019-01-01 2:17 ` [PATCH 09/12] xfs: retry fs writes when there isn't space Darrick J. Wong
2019-01-01 2:17 ` [PATCH 10/12] xfs: force inactivation before fallocate when space is low Darrick J. Wong
2019-01-01 2:17 ` [PATCH 11/12] xfs: parallelize inode inactivation Darrick J. Wong
2019-01-01 2:18 ` [PATCH 12/12] xfs: wait for deferred inactivation when destroying unlinked inodes Darrick J. Wong
2019-01-03 12:46 ` Dave Chinner
2019-01-17 18:41 ` Darrick J. Wong
2019-01-17 22:21 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190117202039.GH4424@magnolia \
--to=darrick.wong@oracle.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).