Re: xfs_clear_incompat_log_features considered harmful?

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>, xfs <linux-xfs@vger.kernel.org>
Subject: Re: xfs_clear_incompat_log_features considered harmful?
Date: Mon, 5 Feb 2024 21:30:11 -0800	[thread overview]
Message-ID: <20240206053011.GB6226@frogsfrogsfrogs> (raw)
In-Reply-To: <ZcA1Q5gvboA/uFCC@dread.disaster.area>

On Mon, Feb 05, 2024 at 12:09:23PM +1100, Dave Chinner wrote:
> On Wed, Jan 31, 2024 at 03:00:43PM -0800, Darrick J. Wong wrote:
> > Hi everyone,
> > 
> > Christoph spied the xfs_swapext_can_use_without_log_assistance
> > function[0] in the atomic file updates patchset[1] and wondered why we
> > go through this inverted-bitmask dance to avoid setting the
> > XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT feature.
> > 
> > (The same principles apply to xfs_attri_can_use_without_log_assistance
> > from the since-merged LARP series.)
> > 
> > The reason for this dance is that xfs_add_incompat_log_feature is an
> > expensive operation -- it forces the log, pushes the AIL, and then if
> > nobody's beaten us to it, sets the feature bit and issues a synchronous
> > write of the primary superblock.  That could be a one-time cost
> > amortized over the life of the filesystem, but the log quiesce and cover
> > operations call xfs_clear_incompat_log_features to remove feature bits
> > opportunistically.  On a moderately loaded filesystem this leads to us
> > cycling those bits on and off over and over, which hurts performance.
> > 
> > Why do we clear the log incompat bits?  Back in ~2020 I think Dave and I
> > had a conversation on IRC[2] about what the log incompat bits represent.
> > IIRC in that conversation we decided that the log incompat bits protect
> > unrecovered log items so that old kernels won't try to recover them and
> > barf.  Since a clean log has no protected log items, we could clear the
> > bits at cover/quiesce time.  At the time, I think we decided to go with
> > this idea because we didn't like the idea of reducing the span of
> > kernels that can recover a filesystem over the lifetime of that
> > filesystem.
> 
> I don't think that was the issue - it was a practical concern that
> having unnecessary log incompat fields set would prevent the common
> practice of provisioning VMs with newer kernels than the host is
> running.

aha, thanks for refreshing my memory.

> The issue arises if the host tries to mount the guest VM image to
> configure the clone of a golden image prior to first start. If there
> are log incompat fields set in the golden image that was generated
> by a newer kernel/OS image builder then the provisioning
> host cannot mount the filesystem even though the log is clean and
> recovery is unnecessary to mount the filesystem.
> 
> Hence on unmount we really want the journal contents based log
> incompat bits cleared because there is nothing incompatible in the
> log and so there is no reason to prevent older kernels from
> mounting the filesytsem.

<nod> Will it be a problem if we crash with a log incompat bit set but
none of the actual log items that it protects in the ondisk log?

> > [ISTR Eric pointing out at some point that adding incompat feature bits
> > at runtime could confuse users who crash and try to recover with Ye Olde
> > Knoppix CD because there's now a log incompat bit set that wasn't there
> > at format time, but my memory is a bit hazy.]
> > 
> > Christoph wondered why I don't just set the log incompat bits at mkfs
> > time, especially for filesystems that have some newer feature set (e.g.
> > parent pointers, metadir, rtgroups...) to avoid the runtime cost of
> > adding the feature flag.  I don't bother with that because of the log
> > clearing behavior.  He also noted that _can_use_without_log_assistance
> > is potentially dangerous if distro vendors backport features to old
> > kernels in a different order than they land in upstream.
> 
> This is what incompat feature bits are for, not log incompat bits.
> We don't need log incompat bits for pp, metaddir, rtgroups, etc
> because older kernels won't even get to log recovery as they see
> incompat feature bits set when they first read the superblock.
> 
> IOWs, a log incompat flag should only be necessary to prevent log
> recovery on older kernels if we change how something is logged
> without otherwise changing the on disk format of those items (e.g.
> LARP). There are no incompat feature bits that are set in these
> cases, so we need a log incompat bit to be set whilst the journal
> has those items in it....

Ok.

> > Another problem with this scheme is the l_incompat_users rwsem that we
> > use to protect a log cleaning operation from clearing a feature bit that
> > a frontend thread is trying to set -- this lock adds another way to fail
> > w.r.t. locking.  For the swapext series, I shard that into multiple
> > locks just to work around the lockdep complaints, and that's fugly.
> 
> We can avoid that simply by not clearing the incompat bit at cover
> time, and instead do it only at unmount. Then it only gets set once
> per mount, and only gets cleared when we are running single threaded
> on unmount. No need for locking at this point holding the superblock
> buffer locked will serialise feature bit addition....

Ok.  I can live with that -- clearing the log-incompat bits on a clean
unmount, and leaving them set at cleaning time.  I think this means
l_incompat_users can go away too.

> > Going forward, I'd make mkfs set the log incompat features during a
> > fresh format if any of the currently-undefined feature bits are set,
> > which means that they'll be enabled by default on any filesystem with
> > directory parent pointers and/or metadata directories.  I'd also add
> > mkfs -l options so that sysadmins can turn it on at format time.
> 
> At this point they are just normal incompat feature bits. Why not
> just use those instead? i.e. Why do we need log incompat bits to be
> permanently sticky when we've already got incompat feature bits for
> that?

Without the "clear log incompat on clean" behavior, I don't think we
need anything I wrote about in that paragraph anymore either.

So, if I'm reading you both correctly, I'll:

1. change the log incompat handling code to clear them on umount only;
2. add a mount option to allow admins to permit setting of log incompat
   flags;
3. leave the xfs_{swapext,attri}_can_use_without_log_assistance
   functions as they are in djwong-wtf so that new incompat filesystem
   features will eliminate the need for setting/clearing the log
   incompat flags

Did I get that right?

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

next prev parent reply	other threads:[~2024-02-06  5:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 23:00 xfs_clear_incompat_log_features considered harmful? Darrick J. Wong
2024-02-01  4:33 ` Christoph Hellwig
2024-02-05  1:09 ` Dave Chinner
2024-02-05  6:45   ` Christoph Hellwig
2024-02-06  5:23     ` Darrick J. Wong
2024-02-09  6:33       ` Christoph Hellwig
2024-02-06  5:30   ` Darrick J. Wong [this message]
2024-02-07  0:14     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240206053011.GB6226@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox