From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>, xfs <linux-xfs@vger.kernel.org>
Subject: Re: xfs_clear_incompat_log_features considered harmful?
Date: Mon, 5 Feb 2024 21:30:11 -0800 [thread overview]
Message-ID: <20240206053011.GB6226@frogsfrogsfrogs> (raw)
In-Reply-To: <ZcA1Q5gvboA/uFCC@dread.disaster.area>
On Mon, Feb 05, 2024 at 12:09:23PM +1100, Dave Chinner wrote:
> On Wed, Jan 31, 2024 at 03:00:43PM -0800, Darrick J. Wong wrote:
> > Hi everyone,
> >
> > Christoph spied the xfs_swapext_can_use_without_log_assistance
> > function[0] in the atomic file updates patchset[1] and wondered why we
> > go through this inverted-bitmask dance to avoid setting the
> > XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT feature.
> >
> > (The same principles apply to xfs_attri_can_use_without_log_assistance
> > from the since-merged LARP series.)
> >
> > The reason for this dance is that xfs_add_incompat_log_feature is an
> > expensive operation -- it forces the log, pushes the AIL, and then if
> > nobody's beaten us to it, sets the feature bit and issues a synchronous
> > write of the primary superblock. That could be a one-time cost
> > amortized over the life of the filesystem, but the log quiesce and cover
> > operations call xfs_clear_incompat_log_features to remove feature bits
> > opportunistically. On a moderately loaded filesystem this leads to us
> > cycling those bits on and off over and over, which hurts performance.
> >
> > Why do we clear the log incompat bits? Back in ~2020 I think Dave and I
> > had a conversation on IRC[2] about what the log incompat bits represent.
> > IIRC in that conversation we decided that the log incompat bits protect
> > unrecovered log items so that old kernels won't try to recover them and
> > barf. Since a clean log has no protected log items, we could clear the
> > bits at cover/quiesce time. At the time, I think we decided to go with
> > this idea because we didn't like the idea of reducing the span of
> > kernels that can recover a filesystem over the lifetime of that
> > filesystem.
>
> I don't think that was the issue - it was a practical concern that
> having unnecessary log incompat fields set would prevent the common
> practice of provisioning VMs with newer kernels than the host is
> running.
aha, thanks for refreshing my memory.
> The issue arises if the host tries to mount the guest VM image to
> configure the clone of a golden image prior to first start. If there
> are log incompat fields set in the golden image that was generated
> by a newer kernel/OS image builder then the provisioning
> host cannot mount the filesystem even though the log is clean and
> recovery is unnecessary to mount the filesystem.
>
> Hence on unmount we really want the journal contents based log
> incompat bits cleared because there is nothing incompatible in the
> log and so there is no reason to prevent older kernels from
> mounting the filesytsem.
<nod> Will it be a problem if we crash with a log incompat bit set but
none of the actual log items that it protects in the ondisk log?
> > [ISTR Eric pointing out at some point that adding incompat feature bits
> > at runtime could confuse users who crash and try to recover with Ye Olde
> > Knoppix CD because there's now a log incompat bit set that wasn't there
> > at format time, but my memory is a bit hazy.]
> >
> > Christoph wondered why I don't just set the log incompat bits at mkfs
> > time, especially for filesystems that have some newer feature set (e.g.
> > parent pointers, metadir, rtgroups...) to avoid the runtime cost of
> > adding the feature flag. I don't bother with that because of the log
> > clearing behavior. He also noted that _can_use_without_log_assistance
> > is potentially dangerous if distro vendors backport features to old
> > kernels in a different order than they land in upstream.
>
> This is what incompat feature bits are for, not log incompat bits.
> We don't need log incompat bits for pp, metaddir, rtgroups, etc
> because older kernels won't even get to log recovery as they see
> incompat feature bits set when they first read the superblock.
>
> IOWs, a log incompat flag should only be necessary to prevent log
> recovery on older kernels if we change how something is logged
> without otherwise changing the on disk format of those items (e.g.
> LARP). There are no incompat feature bits that are set in these
> cases, so we need a log incompat bit to be set whilst the journal
> has those items in it....
Ok.
> > Another problem with this scheme is the l_incompat_users rwsem that we
> > use to protect a log cleaning operation from clearing a feature bit that
> > a frontend thread is trying to set -- this lock adds another way to fail
> > w.r.t. locking. For the swapext series, I shard that into multiple
> > locks just to work around the lockdep complaints, and that's fugly.
>
> We can avoid that simply by not clearing the incompat bit at cover
> time, and instead do it only at unmount. Then it only gets set once
> per mount, and only gets cleared when we are running single threaded
> on unmount. No need for locking at this point holding the superblock
> buffer locked will serialise feature bit addition....
Ok. I can live with that -- clearing the log-incompat bits on a clean
unmount, and leaving them set at cleaning time. I think this means
l_incompat_users can go away too.
> > Going forward, I'd make mkfs set the log incompat features during a
> > fresh format if any of the currently-undefined feature bits are set,
> > which means that they'll be enabled by default on any filesystem with
> > directory parent pointers and/or metadata directories. I'd also add
> > mkfs -l options so that sysadmins can turn it on at format time.
>
> At this point they are just normal incompat feature bits. Why not
> just use those instead? i.e. Why do we need log incompat bits to be
> permanently sticky when we've already got incompat feature bits for
> that?
Without the "clear log incompat on clean" behavior, I don't think we
need anything I wrote about in that paragraph anymore either.
So, if I'm reading you both correctly, I'll:
1. change the log incompat handling code to clear them on umount only;
2. add a mount option to allow admins to permit setting of log incompat
flags;
3. leave the xfs_{swapext,attri}_can_use_without_log_assistance
functions as they are in djwong-wtf so that new incompat filesystem
features will eliminate the need for setting/clearing the log
incompat flags
Did I get that right?
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2024-02-06 5:30 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 23:00 xfs_clear_incompat_log_features considered harmful? Darrick J. Wong
2024-02-01 4:33 ` Christoph Hellwig
2024-02-05 1:09 ` Dave Chinner
2024-02-05 6:45 ` Christoph Hellwig
2024-02-06 5:23 ` Darrick J. Wong
2024-02-09 6:33 ` Christoph Hellwig
2024-02-06 5:30 ` Darrick J. Wong [this message]
2024-02-07 0:14 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240206053011.GB6226@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox