Re: [PATCH v2 0/4] xfs: fix v5 AGFL wrapping

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2 0/4] xfs: fix v5 AGFL wrapping
Date: Wed, 21 Feb 2018 09:35:46 -0800	[thread overview]
Message-ID: <20180221173546.GL27629@magnolia> (raw)
In-Reply-To: <20180221133915.GA29048@bfoster.bfoster>

On Wed, Feb 21, 2018 at 08:39:16AM -0500, Brian Foster wrote:
> On Tue, Feb 20, 2018 at 09:08:30AM -0800, Darrick J. Wong wrote:
> > On Mon, Feb 19, 2018 at 08:54:57AM -0500, Brian Foster wrote:
> > > On Fri, Feb 16, 2018 at 09:57:01AM -0800, Darrick J. Wong wrote:
> > > > On Thu, Feb 15, 2018 at 08:16:30AM -0500, Brian Foster wrote:
> > > > > On Wed, Feb 14, 2018 at 06:05:48PM -0800, Darrick J. Wong wrote:
> > > > > > On Wed, Feb 14, 2018 at 02:56:51PM -0500, Brian Foster wrote:
> ...
> > > > > 
> > > > > Ok, so the intent is to "protect" those old kernels from a filesystem
> > > > > that has seen a "fixed" kernel so the _old_ kernel doesn't explode on a
> > > > > filesystem that has a wrapped agfl from a fixed kernel. That sounds
> > > > > reasonable.. I guess what concerns me is that the bit doesn't seem to be
> > > > > used in that manner. A user can always run a fixed kernel, wrap the
> > > > > agfl, then go back to the bad one and cry when it explodes.
> > > > > 
> > > > > IOW, if the goal is this kind of retroactive protection, I'd expect the
> > > > > bit to be enabled any time the filesystem is in a state that would cause
> > > > > problems on the old kernel. Technically, that could mean doing something
> > > > > like setting the bit any time an AGFL is in a wrapped state and clearing
> > > > > it when that state clears. Or perhaps setting it unconditionally on
> > > > > mount and leaving it permanently would be another way to satisfy the
> > > > > requirement, though certainly more heavy handed.
> > > > 
> > > > The previous version of this series did that, though Dave suggested I
> > > > change it to set the bit only if we actually touched an agfl.  I
> > > > probably should have pushed back harder, but it was only rfc at the
> > > > time.
> > > > 
> > > 
> > > It's not clear which of the above options you're referring to. FWIW, the
> > > former seems notably more usable to me than the latter.
> > 
> > Sorry about that.  The previous iteration would do:
> > 
> > if (!has_fixedagfl) {
> > 	fix_agfls();
> > 	set_fixedagfl();
> > }
> > 
> > Which probably doesn't satisfy your usability test. :)
> > 
> > Hmm... another possible strategy would be to fix the agfls and set the
> > fixedagfl bit at mount time, then at unmount/freeze/remount-ro we fix
> > the agfls again and clear the fixedagfl bit.  Then the only way an
> > unpatched kernel hits this is if we crash with a patched kernel and the
> > recovery mount is an unpatched kernel.
> > 
> 
> What's the purpose of "fixing the agfls again" at unmount time? Hasn't
> the patched kernel already addressed the size mismatch at mount time?

Sorry, my wording there didn't match what was in my head.  I'll try
again:

At mount/remount-rw time:
if agfl list is wrapped assuming the shorter agfl length:
	fix wrapping to fit the longer 4.5+ agfl length
set fixedagfl bit

At unmount/freeze/remount-ro time:
if agfl list touches the last agfl item:
	reset agfl list to the start of the agfl
clear fixedagfl bit

This way, we're only using the rocompat bit to prevent unpatched kernels
from *recovering* filesystems that might have an agfl wrap that it
doesn't understand.

> > What does everyone think of this strategy?  I think I like it the most
> > of all the things we've put forth because the only time anyone will trip
> > over this rocompat bit is this one specific scenario.
> > 
> > (Apologies if we've already talked about this, I've gotten lost in this
> > thread.)
> > 
> ...
> > > > > 
> > > > > Hmm, I still think my preferred approach is to mount time detect,
> > > > > enforce read-only and tell the user to xfs_repair[1]. It's the most
> > > > 
> > > > I think this is difficult to do for the root fs -- in general I don't
> > > > think initrds ship with xfs_repair and the logic to detect the "refuses
> > > > to mount rw" condition and react to that with xfs_repair and a second
> > > > mount attempt.
> > > > 
> > > 
> > > Good point. I guess what concerns me is leaving around such highly
> > > specialized code. I figured an error check is less risk than a function
> > > that modifies the fs. Do we have a test case that would exercise this
> > > codepath going forward, at least?
> > 
> > I did write xfstests for the general case, though apparently I've been
> > lagging xfstests and haven't sent it.  :/
> > 
> > Oh, right, we're still discussing semantics & existence of an rocompat
> > bit, which affects the golden output.
> > 
> 
> Ok..
> 
> > > > Also I think this doesn't work if we do an initial ro mount and then try
> > > > to remount-rw...
> > > > 
> > > 
> > > That one seems like it would just require a bit more code. E.g., we'd
> > > have to cache or re-check state on ro -> rw transitions. That's
> > > secondary to the rootfs use case justification, though..
> > 
> > <nod>
> > 
> > > > > simple implementation as we don't have to carry fixup code in the kernel
> > > > > for such an uncommon situation (which facilitates a broad backport
> > > > > strategy). It protects the users on stable kernels who pull in the
> > > > > latest bug fixes, including the guy who goes from a bad kernel to a good
> > > > > one, and all kernels going forward without having to use a feature bit.
> > > > > We don't do anything for the guy who wraps the agfl on newer kernels and
> > > > > goes back to the bad one, but at some point he just needs to update the
> > > > > broken kernel.
> > > > > 
> > > > > [1] I guess doing a mount time fix up vs. a read-only mount + warning is
> > > > > a separate question from the feature bit. I'd prefer the latter for
> > > > > simplicity, but perhaps there are use cases where the benefit outweighs
> > > > > the risk.
> > > > > 
> > > > > Anyways, that's just my .02. There is no real great option here, I
> > > > > guess. I just think that at some point maybe it's best to fix all the
> > > > > kernels we can to handle the size mismatch, live with the fact that some
> > > > > users might have those old, unfixed kernels and bonk them on the head to
> > > > > upgrade when they (hopefully decreasingly) pop up. Perhaps others can
> > > > > chime in with more thoughts..
> > > > 
> > > > Yeah, we could do that too (make all the next releases of
> > > > rhel/ol/debian/ubuntu/sles/whatever quietly fix the bad-wrap agfls and
> > > > skip the feature bit) and just let the unpatched kernels rot.  It keeps
> > > > coming up, though...
> > > > 
> > > 
> > > I guess I haven't really noticed it enough to consider it an ongoing
> > > issue (which doesn't mean it isn't :P). Any pattern to the use cache
> > > behind these continued reports..?
> > 
> > No particular pattern, other than using/freeing agfl blocks.
> > 
> 
> I'm more curious about the use case than the workload. E.g., what's the
> purpose for using two kernels? Was it bad luck on an upgrade or
> something that implies the problematic state could reoccur (i.e., active
> switching between good/bad kernels)?
> 
> IOW, if the "it keeps coming up" situation is simply because the
> existing userbase hasn't fully upgraded yet, then a simple detect/repair
> forward-fix patch is going to be sufficient (perhaps with a warning not
> to revert to $previous kernel) to resolve the problem for the remaining
> set of users who are susceptible to the problem.
> 
> If the problem is instead a set of users that are switching back and
> forth for whatever reason, then perhaps a feature bit policy is more
> justified.

$employer has customers that want or have to run mixed environments.

Not many, though.

> > > From an upstream perspective, RHEL continuing to operate in this mode
> > > certainly does create an ongoing situation where a "current" distro
> > > has/causes compatibility issues, particularly since some other
> > > downstream distros may have kernels that use the upstream/fixed format.
> > > 
> > > I'll reiterate that in principle I don't think downstream decisions
> > > should prohibit doing the right thing upstream, but for somebody on a
> > > downstream distro who happens to test an upstream kernel (say as part of
> > > a regression test/investigation), having the upstream kernel decide the
> > > downstream kernel is no longer allowed to mount the fs seems kind of
> > > mean. :P
> > 
> > Agreed, though agfl-related crashes are also mean. :/
> > 
> 
> Yeah, but you can recover from that. In the use case above, the feature
> bit has permanently prevented the user from going back to the distro
> kernel. So this is one semi-realistic "switching between kernels" use
> case where I think this feature bit actually hurts more than it helps.
> If it were a big enough problem, I'd much rather have the downstream
> kernel implement the feature bit to indicate that the upstream kernel
> shouldn't/can't mount this filesystem than retroactively doing the
> opposite. Even then, I still think I'd prefer the downstream kernel just
> detect, fail to mount and encourage repair so we don't have to support
> the strange transition from a clearly unsupported sequence (but this is
> all downstream/distro discussion).
> 
> FWIW, if you actually want to make progress on this in absense of any
> other feedback, I think the best approach is to refactor this series to
> separate the detect/repair mechanism from all of this feature bit policy
> stuff. The former all seems reasonable/justified to me based on your
> explanations, so I'd be happy to review those portions of it without the
> feature bit (which could still be tacked on for further review,
> reordered appropriately on merge if ultimately necessary, etc.) and
> consider the feature bit separately based on justification for added
> benefit over the former.

Ok, I'll split out the part that fixes the AGFL and we can review those
parts separately from the feature bit stuff.

--D

> 
> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > > --D
> > > > 
> > > > > 
> > > > > Brian
> > > > > 
> > > > > > > caused this in the first place ..." Then we write a FAQ entry that
> > > > > > > elaborates on how/why a user hit that problem and backport to stable
> > > > > > > kernels as far and wide as possible. Hm?
> > > > > > 
> > > > > > I /do/ like the part about writing FAQ to update the kernel.
> > > > > > 
> > > > > > --D
> > > > > > 
> > > > > > > Brian
> > > > > > > 
> > > > > > > > --D
> > > > > > > > --
> > > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > > the body of a message to majordomo@vger.kernel.org
> > > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > > --
> > > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > > the body of a message to majordomo@vger.kernel.org
> > > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > the body of a message to majordomo@vger.kernel.org
> > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2018-02-21 17:35 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-14 18:12 [PATCH v2 0/4] xfs: fix v5 AGFL wrapping Darrick J. Wong
2018-02-14 18:12 ` [PATCH 1/4] xfs: convert XFS_AGFL_SIZE to a helper function Darrick J. Wong
2018-02-14 18:12 ` [PATCH 2/4] xfs: introduce 'fixed agfl' feature Darrick J. Wong
2018-02-14 18:12 ` [PATCH 3/4] xfs: fix agfl wrapping on v5 filesystems Darrick J. Wong
2018-02-14 18:12 ` [PATCH 4/4] xfs: enable fixed agfl feature Darrick J. Wong
2018-02-14 19:56 ` [PATCH v2 0/4] xfs: fix v5 AGFL wrapping Brian Foster
2018-02-15  2:05   ` Darrick J. Wong
2018-02-15 13:16     ` Brian Foster
2018-02-16 17:57       ` Darrick J. Wong
2018-02-19 13:54         ` Brian Foster
2018-02-20 17:08           ` Darrick J. Wong
2018-02-21 13:39             ` Brian Foster
2018-02-21 17:35               ` Darrick J. Wong [this message]
2018-02-22 11:55                 ` Brian Foster
2018-02-22 18:06                   ` Darrick J. Wong
2018-02-22 18:11                     ` Darrick J. Wong
2018-02-22 18:28                       ` Brian Foster
2018-02-21 21:16               ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180221173546.GL27629@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).