From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: v5 filesystem corruption due to log recovery lsn ordering
Date: Mon, 24 Aug 2015 10:10:35 +1000 [thread overview]
Message-ID: <20150824001035.GZ714@dastard> (raw)
In-Reply-To: <20150821143923.GC46372@bfoster.bfoster>
On Fri, Aug 21, 2015 at 10:39:23AM -0400, Brian Foster wrote:
> On Fri, Aug 21, 2015 at 10:34:49AM +1000, Dave Chinner wrote:
> > On Thu, Aug 20, 2015 at 12:25:29PM -0400, Brian Foster wrote:
> > > On Thu, Aug 20, 2015 at 08:44:53AM +1000, Dave Chinner wrote:
> > > That's an interesting idea, but I wonder if it truly fixes the problem
> > > or just makes it potentially more difficult to reproduce. One concern is
> > > that even with this in place, all it takes to reintroduce the problem is
> > > for a filesystem to run a bit on an older kernel in a situation where
> > > the LSN is reset and enough modification has occurred to update whatever
> > > key metadata we determine as important with the reset LSNs.
> >
> > We can't prevent that - that horse has already bolted.
>
> Well, the feature bit can potentially do that. I'm not terribly fond of
> it (overkill) and the goal was't to fix older kernels so much as to
> prevent the problem from being reintroduced on newer kernels going
> forward (provided newer kernels implemented this mount-time LSN
> inheritance scheme).
We can prevent the problem of old userspace on newer kernels by
detecting a zeroed log - we don't need a feature bit for that.
> In other words, the proposed solution depends on trust of the metadata
> LSNs. One way we can trust those metadata LSNs is to prevent older
> kernels/xfsprogs from messing with them.
Well, it's old userspace we need to worry about here - kernels don't
zero the log...
> That said, I don't really agree with the assertion that upgraded
> userspace alone sufficiently alleviates the problem. A filesystem that's
> been around a while and has more recently been repaired is susceptible
> to this problem for the foreseeable future, upgraded userspace or not,
> until either all metadata with stale LSNs is updated or the problem is
> identified and repaired explicitly.
But there's nothing we can do in userspace to fix that. More on that
below.
> > > Another concern is that we're assuming that the key metadata will always
> > > have the most recent LSN. I think the closest thing to a guarantee we
> > > have of that is the superblock being updated on umount and every so
> > > often by xfs_log_worker() to cover the log. After a bit of playing
> > > around, I'm not so sure that is enough.
> >
> > Sure, I just threw it out as a way to get a more recent LSN. The
> > only way to reliably get the highest LSN is to walk all the metadata
> > in the filesystem. We can't do that at mount time, so perhaps it
> > is best to just refuse to mount read-write and issue a warning to
> > upgrade xfsprogs and re-run xfs_repair.
> >
>
> Indeed, that sounds like a nicer approach to me. We're really only
> considering the superblock at mount time though (note that repair resets
> AG header LSNs), so we're not going to be guaranteed the mount fails for
> all affected fs'. What do you think about also firing a (one-time,
> probably) warning if we ever update a bit of metadata with a smaller
> LSN?
Larger LSN, you mean? ;)
But, yes, that is what I was thinking for solving the "horse has
already bolted" condition you described above. We can detect LSNs
beyond the current log in the metadata we read from disk and issue a
warning. e.g in each metadata verifiers we add a:
if (xfs_verify_lsn(mp, hdr->lsn))
return false;
then we can warn immediately that this is a problem. If we discover
it during log recovery (i.e. lsn beyond the current head in the log
in a block that passes all the other metadata self checks) then we
can shut down or force the filesystem into read-only mode so that
the user has to update xfsprogs and run repair to fix it because we
can't safely run log recovery at this point.
> That bit of metadata is now "fixed" once written out, but the point is
> that the log item for this metadata was probably not effective and to
> notify the user that the fs' LSNs are potentially out of whack and a
> repair is in order.
That's true - if we don't hit a bad lsn in recovery, we just issue a
one-time warning from the verifiers and continue onwards. If the
user is lucky, the problem will fix itself before it is ever a
problem for recovery.
> > So, to repair: have it record LSNs as it walks the metadata and
> > at the start of phase 5 have it select a new cycle number higher
> > than the highest LSN found in the filesystem. Stamp the log with
> > that cycle number as well as all metadata that is being rebuilt
> > in phase 5, and all the superblocks.
> >
> > Problem solved.
> >
>
> Yeah, I was thinking of just using the current head of the log, as
> described in my random note. Using the LSN from the metadata is probably
> necessary in the cases where the log might be corrupted, however.
And as a minor detail, I'd take the cycle number from the highest
LSN and bump it a few cycles just to make sure that the new LSN is
well beyond anything in the filesystem...
> The solution of detection in the kernel and a fix in repair sounds
> pretty good to me provided the detection is robust (e.g., the point
> above wrt a warning). Then we have something in current kernels that can
> effectively handle the problem regardless of how it might have been
> (re)introduced and it's fairly simple as well.
*nod*
I think we have a plan :)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-08-24 0:12 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-19 18:39 v5 filesystem corruption due to log recovery lsn ordering Brian Foster
2015-08-19 22:44 ` Dave Chinner
2015-08-20 16:25 ` Brian Foster
2015-08-21 0:34 ` Dave Chinner
2015-08-21 14:39 ` Brian Foster
2015-08-24 0:10 ` Dave Chinner [this message]
2015-08-24 12:34 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150824001035.GZ714@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.