From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Darrick Wong <darrick.wong@oracle.com>,
Srikanth C S <srikanth.c.s@oracle.com>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
Rajesh Sivaramasubramaniom
<rajesh.sivaramasubramaniom@oracle.com>,
Junxiao Bi <junxiao.bi@oracle.com>
Subject: Re: [PATCH] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
Date: Tue, 11 Oct 2022 09:08:00 -0700 [thread overview]
Message-ID: <Y0WU4FANNYTKNXzL@magnolia> (raw)
In-Reply-To: <20221011042602.GX3600936@dread.disaster.area>
On Tue, Oct 11, 2022 at 03:26:02PM +1100, Dave Chinner wrote:
> On Mon, Oct 10, 2022 at 05:30:48PM -0700, Darrick J. Wong wrote:
> > On Tue, Oct 11, 2022 at 10:29:32AM +1100, Dave Chinner wrote:
> > > On Mon, Oct 10, 2022 at 03:40:51PM +0000, Darrick Wong wrote:
> > > > LGTM, want to send this to the upstream list to start that discussion?
> >
> > UGH, so I thought this was an internal thread, but it turns out that
> > linux-xfs has been cc'd for a while but none of the messages made it to
> > lore.
> >
> > I'll fill in some missing context below.
> >
> > > > --D
> > > >
> > > > ________________________________________
> > > > From: Srikanth C S <srikanth.c.s@oracle.com>
> > > > Sent: Monday, October 10, 2022 08:24
> > > > To: linux-xfs@vger.kernel.org; Darrick Wong
> > > > Cc: Rajesh Sivaramasubramaniom; Junxiao Bi
> > > > Subject: [PATCH] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
> > > >
> > > > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> > > > possible that when the machine crashes, the fs is in inconsistent
> > > > state with the journal log not yet replayed. This can put the
> > > > machine into rescue shell. To address this problem, mount and
> > > > umount the fs before running xfs_repair.
> > >
> > > What's the purpose of forcing xfs_repair to be run on every boot?
> > > The whole point of having a journalling filesystem is to avoid
> > > needing to run fsck on every boot.
> >
> > I don't think repair-at-every-boot here is the overall goal for our
> > customer base.
> >
> > We've had some <cough> major support events over the last few months.
> > There are two things going on here: some of the problems are due to are
> > datacenters abending, and the rest of it are the pile of data corruption
>
> /me has to look up what "abending" means in this context, because I
> don't think you mean that the data denter was "Absent By Enforced
> Net Deprivation"...
>
> Ah, it's an ancient abbreviation from the early days of IBM
> mainframes meaning "abnormal end of task". i.e. something crashed.
Yes.
> > problems that you and I and Chandan have been working through for months
> > now.
> >
> > This means that customer support has 30,000 VMs to reboot. 90% of the
> > systems involved seem to have survived more or less intact, but that
> > leaves 10% of them with latent errors, unmountable root filesystems,
> > etc. They probably have even more than that, but I don't recommend
> > inviting the G-men for a visit to find out the real sum.
> >
> > Since these machines are remotely manageable, support /can/ inject the
> > kernel command line with 'fsck.mode=force' to kick off xfs_repair if the
> > machine won't come up or if they suspect there might be deeper issues
> > with latent errors in the fs metadata, which is what they did to try to
> > get everyone running ASAP while anticipating any future problems...
>
> This context is kinda important :)
>
> [....]
>
> > > More explanation, please!
> >
> > Frankly, I /don't/ want to expend a lot of time wringing our hands over
> > how exactly do we hammer 2022 XFS tools into 1994 ext2 behavioral
> > semantics.
>
> Neither do I - I just want to understand the problem that the
> change is trying to solve. The commit message gave no indication of
> why the change was being proposed; commit messages need to tell the
> whole story, not describe the code change being made. If it started
> like:
>
> "After a recent data center crash, we recently had to recover root
> filesystems on several thousand VMs via a boot time fsck. Many of
> them failed unnecessarily because.... "
>
> then it becomes self evident that replaying the log in these
> situations is necessary.
<nod> Srikanth, would you reword the patch message to include some of
the details about why we want this patch?
> > What they really want is online fsck to detect and fix problems in real
> > time, but I can't seem to engage the community on how exactly do we land
> > this thing now that I've finished writing it.
>
> I only want to look at the interfacing with the runtime structures
> and locking to ensure we don't back ourselves into a nasty corner.
> This is really only a small part of the bigger online repair
> patchset. Once we've got that sorted, I think we should just commit
> the rest of it as it stands (i.e. tested but unreviewed) and fix as
> needed, just like we do with the userspace xfs_repair code...
Ok.
> > --D
> >
> > > > Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> > > > ---
> > > > fsck/xfs_fsck.sh | 20 ++++++++++++++++++--
> > > > 1 file changed, 18 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> > > > index 6af0f22..21a8c19 100755
> > > > --- a/fsck/xfs_fsck.sh
> > > > +++ b/fsck/xfs_fsck.sh
> > > > @@ -63,8 +63,24 @@ if [ -n "$PS1" -o -t 0 ]; then
> > > > fi
> > > >
> > > > if $FORCE; then
> > > > - xfs_repair -e $DEV
> > > > - repair2fsck_code $?
> > > > + if $AUTO; then
> > > > + xfs_repair -e $DEV
> > > > + error=$?
> > > > + if [ $error -eq 2 ]; then
To reiterate something earlier in the thread: I would only do the
mount/umount if the caller passes -fy. I would drop to the emergency
shell for -fa or -f.
(Also: whitespace damage)
> > > > + echo "Replaying log for $DEV"
> > > > + mkdir -p /tmp/tmp_mnt
> > > > + mount $DEV /tmp/tmp_mnt
>
> Need to handle mount failure here - this will need to drop to
> a rescue shell for manual recovery, I think.
Yep.
> Also, what happens if there are mount options set like quota?
> Doesn't this make more work for the recovery process if we elide
> things like that here?
Agreed, I think we need to extract rootflags= from /proc/cmdline.
> > > > + umount /tmp/tmp_mnt
> > > > + xfs_repair -e $DEV
> > > > + error=$?
> > > > + rmdir /tmp/tmp_mnt
> > > > + fi
> > > > + else
> > > > + #fsck.mode=force is set but fsck.repair=no
> > > > + xfs_repair -n $DEV
> > > > + error=$?
> > > > + fi
>
> I think that adding a "check only" mode should be a separate patch.
> We have to consider different things here, such as it will run on an
> unrecovered filesystem and potentially report it damaged when, in
> fact, all that is needed is log recovery to run....
<nod> I'm not sure what we should return in that case. Maybe 8 for
operational error?
--D
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
prev parent reply other threads:[~2022-10-11 16:08 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <MWHPR10MB1486754F03696347F4E7FEE5A3209@MWHPR10MB1486.namprd10.prod.outlook.com>
2022-10-10 15:40 ` [PATCH] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair Darrick Wong
2022-10-10 23:29 ` Dave Chinner
2022-10-11 0:30 ` Darrick J. Wong
2022-10-11 4:26 ` Dave Chinner
2022-10-11 16:08 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y0WU4FANNYTKNXzL@magnolia \
--to=djwong@kernel.org \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=junxiao.bi@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=rajesh.sivaramasubramaniom@oracle.com \
--cc=srikanth.c.s@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).