From: "Darrick J. Wong" <djwong@kernel.org>
To: Srikanth C S <srikanth.c.s@oracle.com>
Cc: linux-xfs@vger.kernel.org, darrick.wong@oracle.com,
rajesh.sivaramasubramaniom@oracle.com, junxiao.bi@oracle.com
Subject: Re: [PATCH V2] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
Date: Wed, 2 Nov 2022 10:49:15 -0700 [thread overview]
Message-ID: <Y2Ktm96molO8Kd6r@magnolia> (raw)
In-Reply-To: <20221102142946.3454-1-srikanth.c.s@oracle.com>
On Wed, Nov 02, 2022 at 07:59:46PM +0530, Srikanth C S wrote:
> After a recent data center crash, we had to recover root filesystems
> on several thousands of VMs via a boot time fsck. Since these
> machines are remotely manageable, support can inject the kernel
> command line with 'fsck.mode=force fsck.repair=yes' to kick off
> xfs_repair if the machine won't come up or if they suspect there
> might be deeper issues with latent errors in the fs metadata, which
> is what they did to try to get everyone running ASAP while
> anticipating any future problems. But, fsck.xfs does not address the
> journal replay in case of a crash.
>
> fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> possible that when the machine crashes, the fs is in inconsistent
> state with the journal log not yet replayed. This can put the
> machine into rescue shell. To address this problem, mount and umount
> the fs before running xfs_repair.
"This can drop the machine into the rescue shell because xfs_fsck.sh
does not know how to clean the log. Since the administrator told us to
force repairs, address the deficiency by cleaning the log and rerunning
xfs_repair."
> Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> other option -fa and -f drop to the resuce shell if repair detects
s/resuce/rescue/
> any corruptions
>
> Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
Ah good, your email works again.
> ---
> fsck/xfs_fsck.sh | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index 6af0f22..4ef61db 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -31,10 +31,12 @@ repair2fsck_code() {
>
> AUTO=false
> FORCE=false
> +REPAIR=false
> while getopts ":aApyf" c
> do
> case $c in
> - a|A|p|y) AUTO=true;;
> + a|A|p) AUTO=true;;
> + y) REPAIR=true;;
> f) FORCE=true;;
> esac
> done
> @@ -64,7 +66,24 @@ fi
>
> if $FORCE; then
> xfs_repair -e $DEV
> - repair2fsck_code $?
> + error=$?
> + if [ $error -eq 2 ] && [ -n "$REPAIR" ]; then
test -n checks that its argument "$REPAIR" is nonzero length. Since you
set REPAIR=false above, this test will always return success. I think
you wanted:
if [ $error -eq 2 ] && [ $REPAIR = true ]; then
here?
> + echo "Replaying log for $DEV"
> + mkdir -p /tmp/repair_mnt || exit 1
> + for x in $(cat /proc/cmdline); do
> + case $x in
> + rootflags=*)
> + ROOTFLAGS="-o ${x#rootflags=}"
> + ;;
> + esac
> + done
> + mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> + umount /tmp/repair_mnt
> + xfs_repair -e $DEV
> + error=$?
> + rm -d /tmp/repair_mnt
> + fi
> + repair2fsck_code $error
The rest of the logic looks ok to me. The new behavior needs to be
documented in the manpage. Here's a fugly troff snippet that could be
added towards the end of man/man8/fsck.xfs.8:
If the system administrator adds "fsck.mode=force fsck.repair=yes" to
the kernel command line,
.B fsck.xfs
will detect a dirty log and mount and unmount the filesystem to clean
the log before running
.BR xfs_repair (8).
--D
> exit $?
> fi
>
> --
> 1.8.3.1
next prev parent reply other threads:[~2022-11-02 17:49 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-02 14:29 [PATCH V2] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair Srikanth C S
2022-11-02 17:49 ` Darrick J. Wong [this message]
2022-11-02 20:43 ` Dave Chinner
[not found] <MWHPR10MB1486F72607AE8681E25BA0D0A3299@MWHPR10MB1486.namprd10.prod.outlook.com>
2022-10-17 17:23 ` Darrick Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2Ktm96molO8Kd6r@magnolia \
--to=djwong@kernel.org \
--cc=darrick.wong@oracle.com \
--cc=junxiao.bi@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=rajesh.sivaramasubramaniom@oracle.com \
--cc=srikanth.c.s@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox