public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
@ 2022-11-23  6:30 ` Srikanth C S
  2022-11-23  8:36   ` Carlos Maiolino
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Srikanth C S @ 2022-11-23  6:30 UTC (permalink / raw)
  To: linux-xfs
  Cc: srikanth.c.s, darrick.wong, rajesh.sivaramasubramaniom,
	junxiao.bi, david, cem

After a recent data center crash, we had to recover root filesystems
on several thousands of VMs via a boot time fsck. Since these
machines are remotely manageable, support can inject the kernel
command line with 'fsck.mode=force fsck.repair=yes' to kick off
xfs_repair if the machine won't come up or if they suspect there
might be deeper issues with latent errors in the fs metadata, which
is what they did to try to get everyone running ASAP while
anticipating any future problems. But, fsck.xfs does not address the
journal replay in case of a crash.

fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
possible that when the machine crashes, the fs is in inconsistent
state with the journal log not yet replayed. This can drop the machine
into the rescue shell because xfs_fsck.sh does not know how to clean the
log. Since the administrator told us to force repairs, address the
deficiency by cleaning the log and rerunning xfs_repair.

Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
Replay the logs only if fsck.mode=force and fsck.repair=yes. For
other option -fa and -f drop to the rescue shell if repair detects
any corruptions.

Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
---
 fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
index 6af0f22..62a1e0b 100755
--- a/fsck/xfs_fsck.sh
+++ b/fsck/xfs_fsck.sh
@@ -31,10 +31,12 @@ repair2fsck_code() {
 
 AUTO=false
 FORCE=false
+REPAIR=false
 while getopts ":aApyf" c
 do
        case $c in
-       a|A|p|y)        AUTO=true;;
+       a|A|p)          AUTO=true;;
+       y)              REPAIR=true;;
        f)              FORCE=true;;
        esac
 done
@@ -64,7 +66,32 @@ fi
 
 if $FORCE; then
        xfs_repair -e $DEV
-       repair2fsck_code $?
+       error=$?
+       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
+               echo "Replaying log for $DEV"
+               mkdir -p /tmp/repair_mnt || exit 1
+               for x in $(cat /proc/cmdline); do
+                       case $x in
+                               root=*)
+                                       ROOT="${x#root=}"
+                               ;;
+                               rootflags=*)
+                                       ROOTFLAGS="-o ${x#rootflags=}"
+                               ;;
+                       esac
+               done
+               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
+               if [ $(basename $DEV) = $(basename $ROOT) ]; then
+                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
+               else
+                       mount $DEV /tmp/repair_mnt || exit 1
+               fi
+               umount /tmp/repair_mnt
+               xfs_repair -e $DEV
+               error=$?
+               rm -d /tmp/repair_mnt
+       fi
+       repair2fsck_code $error
        exit $?
 fi
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-23  6:30 ` [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair Srikanth C S
@ 2022-11-23  8:36   ` Carlos Maiolino
       [not found]     ` <c-vuqhpmmrL6JSN0ZRnqX7c1BUcXw5gJ9L2UZ2lG3H8hCJRNIn_uan2rVHLDUPwgY24Nv3WZpiBt2nflhVadtA==@protonmail.internalid>
  2022-12-12 12:13   ` Carlos Maiolino
  2022-12-13  9:39   ` Carlos Maiolino
  2 siblings, 1 reply; 10+ messages in thread
From: Carlos Maiolino @ 2022-11-23  8:36 UTC (permalink / raw)
  To: Srikanth C S
  Cc: linux-xfs, darrick.wong, rajesh.sivaramasubramaniom, junxiao.bi,
	david

Hi.

Did you plan to resend V3 again, or is this supposed to be V4?


On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
> After a recent data center crash, we had to recover root filesystems
> on several thousands of VMs via a boot time fsck. Since these
> machines are remotely manageable, support can inject the kernel
> command line with 'fsck.mode=force fsck.repair=yes' to kick off
> xfs_repair if the machine won't come up or if they suspect there
> might be deeper issues with latent errors in the fs metadata, which
> is what they did to try to get everyone running ASAP while
> anticipating any future problems. But, fsck.xfs does not address the
> journal replay in case of a crash.
> 
> fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> possible that when the machine crashes, the fs is in inconsistent
> state with the journal log not yet replayed. This can drop the machine
> into the rescue shell because xfs_fsck.sh does not know how to clean the
> log. Since the administrator told us to force repairs, address the
> deficiency by cleaning the log and rerunning xfs_repair.
> 
> Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> other option -fa and -f drop to the rescue shell if repair detects
> any corruptions.
> 
> Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> ---
>  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
>  1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index 6af0f22..62a1e0b 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -31,10 +31,12 @@ repair2fsck_code() {
> 
>  AUTO=false
>  FORCE=false
> +REPAIR=false
>  while getopts ":aApyf" c
>  do
>         case $c in
> -       a|A|p|y)        AUTO=true;;
> +       a|A|p)          AUTO=true;;
> +       y)              REPAIR=true;;
>         f)              FORCE=true;;
>         esac
>  done
> @@ -64,7 +66,32 @@ fi
> 
>  if $FORCE; then
>         xfs_repair -e $DEV
> -       repair2fsck_code $?
> +       error=$?
> +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> +               echo "Replaying log for $DEV"
> +               mkdir -p /tmp/repair_mnt || exit 1
> +               for x in $(cat /proc/cmdline); do
> +                       case $x in
> +                               root=*)
> +                                       ROOT="${x#root=}"
> +                               ;;
> +                               rootflags=*)
> +                                       ROOTFLAGS="-o ${x#rootflags=}"
> +                               ;;
> +                       esac
> +               done
> +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> +               else
> +                       mount $DEV /tmp/repair_mnt || exit 1
> +               fi
> +               umount /tmp/repair_mnt
> +               xfs_repair -e $DEV
> +               error=$?
> +               rm -d /tmp/repair_mnt
> +       fi
> +       repair2fsck_code $error
>         exit $?
>  fi
> 
> --
> 1.8.3.1

-- 
Carlos Maiolino

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
       [not found]       ` <CY4PR10MB1479D19A047EAB8558445EC7A30C9@CY4PR10MB1479.namprd10.prod.outlook.com>
@ 2022-11-23 12:23         ` Carlos Maiolino
  2022-11-25 12:09           ` Srikanth C S
  0 siblings, 1 reply; 10+ messages in thread
From: Carlos Maiolino @ 2022-11-23 12:23 UTC (permalink / raw)
  To: Srikanth C S
  Cc: linux-xfs@vger.kernel.org, Darrick Wong,
	Rajesh Sivaramasubramaniom, Junxiao Bi, david@fromorbit.com

On Wed, Nov 23, 2022 at 11:40:53AM +0000, Srikanth C S wrote:
>    Hi
> 
>    I resent the same patch as I did not see any review comments.

Unless I'm looking at the wrong patch, there were comments on your previous
submission:

https://lore.kernel.org/linux-xfs/Y2ie54fcHDx5bcG4@B-P7TQMD6M-0146.local/T/#t

Am I missing something?

Also, if you are sending the same patch, you can 'flag' it as a resend, so, it's
easier to identify you are simply resending the same patch. You can do it by
appending/prepending 'RESEND', to the patch tag:

[RESEND PATCH] <subject>

Cheers.

> 
>    -Srikanth
>      __________________________________________________________________
> 
>    From: Carlos Maiolino <cem@kernel.org>
>    Sent: Wednesday, November 23, 2022 2:06 PM
>    To: Srikanth C S <srikanth.c.s@oracle.com>
>    Cc: linux-xfs@vger.kernel.org <linux-xfs@vger.kernel.org>; Darrick Wong
>    <darrick.wong@oracle.com>; Rajesh Sivaramasubramaniom
>    <rajesh.sivaramasubramaniom@oracle.com>; Junxiao Bi
>    <junxiao.bi@oracle.com>; david@fromorbit.com <david@fromorbit.com>
>    Subject: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to
>    replay log before running xfs_repair
> 
>    Hi.
>    Did you plan to resend V3 again, or is this supposed to be V4?
>    On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
>    > After a recent data center crash, we had to recover root filesystems
>    > on several thousands of VMs via a boot time fsck. Since these
>    > machines are remotely manageable, support can inject the kernel
>    > command line with 'fsck.mode=force fsck.repair=yes' to kick off
>    > xfs_repair if the machine won't come up or if they suspect there
>    > might be deeper issues with latent errors in the fs metadata, which
>    > is what they did to try to get everyone running ASAP while
>    > anticipating any future problems. But, fsck.xfs does not address the
>    > journal replay in case of a crash.
>    >
>    > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
>    > possible that when the machine crashes, the fs is in inconsistent
>    > state with the journal log not yet replayed. This can drop the
>    machine
>    > into the rescue shell because xfs_fsck.sh does not know how to clean
>    the
>    > log. Since the administrator told us to force repairs, address the
>    > deficiency by cleaning the log and rerunning xfs_repair.
>    >
>    > Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
>    > Replay the logs only if fsck.mode=force and fsck.repair=yes. For
>    > other option -fa and -f drop to the rescue shell if repair detects
>    > any corruptions.
>    >
>    > Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
>    > ---
>    >  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
>    >  1 file changed, 29 insertions(+), 2 deletions(-)
>    >
>    > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
>    > index 6af0f22..62a1e0b 100755
>    > --- a/fsck/xfs_fsck.sh
>    > +++ b/fsck/xfs_fsck.sh
>    > @@ -31,10 +31,12 @@ repair2fsck_code() {
>    >
>    >  AUTO=false
>    >  FORCE=false
>    > +REPAIR=false
>    >  while getopts ":aApyf" c
>    >  do
>    >         case $c in
>    > -       a|A|p|y)        AUTO=true;;
>    > +       a|A|p)          AUTO=true;;
>    > +       y)              REPAIR=true;;
>    >         f)              FORCE=true;;
>    >         esac
>    >  done
>    > @@ -64,7 +66,32 @@ fi
>    >
>    >  if $FORCE; then
>    >         xfs_repair -e $DEV
>    > -       repair2fsck_code $?
>    > +       error=$?
>    > +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
>    > +               echo "Replaying log for $DEV"
>    > +               mkdir -p /tmp/repair_mnt || exit 1
>    > +               for x in $(cat /proc/cmdline); do
>    > +                       case $x in
>    > +                               root=*)
>    > +                                       ROOT="${x#root=}"
>    > +                               ;;
>    > +                               rootflags=*)
>    > +                                       ROOTFLAGS="-o
>    ${x#rootflags=}"
>    > +                               ;;
>    > +                       esac
>    > +               done
>    > +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
>    > +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
>    > +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit
>    1
>    > +               else
>    > +                       mount $DEV /tmp/repair_mnt || exit 1
>    > +               fi
>    > +               umount /tmp/repair_mnt
>    > +               xfs_repair -e $DEV
>    > +               error=$?
>    > +               rm -d /tmp/repair_mnt
>    > +       fi
>    > +       repair2fsck_code $error
>    >         exit $?
>    >  fi
>    >
>    > --
>    > 1.8.3.1
>    --
>    Carlos Maiolino

-- 
Carlos Maiolino

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-23 12:23         ` [External] : " Carlos Maiolino
@ 2022-11-25 12:09           ` Srikanth C S
  2022-11-28 23:04             ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Srikanth C S @ 2022-11-25 12:09 UTC (permalink / raw)
  To: Carlos Maiolino
  Cc: linux-xfs@vger.kernel.org, Darrick Wong,
	Rajesh Sivaramasubramaniom, Junxiao Bi, david@fromorbit.com



> -----Original Message-----
> From: Carlos Maiolino <cem@kernel.org>
> Sent: 23 November 2022 05:53 PM
> To: Srikanth C S <srikanth.c.s@oracle.com>
> Cc: linux-xfs@vger.kernel.org; Darrick Wong <darrick.wong@oracle.com>;
> Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@oracle.com>;
> Junxiao Bi <junxiao.bi@oracle.com>; david@fromorbit.com
> Subject: Re: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to
> replay log before running xfs_repair
> 
> On Wed, Nov 23, 2022 at 11:40:53AM +0000, Srikanth C S wrote:
> >    Hi
> >
> >    I resent the same patch as I did not see any review comments.
> 
> Unless I'm looking at the wrong patch, there were comments on your
> previous
> submission:
> 
> https://urldefense.com/v3/__https://lore.kernel.org/linux-
> xfs/Y2ie54fcHDx5bcG4@B-P7TQMD6M-
> 0146.local/T/*t__;Iw!!ACWV5N9M2RV99hQ!J2Z-
> 2NThyyDm__z9ivhioF9QoHsaHh4Tk733jtNbVMPGeA2vbmbw3h4ZGxOywQF
> v_lA1Zs_jsUgr$
> 
> Am I missing something?
All the previous comments addressing this patch were about having journal replay 
code in the userspace. But Darricks comments indicate that this requires making the 
log endian safe because of kernel's inability to recover a log from a platform with 
a different endianness.

So I am still wondering on how to proceed with this patch. Any comments would 
be helpful.
> 
> Also, if you are sending the same patch, you can 'flag' it as a resend, so, it's
> easier to identify you are simply resending the same patch. You can do it by
> appending/prepending 'RESEND', to the patch tag:
> 
> [RESEND PATCH] <subject>
Thanks for the info. Didn't know this.
> 
> Cheers.
> 
> >
> >    -Srikanth
> >
> >
> __________________________________________________________
> ________
> >
> >    From: Carlos Maiolino <cem@kernel.org>
> >    Sent: Wednesday, November 23, 2022 2:06 PM
> >    To: Srikanth C S <srikanth.c.s@oracle.com>
> >    Cc: linux-xfs@vger.kernel.org <linux-xfs@vger.kernel.org>; Darrick Wong
> >    <darrick.wong@oracle.com>; Rajesh Sivaramasubramaniom
> >    <rajesh.sivaramasubramaniom@oracle.com>; Junxiao Bi
> >    <junxiao.bi@oracle.com>; david@fromorbit.com
> <david@fromorbit.com>
> >    Subject: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to
> >    replay log before running xfs_repair
> >
> >    Hi.
> >    Did you plan to resend V3 again, or is this supposed to be V4?
> >    On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
> >    > After a recent data center crash, we had to recover root filesystems
> >    > on several thousands of VMs via a boot time fsck. Since these
> >    > machines are remotely manageable, support can inject the kernel
> >    > command line with 'fsck.mode=force fsck.repair=yes' to kick off
> >    > xfs_repair if the machine won't come up or if they suspect there
> >    > might be deeper issues with latent errors in the fs metadata, which
> >    > is what they did to try to get everyone running ASAP while
> >    > anticipating any future problems. But, fsck.xfs does not address the
> >    > journal replay in case of a crash.
> >    >
> >    > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> >    > possible that when the machine crashes, the fs is in inconsistent
> >    > state with the journal log not yet replayed. This can drop the
> >    machine
> >    > into the rescue shell because xfs_fsck.sh does not know how to clean
> >    the
> >    > log. Since the administrator told us to force repairs, address the
> >    > deficiency by cleaning the log and rerunning xfs_repair.
> >    >
> >    > Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> >    > Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> >    > other option -fa and -f drop to the rescue shell if repair detects
> >    > any corruptions.
> >    >
> >    > Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> >    > ---
> >    >  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
> >    >  1 file changed, 29 insertions(+), 2 deletions(-)
> >    >
> >    > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> >    > index 6af0f22..62a1e0b 100755
> >    > --- a/fsck/xfs_fsck.sh
> >    > +++ b/fsck/xfs_fsck.sh
> >    > @@ -31,10 +31,12 @@ repair2fsck_code() {
> >    >
> >    >  AUTO=false
> >    >  FORCE=false
> >    > +REPAIR=false
> >    >  while getopts ":aApyf" c
> >    >  do
> >    >         case $c in
> >    > -       a|A|p|y)        AUTO=true;;
> >    > +       a|A|p)          AUTO=true;;
> >    > +       y)              REPAIR=true;;
> >    >         f)              FORCE=true;;
> >    >         esac
> >    >  done
> >    > @@ -64,7 +66,32 @@ fi
> >    >
> >    >  if $FORCE; then
> >    >         xfs_repair -e $DEV
> >    > -       repair2fsck_code $?
> >    > +       error=$?
> >    > +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> >    > +               echo "Replaying log for $DEV"
> >    > +               mkdir -p /tmp/repair_mnt || exit 1
> >    > +               for x in $(cat /proc/cmdline); do
> >    > +                       case $x in
> >    > +                               root=*)
> >    > +                                       ROOT="${x#root=}"
> >    > +                               ;;
> >    > +                               rootflags=*)
> >    > +                                       ROOTFLAGS="-o
> >    ${x#rootflags=}"
> >    > +                               ;;
> >    > +                       esac
> >    > +               done
> >    > +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> >    > +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> >    > +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit
> >    1
> >    > +               else
> >    > +                       mount $DEV /tmp/repair_mnt || exit 1
> >    > +               fi
> >    > +               umount /tmp/repair_mnt
> >    > +               xfs_repair -e $DEV
> >    > +               error=$?
> >    > +               rm -d /tmp/repair_mnt
> >    > +       fi
> >    > +       repair2fsck_code $error
> >    >         exit $?
> >    >  fi
> >    >
> >    > --
> >    > 1.8.3.1
> >    --
> >    Carlos Maiolino
> 
> --
> Carlos Maiolino

Regards,
Srikanth

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-25 12:09           ` Srikanth C S
@ 2022-11-28 23:04             ` Darrick J. Wong
  2022-12-06 11:48               ` Srikanth C S
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2022-11-28 23:04 UTC (permalink / raw)
  To: Srikanth C S
  Cc: Carlos Maiolino, linux-xfs@vger.kernel.org, Darrick Wong,
	Rajesh Sivaramasubramaniom, Junxiao Bi, david@fromorbit.com

On Fri, Nov 25, 2022 at 12:09:39PM +0000, Srikanth C S wrote:
> 
> 
> > -----Original Message-----
> > From: Carlos Maiolino <cem@kernel.org>
> > Sent: 23 November 2022 05:53 PM
> > To: Srikanth C S <srikanth.c.s@oracle.com>
> > Cc: linux-xfs@vger.kernel.org; Darrick Wong <darrick.wong@oracle.com>;
> > Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@oracle.com>;
> > Junxiao Bi <junxiao.bi@oracle.com>; david@fromorbit.com
> > Subject: Re: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to
> > replay log before running xfs_repair
> > 
> > On Wed, Nov 23, 2022 at 11:40:53AM +0000, Srikanth C S wrote:
> > >    Hi
> > >
> > >    I resent the same patch as I did not see any review comments.
> > 
> > Unless I'm looking at the wrong patch, there were comments on your
> > previous
> > submission:
> > 
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-
> > xfs/Y2ie54fcHDx5bcG4@B-P7TQMD6M-
> > 0146.local/T/*t__;Iw!!ACWV5N9M2RV99hQ!J2Z-
> > 2NThyyDm__z9ivhioF9QoHsaHh4Tk733jtNbVMPGeA2vbmbw3h4ZGxOywQF
> > v_lA1Zs_jsUgr$
> > 
> > Am I missing something?

Err.... whose comments, Joseph's or Gao's?

> All the previous comments addressing this patch were about having
> journal replay code in the userspace. But Darricks comments indicate
> that this requires making the log endian safe because of kernel's
> inability to recover a log from a platform with a different
> endianness.
> 
> So I am still wondering on how to proceed with this patch. Any
> comments would be helpful.

Same here, though the long holiday weekend probably didn't help.

--D

> > Also, if you are sending the same patch, you can 'flag' it as a resend, so, it's
> > easier to identify you are simply resending the same patch. You can do it by
> > appending/prepending 'RESEND', to the patch tag:
> > 
> > [RESEND PATCH] <subject>
> Thanks for the info. Didn't know this.
> > 
> > Cheers.
> > 
> > >
> > >    -Srikanth
> > >
> > >
> > __________________________________________________________
> > ________
> > >
> > >    From: Carlos Maiolino <cem@kernel.org>
> > >    Sent: Wednesday, November 23, 2022 2:06 PM
> > >    To: Srikanth C S <srikanth.c.s@oracle.com>
> > >    Cc: linux-xfs@vger.kernel.org <linux-xfs@vger.kernel.org>; Darrick Wong
> > >    <darrick.wong@oracle.com>; Rajesh Sivaramasubramaniom
> > >    <rajesh.sivaramasubramaniom@oracle.com>; Junxiao Bi
> > >    <junxiao.bi@oracle.com>; david@fromorbit.com
> > <david@fromorbit.com>
> > >    Subject: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to
> > >    replay log before running xfs_repair
> > >
> > >    Hi.
> > >    Did you plan to resend V3 again, or is this supposed to be V4?
> > >    On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
> > >    > After a recent data center crash, we had to recover root filesystems
> > >    > on several thousands of VMs via a boot time fsck. Since these
> > >    > machines are remotely manageable, support can inject the kernel
> > >    > command line with 'fsck.mode=force fsck.repair=yes' to kick off
> > >    > xfs_repair if the machine won't come up or if they suspect there
> > >    > might be deeper issues with latent errors in the fs metadata, which
> > >    > is what they did to try to get everyone running ASAP while
> > >    > anticipating any future problems. But, fsck.xfs does not address the
> > >    > journal replay in case of a crash.
> > >    >
> > >    > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> > >    > possible that when the machine crashes, the fs is in inconsistent
> > >    > state with the journal log not yet replayed. This can drop the
> > >    machine
> > >    > into the rescue shell because xfs_fsck.sh does not know how to clean
> > >    the
> > >    > log. Since the administrator told us to force repairs, address the
> > >    > deficiency by cleaning the log and rerunning xfs_repair.
> > >    >
> > >    > Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> > >    > Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> > >    > other option -fa and -f drop to the rescue shell if repair detects
> > >    > any corruptions.
> > >    >
> > >    > Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> > >    > ---
> > >    >  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
> > >    >  1 file changed, 29 insertions(+), 2 deletions(-)
> > >    >
> > >    > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> > >    > index 6af0f22..62a1e0b 100755
> > >    > --- a/fsck/xfs_fsck.sh
> > >    > +++ b/fsck/xfs_fsck.sh
> > >    > @@ -31,10 +31,12 @@ repair2fsck_code() {
> > >    >
> > >    >  AUTO=false
> > >    >  FORCE=false
> > >    > +REPAIR=false
> > >    >  while getopts ":aApyf" c
> > >    >  do
> > >    >         case $c in
> > >    > -       a|A|p|y)        AUTO=true;;
> > >    > +       a|A|p)          AUTO=true;;
> > >    > +       y)              REPAIR=true;;
> > >    >         f)              FORCE=true;;
> > >    >         esac
> > >    >  done
> > >    > @@ -64,7 +66,32 @@ fi
> > >    >
> > >    >  if $FORCE; then
> > >    >         xfs_repair -e $DEV
> > >    > -       repair2fsck_code $?
> > >    > +       error=$?
> > >    > +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> > >    > +               echo "Replaying log for $DEV"
> > >    > +               mkdir -p /tmp/repair_mnt || exit 1
> > >    > +               for x in $(cat /proc/cmdline); do
> > >    > +                       case $x in
> > >    > +                               root=*)
> > >    > +                                       ROOT="${x#root=}"
> > >    > +                               ;;
> > >    > +                               rootflags=*)
> > >    > +                                       ROOTFLAGS="-o
> > >    ${x#rootflags=}"
> > >    > +                               ;;
> > >    > +                       esac
> > >    > +               done
> > >    > +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> > >    > +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> > >    > +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit
> > >    1
> > >    > +               else
> > >    > +                       mount $DEV /tmp/repair_mnt || exit 1
> > >    > +               fi
> > >    > +               umount /tmp/repair_mnt
> > >    > +               xfs_repair -e $DEV
> > >    > +               error=$?
> > >    > +               rm -d /tmp/repair_mnt
> > >    > +       fi
> > >    > +       repair2fsck_code $error
> > >    >         exit $?
> > >    >  fi
> > >    >
> > >    > --
> > >    > 1.8.3.1
> > >    --
> > >    Carlos Maiolino
> > 
> > --
> > Carlos Maiolino
> 
> Regards,
> Srikanth

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-28 23:04             ` Darrick J. Wong
@ 2022-12-06 11:48               ` Srikanth C S
  2022-12-09  9:51                 ` Carlos Maiolino
  0 siblings, 1 reply; 10+ messages in thread
From: Srikanth C S @ 2022-12-06 11:48 UTC (permalink / raw)
  To: Carlos Maiolino
  Cc: linux-xfs@vger.kernel.org, Darrick Wong,
	Rajesh Sivaramasubramaniom, Junxiao Bi, david@fromorbit.com,
	Darrick J. Wong



> -----Original Message-----
> From: Darrick J. Wong <djwong@kernel.org>
> Sent: 29 November 2022 04:34 AM
> To: Srikanth C S <srikanth.c.s@oracle.com>
> Cc: Carlos Maiolino <cem@kernel.org>; linux-xfs@vger.kernel.org; Darrick
> Wong <darrick.wong@oracle.com>; Rajesh Sivaramasubramaniom
> <rajesh.sivaramasubramaniom@oracle.com>; Junxiao Bi
> <junxiao.bi@oracle.com>; david@fromorbit.com
> Subject: Re: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to
> replay log before running xfs_repair
> 
> On Fri, Nov 25, 2022 at 12:09:39PM +0000, Srikanth C S wrote:
> >
> >
> > > -----Original Message-----
> > > From: Carlos Maiolino <cem@kernel.org>
> > > Sent: 23 November 2022 05:53 PM
> > > To: Srikanth C S <srikanth.c.s@oracle.com>
> > > Cc: linux-xfs@vger.kernel.org; Darrick Wong
> > > <darrick.wong@oracle.com>; Rajesh Sivaramasubramaniom
> > > <rajesh.sivaramasubramaniom@oracle.com>;
> > > Junxiao Bi <junxiao.bi@oracle.com>; david@fromorbit.com
> > > Subject: Re: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs
> > > fs to replay log before running xfs_repair
> > >
> > > On Wed, Nov 23, 2022 at 11:40:53AM +0000, Srikanth C S wrote:
> > > >    Hi
> > > >
> > > >    I resent the same patch as I did not see any review comments.
> > >
> > > Unless I'm looking at the wrong patch, there were comments on your
> > > previous
> > > submission:
> > >
> > > https://urldefense.com/v3/__https://lore.kernel.org/linux-
> > > xfs/Y2ie54fcHDx5bcG4@B-P7TQMD6M-
> > > 0146.local/T/*t__;Iw!!ACWV5N9M2RV99hQ!J2Z-
> > >
> 2NThyyDm__z9ivhioF9QoHsaHh4Tk733jtNbVMPGeA2vbmbw3h4ZGxOywQF
> > > v_lA1Zs_jsUgr$
> > >
> > > Am I missing something?
> 
> Err.... whose comments, Joseph's or Gao's?
> 
> > All the previous comments addressing this patch were about having
> > journal replay code in the userspace. But Darricks comments indicate
> > that this requires making the log endian safe because of kernel's
> > inability to recover a log from a platform with a different
> > endianness.
> >
> > So I am still wondering on how to proceed with this patch. Any
> > comments would be helpful.
> 
@Carlos Maiolino, Any comments or thoughts on this patch?

-Srikanth
> Same here, though the long holiday weekend probably didn't help.
> 
> --D
> 
> > > Also, if you are sending the same patch, you can 'flag' it as a
> > > resend, so, it's easier to identify you are simply resending the
> > > same patch. You can do it by appending/prepending 'RESEND', to the
> patch tag:
> > >
> > > [RESEND PATCH] <subject>
> > Thanks for the info. Didn't know this.
> > >
> > > Cheers.
> > >
> > > >
> > > >    -Srikanth
> > > >
> > > >
> > >
> __________________________________________________________
> > > ________
> > > >
> > > >    From: Carlos Maiolino <cem@kernel.org>
> > > >    Sent: Wednesday, November 23, 2022 2:06 PM
> > > >    To: Srikanth C S <srikanth.c.s@oracle.com>
> > > >    Cc: linux-xfs@vger.kernel.org <linux-xfs@vger.kernel.org>; Darrick
> Wong
> > > >    <darrick.wong@oracle.com>; Rajesh Sivaramasubramaniom
> > > >    <rajesh.sivaramasubramaniom@oracle.com>; Junxiao Bi
> > > >    <junxiao.bi@oracle.com>; david@fromorbit.com
> > > <david@fromorbit.com>
> > > >    Subject: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to
> > > >    replay log before running xfs_repair
> > > >
> > > >    Hi.
> > > >    Did you plan to resend V3 again, or is this supposed to be V4?
> > > >    On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
> > > >    > After a recent data center crash, we had to recover root filesystems
> > > >    > on several thousands of VMs via a boot time fsck. Since these
> > > >    > machines are remotely manageable, support can inject the kernel
> > > >    > command line with 'fsck.mode=force fsck.repair=yes' to kick off
> > > >    > xfs_repair if the machine won't come up or if they suspect there
> > > >    > might be deeper issues with latent errors in the fs metadata, which
> > > >    > is what they did to try to get everyone running ASAP while
> > > >    > anticipating any future problems. But, fsck.xfs does not address the
> > > >    > journal replay in case of a crash.
> > > >    >
> > > >    > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> > > >    > possible that when the machine crashes, the fs is in inconsistent
> > > >    > state with the journal log not yet replayed. This can drop the
> > > >    machine
> > > >    > into the rescue shell because xfs_fsck.sh does not know how to
> clean
> > > >    the
> > > >    > log. Since the administrator told us to force repairs, address the
> > > >    > deficiency by cleaning the log and rerunning xfs_repair.
> > > >    >
> > > >    > Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> > > >    > Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> > > >    > other option -fa and -f drop to the rescue shell if repair detects
> > > >    > any corruptions.
> > > >    >
> > > >    > Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> > > >    > ---
> > > >    >  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
> > > >    >  1 file changed, 29 insertions(+), 2 deletions(-)
> > > >    >
> > > >    > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> > > >    > index 6af0f22..62a1e0b 100755
> > > >    > --- a/fsck/xfs_fsck.sh
> > > >    > +++ b/fsck/xfs_fsck.sh
> > > >    > @@ -31,10 +31,12 @@ repair2fsck_code() {
> > > >    >
> > > >    >  AUTO=false
> > > >    >  FORCE=false
> > > >    > +REPAIR=false
> > > >    >  while getopts ":aApyf" c
> > > >    >  do
> > > >    >         case $c in
> > > >    > -       a|A|p|y)        AUTO=true;;
> > > >    > +       a|A|p)          AUTO=true;;
> > > >    > +       y)              REPAIR=true;;
> > > >    >         f)              FORCE=true;;
> > > >    >         esac
> > > >    >  done
> > > >    > @@ -64,7 +66,32 @@ fi
> > > >    >
> > > >    >  if $FORCE; then
> > > >    >         xfs_repair -e $DEV
> > > >    > -       repair2fsck_code $?
> > > >    > +       error=$?
> > > >    > +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> > > >    > +               echo "Replaying log for $DEV"
> > > >    > +               mkdir -p /tmp/repair_mnt || exit 1
> > > >    > +               for x in $(cat /proc/cmdline); do
> > > >    > +                       case $x in
> > > >    > +                               root=*)
> > > >    > +                                       ROOT="${x#root=}"
> > > >    > +                               ;;
> > > >    > +                               rootflags=*)
> > > >    > +                                       ROOTFLAGS="-o
> > > >    ${x#rootflags=}"
> > > >    > +                               ;;
> > > >    > +                       esac
> > > >    > +               done
> > > >    > +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> > > >    > +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> > > >    > +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit
> > > >    1
> > > >    > +               else
> > > >    > +                       mount $DEV /tmp/repair_mnt || exit 1
> > > >    > +               fi
> > > >    > +               umount /tmp/repair_mnt
> > > >    > +               xfs_repair -e $DEV
> > > >    > +               error=$?
> > > >    > +               rm -d /tmp/repair_mnt
> > > >    > +       fi
> > > >    > +       repair2fsck_code $error
> > > >    >         exit $?
> > > >    >  fi
> > > >    >
> > > >    > --
> > > >    > 1.8.3.1
> > > >    --
> > > >    Carlos Maiolino
> > >
> > > --
> > > Carlos Maiolino
> >
> > Regards,
> > Srikanth

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-12-06 11:48               ` Srikanth C S
@ 2022-12-09  9:51                 ` Carlos Maiolino
  0 siblings, 0 replies; 10+ messages in thread
From: Carlos Maiolino @ 2022-12-09  9:51 UTC (permalink / raw)
  To: Srikanth C S
  Cc: linux-xfs@vger.kernel.org, Darrick Wong,
	Rajesh Sivaramasubramaniom, Junxiao Bi, david@fromorbit.com,
	Darrick J. Wong

> > > So I am still wondering on how to proceed with this patch. Any
> > > comments would be helpful.
> >
> @Carlos Maiolino, Any comments or thoughts on this patch?

Sorry, didn't have time to look into it yet.


-- 
Carlos Maiolino

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-23  6:30 ` [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair Srikanth C S
  2022-11-23  8:36   ` Carlos Maiolino
@ 2022-12-12 12:13   ` Carlos Maiolino
  2022-12-13  9:39   ` Carlos Maiolino
  2 siblings, 0 replies; 10+ messages in thread
From: Carlos Maiolino @ 2022-12-12 12:13 UTC (permalink / raw)
  To: Srikanth C S
  Cc: linux-xfs, darrick.wong, rajesh.sivaramasubramaniom, junxiao.bi,
	david

On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
> After a recent data center crash, we had to recover root filesystems
> on several thousands of VMs via a boot time fsck. Since these
> machines are remotely manageable, support can inject the kernel
> command line with 'fsck.mode=force fsck.repair=yes' to kick off
> xfs_repair if the machine won't come up or if they suspect there
> might be deeper issues with latent errors in the fs metadata, which
> is what they did to try to get everyone running ASAP while
> anticipating any future problems. But, fsck.xfs does not address the
> journal replay in case of a crash.
> 
> fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> possible that when the machine crashes, the fs is in inconsistent
> state with the journal log not yet replayed. This can drop the machine
> into the rescue shell because xfs_fsck.sh does not know how to clean the
> log. Since the administrator told us to force repairs, address the
> deficiency by cleaning the log and rerunning xfs_repair.
> 
> Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> other option -fa and -f drop to the rescue shell if repair detects
> any corruptions.
> 
> Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>

Apologies it took so long, the patch seems fine to me. Will test.

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

> ---
>  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
>  1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index 6af0f22..62a1e0b 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -31,10 +31,12 @@ repair2fsck_code() {
> 
>  AUTO=false
>  FORCE=false
> +REPAIR=false
>  while getopts ":aApyf" c
>  do
>         case $c in
> -       a|A|p|y)        AUTO=true;;
> +       a|A|p)          AUTO=true;;
> +       y)              REPAIR=true;;
>         f)              FORCE=true;;
>         esac
>  done
> @@ -64,7 +66,32 @@ fi
> 
>  if $FORCE; then
>         xfs_repair -e $DEV
> -       repair2fsck_code $?
> +       error=$?
> +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> +               echo "Replaying log for $DEV"
> +               mkdir -p /tmp/repair_mnt || exit 1
> +               for x in $(cat /proc/cmdline); do
> +                       case $x in
> +                               root=*)
> +                                       ROOT="${x#root=}"
> +                               ;;
> +                               rootflags=*)
> +                                       ROOTFLAGS="-o ${x#rootflags=}"
> +                               ;;
> +                       esac
> +               done
> +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> +               else
> +                       mount $DEV /tmp/repair_mnt || exit 1
> +               fi
> +               umount /tmp/repair_mnt
> +               xfs_repair -e $DEV
> +               error=$?
> +               rm -d /tmp/repair_mnt
> +       fi
> +       repair2fsck_code $error
>         exit $?
>  fi
> 
> --
> 1.8.3.1

-- 
Carlos Maiolino

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-23  6:30 ` [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair Srikanth C S
  2022-11-23  8:36   ` Carlos Maiolino
  2022-12-12 12:13   ` Carlos Maiolino
@ 2022-12-13  9:39   ` Carlos Maiolino
  2022-12-13 12:32     ` [External] : " Srikanth C S
  2 siblings, 1 reply; 10+ messages in thread
From: Carlos Maiolino @ 2022-12-13  9:39 UTC (permalink / raw)
  To: Srikanth C S
  Cc: linux-xfs, darrick.wong, rajesh.sivaramasubramaniom, junxiao.bi,
	david

Hi Srikanth.

On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
> After a recent data center crash, we had to recover root filesystems
> on several thousands of VMs via a boot time fsck. Since these
> machines are remotely manageable, support can inject the kernel
> command line with 'fsck.mode=force fsck.repair=yes' to kick off
> xfs_repair if the machine won't come up or if they suspect there
> might be deeper issues with latent errors in the fs metadata, which
> is what they did to try to get everyone running ASAP while
> anticipating any future problems. But, fsck.xfs does not address the
> journal replay in case of a crash.
> 
> fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> possible that when the machine crashes, the fs is in inconsistent
> state with the journal log not yet replayed. This can drop the machine
> into the rescue shell because xfs_fsck.sh does not know how to clean the
> log. Since the administrator told us to force repairs, address the
> deficiency by cleaning the log and rerunning xfs_repair.
> 
> Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> other option -fa and -f drop to the rescue shell if repair detects
> any corruptions.
> 
> Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> ---
>  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
>  1 file changed, 29 insertions(+), 2 deletions(-)

Did you by any chance wrote this patch on top of something else you have in your
tree?

It doesn't apply to the tree without tweaking it, and the last changes we've in
the fsck/xfs_fsck.sh file are from 2018, so I assume you have something before
this patch in your tree.

Could you please rebase this patch against xfsprogs for-next and resend it? Feel
free to keep my RwB as long as you don't change the code semantics.

Cheers.

> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index 6af0f22..62a1e0b 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -31,10 +31,12 @@ repair2fsck_code() {
> 
>  AUTO=false
>  FORCE=false
> +REPAIR=false
>  while getopts ":aApyf" c
>  do
>         case $c in
> -       a|A|p|y)        AUTO=true;;
> +       a|A|p)          AUTO=true;;
> +       y)              REPAIR=true;;
>         f)              FORCE=true;;
>         esac
>  done
> @@ -64,7 +66,32 @@ fi
> 
>  if $FORCE; then
>         xfs_repair -e $DEV
> -       repair2fsck_code $?
> +       error=$?
> +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> +               echo "Replaying log for $DEV"
> +               mkdir -p /tmp/repair_mnt || exit 1
> +               for x in $(cat /proc/cmdline); do
> +                       case $x in
> +                               root=*)
> +                                       ROOT="${x#root=}"
> +                               ;;
> +                               rootflags=*)
> +                                       ROOTFLAGS="-o ${x#rootflags=}"
> +                               ;;
> +                       esac
> +               done
> +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> +               else
> +                       mount $DEV /tmp/repair_mnt || exit 1
> +               fi
> +               umount /tmp/repair_mnt
> +               xfs_repair -e $DEV
> +               error=$?
> +               rm -d /tmp/repair_mnt
> +       fi
> +       repair2fsck_code $error
>         exit $?
>  fi
> 
> --
> 1.8.3.1

-- 
Carlos Maiolino

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-12-13  9:39   ` Carlos Maiolino
@ 2022-12-13 12:32     ` Srikanth C S
  0 siblings, 0 replies; 10+ messages in thread
From: Srikanth C S @ 2022-12-13 12:32 UTC (permalink / raw)
  To: Carlos Maiolino
  Cc: linux-xfs@vger.kernel.org, Darrick Wong,
	Rajesh Sivaramasubramaniom, Junxiao Bi, david@fromorbit.com



> -----Original Message-----
> From: Carlos Maiolino <cem@kernel.org>
> Sent: 13 December 2022 03:10 PM
> To: Srikanth C S <srikanth.c.s@oracle.com>
> Cc: linux-xfs@vger.kernel.org; Darrick Wong <darrick.wong@oracle.com>;
> Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@oracle.com>;
> Junxiao Bi <junxiao.bi@oracle.com>; david@fromorbit.com
> Subject: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay
> log before running xfs_repair
> 
> Hi Srikanth.
> 
> On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
> > After a recent data center crash, we had to recover root filesystems
> > on several thousands of VMs via a boot time fsck. Since these machines
> > are remotely manageable, support can inject the kernel command line
> > with 'fsck.mode=force fsck.repair=yes' to kick off xfs_repair if the
> > machine won't come up or if they suspect there might be deeper issues
> > with latent errors in the fs metadata, which is what they did to try
> > to get everyone running ASAP while anticipating any future problems.
> > But, fsck.xfs does not address the journal replay in case of a crash.
> >
> > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is possible
> > that when the machine crashes, the fs is in inconsistent state with
> > the journal log not yet replayed. This can drop the machine into the
> > rescue shell because xfs_fsck.sh does not know how to clean the log.
> > Since the administrator told us to force repairs, address the
> > deficiency by cleaning the log and rerunning xfs_repair.
> >
> > Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> > Replay the logs only if fsck.mode=force and fsck.repair=yes. For other
> > option -fa and -f drop to the rescue shell if repair detects any
> > corruptions.
> >
> > Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> > ---
> >  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
> >  1 file changed, 29 insertions(+), 2 deletions(-)
> 
> Did you by any chance wrote this patch on top of something else you have in
> your tree?
> 
> It doesn't apply to the tree without tweaking it, and the last changes we've in
> the fsck/xfs_fsck.sh file are from 2018, so I assume you have something
> before this patch in your tree.
> 
Sorry for the inconvenience, will verify this.

> Could you please rebase this patch against xfsprogs for-next and resend it?
> Feel free to keep my RwB as long as you don't change the code semantics.
> 
Let me rebase the patch and resend it. Thanks for the Reviewed by.

> Cheers.
> 
> >
> > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh index
> > 6af0f22..62a1e0b 100755
> > --- a/fsck/xfs_fsck.sh
> > +++ b/fsck/xfs_fsck.sh
> > @@ -31,10 +31,12 @@ repair2fsck_code() {
> >
> >  AUTO=false
> >  FORCE=false
> > +REPAIR=false
> >  while getopts ":aApyf" c
> >  do
> >         case $c in
> > -       a|A|p|y)        AUTO=true;;
> > +       a|A|p)          AUTO=true;;
> > +       y)              REPAIR=true;;
> >         f)              FORCE=true;;
> >         esac
> >  done
> > @@ -64,7 +66,32 @@ fi
> >
> >  if $FORCE; then
> >         xfs_repair -e $DEV
> > -       repair2fsck_code $?
> > +       error=$?
> > +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> > +               echo "Replaying log for $DEV"
> > +               mkdir -p /tmp/repair_mnt || exit 1
> > +               for x in $(cat /proc/cmdline); do
> > +                       case $x in
> > +                               root=*)
> > +                                       ROOT="${x#root=}"
> > +                               ;;
> > +                               rootflags=*)
> > +                                       ROOTFLAGS="-o ${x#rootflags=}"
> > +                               ;;
> > +                       esac
> > +               done
> > +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> > +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> > +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> > +               else
> > +                       mount $DEV /tmp/repair_mnt || exit 1
> > +               fi
> > +               umount /tmp/repair_mnt
> > +               xfs_repair -e $DEV
> > +               error=$?
> > +               rm -d /tmp/repair_mnt
> > +       fi
> > +       repair2fsck_code $error
> >         exit $?
> >  fi
> >
> > --
> > 1.8.3.1
> 
> --
> Carlos Maiolino

Thanks,
Srikanth

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-12-13 12:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <NdSU2Rq0FpWJ3II4JAnJNk-0HW5bns_UxhQ03sSOaek-nu9QPA-ZMx0HDXFtVx8ahgKhWe0Wcfh13NH0ZSwJjg==@protonmail.internalid>
2022-11-23  6:30 ` [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair Srikanth C S
2022-11-23  8:36   ` Carlos Maiolino
     [not found]     ` <c-vuqhpmmrL6JSN0ZRnqX7c1BUcXw5gJ9L2UZ2lG3H8hCJRNIn_uan2rVHLDUPwgY24Nv3WZpiBt2nflhVadtA==@protonmail.internalid>
     [not found]       ` <CY4PR10MB1479D19A047EAB8558445EC7A30C9@CY4PR10MB1479.namprd10.prod.outlook.com>
2022-11-23 12:23         ` [External] : " Carlos Maiolino
2022-11-25 12:09           ` Srikanth C S
2022-11-28 23:04             ` Darrick J. Wong
2022-12-06 11:48               ` Srikanth C S
2022-12-09  9:51                 ` Carlos Maiolino
2022-12-12 12:13   ` Carlos Maiolino
2022-12-13  9:39   ` Carlos Maiolino
2022-12-13 12:32     ` [External] : " Srikanth C S

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox