[PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
@ 2022-11-04  6:10 Srikanth C S
  2022-11-07  6:00 ` Gao Xiang
  2022-11-11  6:24 ` Joseph Qi
  0 siblings, 2 replies; 10+ messages in thread
From: Srikanth C S @ 2022-11-04  6:10 UTC (permalink / raw)
  To: linux-xfs
  Cc: srikanth.c.s, darrick.wong, rajesh.sivaramasubramaniom,
	junxiao.bi, david

After a recent data center crash, we had to recover root filesystems
on several thousands of VMs via a boot time fsck. Since these
machines are remotely manageable, support can inject the kernel
command line with 'fsck.mode=force fsck.repair=yes' to kick off
xfs_repair if the machine won't come up or if they suspect there
might be deeper issues with latent errors in the fs metadata, which
is what they did to try to get everyone running ASAP while
anticipating any future problems. But, fsck.xfs does not address the
journal replay in case of a crash.

fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
possible that when the machine crashes, the fs is in inconsistent
state with the journal log not yet replayed. This can drop the machine
into the rescue shell because xfs_fsck.sh does not know how to clean the
log. Since the administrator told us to force repairs, address the
deficiency by cleaning the log and rerunning xfs_repair.

Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
Replay the logs only if fsck.mode=force and fsck.repair=yes. For
other option -fa and -f drop to the rescue shell if repair detects
any corruptions.

Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
---
 fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
index 6af0f22..62a1e0b 100755
--- a/fsck/xfs_fsck.sh
+++ b/fsck/xfs_fsck.sh
@@ -31,10 +31,12 @@ repair2fsck_code() {

 AUTO=false
 FORCE=false
+REPAIR=false
 while getopts ":aApyf" c
 do
        case $c in
-       a|A|p|y)        AUTO=true;;
+       a|A|p)          AUTO=true;;
+       y)              REPAIR=true;;
        f)              FORCE=true;;
        esac
 done
@@ -64,7 +66,32 @@ fi

 if $FORCE; then
        xfs_repair -e $DEV
-       repair2fsck_code $?
+       error=$?
+       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
+               echo "Replaying log for $DEV"
+               mkdir -p /tmp/repair_mnt || exit 1
+               for x in $(cat /proc/cmdline); do
+                       case $x in
+                               root=*)
+                                       ROOT="${x#root=}"
+                               ;;
+                               rootflags=*)
+                                       ROOTFLAGS="-o ${x#rootflags=}"
+                               ;;
+                       esac
+               done
+               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
+               if [ $(basename $DEV) = $(basename $ROOT) ]; then
+                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
+               else
+                       mount $DEV /tmp/repair_mnt || exit 1
+               fi
+               umount /tmp/repair_mnt
+               xfs_repair -e $DEV
+               error=$?
+               rm -d /tmp/repair_mnt
+       fi
+       repair2fsck_code $error
        exit $?
 fi

-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-04  6:10 [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair Srikanth C S
@ 2022-11-07  6:00 ` Gao Xiang
  2022-11-07 16:55   ` Darrick J. Wong
  2022-11-11  6:24 ` Joseph Qi
  1 sibling, 1 reply; 10+ messages in thread
From: Gao Xiang @ 2022-11-07  6:00 UTC (permalink / raw)
  To: Srikanth C S, darrick.wong, david
  Cc: linux-xfs, rajesh.sivaramasubramaniom, junxiao.bi, Joseph Qi

Hi folks,

On Fri, Nov 04, 2022 at 11:40:11AM +0530, Srikanth C S wrote:
> After a recent data center crash, we had to recover root filesystems
> on several thousands of VMs via a boot time fsck. Since these
> machines are remotely manageable, support can inject the kernel
> command line with 'fsck.mode=force fsck.repair=yes' to kick off
> xfs_repair if the machine won't come up or if they suspect there
> might be deeper issues with latent errors in the fs metadata, which
> is what they did to try to get everyone running ASAP while
> anticipating any future problems. But, fsck.xfs does not address the
> journal replay in case of a crash.
> 
> fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> possible that when the machine crashes, the fs is in inconsistent
> state with the journal log not yet replayed. This can drop the machine
> into the rescue shell because xfs_fsck.sh does not know how to clean the
> log. Since the administrator told us to force repairs, address the
> deficiency by cleaning the log and rerunning xfs_repair.
> 
> Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> other option -fa and -f drop to the rescue shell if repair detects
> any corruptions.
> 
> Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> ---
>  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
>  1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index 6af0f22..62a1e0b 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -31,10 +31,12 @@ repair2fsck_code() {
>  
>  AUTO=false
>  FORCE=false
> +REPAIR=false
>  while getopts ":aApyf" c
>  do
>         case $c in
> -       a|A|p|y)        AUTO=true;;
> +       a|A|p)          AUTO=true;;
> +       y)              REPAIR=true;;
>         f)              FORCE=true;;
>         esac
>  done
> @@ -64,7 +66,32 @@ fi
>  
>  if $FORCE; then
>         xfs_repair -e $DEV
> -       repair2fsck_code $?
> +       error=$?
> +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> +               echo "Replaying log for $DEV"
> +               mkdir -p /tmp/repair_mnt || exit 1
> +               for x in $(cat /proc/cmdline); do
> +                       case $x in
> +                               root=*)
> +                                       ROOT="${x#root=}"
> +                               ;;
> +                               rootflags=*)
> +                                       ROOTFLAGS="-o ${x#rootflags=}"
> +                               ;;
> +                       esac
> +               done
> +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)

We'd also like to get a formal solution about this for our production
so that xfs_repair can work properly with log recovery.

However, may I ask if it's the preferred way to implement this which
just acts as another mount-unmount cycle, since I'm not sure if there
are some customized initramfs-es which could get the fs busy so that it
won't unmount properly.

Alternatively, do we consider another way like exporting the log
recovery functionality with ioctl() so that log recovery can work
without the actual fs mounting? Is it affordable?

Thanks,
Gao Xiang

> +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> +               else
> +                       mount $DEV /tmp/repair_mnt || exit 1
> +               fi
> +               umount /tmp/repair_mnt
> +               xfs_repair -e $DEV
> +               error=$?
> +               rm -d /tmp/repair_mnt
> +       fi
> +       repair2fsck_code $error
>         exit $?
>  fi
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-07  6:00 ` Gao Xiang
@ 2022-11-07 16:55   ` Darrick J. Wong
  2022-11-07 17:24     ` Gao Xiang
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2022-11-07 16:55 UTC (permalink / raw)
  To: Srikanth C S, darrick.wong, david, linux-xfs,
	rajesh.sivaramasubramaniom, junxiao.bi, Joseph Qi

On Mon, Nov 07, 2022 at 02:00:07PM +0800, Gao Xiang wrote:
> Hi folks,
> 
> On Fri, Nov 04, 2022 at 11:40:11AM +0530, Srikanth C S wrote:
> > After a recent data center crash, we had to recover root filesystems
> > on several thousands of VMs via a boot time fsck. Since these
> > machines are remotely manageable, support can inject the kernel
> > command line with 'fsck.mode=force fsck.repair=yes' to kick off
> > xfs_repair if the machine won't come up or if they suspect there
> > might be deeper issues with latent errors in the fs metadata, which
> > is what they did to try to get everyone running ASAP while
> > anticipating any future problems. But, fsck.xfs does not address the
> > journal replay in case of a crash.
> > 
> > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> > possible that when the machine crashes, the fs is in inconsistent
> > state with the journal log not yet replayed. This can drop the machine
> > into the rescue shell because xfs_fsck.sh does not know how to clean the
> > log. Since the administrator told us to force repairs, address the
> > deficiency by cleaning the log and rerunning xfs_repair.
> > 
> > Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> > Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> > other option -fa and -f drop to the rescue shell if repair detects
> > any corruptions.
> > 
> > Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> > ---
> >  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
> >  1 file changed, 29 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> > index 6af0f22..62a1e0b 100755
> > --- a/fsck/xfs_fsck.sh
> > +++ b/fsck/xfs_fsck.sh
> > @@ -31,10 +31,12 @@ repair2fsck_code() {
> >  
> >  AUTO=false
> >  FORCE=false
> > +REPAIR=false
> >  while getopts ":aApyf" c
> >  do
> >         case $c in
> > -       a|A|p|y)        AUTO=true;;
> > +       a|A|p)          AUTO=true;;
> > +       y)              REPAIR=true;;
> >         f)              FORCE=true;;
> >         esac
> >  done
> > @@ -64,7 +66,32 @@ fi
> >  
> >  if $FORCE; then
> >         xfs_repair -e $DEV
> > -       repair2fsck_code $?
> > +       error=$?
> > +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> > +               echo "Replaying log for $DEV"
> > +               mkdir -p /tmp/repair_mnt || exit 1
> > +               for x in $(cat /proc/cmdline); do
> > +                       case $x in
> > +                               root=*)
> > +                                       ROOT="${x#root=}"
> > +                               ;;
> > +                               rootflags=*)
> > +                                       ROOTFLAGS="-o ${x#rootflags=}"
> > +                               ;;
> > +                       esac
> > +               done
> > +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> 
> We'd also like to get a formal solution about this for our production
> so that xfs_repair can work properly with log recovery.

My preferred solution is to port the log recovery code to userspace, and
then train xfs_repair to invoke it.  Handling the trivial case where
xfs_repair can recover logs created on the same platform as the support
script wouldn't be that hard (I think?) because log recovery is fairly
selfcontained nowadays.

But.

Inevitably someone will suggest fixing the kernel's inability to recover
a log from a platform with a different endianness, which will lead to a
discussion of making the ondisk log format endian safe.  Someone else
may also ask why not make userspace xfs_trans transactional, and... ;)

(All those extra asks are ok, but anyone taking on these task sets
should make it /very/ clear where the scope of each set begins and ends,
and in which order they'll be worked on.)

> However, may I ask if it's the preferred way to implement this which
> just acts as another mount-unmount cycle, since I'm not sure if there
> are some customized initramfs-es which could get the fs busy so that it
> won't unmount properly.

Seeing as initramfses are only supposed to turn on enough hardware so
that mount can find the root volume, I really hope there aren't
*background services* running here.

> Alternatively, do we consider another way like exporting the log
> recovery functionality with ioctl() so that log recovery can work
> without the actual fs mounting? Is it affordable?

I guess you could create a 'recoveryonly' mount option that would abort
the mount after recovering the log.  I'm not really a fan of that
approach.

--D

> Thanks,
> Gao Xiang
> 
> > +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> > +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> > +               else
> > +                       mount $DEV /tmp/repair_mnt || exit 1
> > +               fi
> > +               umount /tmp/repair_mnt
> > +               xfs_repair -e $DEV
> > +               error=$?
> > +               rm -d /tmp/repair_mnt
> > +       fi
> > +       repair2fsck_code $error
> >         exit $?
> >  fi
> >  
> > -- 
> > 1.8.3.1
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-07 16:55   ` Darrick J. Wong
@ 2022-11-07 17:24     ` Gao Xiang
  0 siblings, 0 replies; 10+ messages in thread
From: Gao Xiang @ 2022-11-07 17:24 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Srikanth C S, darrick.wong, david, linux-xfs,
	rajesh.sivaramasubramaniom, junxiao.bi, Joseph Qi

Hi Darrick,

On Mon, Nov 07, 2022 at 08:55:47AM -0800, Darrick J. Wong wrote:
> On Mon, Nov 07, 2022 at 02:00:07PM +0800, Gao Xiang wrote:
> > Hi folks,
> > 
> > On Fri, Nov 04, 2022 at 11:40:11AM +0530, Srikanth C S wrote:
> > > After a recent data center crash, we had to recover root filesystems
> > > on several thousands of VMs via a boot time fsck. Since these
> > > machines are remotely manageable, support can inject the kernel
> > > command line with 'fsck.mode=force fsck.repair=yes' to kick off
> > > xfs_repair if the machine won't come up or if they suspect there
> > > might be deeper issues with latent errors in the fs metadata, which
> > > is what they did to try to get everyone running ASAP while
> > > anticipating any future problems. But, fsck.xfs does not address the
> > > journal replay in case of a crash.
> > > 
> > > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> > > possible that when the machine crashes, the fs is in inconsistent
> > > state with the journal log not yet replayed. This can drop the machine
> > > into the rescue shell because xfs_fsck.sh does not know how to clean the
> > > log. Since the administrator told us to force repairs, address the
> > > deficiency by cleaning the log and rerunning xfs_repair.
> > > 
> > > Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> > > Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> > > other option -fa and -f drop to the rescue shell if repair detects
> > > any corruptions.
> > > 
> > > Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> > > ---
> > >  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
> > >  1 file changed, 29 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> > > index 6af0f22..62a1e0b 100755
> > > --- a/fsck/xfs_fsck.sh
> > > +++ b/fsck/xfs_fsck.sh
> > > @@ -31,10 +31,12 @@ repair2fsck_code() {
> > >  
> > >  AUTO=false
> > >  FORCE=false
> > > +REPAIR=false
> > >  while getopts ":aApyf" c
> > >  do
> > >         case $c in
> > > -       a|A|p|y)        AUTO=true;;
> > > +       a|A|p)          AUTO=true;;
> > > +       y)              REPAIR=true;;
> > >         f)              FORCE=true;;
> > >         esac
> > >  done
> > > @@ -64,7 +66,32 @@ fi
> > >  
> > >  if $FORCE; then
> > >         xfs_repair -e $DEV
> > > -       repair2fsck_code $?
> > > +       error=$?
> > > +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> > > +               echo "Replaying log for $DEV"
> > > +               mkdir -p /tmp/repair_mnt || exit 1
> > > +               for x in $(cat /proc/cmdline); do
> > > +                       case $x in
> > > +                               root=*)
> > > +                                       ROOT="${x#root=}"
> > > +                               ;;
> > > +                               rootflags=*)
> > > +                                       ROOTFLAGS="-o ${x#rootflags=}"
> > > +                               ;;
> > > +                       esac
> > > +               done
> > > +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> > 
> > We'd also like to get a formal solution about this for our production
> > so that xfs_repair can work properly with log recovery.
> 
> My preferred solution is to port the log recovery code to userspace, and
> then train xfs_repair to invoke it.  Handling the trivial case where
> xfs_repair can recover logs created on the same platform as the support
> script wouldn't be that hard (I think?) because log recovery is fairly
> selfcontained nowadays.
> 

Yeah, my preferred way is also that it could be done like this, but
sadly in practice currently such ROI is low (considering only a small
number of ECS uses XFS at Alibaba Cloud..)

So.. Hopefully I could promote XFS first so we could have more manpower
and take more time on XFS ;)

> But.
> 
> Inevitably someone will suggest fixing the kernel's inability to recover
> a log from a platform with a different endianness, which will lead to a
> discussion of making the ondisk log format endian safe.  Someone else
> may also ask why not make userspace xfs_trans transactional, and... ;)
> 

Yeah, actually I talked with Eric two years ago about this.  The
log format endianness is still in a mess.  Yeah... You're right.

> (All those extra asks are ok, but anyone taking on these task sets
> should make it /very/ clear where the scope of each set begins and ends,
> and in which order they'll be worked on.)

Currently the part of my job is to aim at stablizing XFS first in order
to promote XFS to our production.  At least reflink with good performance
is somewhat a killer combination for the Cloud providers like us ;)

> 
> > However, may I ask if it's the preferred way to implement this which
> > just acts as another mount-unmount cycle, since I'm not sure if there
> > are some customized initramfs-es which could get the fs busy so that it
> > won't unmount properly.
> 
> Seeing as initramfses are only supposed to turn on enough hardware so
> that mount can find the root volume, I really hope there aren't
> *background services* running here.

Although it's much like impossible, from stability POV, I'm not sure
if some unusual users could behave like this. At least, it's currently
one of our concerns if it just acts as a mount-unmount cycle as a public
cloud provider.

> 
> > Alternatively, do we consider another way like exporting the log
> > recovery functionality with ioctl() so that log recovery can work
> > without the actual fs mounting? Is it affordable?
> 
> I guess you could create a 'recoveryonly' mount option that would abort
> the mount after recovering the log.  I'm not really a fan of that
> approach.

I'm not quite such fan to introduce a weird mount option too.  We
might evaluate such way internally as well, but if upstream prefers a
full mount-unmount cycle, I guess we will finally follow it.

Thank a lot,
Gao Xiang

> 
> --D
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-04  6:10 [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair Srikanth C S
  2022-11-07  6:00 ` Gao Xiang
@ 2022-11-11  6:24 ` Joseph Qi
  2022-11-14 22:59   ` Darrick J. Wong
  1 sibling, 1 reply; 10+ messages in thread
From: Joseph Qi @ 2022-11-11  6:24 UTC (permalink / raw)
  To: Srikanth C S, linux-xfs
  Cc: darrick.wong, rajesh.sivaramasubramaniom, junxiao.bi, david

Hi，

On 11/4/22 2:10 PM, Srikanth C S wrote:
> After a recent data center crash, we had to recover root filesystems
> on several thousands of VMs via a boot time fsck. Since these
> machines are remotely manageable, support can inject the kernel
> command line with 'fsck.mode=force fsck.repair=yes' to kick off
> xfs_repair if the machine won't come up or if they suspect there
> might be deeper issues with latent errors in the fs metadata, which
> is what they did to try to get everyone running ASAP while
> anticipating any future problems. But, fsck.xfs does not address the
> journal replay in case of a crash.
> 
> fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> possible that when the machine crashes, the fs is in inconsistent
> state with the journal log not yet replayed. This can drop the machine
> into the rescue shell because xfs_fsck.sh does not know how to clean the
> log. Since the administrator told us to force repairs, address the
> deficiency by cleaning the log and rerunning xfs_repair.
> 
> Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> other option -fa and -f drop to the rescue shell if repair detects
> any corruptions.
> 
> Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> ---
>  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
>  1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index 6af0f22..62a1e0b 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -31,10 +31,12 @@ repair2fsck_code() {
>  
>  AUTO=false
>  FORCE=false
> +REPAIR=false
>  while getopts ":aApyf" c
>  do
>         case $c in
> -       a|A|p|y)        AUTO=true;;
> +       a|A|p)          AUTO=true;;
> +       y)              REPAIR=true;;
>         f)              FORCE=true;;
>         esac
>  done
> @@ -64,7 +66,32 @@ fi
>  
>  if $FORCE; then
>         xfs_repair -e $DEV
> -       repair2fsck_code $?
> +       error=$?
> +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> +               echo "Replaying log for $DEV"
> +               mkdir -p /tmp/repair_mnt || exit 1
> +               for x in $(cat /proc/cmdline); do
> +                       case $x in
> +                               root=*)
> +                                       ROOT="${x#root=}"
> +                               ;;
> +                               rootflags=*)
> +                                       ROOTFLAGS="-o ${x#rootflags=}"
> +                               ;;
> +                       esac
> +               done
> +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> +               else
> +                       mount $DEV /tmp/repair_mnt || exit 1
> +               fi

If do normal boot, it will try to mount according to fstab.
So in the crash case you've described, it seems that it can't mount
successfully? Or am I missing something?

Thanks,
Joseph

> +               umount /tmp/repair_mnt
> +               xfs_repair -e $DEV
> +               error=$?
> +               rm -d /tmp/repair_mnt
> +       fi
> +       repair2fsck_code $error
>         exit $?
>  fi
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-11  6:24 ` Joseph Qi
@ 2022-11-14 22:59   ` Darrick J. Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2022-11-14 22:59 UTC (permalink / raw)
  To: Joseph Qi
  Cc: Srikanth C S, linux-xfs, darrick.wong, rajesh.sivaramasubramaniom,
	junxiao.bi, david

On Fri, Nov 11, 2022 at 02:24:25PM +0800, Joseph Qi wrote:
> Hi，
> 
> On 11/4/22 2:10 PM, Srikanth C S wrote:
> > After a recent data center crash, we had to recover root filesystems
> > on several thousands of VMs via a boot time fsck. Since these
> > machines are remotely manageable, support can inject the kernel
> > command line with 'fsck.mode=force fsck.repair=yes' to kick off
> > xfs_repair if the machine won't come up or if they suspect there
> > might be deeper issues with latent errors in the fs metadata, which
> > is what they did to try to get everyone running ASAP while
> > anticipating any future problems. But, fsck.xfs does not address the
> > journal replay in case of a crash.
> > 
> > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> > possible that when the machine crashes, the fs is in inconsistent
> > state with the journal log not yet replayed. This can drop the machine
> > into the rescue shell because xfs_fsck.sh does not know how to clean the
> > log. Since the administrator told us to force repairs, address the
> > deficiency by cleaning the log and rerunning xfs_repair.
> > 
> > Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> > Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> > other option -fa and -f drop to the rescue shell if repair detects
> > any corruptions.
> > 
> > Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> > ---
> >  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
> >  1 file changed, 29 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> > index 6af0f22..62a1e0b 100755
> > --- a/fsck/xfs_fsck.sh
> > +++ b/fsck/xfs_fsck.sh
> > @@ -31,10 +31,12 @@ repair2fsck_code() {
> >  
> >  AUTO=false
> >  FORCE=false
> > +REPAIR=false
> >  while getopts ":aApyf" c
> >  do
> >         case $c in
> > -       a|A|p|y)        AUTO=true;;
> > +       a|A|p)          AUTO=true;;
> > +       y)              REPAIR=true;;
> >         f)              FORCE=true;;
> >         esac
> >  done
> > @@ -64,7 +66,32 @@ fi
> >  
> >  if $FORCE; then
> >         xfs_repair -e $DEV
> > -       repair2fsck_code $?
> > +       error=$?
> > +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> > +               echo "Replaying log for $DEV"
> > +               mkdir -p /tmp/repair_mnt || exit 1
> > +               for x in $(cat /proc/cmdline); do
> > +                       case $x in
> > +                               root=*)
> > +                                       ROOT="${x#root=}"
> > +                               ;;
> > +                               rootflags=*)
> > +                                       ROOTFLAGS="-o ${x#rootflags=}"
> > +                               ;;
> > +                       esac
> > +               done
> > +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> > +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> > +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> > +               else
> > +                       mount $DEV /tmp/repair_mnt || exit 1
> > +               fi
> 
> If do normal boot, it will try to mount according to fstab.
> So in the crash case you've described, it seems that it can't mount
> successfully? Or am I missing something?

Yes, we're assuming that support has injected the magic command lines
into the bootloader to trigger xfs_repair after boot failed due to a
bad/corrupt rootfs.

--D

> Thanks,
> Joseph
> 
> > +               umount /tmp/repair_mnt
> > +               xfs_repair -e $DEV
> > +               error=$?
> > +               rm -d /tmp/repair_mnt
> > +       fi
> > +       repair2fsck_code $error
> >         exit $?
> >  fi
> >  

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <NdSU2Rq0FpWJ3II4JAnJNk-0HW5bns_UxhQ03sSOaek-nu9QPA-ZMx0HDXFtVx8ahgKhWe0Wcfh13NH0ZSwJjg==@protonmail.internalid>]

* [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
@ 2022-11-23  6:30 ` Srikanth C S
  2022-11-23  8:36   ` Carlos Maiolino
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Srikanth C S @ 2022-11-23  6:30 UTC (permalink / raw)
  To: linux-xfs
  Cc: srikanth.c.s, darrick.wong, rajesh.sivaramasubramaniom,
	junxiao.bi, david, cem

After a recent data center crash, we had to recover root filesystems
on several thousands of VMs via a boot time fsck. Since these
machines are remotely manageable, support can inject the kernel
command line with 'fsck.mode=force fsck.repair=yes' to kick off
xfs_repair if the machine won't come up or if they suspect there
might be deeper issues with latent errors in the fs metadata, which
is what they did to try to get everyone running ASAP while
anticipating any future problems. But, fsck.xfs does not address the
journal replay in case of a crash.

fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
possible that when the machine crashes, the fs is in inconsistent
state with the journal log not yet replayed. This can drop the machine
into the rescue shell because xfs_fsck.sh does not know how to clean the
log. Since the administrator told us to force repairs, address the
deficiency by cleaning the log and rerunning xfs_repair.

Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
Replay the logs only if fsck.mode=force and fsck.repair=yes. For
other option -fa and -f drop to the rescue shell if repair detects
any corruptions.

Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
---
 fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
index 6af0f22..62a1e0b 100755
--- a/fsck/xfs_fsck.sh
+++ b/fsck/xfs_fsck.sh
@@ -31,10 +31,12 @@ repair2fsck_code() {

 AUTO=false
 FORCE=false
+REPAIR=false
 while getopts ":aApyf" c
 do
        case $c in
-       a|A|p|y)        AUTO=true;;
+       a|A|p)          AUTO=true;;
+       y)              REPAIR=true;;
        f)              FORCE=true;;
        esac
 done
@@ -64,7 +66,32 @@ fi

 if $FORCE; then
        xfs_repair -e $DEV
-       repair2fsck_code $?
+       error=$?
+       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
+               echo "Replaying log for $DEV"
+               mkdir -p /tmp/repair_mnt || exit 1
+               for x in $(cat /proc/cmdline); do
+                       case $x in
+                               root=*)
+                                       ROOT="${x#root=}"
+                               ;;
+                               rootflags=*)
+                                       ROOTFLAGS="-o ${x#rootflags=}"
+                               ;;
+                       esac
+               done
+               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
+               if [ $(basename $DEV) = $(basename $ROOT) ]; then
+                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
+               else
+                       mount $DEV /tmp/repair_mnt || exit 1
+               fi
+               umount /tmp/repair_mnt
+               xfs_repair -e $DEV
+               error=$?
+               rm -d /tmp/repair_mnt
+       fi
+       repair2fsck_code $error
        exit $?
 fi

-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-23  6:30 ` Srikanth C S
@ 2022-11-23  8:36   ` Carlos Maiolino
  2022-12-12 12:13   ` Carlos Maiolino
  2022-12-13  9:39   ` Carlos Maiolino
  2 siblings, 0 replies; 10+ messages in thread
From: Carlos Maiolino @ 2022-11-23  8:36 UTC (permalink / raw)
  To: Srikanth C S
  Cc: linux-xfs, darrick.wong, rajesh.sivaramasubramaniom, junxiao.bi,
	david

Hi.

Did you plan to resend V3 again, or is this supposed to be V4?


On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
> After a recent data center crash, we had to recover root filesystems
> on several thousands of VMs via a boot time fsck. Since these
> machines are remotely manageable, support can inject the kernel
> command line with 'fsck.mode=force fsck.repair=yes' to kick off
> xfs_repair if the machine won't come up or if they suspect there
> might be deeper issues with latent errors in the fs metadata, which
> is what they did to try to get everyone running ASAP while
> anticipating any future problems. But, fsck.xfs does not address the
> journal replay in case of a crash.
> 
> fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> possible that when the machine crashes, the fs is in inconsistent
> state with the journal log not yet replayed. This can drop the machine
> into the rescue shell because xfs_fsck.sh does not know how to clean the
> log. Since the administrator told us to force repairs, address the
> deficiency by cleaning the log and rerunning xfs_repair.
> 
> Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> other option -fa and -f drop to the rescue shell if repair detects
> any corruptions.
> 
> Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> ---
>  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
>  1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index 6af0f22..62a1e0b 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -31,10 +31,12 @@ repair2fsck_code() {
> 
>  AUTO=false
>  FORCE=false
> +REPAIR=false
>  while getopts ":aApyf" c
>  do
>         case $c in
> -       a|A|p|y)        AUTO=true;;
> +       a|A|p)          AUTO=true;;
> +       y)              REPAIR=true;;
>         f)              FORCE=true;;
>         esac
>  done
> @@ -64,7 +66,32 @@ fi
> 
>  if $FORCE; then
>         xfs_repair -e $DEV
> -       repair2fsck_code $?
> +       error=$?
> +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> +               echo "Replaying log for $DEV"
> +               mkdir -p /tmp/repair_mnt || exit 1
> +               for x in $(cat /proc/cmdline); do
> +                       case $x in
> +                               root=*)
> +                                       ROOT="${x#root=}"
> +                               ;;
> +                               rootflags=*)
> +                                       ROOTFLAGS="-o ${x#rootflags=}"
> +                               ;;
> +                       esac
> +               done
> +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> +               else
> +                       mount $DEV /tmp/repair_mnt || exit 1
> +               fi
> +               umount /tmp/repair_mnt
> +               xfs_repair -e $DEV
> +               error=$?
> +               rm -d /tmp/repair_mnt
> +       fi
> +       repair2fsck_code $error
>         exit $?
>  fi
> 
> --
> 1.8.3.1

-- 
Carlos Maiolino

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-23  6:30 ` Srikanth C S
  2022-11-23  8:36   ` Carlos Maiolino
@ 2022-12-12 12:13   ` Carlos Maiolino
  2022-12-13  9:39   ` Carlos Maiolino
  2 siblings, 0 replies; 10+ messages in thread
From: Carlos Maiolino @ 2022-12-12 12:13 UTC (permalink / raw)
  To: Srikanth C S
  Cc: linux-xfs, darrick.wong, rajesh.sivaramasubramaniom, junxiao.bi,
	david

On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
> After a recent data center crash, we had to recover root filesystems
> on several thousands of VMs via a boot time fsck. Since these
> machines are remotely manageable, support can inject the kernel
> command line with 'fsck.mode=force fsck.repair=yes' to kick off
> xfs_repair if the machine won't come up or if they suspect there
> might be deeper issues with latent errors in the fs metadata, which
> is what they did to try to get everyone running ASAP while
> anticipating any future problems. But, fsck.xfs does not address the
> journal replay in case of a crash.
> 
> fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> possible that when the machine crashes, the fs is in inconsistent
> state with the journal log not yet replayed. This can drop the machine
> into the rescue shell because xfs_fsck.sh does not know how to clean the
> log. Since the administrator told us to force repairs, address the
> deficiency by cleaning the log and rerunning xfs_repair.
> 
> Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> other option -fa and -f drop to the rescue shell if repair detects
> any corruptions.
> 
> Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>

Apologies it took so long, the patch seems fine to me. Will test.

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

> ---
>  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
>  1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index 6af0f22..62a1e0b 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -31,10 +31,12 @@ repair2fsck_code() {
> 
>  AUTO=false
>  FORCE=false
> +REPAIR=false
>  while getopts ":aApyf" c
>  do
>         case $c in
> -       a|A|p|y)        AUTO=true;;
> +       a|A|p)          AUTO=true;;
> +       y)              REPAIR=true;;
>         f)              FORCE=true;;
>         esac
>  done
> @@ -64,7 +66,32 @@ fi
> 
>  if $FORCE; then
>         xfs_repair -e $DEV
> -       repair2fsck_code $?
> +       error=$?
> +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> +               echo "Replaying log for $DEV"
> +               mkdir -p /tmp/repair_mnt || exit 1
> +               for x in $(cat /proc/cmdline); do
> +                       case $x in
> +                               root=*)
> +                                       ROOT="${x#root=}"
> +                               ;;
> +                               rootflags=*)
> +                                       ROOTFLAGS="-o ${x#rootflags=}"
> +                               ;;
> +                       esac
> +               done
> +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> +               else
> +                       mount $DEV /tmp/repair_mnt || exit 1
> +               fi
> +               umount /tmp/repair_mnt
> +               xfs_repair -e $DEV
> +               error=$?
> +               rm -d /tmp/repair_mnt
> +       fi
> +       repair2fsck_code $error
>         exit $?
>  fi
> 
> --
> 1.8.3.1

-- 
Carlos Maiolino

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair
  2022-11-23  6:30 ` Srikanth C S
  2022-11-23  8:36   ` Carlos Maiolino
  2022-12-12 12:13   ` Carlos Maiolino
@ 2022-12-13  9:39   ` Carlos Maiolino
  2 siblings, 0 replies; 10+ messages in thread
From: Carlos Maiolino @ 2022-12-13  9:39 UTC (permalink / raw)
  To: Srikanth C S
  Cc: linux-xfs, darrick.wong, rajesh.sivaramasubramaniom, junxiao.bi,
	david

Hi Srikanth.

On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote:
> After a recent data center crash, we had to recover root filesystems
> on several thousands of VMs via a boot time fsck. Since these
> machines are remotely manageable, support can inject the kernel
> command line with 'fsck.mode=force fsck.repair=yes' to kick off
> xfs_repair if the machine won't come up or if they suspect there
> might be deeper issues with latent errors in the fs metadata, which
> is what they did to try to get everyone running ASAP while
> anticipating any future problems. But, fsck.xfs does not address the
> journal replay in case of a crash.
> 
> fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
> possible that when the machine crashes, the fs is in inconsistent
> state with the journal log not yet replayed. This can drop the machine
> into the rescue shell because xfs_fsck.sh does not know how to clean the
> log. Since the administrator told us to force repairs, address the
> deficiency by cleaning the log and rerunning xfs_repair.
> 
> Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
> Replay the logs only if fsck.mode=force and fsck.repair=yes. For
> other option -fa and -f drop to the rescue shell if repair detects
> any corruptions.
> 
> Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
> ---
>  fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++--
>  1 file changed, 29 insertions(+), 2 deletions(-)

Did you by any chance wrote this patch on top of something else you have in your
tree?

It doesn't apply to the tree without tweaking it, and the last changes we've in
the fsck/xfs_fsck.sh file are from 2018, so I assume you have something before
this patch in your tree.

Could you please rebase this patch against xfsprogs for-next and resend it? Feel
free to keep my RwB as long as you don't change the code semantics.

Cheers.

> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index 6af0f22..62a1e0b 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -31,10 +31,12 @@ repair2fsck_code() {
> 
>  AUTO=false
>  FORCE=false
> +REPAIR=false
>  while getopts ":aApyf" c
>  do
>         case $c in
> -       a|A|p|y)        AUTO=true;;
> +       a|A|p)          AUTO=true;;
> +       y)              REPAIR=true;;
>         f)              FORCE=true;;
>         esac
>  done
> @@ -64,7 +66,32 @@ fi
> 
>  if $FORCE; then
>         xfs_repair -e $DEV
> -       repair2fsck_code $?
> +       error=$?
> +       if [ $error -eq 2 ] && [ $REPAIR = true ]; then
> +               echo "Replaying log for $DEV"
> +               mkdir -p /tmp/repair_mnt || exit 1
> +               for x in $(cat /proc/cmdline); do
> +                       case $x in
> +                               root=*)
> +                                       ROOT="${x#root=}"
> +                               ;;
> +                               rootflags=*)
> +                                       ROOTFLAGS="-o ${x#rootflags=}"
> +                               ;;
> +                       esac
> +               done
> +               test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device)
> +               if [ $(basename $DEV) = $(basename $ROOT) ]; then
> +                       mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit 1
> +               else
> +                       mount $DEV /tmp/repair_mnt || exit 1
> +               fi
> +               umount /tmp/repair_mnt
> +               xfs_repair -e $DEV
> +               error=$?
> +               rm -d /tmp/repair_mnt
> +       fi
> +       repair2fsck_code $error
>         exit $?
>  fi
> 
> --
> 1.8.3.1

-- 
Carlos Maiolino

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-12-13  9:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-04  6:10 [PATCH v3] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair Srikanth C S
2022-11-07  6:00 ` Gao Xiang
2022-11-07 16:55   ` Darrick J. Wong
2022-11-07 17:24     ` Gao Xiang
2022-11-11  6:24 ` Joseph Qi
2022-11-14 22:59   ` Darrick J. Wong
     [not found] <NdSU2Rq0FpWJ3II4JAnJNk-0HW5bns_UxhQ03sSOaek-nu9QPA-ZMx0HDXFtVx8ahgKhWe0Wcfh13NH0ZSwJjg==@protonmail.internalid>
2022-11-23  6:30 ` Srikanth C S
2022-11-23  8:36   ` Carlos Maiolino
2022-12-12 12:13   ` Carlos Maiolino
2022-12-13  9:39   ` Carlos Maiolino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox