[PATCH 0/3] Failed array handling improvements

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] Failed array handling improvements
@ 2022-03-22 15:23 Mariusz Tkaczyk
  2022-03-22 15:23 ` [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear Mariusz Tkaczyk
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Mariusz Tkaczyk @ 2022-03-22 15:23 UTC (permalink / raw)
  To: song; +Cc: linux-raid, guoqing.jiang

Hi Song,
In v4 following changes were made:
- raid1_error and raid10_error description reworked, suggested by Guoqing.
- Error messages for raid0 and linear reworked, suggested by Guoqing.
- check for sync_request replaced by level checks, suggested by Guoqing.

I did manual (mainly on IMSM) tests, here my next TODOs in mdadm:
- broken state handling for redundant arrays in mdadm (it is exposed in sysfs).
  Currently it is working same as before, because broken is not a case for
  redundant arrays in mdadm. It requires to deal with already defined "FAILED"
  state in mdadm.
- Blocking manual removal of devices (#mdadm --set-faulty).

I run following native mdadm tests with and without changes, all passed:
#./test --disks=/dev/nullb[1-5] --tests=00raid1,00raid4,00raid5,00raid6,01r1fail,
01r5fail,01replace,02r1add,05r1failfast,05r1re-add,05r1re-add-nosuper

Mariusz Tkaczyk (3):
  raid0, linear, md: add error_handlers for raid0 and linear
  md: Set MD_BROKEN for RAID1 and RAID10
  raid5: introduce MD_BROKEN

 drivers/md/md-linear.c | 14 +++++++-
 drivers/md/md.c        | 30 +++++++++++-------
 drivers/md/md.h        | 72 ++++++++++++++++++++++--------------------
 drivers/md/raid0.c     | 14 +++++++-
 drivers/md/raid1.c     | 43 +++++++++++++++----------
 drivers/md/raid10.c    | 40 +++++++++++++----------
 drivers/md/raid5.c     | 48 ++++++++++++++--------------
 7 files changed, 155 insertions(+), 106 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear
  2022-03-22 15:23 [PATCH 0/3] Failed array handling improvements Mariusz Tkaczyk
@ 2022-03-22 15:23 ` Mariusz Tkaczyk
  2022-04-08  0:16   ` Song Liu
  2022-03-22 15:23 ` [PATCH 2/3] md: Set MD_BROKEN for RAID1 and RAID10 Mariusz Tkaczyk
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Mariusz Tkaczyk @ 2022-03-22 15:23 UTC (permalink / raw)
  To: song; +Cc: linux-raid, guoqing.jiang

Patch 62f7b1989c0 ("md raid0/linear: Mark array as 'broken' and fail BIOs
if a member is gone") allowed to finish writes earlier (before level
dependent actions) for non-redundant arrays.

To achieve that MD_BROKEN is added to mddev->flags if drive disappearance
is detected. This is done in is_mddev_broken() which is confusing and not
consistent with other levels where error_handler() is used.
This patch adds appropriate error_handler for raid0 and linear and adopt
md_error() to the change.

Usage of error_handler causes that disk failure can be requested from
userspace. User can fail the array via #mdadm --set-faulty command. This
is not safe and will be fixed in mdadm. It is correctable because failed
state is not recorded in the metadata. After next assembly array will be
read-write again. For safety reason is better to keep MD_BROKEN in
runtime only.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
---
 drivers/md/md-linear.c | 14 +++++++++++++-
 drivers/md/md.c        |  6 +++++-
 drivers/md/md.h        | 10 ++--------
 drivers/md/raid0.c     | 14 +++++++++++++-
 4 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
index 1ff51647a682..c33cd28f1dba 100644
--- a/drivers/md/md-linear.c
+++ b/drivers/md/md-linear.c
@@ -233,7 +233,8 @@ static bool linear_make_request(struct mddev *mddev, struct bio *bio)
 		     bio_sector < start_sector))
 		goto out_of_bounds;
 
-	if (unlikely(is_mddev_broken(tmp_dev->rdev, "linear"))) {
+	if (unlikely(is_rdev_broken(tmp_dev->rdev))) {
+		md_error(mddev, tmp_dev->rdev);
 		bio_io_error(bio);
 		return true;
 	}
@@ -281,6 +282,16 @@ static void linear_status (struct seq_file *seq, struct mddev *mddev)
 	seq_printf(seq, " %dk rounding", mddev->chunk_sectors / 2);
 }
 
+static void linear_error(struct mddev *mddev, struct md_rdev *rdev)
+{
+	if (!test_and_set_bit(MD_BROKEN, &mddev->flags)) {
+		char *md_name = mdname(mddev);
+
+		pr_crit("md/linear%s: Disk failure on %pg detected, failing array.\n",
+			md_name, rdev->bdev);
+	}
+}
+
 static void linear_quiesce(struct mddev *mddev, int state)
 {
 }
@@ -297,6 +308,7 @@ static struct md_personality linear_personality =
 	.hot_add_disk	= linear_add,
 	.size		= linear_size,
 	.quiesce	= linear_quiesce,
+	.error_handler	= linear_error,
 };
 
 static int __init linear_init (void)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0a89f072dae0..3354afc9d2a3 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7985,7 +7985,11 @@ void md_error(struct mddev *mddev, struct md_rdev *rdev)
 
 	if (!mddev->pers || !mddev->pers->error_handler)
 		return;
-	mddev->pers->error_handler(mddev,rdev);
+	mddev->pers->error_handler(mddev, rdev);
+
+	if (mddev->pers->level == 0 || mddev->pers->level == LEVEL_LINEAR)
+		return;
+
 	if (mddev->degraded)
 		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 	sysfs_notify_dirent_safe(rdev->sysfs_state);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index f1bf3625ef4c..13d435a303fa 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -764,15 +764,9 @@ extern void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr);
 struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev);
 
-static inline bool is_mddev_broken(struct md_rdev *rdev, const char *md_type)
+static inline bool is_rdev_broken(struct md_rdev *rdev)
 {
-	if (!disk_live(rdev->bdev->bd_disk)) {
-		if (!test_and_set_bit(MD_BROKEN, &rdev->mddev->flags))
-			pr_warn("md: %s: %s array has a missing/failed member\n",
-				mdname(rdev->mddev), md_type);
-		return true;
-	}
-	return false;
+	return !disk_live(rdev->bdev->bd_disk);
 }
 
 static inline void rdev_dec_pending(struct md_rdev *rdev, struct mddev *mddev)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index b59a77b31b90..b402e3dc4ead 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -582,8 +582,9 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 		return true;
 	}
 
-	if (unlikely(is_mddev_broken(tmp_dev, "raid0"))) {
+	if (unlikely(is_rdev_broken(tmp_dev))) {
 		bio_io_error(bio);
+		md_error(mddev, tmp_dev);
 		return true;
 	}
 
@@ -606,6 +607,16 @@ static void raid0_status(struct seq_file *seq, struct mddev *mddev)
 	return;
 }
 
+static void raid0_error(struct mddev *mddev, struct md_rdev *rdev)
+{
+	if (!test_and_set_bit(MD_BROKEN, &mddev->flags)) {
+		char *md_name = mdname(mddev);
+
+		pr_crit("md/raid0%s: Disk failure on %pg detected, failing array.\n",
+			md_name, rdev->bdev);
+	}
+}
+
 static void *raid0_takeover_raid45(struct mddev *mddev)
 {
 	struct md_rdev *rdev;
@@ -781,6 +792,7 @@ static struct md_personality raid0_personality=
 	.size		= raid0_size,
 	.takeover	= raid0_takeover,
 	.quiesce	= raid0_quiesce,
+	.error_handler	= raid0_error,
 };
 
 static int __init raid0_init (void)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear
  2022-03-22 15:23 ` [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear Mariusz Tkaczyk
@ 2022-04-08  0:16   ` Song Liu
  2022-04-08 14:35     ` Mariusz Tkaczyk
  0 siblings, 1 reply; 11+ messages in thread
From: Song Liu @ 2022-04-08  0:16 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid, Guoqing Jiang

On Tue, Mar 22, 2022 at 8:24 AM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> Patch 62f7b1989c0 ("md raid0/linear: Mark array as 'broken' and fail BIOs
> if a member is gone") allowed to finish writes earlier (before level
> dependent actions) for non-redundant arrays.
>
> To achieve that MD_BROKEN is added to mddev->flags if drive disappearance
> is detected. This is done in is_mddev_broken() which is confusing and not
> consistent with other levels where error_handler() is used.
> This patch adds appropriate error_handler for raid0 and linear and adopt
> md_error() to the change.
>
> Usage of error_handler causes that disk failure can be requested from
> userspace. User can fail the array via #mdadm --set-faulty command. This
> is not safe and will be fixed in mdadm. It is correctable because failed
> state is not recorded in the metadata. After next assembly array will be
> read-write again. For safety reason is better to keep MD_BROKEN in
> runtime only.
>
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>

Sorry for the late response.

> ---
>  drivers/md/md-linear.c | 14 +++++++++++++-
>  drivers/md/md.c        |  6 +++++-
>  drivers/md/md.h        | 10 ++--------
>  drivers/md/raid0.c     | 14 +++++++++++++-
>  4 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
> index 1ff51647a682..c33cd28f1dba 100644
> --- a/drivers/md/md-linear.c
> +++ b/drivers/md/md-linear.c
> @@ -233,7 +233,8 @@ static bool linear_make_request(struct mddev *mddev, struct bio *bio)
>                      bio_sector < start_sector))
>                 goto out_of_bounds;
>
> -       if (unlikely(is_mddev_broken(tmp_dev->rdev, "linear"))) {
> +       if (unlikely(is_rdev_broken(tmp_dev->rdev))) {
> +               md_error(mddev, tmp_dev->rdev);

I apologize if we discussed this before. Shall we just call linear_error()
here?If we go this way, we don't really need ...

>                 bio_io_error(bio);
>                 return true;
>         }
> @@ -281,6 +282,16 @@ static void linear_status (struct seq_file *seq, struct mddev *mddev)
>         seq_printf(seq, " %dk rounding", mddev->chunk_sectors / 2);
>  }
>
> +static void linear_error(struct mddev *mddev, struct md_rdev *rdev)
> +{
> +       if (!test_and_set_bit(MD_BROKEN, &mddev->flags)) {
> +               char *md_name = mdname(mddev);
> +
> +               pr_crit("md/linear%s: Disk failure on %pg detected, failing array.\n",
> +                       md_name, rdev->bdev);
> +       }
> +}
> +
>  static void linear_quiesce(struct mddev *mddev, int state)
>  {
>  }
> @@ -297,6 +308,7 @@ static struct md_personality linear_personality =
>         .hot_add_disk   = linear_add,
>         .size           = linear_size,
>         .quiesce        = linear_quiesce,
> +       .error_handler  = linear_error,

... set error_handler here, and ...

>  };
>
>  static int __init linear_init (void)
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0a89f072dae0..3354afc9d2a3 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7985,7 +7985,11 @@ void md_error(struct mddev *mddev, struct md_rdev *rdev)
>
>         if (!mddev->pers || !mddev->pers->error_handler)
>                 return;
> -       mddev->pers->error_handler(mddev,rdev);
> +       mddev->pers->error_handler(mddev, rdev);
> +
> +       if (mddev->pers->level == 0 || mddev->pers->level == LEVEL_LINEAR)
> +               return;

... this check here.

Did I miss something?

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear
  2022-04-08  0:16   ` Song Liu
@ 2022-04-08 14:35     ` Mariusz Tkaczyk
  2022-04-08 16:18       ` Song Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Mariusz Tkaczyk @ 2022-04-08 14:35 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, Guoqing Jiang

On Thu, 7 Apr 2022 17:16:37 -0700
Song Liu <song@kernel.org> wrote:

> On Tue, Mar 22, 2022 at 8:24 AM Mariusz Tkaczyk
> <mariusz.tkaczyk@linux.intel.com> wrote:
> >
> > Patch 62f7b1989c0 ("md raid0/linear: Mark array as 'broken' and
> > fail BIOs if a member is gone") allowed to finish writes earlier
> > (before level dependent actions) for non-redundant arrays.
> >
> > To achieve that MD_BROKEN is added to mddev->flags if drive
> > disappearance is detected. This is done in is_mddev_broken() which
> > is confusing and not consistent with other levels where
> > error_handler() is used. This patch adds appropriate error_handler
> > for raid0 and linear and adopt md_error() to the change.
> >
> > Usage of error_handler causes that disk failure can be requested
> > from userspace. User can fail the array via #mdadm --set-faulty
> > command. This is not safe and will be fixed in mdadm. It is
> > correctable because failed state is not recorded in the metadata.
> > After next assembly array will be read-write again. For safety
> > reason is better to keep MD_BROKEN in runtime only.
> >
> > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>  
> 
> Sorry for the late response.
> 
> > ---
> >  drivers/md/md-linear.c | 14 +++++++++++++-
> >  drivers/md/md.c        |  6 +++++-
> >  drivers/md/md.h        | 10 ++--------
> >  drivers/md/raid0.c     | 14 +++++++++++++-
> >  4 files changed, 33 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
> > index 1ff51647a682..c33cd28f1dba 100644
> > --- a/drivers/md/md-linear.c
> > +++ b/drivers/md/md-linear.c
> > @@ -233,7 +233,8 @@ static bool linear_make_request(struct mddev
> > *mddev, struct bio *bio) bio_sector < start_sector))
> >                 goto out_of_bounds;
> >
> > -       if (unlikely(is_mddev_broken(tmp_dev->rdev, "linear"))) {
> > +       if (unlikely(is_rdev_broken(tmp_dev->rdev))) {
> > +               md_error(mddev, tmp_dev->rdev);  
> 
> I apologize if we discussed this before. Shall we just call
> linear_error() here?If we go this way, we don't really need ...
> 
> >                 bio_io_error(bio);
> >                 return true;
> >         }
> > @@ -281,6 +282,16 @@ static void linear_status (struct seq_file
> > *seq, struct mddev *mddev) seq_printf(seq, " %dk rounding",
> > mddev->chunk_sectors / 2); }
> >
> > +static void linear_error(struct mddev *mddev, struct md_rdev *rdev)
> > +{
> > +       if (!test_and_set_bit(MD_BROKEN, &mddev->flags)) {
> > +               char *md_name = mdname(mddev);
> > +
> > +               pr_crit("md/linear%s: Disk failure on %pg detected,
> > failing array.\n",
> > +                       md_name, rdev->bdev);
> > +       }
> > +}
> > +
> >  static void linear_quiesce(struct mddev *mddev, int state)
> >  {
> >  }
> > @@ -297,6 +308,7 @@ static struct md_personality linear_personality
> > = .hot_add_disk   = linear_add,
> >         .size           = linear_size,
> >         .quiesce        = linear_quiesce,
> > +       .error_handler  = linear_error,  
> 
> ... set error_handler here, and ...
> 
> >  };
> >
> >  static int __init linear_init (void)
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 0a89f072dae0..3354afc9d2a3 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -7985,7 +7985,11 @@ void md_error(struct mddev *mddev, struct
> > md_rdev *rdev)
> >
> >         if (!mddev->pers || !mddev->pers->error_handler)
> >                 return;
> > -       mddev->pers->error_handler(mddev,rdev);
> > +       mddev->pers->error_handler(mddev, rdev);
> > +
> > +       if (mddev->pers->level == 0 || mddev->pers->level ==
> > LEVEL_LINEAR)
> > +               return;  
> 
> ... this check here.
> 
> Did I miss something?
> 
Hi Song,
That is correct, we can do the same for raid0. I did it this way to
make it similar to redundant levels.
If you think that it is overhead, I can drop it.

Thanks,
Mariusz


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear
  2022-04-08 14:35     ` Mariusz Tkaczyk
@ 2022-04-08 16:18       ` Song Liu
  2022-04-12 15:31         ` Mariusz Tkaczyk
  0 siblings, 1 reply; 11+ messages in thread
From: Song Liu @ 2022-04-08 16:18 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid, Guoqing Jiang

On Fri, Apr 8, 2022 at 7:35 AM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> On Thu, 7 Apr 2022 17:16:37 -0700
> Song Liu <song@kernel.org> wrote:
>
> > On Tue, Mar 22, 2022 at 8:24 AM Mariusz Tkaczyk
> > <mariusz.tkaczyk@linux.intel.com> wrote:
> > >
> > > Patch 62f7b1989c0 ("md raid0/linear: Mark array as 'broken' and
> > > fail BIOs if a member is gone") allowed to finish writes earlier
> > > (before level dependent actions) for non-redundant arrays.
> > >
> > > To achieve that MD_BROKEN is added to mddev->flags if drive
> > > disappearance is detected. This is done in is_mddev_broken() which
> > > is confusing and not consistent with other levels where
> > > error_handler() is used. This patch adds appropriate error_handler
> > > for raid0 and linear and adopt md_error() to the change.
> > >
> > > Usage of error_handler causes that disk failure can be requested
> > > from userspace. User can fail the array via #mdadm --set-faulty
> > > command. This is not safe and will be fixed in mdadm. It is
> > > correctable because failed state is not recorded in the metadata.
> > > After next assembly array will be read-write again. For safety
> > > reason is better to keep MD_BROKEN in runtime only.
> > >
> > > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> >
> > Sorry for the late response.
> >
> > > ---
> > >  drivers/md/md-linear.c | 14 +++++++++++++-
> > >  drivers/md/md.c        |  6 +++++-
> > >  drivers/md/md.h        | 10 ++--------
> > >  drivers/md/raid0.c     | 14 +++++++++++++-
> > >  4 files changed, 33 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
> > > index 1ff51647a682..c33cd28f1dba 100644
> > > --- a/drivers/md/md-linear.c
> > > +++ b/drivers/md/md-linear.c
> > > @@ -233,7 +233,8 @@ static bool linear_make_request(struct mddev
> > > *mddev, struct bio *bio) bio_sector < start_sector))
> > >                 goto out_of_bounds;
> > >
> > > -       if (unlikely(is_mddev_broken(tmp_dev->rdev, "linear"))) {
> > > +       if (unlikely(is_rdev_broken(tmp_dev->rdev))) {
> > > +               md_error(mddev, tmp_dev->rdev);
> >
> > I apologize if we discussed this before. Shall we just call
> > linear_error() here?If we go this way, we don't really need ...
> >
> > >                 bio_io_error(bio);
> > >                 return true;
> > >         }
> > > @@ -281,6 +282,16 @@ static void linear_status (struct seq_file
> > > *seq, struct mddev *mddev) seq_printf(seq, " %dk rounding",
> > > mddev->chunk_sectors / 2); }
> > >
> > > +static void linear_error(struct mddev *mddev, struct md_rdev *rdev)
> > > +{
> > > +       if (!test_and_set_bit(MD_BROKEN, &mddev->flags)) {
> > > +               char *md_name = mdname(mddev);
> > > +
> > > +               pr_crit("md/linear%s: Disk failure on %pg detected,
> > > failing array.\n",
> > > +                       md_name, rdev->bdev);
> > > +       }
> > > +}
> > > +
> > >  static void linear_quiesce(struct mddev *mddev, int state)
> > >  {
> > >  }
> > > @@ -297,6 +308,7 @@ static struct md_personality linear_personality
> > > = .hot_add_disk   = linear_add,
> > >         .size           = linear_size,
> > >         .quiesce        = linear_quiesce,
> > > +       .error_handler  = linear_error,
> >
> > ... set error_handler here, and ...
> >
> > >  };
> > >
> > >  static int __init linear_init (void)
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > index 0a89f072dae0..3354afc9d2a3 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -7985,7 +7985,11 @@ void md_error(struct mddev *mddev, struct
> > > md_rdev *rdev)
> > >
> > >         if (!mddev->pers || !mddev->pers->error_handler)
> > >                 return;
> > > -       mddev->pers->error_handler(mddev,rdev);
> > > +       mddev->pers->error_handler(mddev, rdev);
> > > +
> > > +       if (mddev->pers->level == 0 || mddev->pers->level ==
> > > LEVEL_LINEAR)
> > > +               return;
> >
> > ... this check here.
> >
> > Did I miss something?
> >
> Hi Song,
> That is correct, we can do the same for raid0. I did it this way to
> make it similar to redundant levels.
> If you think that it is overhead, I can drop it.

Yeah, it feels like more overhead to me.

I applied 2/3 and 3/3 to md-next. Please take a look and let me know
if anything needs to be fixed.

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear
  2022-04-08 16:18       ` Song Liu
@ 2022-04-12 15:31         ` Mariusz Tkaczyk
  2022-04-12 16:36           ` Song Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Mariusz Tkaczyk @ 2022-04-12 15:31 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, Guoqing Jiang

On Fri, 8 Apr 2022 09:18:28 -0700
Song Liu <song@kernel.org> wrote:

> On Fri, Apr 8, 2022 at 7:35 AM Mariusz Tkaczyk
> <mariusz.tkaczyk@linux.intel.com> wrote:
> >
> > On Thu, 7 Apr 2022 17:16:37 -0700
> > Song Liu <song@kernel.org> wrote:
> >  
> > > On Tue, Mar 22, 2022 at 8:24 AM Mariusz Tkaczyk
> > > <mariusz.tkaczyk@linux.intel.com> wrote:  
> > > >
> > > > Patch 62f7b1989c0 ("md raid0/linear: Mark array as 'broken' and
> > > > fail BIOs if a member is gone") allowed to finish writes earlier
> > > > (before level dependent actions) for non-redundant arrays.
> > > >
> > > > To achieve that MD_BROKEN is added to mddev->flags if drive
> > > > disappearance is detected. This is done in is_mddev_broken() which
> > > > is confusing and not consistent with other levels where
> > > > error_handler() is used. This patch adds appropriate error_handler
> > > > for raid0 and linear and adopt md_error() to the change.
> > > >
> > > > Usage of error_handler causes that disk failure can be requested
> > > > from userspace. User can fail the array via #mdadm --set-faulty
> > > > command. This is not safe and will be fixed in mdadm. It is
> > > > correctable because failed state is not recorded in the metadata.
> > > > After next assembly array will be read-write again. For safety
> > > > reason is better to keep MD_BROKEN in runtime only.
> > > >
> > > > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>  
> > >
> > > Sorry for the late response.
> > >  
> > > > ---
> > > >  drivers/md/md-linear.c | 14 +++++++++++++-
> > > >  drivers/md/md.c        |  6 +++++-
> > > >  drivers/md/md.h        | 10 ++--------
> > > >  drivers/md/raid0.c     | 14 +++++++++++++-
> > > >  4 files changed, 33 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
> > > > index 1ff51647a682..c33cd28f1dba 100644
> > > > --- a/drivers/md/md-linear.c
> > > > +++ b/drivers/md/md-linear.c
> > > > @@ -233,7 +233,8 @@ static bool linear_make_request(struct mddev
> > > > *mddev, struct bio *bio) bio_sector < start_sector))
> > > >                 goto out_of_bounds;
> > > >
> > > > -       if (unlikely(is_mddev_broken(tmp_dev->rdev, "linear"))) {
> > > > +       if (unlikely(is_rdev_broken(tmp_dev->rdev))) {
> > > > +               md_error(mddev, tmp_dev->rdev);  
> > >
> > > I apologize if we discussed this before. Shall we just call
> > > linear_error() here?If we go this way, we don't really need ...
> > >  
> > > >                 bio_io_error(bio);
> > > >                 return true;
> > > >         }
> > > > @@ -281,6 +282,16 @@ static void linear_status (struct seq_file
> > > > *seq, struct mddev *mddev) seq_printf(seq, " %dk rounding",
> > > > mddev->chunk_sectors / 2); }
> > > >
> > > > +static void linear_error(struct mddev *mddev, struct md_rdev *rdev)
> > > > +{
> > > > +       if (!test_and_set_bit(MD_BROKEN, &mddev->flags)) {
> > > > +               char *md_name = mdname(mddev);
> > > > +
> > > > +               pr_crit("md/linear%s: Disk failure on %pg detected,
> > > > failing array.\n",
> > > > +                       md_name, rdev->bdev);
> > > > +       }
> > > > +}
> > > > +
> > > >  static void linear_quiesce(struct mddev *mddev, int state)
> > > >  {
> > > >  }
> > > > @@ -297,6 +308,7 @@ static struct md_personality linear_personality
> > > > = .hot_add_disk   = linear_add,
> > > >         .size           = linear_size,
> > > >         .quiesce        = linear_quiesce,
> > > > +       .error_handler  = linear_error,  
> > >
> > > ... set error_handler here, and ...
> > >  
> > > >  };
> > > >
> > > >  static int __init linear_init (void)
> > > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > > index 0a89f072dae0..3354afc9d2a3 100644
> > > > --- a/drivers/md/md.c
> > > > +++ b/drivers/md/md.c
> > > > @@ -7985,7 +7985,11 @@ void md_error(struct mddev *mddev, struct
> > > > md_rdev *rdev)
> > > >
> > > >         if (!mddev->pers || !mddev->pers->error_handler)
> > > >                 return;
> > > > -       mddev->pers->error_handler(mddev,rdev);
> > > > +       mddev->pers->error_handler(mddev, rdev);
> > > > +
> > > > +       if (mddev->pers->level == 0 || mddev->pers->level ==
> > > > LEVEL_LINEAR)
> > > > +               return;  
> > >
> > > ... this check here.
> > >
> > > Did I miss something?
> > >  
> > Hi Song,
> > That is correct, we can do the same for raid0. I did it this way to
> > make it similar to redundant levels.
> > If you think that it is overhead, I can drop it.  
> 
> Yeah, it feels like more overhead to me.
> 
> I applied 2/3 and 3/3 to md-next. Please take a look and let me know
> if anything needs to be fixed.
> 
Now the first patch is just a code refactor with different error message.
I think that we can drop it. Do you agree?

Thanks,
Mariusz


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear
  2022-04-12 15:31         ` Mariusz Tkaczyk
@ 2022-04-12 16:36           ` Song Liu
  0 siblings, 0 replies; 11+ messages in thread
From: Song Liu @ 2022-04-12 16:36 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid, Guoqing Jiang

On Tue, Apr 12, 2022 at 8:32 AM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> On Fri, 8 Apr 2022 09:18:28 -0700
> Song Liu <song@kernel.org> wrote:
>
> > On Fri, Apr 8, 2022 at 7:35 AM Mariusz Tkaczyk
> > <mariusz.tkaczyk@linux.intel.com> wrote:
> > >
> > > On Thu, 7 Apr 2022 17:16:37 -0700
> > > Song Liu <song@kernel.org> wrote:
> > >
> > > > On Tue, Mar 22, 2022 at 8:24 AM Mariusz Tkaczyk
> > > > <mariusz.tkaczyk@linux.intel.com> wrote:
> > > > >
> > > > > Patch 62f7b1989c0 ("md raid0/linear: Mark array as 'broken' and
> > > > > fail BIOs if a member is gone") allowed to finish writes earlier
> > > > > (before level dependent actions) for non-redundant arrays.
> > > > >
> > > > > To achieve that MD_BROKEN is added to mddev->flags if drive
> > > > > disappearance is detected. This is done in is_mddev_broken() which
> > > > > is confusing and not consistent with other levels where
> > > > > error_handler() is used. This patch adds appropriate error_handler
> > > > > for raid0 and linear and adopt md_error() to the change.
> > > > >
> > > > > Usage of error_handler causes that disk failure can be requested
> > > > > from userspace. User can fail the array via #mdadm --set-faulty
> > > > > command. This is not safe and will be fixed in mdadm. It is
> > > > > correctable because failed state is not recorded in the metadata.
> > > > > After next assembly array will be read-write again. For safety
> > > > > reason is better to keep MD_BROKEN in runtime only.
> > > > >
> > > > > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> > > >
> > > > Sorry for the late response.
> > > >
> > > > > ---
> > > > >  drivers/md/md-linear.c | 14 +++++++++++++-
> > > > >  drivers/md/md.c        |  6 +++++-
> > > > >  drivers/md/md.h        | 10 ++--------
> > > > >  drivers/md/raid0.c     | 14 +++++++++++++-
> > > > >  4 files changed, 33 insertions(+), 11 deletions(-)
> > > > >
> > > > > diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
> > > > > index 1ff51647a682..c33cd28f1dba 100644
> > > > > --- a/drivers/md/md-linear.c
> > > > > +++ b/drivers/md/md-linear.c
> > > > > @@ -233,7 +233,8 @@ static bool linear_make_request(struct mddev
> > > > > *mddev, struct bio *bio) bio_sector < start_sector))
> > > > >                 goto out_of_bounds;
> > > > >
> > > > > -       if (unlikely(is_mddev_broken(tmp_dev->rdev, "linear"))) {
> > > > > +       if (unlikely(is_rdev_broken(tmp_dev->rdev))) {
> > > > > +               md_error(mddev, tmp_dev->rdev);
> > > >
> > > > I apologize if we discussed this before. Shall we just call
> > > > linear_error() here?If we go this way, we don't really need ...
> > > >
> > > > >                 bio_io_error(bio);
> > > > >                 return true;
> > > > >         }
> > > > > @@ -281,6 +282,16 @@ static void linear_status (struct seq_file
> > > > > *seq, struct mddev *mddev) seq_printf(seq, " %dk rounding",
> > > > > mddev->chunk_sectors / 2); }
> > > > >
> > > > > +static void linear_error(struct mddev *mddev, struct md_rdev *rdev)
> > > > > +{
> > > > > +       if (!test_and_set_bit(MD_BROKEN, &mddev->flags)) {
> > > > > +               char *md_name = mdname(mddev);
> > > > > +
> > > > > +               pr_crit("md/linear%s: Disk failure on %pg detected,
> > > > > failing array.\n",
> > > > > +                       md_name, rdev->bdev);
> > > > > +       }
> > > > > +}
> > > > > +
> > > > >  static void linear_quiesce(struct mddev *mddev, int state)
> > > > >  {
> > > > >  }
> > > > > @@ -297,6 +308,7 @@ static struct md_personality linear_personality
> > > > > = .hot_add_disk   = linear_add,
> > > > >         .size           = linear_size,
> > > > >         .quiesce        = linear_quiesce,
> > > > > +       .error_handler  = linear_error,
> > > >
> > > > ... set error_handler here, and ...
> > > >
> > > > >  };
> > > > >
> > > > >  static int __init linear_init (void)
> > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > > > index 0a89f072dae0..3354afc9d2a3 100644
> > > > > --- a/drivers/md/md.c
> > > > > +++ b/drivers/md/md.c
> > > > > @@ -7985,7 +7985,11 @@ void md_error(struct mddev *mddev, struct
> > > > > md_rdev *rdev)
> > > > >
> > > > >         if (!mddev->pers || !mddev->pers->error_handler)
> > > > >                 return;
> > > > > -       mddev->pers->error_handler(mddev,rdev);
> > > > > +       mddev->pers->error_handler(mddev, rdev);
> > > > > +
> > > > > +       if (mddev->pers->level == 0 || mddev->pers->level ==
> > > > > LEVEL_LINEAR)
> > > > > +               return;
> > > >
> > > > ... this check here.
> > > >
> > > > Did I miss something?
> > > >
> > > Hi Song,
> > > That is correct, we can do the same for raid0. I did it this way to
> > > make it similar to redundant levels.
> > > If you think that it is overhead, I can drop it.
> >
> > Yeah, it feels like more overhead to me.
> >
> > I applied 2/3 and 3/3 to md-next. Please take a look and let me know
> > if anything needs to be fixed.
> >
> Now the first patch is just a code refactor with different error message.
> I think that we can drop it. Do you agree?

Yes, I think we can drop it.

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/3] md: Set MD_BROKEN for RAID1 and RAID10
  2022-03-22 15:23 [PATCH 0/3] Failed array handling improvements Mariusz Tkaczyk
  2022-03-22 15:23 ` [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear Mariusz Tkaczyk
@ 2022-03-22 15:23 ` Mariusz Tkaczyk
  2022-03-22 15:23 ` [PATCH 3/3] raid5: introduce MD_BROKEN Mariusz Tkaczyk
  2022-03-24  7:09 ` [PATCH 0/3] Failed array handling improvements Xiao Ni
  3 siblings, 0 replies; 11+ messages in thread
From: Mariusz Tkaczyk @ 2022-03-22 15:23 UTC (permalink / raw)
  To: song; +Cc: linux-raid, guoqing.jiang

There is no direct mechanism to determine raid failure outside
personality. It is done by checking rdev->flags after executing
md_error(). If "faulty" flag is not set then -EBUSY is returned to
userspace. -EBUSY means that array will be failed after drive removal.

Mdadm has special routine to handle the array failure and it is executed
if -EBUSY is returned by md.

There are at least two known reasons to not consider this mechanism
as correct:
1. drive can be removed even if array will be failed[1].
2. -EBUSY seems to be wrong status. Array is not busy, but removal
   process cannot proceed safe.

-EBUSY expectation cannot be removed without breaking compatibility
with userspace. In this patch first issue is resolved by adding support
for MD_BROKEN flag for RAID1 and RAID10. Support for RAID456 is added in
next commit.

The idea is to set the MD_BROKEN if we are sure that raid is in failed
state now. This is done in each error_handler(). In md_error() MD_BROKEN
flag is checked. If is set, then -EBUSY is returned to userspace.

As in previous commit, it causes that #mdadm --set-faulty is able to
fail array. Previously proposed workaround is valid if optional
functionality[1] is disabled.

[1] commit 9a567843f7ce("md: allow last device to be forcibly removed from
    RAID1/RAID10.")
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
---
 drivers/md/md.c     | 24 ++++++++++--------
 drivers/md/md.h     | 62 +++++++++++++++++++++++++--------------------
 drivers/md/raid1.c  | 43 ++++++++++++++++++-------------
 drivers/md/raid10.c | 40 +++++++++++++++++------------
 4 files changed, 98 insertions(+), 71 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3354afc9d2a3..3613b22b9097 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2984,10 +2984,11 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 
 	if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
 		md_error(rdev->mddev, rdev);
-		if (test_bit(Faulty, &rdev->flags))
-			err = 0;
-		else
+
+		if (test_bit(MD_BROKEN, &rdev->mddev->flags))
 			err = -EBUSY;
+		else
+			err = 0;
 	} else if (cmd_match(buf, "remove")) {
 		if (rdev->mddev->pers) {
 			clear_bit(Blocked, &rdev->flags);
@@ -4353,10 +4354,9 @@ __ATTR_PREALLOC(resync_start, S_IRUGO|S_IWUSR,
  *     like active, but no writes have been seen for a while (100msec).
  *
  * broken
- *     RAID0/LINEAR-only: same as clean, but array is missing a member.
- *     It's useful because RAID0/LINEAR mounted-arrays aren't stopped
- *     when a member is gone, so this state will at least alert the
- *     user that something is wrong.
+*     Array is failed. It's useful because mounted-arrays aren't stopped
+*     when array is failed, so this state will at least alert the user that
+*     something is wrong.
  */
 enum array_state { clear, inactive, suspended, readonly, read_auto, clean, active,
 		   write_pending, active_idle, broken, bad_word};
@@ -7444,7 +7444,7 @@ static int set_disk_faulty(struct mddev *mddev, dev_t dev)
 		err =  -ENODEV;
 	else {
 		md_error(mddev, rdev);
-		if (!test_bit(Faulty, &rdev->flags))
+		if (test_bit(MD_BROKEN, &mddev->flags))
 			err = -EBUSY;
 	}
 	rcu_read_unlock();
@@ -7990,12 +7990,14 @@ void md_error(struct mddev *mddev, struct md_rdev *rdev)
 	if (mddev->pers->level == 0 || mddev->pers->level == LEVEL_LINEAR)
 		return;
 
-	if (mddev->degraded)
+	if (mddev->degraded && !test_bit(MD_BROKEN, &mddev->flags))
 		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 	sysfs_notify_dirent_safe(rdev->sysfs_state);
 	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-	md_wakeup_thread(mddev->thread);
+	if (!test_bit(MD_BROKEN, &mddev->flags)) {
+		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+		md_wakeup_thread(mddev->thread);
+	}
 	if (mddev->event_work.func)
 		queue_work(md_misc_wq, &mddev->event_work);
 	md_new_event();
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 13d435a303fa..3f056ec92473 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -234,34 +234,42 @@ extern int rdev_clear_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
 				int is_new);
 struct md_cluster_info;
 
-/* change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added */
+/**
+ * enum mddev_flags - md device flags.
+ * @MD_ARRAY_FIRST_USE: First use of array, needs initialization.
+ * @MD_CLOSING: If set, we are closing the array, do not open it then.
+ * @MD_JOURNAL_CLEAN: A raid with journal is already clean.
+ * @MD_HAS_JOURNAL: The raid array has journal feature set.
+ * @MD_CLUSTER_RESYNC_LOCKED: cluster raid only, which means node, already took
+ *			       resync lock, need to release the lock.
+ * @MD_FAILFAST_SUPPORTED: Using MD_FAILFAST on metadata writes is supported as
+ *			    calls to md_error() will never cause the array to
+ *			    become failed.
+ * @MD_HAS_PPL:  The raid array has PPL feature set.
+ * @MD_HAS_MULTIPLE_PPLS: The raid array has multiple PPLs feature set.
+ * @MD_ALLOW_SB_UPDATE: md_check_recovery is allowed to update the metadata
+ *			 without taking reconfig_mutex.
+ * @MD_UPDATING_SB: md_check_recovery is updating the metadata without
+ *		     explicitly holding reconfig_mutex.
+ * @MD_NOT_READY: do_md_run() is active, so 'array_state', ust not report that
+ *		   array is ready yet.
+ * @MD_BROKEN: This is used to stop writes and mark array as failed.
+ *
+ * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
+ */
 enum mddev_flags {
-	MD_ARRAY_FIRST_USE,	/* First use of array, needs initialization */
-	MD_CLOSING,		/* If set, we are closing the array, do not open
-				 * it then */
-	MD_JOURNAL_CLEAN,	/* A raid with journal is already clean */
-	MD_HAS_JOURNAL,		/* The raid array has journal feature set */
-	MD_CLUSTER_RESYNC_LOCKED, /* cluster raid only, which means node
-				   * already took resync lock, need to
-				   * release the lock */
-	MD_FAILFAST_SUPPORTED,	/* Using MD_FAILFAST on metadata writes is
-				 * supported as calls to md_error() will
-				 * never cause the array to become failed.
-				 */
-	MD_HAS_PPL,		/* The raid array has PPL feature set */
-	MD_HAS_MULTIPLE_PPLS,	/* The raid array has multiple PPLs feature set */
-	MD_ALLOW_SB_UPDATE,	/* md_check_recovery is allowed to update
-				 * the metadata without taking reconfig_mutex.
-				 */
-	MD_UPDATING_SB,		/* md_check_recovery is updating the metadata
-				 * without explicitly holding reconfig_mutex.
-				 */
-	MD_NOT_READY,		/* do_md_run() is active, so 'array_state'
-				 * must not report that array is ready yet
-				 */
-	MD_BROKEN,              /* This is used in RAID-0/LINEAR only, to stop
-				 * I/O in case an array member is gone/failed.
-				 */
+	MD_ARRAY_FIRST_USE,
+	MD_CLOSING,
+	MD_JOURNAL_CLEAN,
+	MD_HAS_JOURNAL,
+	MD_CLUSTER_RESYNC_LOCKED,
+	MD_FAILFAST_SUPPORTED,
+	MD_HAS_PPL,
+	MD_HAS_MULTIPLE_PPLS,
+	MD_ALLOW_SB_UPDATE,
+	MD_UPDATING_SB,
+	MD_NOT_READY,
+	MD_BROKEN,
 };
 
 enum mddev_sb_flags {
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 9b2f9745b4e0..bd1acfb42997 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1638,30 +1638,39 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev)
 	seq_printf(seq, "]");
 }
 
+/**
+ * raid1_error() - RAID1 error handler.
+ * @mddev: affected md device.
+ * @rdev: member device to fail.
+ *
+ * The routine acknowledges &rdev failure and determines new @mddev state.
+ * If it failed, then:
+ *	- &MD_BROKEN flag is set in &mddev->flags.
+ *	- recovery is disabled.
+ * Otherwise, it must be degraded:
+ *	- recovery is interrupted.
+ *	- &mddev->degraded is bumped.
+ *
+ * @rdev is marked as &Faulty excluding case when array is failed and
+ * &mddev->fail_last_dev is off.
+ */
 static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
 {
 	char b[BDEVNAME_SIZE];
 	struct r1conf *conf = mddev->private;
 	unsigned long flags;
 
-	/*
-	 * If it is not operational, then we have already marked it as dead
-	 * else if it is the last working disks with "fail_last_dev == false",
-	 * ignore the error, let the next level up know.
-	 * else mark the drive as failed
-	 */
 	spin_lock_irqsave(&conf->device_lock, flags);
-	if (test_bit(In_sync, &rdev->flags) && !mddev->fail_last_dev
-	    && (conf->raid_disks - mddev->degraded) == 1) {
-		/*
-		 * Don't fail the drive, act as though we were just a
-		 * normal single drive.
-		 * However don't try a recovery from this drive as
-		 * it is very likely to fail.
-		 */
-		conf->recovery_disabled = mddev->recovery_disabled;
-		spin_unlock_irqrestore(&conf->device_lock, flags);
-		return;
+
+	if (test_bit(In_sync, &rdev->flags) &&
+	    (conf->raid_disks - mddev->degraded) == 1) {
+		set_bit(MD_BROKEN, &mddev->flags);
+
+		if (!mddev->fail_last_dev) {
+			conf->recovery_disabled = mddev->recovery_disabled;
+			spin_unlock_irqrestore(&conf->device_lock, flags);
+			return;
+		}
 	}
 	set_bit(Blocked, &rdev->flags);
 	if (test_and_clear_bit(In_sync, &rdev->flags))
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index c2b5a9ff85cc..06f8c48c13c3 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1963,32 +1963,40 @@ static int enough(struct r10conf *conf, int ignore)
 		_enough(conf, 1, ignore);
 }
 
+/**
+ * raid10_error() - RAID10 error handler.
+ * @mddev: affected md device.
+ * @rdev: member device to fail.
+ *
+ * The routine acknowledges &rdev failure and determines new @mddev state.
+ * If it failed, then:
+ *	- &MD_BROKEN flag is set in &mddev->flags.
+ * Otherwise, it must be degraded:
+ *	- recovery is interrupted.
+ *	- &mddev->degraded is bumped.
+
+ * @rdev is marked as &Faulty excluding case when array is failed and
+ * &mddev->fail_last_dev is off.
+ */
 static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
 {
 	char b[BDEVNAME_SIZE];
 	struct r10conf *conf = mddev->private;
 	unsigned long flags;
 
-	/*
-	 * If it is not operational, then we have already marked it as dead
-	 * else if it is the last working disks with "fail_last_dev == false",
-	 * ignore the error, let the next level up know.
-	 * else mark the drive as failed
-	 */
 	spin_lock_irqsave(&conf->device_lock, flags);
-	if (test_bit(In_sync, &rdev->flags) && !mddev->fail_last_dev
-	    && !enough(conf, rdev->raid_disk)) {
-		/*
-		 * Don't fail the drive, just return an IO error.
-		 */
-		spin_unlock_irqrestore(&conf->device_lock, flags);
-		return;
+
+	if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) {
+		set_bit(MD_BROKEN, &mddev->flags);
+
+		if (!mddev->fail_last_dev) {
+			spin_unlock_irqrestore(&conf->device_lock, flags);
+			return;
+		}
 	}
 	if (test_and_clear_bit(In_sync, &rdev->flags))
 		mddev->degraded++;
-	/*
-	 * If recovery is running, make sure it aborts.
-	 */
+
 	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 	set_bit(Blocked, &rdev->flags);
 	set_bit(Faulty, &rdev->flags);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] raid5: introduce MD_BROKEN
  2022-03-22 15:23 [PATCH 0/3] Failed array handling improvements Mariusz Tkaczyk
  2022-03-22 15:23 ` [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear Mariusz Tkaczyk
  2022-03-22 15:23 ` [PATCH 2/3] md: Set MD_BROKEN for RAID1 and RAID10 Mariusz Tkaczyk
@ 2022-03-22 15:23 ` Mariusz Tkaczyk
  2022-04-08  0:29   ` Song Liu
  2022-03-24  7:09 ` [PATCH 0/3] Failed array handling improvements Xiao Ni
  3 siblings, 1 reply; 11+ messages in thread
From: Mariusz Tkaczyk @ 2022-03-22 15:23 UTC (permalink / raw)
  To: song; +Cc: linux-raid, guoqing.jiang

Raid456 module had allowed to achieve failed state. It was fixed by
fb73b357fb9 ("raid5: block failing device if raid will be failed").
This fix introduces a bug, now if raid5 fails during IO, it may result
with a hung task without completion. Faulty flag on the device is
necessary to process all requests and is checked many times, mainly in
analyze_stripe().
Allow to set faulty on drive again and set MD_BROKEN if raid is failed.

As a result, this level is allowed to achieve failed state again, but
communication with userspace (via -EBUSY status) will be preserved.

This restores possibility to fail array via #mdadm --set-faulty command
and will be fixed by additional verification on mdadm side.

Reproduction steps:
 mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
 mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
 mkfs.xfs /dev/md126 -f
 mount /dev/md126 /mnt/root/

 fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
--bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
--time_based --group_reporting --name=throughput-test-job
--eta-newline=1 &

 echo 1 > /sys/block/nvme2n1/device/device/remove
 echo 1 > /sys/block/nvme1n1/device/device/remove

 [ 1475.787779] Call Trace:
 [ 1475.793111] __schedule+0x2a6/0x700
 [ 1475.799460] schedule+0x38/0xa0
 [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
 [ 1475.813856] ? finish_wait+0x80/0x80
 [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
 [ 1475.828281] ? finish_wait+0x80/0x80
 [ 1475.834727] ? finish_wait+0x80/0x80
 [ 1475.841127] ? finish_wait+0x80/0x80
 [ 1475.847480] md_handle_request+0x119/0x190
 [ 1475.854390] md_make_request+0x8a/0x190
 [ 1475.861041] generic_make_request+0xcf/0x310
 [ 1475.868145] submit_bio+0x3c/0x160
 [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
 [ 1475.882070] iomap_dio_bio_actor+0x175/0x390
 [ 1475.889149] iomap_apply+0xff/0x310
 [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.909974] iomap_dio_rw+0x2f2/0x490
 [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.923680] ? atime_needs_update+0x77/0xe0
 [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
 [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
 [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
 [ 1475.953403] aio_read+0xd5/0x180
 [ 1475.959395] ? _cond_resched+0x15/0x30
 [ 1475.965907] io_submit_one+0x20b/0x3c0
 [ 1475.972398] __x64_sys_io_submit+0xa2/0x180
 [ 1475.979335] ? do_io_getevents+0x7c/0xc0
 [ 1475.986009] do_syscall_64+0x5b/0x1a0
 [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
 [ 1476.000255] RIP: 0033:0x7f11fc27978d
 [ 1476.006631] Code: Bad RIP value.
 [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.

Cc: stable@vger.kernel.org
Fixes: fb73b357fb9 ("raid5: block failing device if raid will be failed")
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
---
 drivers/md/raid5.c | 48 +++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7c119208a214..4d76e3a89aa5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -686,17 +686,20 @@ int raid5_calc_degraded(struct r5conf *conf)
 	return degraded;
 }
 
-static int has_failed(struct r5conf *conf)
+static bool has_failed(struct r5conf *conf)
 {
-	int degraded;
+	int degraded = conf->mddev->degraded;
 
-	if (conf->mddev->reshape_position == MaxSector)
-		return conf->mddev->degraded > conf->max_degraded;
+	if (test_bit(MD_BROKEN, &conf->mddev->flags))
+		return true;
+
+	if (conf->mddev->reshape_position != MaxSector)
+		degraded = raid5_calc_degraded(conf);
 
-	degraded = raid5_calc_degraded(conf);
 	if (degraded > conf->max_degraded)
-		return 1;
-	return 0;
+		return true;
+
+	return false;
 }
 
 struct stripe_head *
@@ -2877,34 +2880,31 @@ static void raid5_error(struct mddev *mddev, struct md_rdev *rdev)
 	unsigned long flags;
 	pr_debug("raid456: error called\n");
 
+	pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n",
+		mdname(mddev), bdevname(rdev->bdev, b));
+
 	spin_lock_irqsave(&conf->device_lock, flags);
+	set_bit(Faulty, &rdev->flags);
+	clear_bit(In_sync, &rdev->flags);
+	mddev->degraded = raid5_calc_degraded(conf);
 
-	if (test_bit(In_sync, &rdev->flags) &&
-	    mddev->degraded == conf->max_degraded) {
-		/*
-		 * Don't allow to achieve failed state
-		 * Don't try to recover this device
-		 */
+	if (has_failed(conf)) {
+		set_bit(MD_BROKEN, &conf->mddev->flags);
 		conf->recovery_disabled = mddev->recovery_disabled;
-		spin_unlock_irqrestore(&conf->device_lock, flags);
-		return;
+
+		pr_crit("md/raid:%s: Cannot continue operation (%d/%d failed).\n",
+			mdname(mddev), mddev->degraded, conf->raid_disks);
+	} else {
+		pr_crit("md/raid:%s: Operation continuing on %d devices.\n",
+			mdname(mddev), conf->raid_disks - mddev->degraded);
 	}
 
-	set_bit(Faulty, &rdev->flags);
-	clear_bit(In_sync, &rdev->flags);
-	mddev->degraded = raid5_calc_degraded(conf);
 	spin_unlock_irqrestore(&conf->device_lock, flags);
 	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 
 	set_bit(Blocked, &rdev->flags);
 	set_mask_bits(&mddev->sb_flags, 0,
 		      BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
-	pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n"
-		"md/raid:%s: Operation continuing on %d devices.\n",
-		mdname(mddev),
-		bdevname(rdev->bdev, b),
-		mdname(mddev),
-		conf->raid_disks - mddev->degraded);
 	r5c_update_on_rdev_error(mddev, rdev);
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] raid5: introduce MD_BROKEN
  2022-03-22 15:23 ` [PATCH 3/3] raid5: introduce MD_BROKEN Mariusz Tkaczyk
@ 2022-04-08  0:29   ` Song Liu
  0 siblings, 0 replies; 11+ messages in thread
From: Song Liu @ 2022-04-08  0:29 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid, Guoqing Jiang

On Tue, Mar 22, 2022 at 8:24 AM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> Raid456 module had allowed to achieve failed state. It was fixed by
> fb73b357fb9 ("raid5: block failing device if raid will be failed").
> This fix introduces a bug, now if raid5 fails during IO, it may result
> with a hung task without completion. Faulty flag on the device is
> necessary to process all requests and is checked many times, mainly in
> analyze_stripe().
[...]

> Cc: stable@vger.kernel.org
> Fixes: fb73b357fb9 ("raid5: block failing device if raid will be failed")
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> ---
>  drivers/md/raid5.c | 48 +++++++++++++++++++++++-----------------------
>  1 file changed, 24 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 7c119208a214..4d76e3a89aa5 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -686,17 +686,20 @@ int raid5_calc_degraded(struct r5conf *conf)
>         return degraded;
>  }
>
> -static int has_failed(struct r5conf *conf)
> +static bool has_failed(struct r5conf *conf)
>  {
> -       int degraded;
> +       int degraded = conf->mddev->degraded;
>
> -       if (conf->mddev->reshape_position == MaxSector)
> -               return conf->mddev->degraded > conf->max_degraded;
> +       if (test_bit(MD_BROKEN, &conf->mddev->flags))
> +               return true;
> +
> +       if (conf->mddev->reshape_position != MaxSector)
> +               degraded = raid5_calc_degraded(conf);
>
> -       degraded = raid5_calc_degraded(conf);
>         if (degraded > conf->max_degraded)

nit: we can just do
   return degraded > conf->max_degraded;

[...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] Failed array handling improvements
  2022-03-22 15:23 [PATCH 0/3] Failed array handling improvements Mariusz Tkaczyk
                   ` (2 preceding siblings ...)
  2022-03-22 15:23 ` [PATCH 3/3] raid5: introduce MD_BROKEN Mariusz Tkaczyk
@ 2022-03-24  7:09 ` Xiao Ni
  3 siblings, 0 replies; 11+ messages in thread
From: Xiao Ni @ 2022-03-24  7:09 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: Song Liu, linux-raid, Guoqing Jiang

On Tue, Mar 22, 2022 at 11:24 PM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> Hi Song,
> In v4 following changes were made:
> - raid1_error and raid10_error description reworked, suggested by Guoqing.
> - Error messages for raid0 and linear reworked, suggested by Guoqing.
> - check for sync_request replaced by level checks, suggested by Guoqing.
>
> I did manual (mainly on IMSM) tests, here my next TODOs in mdadm:
> - broken state handling for redundant arrays in mdadm (it is exposed in sysfs).
>   Currently it is working same as before, because broken is not a case for
>   redundant arrays in mdadm. It requires to deal with already defined "FAILED"
>   state in mdadm.
> - Blocking manual removal of devices (#mdadm --set-faulty).
>
> I run following native mdadm tests with and without changes, all passed:
> #./test --disks=/dev/nullb[1-5] --tests=00raid1,00raid4,00raid5,00raid6,01r1fail,
> 01r5fail,01replace,02r1add,05r1failfast,05r1re-add,05r1re-add-nosuper
>
> Mariusz Tkaczyk (3):
>   raid0, linear, md: add error_handlers for raid0 and linear
>   md: Set MD_BROKEN for RAID1 and RAID10
>   raid5: introduce MD_BROKEN
>
>  drivers/md/md-linear.c | 14 +++++++-
>  drivers/md/md.c        | 30 +++++++++++-------
>  drivers/md/md.h        | 72 ++++++++++++++++++++++--------------------
>  drivers/md/raid0.c     | 14 +++++++-
>  drivers/md/raid1.c     | 43 +++++++++++++++----------
>  drivers/md/raid10.c    | 40 +++++++++++++----------
>  drivers/md/raid5.c     | 48 ++++++++++++++--------------
>  7 files changed, 155 insertions(+), 106 deletions(-)
>
> --
> 2.26.2
>

Reviewd-by: Xiao Ni <xni@redhat.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-04-12 16:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-22 15:23 [PATCH 0/3] Failed array handling improvements Mariusz Tkaczyk
2022-03-22 15:23 ` [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear Mariusz Tkaczyk
2022-04-08  0:16   ` Song Liu
2022-04-08 14:35     ` Mariusz Tkaczyk
2022-04-08 16:18       ` Song Liu
2022-04-12 15:31         ` Mariusz Tkaczyk
2022-04-12 16:36           ` Song Liu
2022-03-22 15:23 ` [PATCH 2/3] md: Set MD_BROKEN for RAID1 and RAID10 Mariusz Tkaczyk
2022-03-22 15:23 ` [PATCH 3/3] raid5: introduce MD_BROKEN Mariusz Tkaczyk
2022-04-08  0:29   ` Song Liu
2022-03-24  7:09 ` [PATCH 0/3] Failed array handling improvements Xiao Ni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).