From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
To: Xiao Ni <xni@redhat.com>
Cc: Song Liu <song@kernel.org>, linux-raid <linux-raid@vger.kernel.org>
Subject: Re: [PATCH 2/3] md: Set MD_BROKEN for RAID1 and RAID10
Date: Mon, 27 Dec 2021 13:34:29 +0100 [thread overview]
Message-ID: <20211227133429.00002062@linux.intel.com> (raw)
In-Reply-To: <CALTww29eqakZmp4oiDDZOWZtiz7q2yXCPidBoJVfVpodDcYdzw@mail.gmail.com>
On Wed, 22 Dec 2021 15:24:54 +0800
Xiao Ni <xni@redhat.com> wrote:
> On Thu, Dec 16, 2021 at 10:53 PM Mariusz Tkaczyk
> <mariusz.tkaczyk@linux.intel.com> wrote:
> >
> > There was no direct mechanism to determine raid failure outside
> > personality. It was done by checking rdev->flags after executing
> > md_error(). If "faulty" was not set then -EBUSY was returned to
> > userspace. It causes that mdadm expects -EBUSY if the array
> > becomes failed. There are some reasons to not consider this
> > mechanism as correct:
> > - drive can't be failed for different reasons.
> > - there are path where -EBUSY is not reported and drive removal
> > leads to failed array, without notification for userspace.
> > - in the array failure case -EBUSY seems to be wrong status. Array
> > is not busy, but removal process cannot proceed safe.
> >
> > -EBUSY expectation cannot be removed without breaking compatibility
> > with userspace, but we can adopt the failed state verification
> > method.
> >
> > In this patch MD_BROKEN flag support, used to mark non-redundant
> > array as dead, is added to RAID1 and RAID10. Support for RAID456 is
> > added in next commit.
> >
> > Now the array failure can be checked, so verify MD_BROKEN flag,
> > however still return -EBUSY to userspace.
> >
> > As in previous commit, it causes that #mdadm --set-faulty is able to
> > mark array as failed. Previously proposed workaround is valid if
> > optional functionality 9a567843f79("md: allow last device to be
> > forcibly removed from RAID1/RAID10.") is disabled.
> >
> > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> > ---
> > drivers/md/md.c | 17 ++++++++++-------
> > drivers/md/md.h | 4 ++--
> > drivers/md/raid1.c | 1 +
> > drivers/md/raid10.c | 1 +
> > 4 files changed, 14 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index f888ef197765..fda8473f96b8 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -2983,10 +2983,11 @@ state_store(struct md_rdev *rdev, const
> > char *buf, size_t len)
> >
> > if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
> > md_error(rdev->mddev, rdev);
> > - if (test_bit(Faulty, &rdev->flags))
> > - err = 0;
> > - else
> > +
> > + if (test_bit(MD_BROKEN, &rdev->mddev->flags))
> > err = -EBUSY;
> > + else
> > + err = 0;
> > } else if (cmd_match(buf, "remove")) {
> > if (rdev->mddev->pers) {
> > clear_bit(Blocked, &rdev->flags);
> > @@ -7441,7 +7442,7 @@ static int set_disk_faulty(struct mddev
> > *mddev, dev_t dev) err = -ENODEV;
> > else {
> > md_error(mddev, rdev);
> > - if (!test_bit(Faulty, &rdev->flags))
> > + if (test_bit(MD_BROKEN, &mddev->flags))
> > err = -EBUSY;
> > }
> > rcu_read_unlock();
> > @@ -7987,12 +7988,14 @@ void md_error(struct mddev *mddev, struct
> > md_rdev *rdev) if (!mddev->pers->sync_request)
> > return;
> >
> > - if (mddev->degraded)
> > + if (mddev->degraded && !test_bit(MD_BROKEN, &mddev->flags))
> > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
> > sysfs_notify_dirent_safe(rdev->sysfs_state);
> > set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> > - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > - md_wakeup_thread(mddev->thread);
> > + if (!test_bit(MD_BROKEN, &mddev->flags)) {
> > + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > + md_wakeup_thread(mddev->thread);
> > + }
> > if (mddev->event_work.func)
> > queue_work(md_misc_wq, &mddev->event_work);
> > md_new_event();
> > diff --git a/drivers/md/md.h b/drivers/md/md.h
> > index bc3f2094d0b6..d3a897868695 100644
> > --- a/drivers/md/md.h
> > +++ b/drivers/md/md.h
> > @@ -259,8 +259,8 @@ enum mddev_flags {
> > MD_NOT_READY, /* do_md_run() is active, so
> > 'array_state'
> > * must not report that array is
> > ready yet */
> > - MD_BROKEN, /* This is used in RAID-0/LINEAR
> > only, to stop
> > - * I/O in case an array member is
> > gone/failed.
> > + MD_BROKEN, /* This is used to stop I/O and
> > mark device as
> > + * dead in case an array becomes
> > failed. */
> > };
> >
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 7dc8026cf6ee..45dc75f90476 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -1638,6 +1638,7 @@ static void raid1_error(struct mddev *mddev,
> > struct md_rdev *rdev) */
> > conf->recovery_disabled = mddev->recovery_disabled;
> > spin_unlock_irqrestore(&conf->device_lock, flags);
> > + set_bit(MD_BROKEN, &mddev->flags);
> > return;
> > }
> > set_bit(Blocked, &rdev->flags);
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index dde98f65bd04..d7cefd212e6b 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -1964,6 +1964,7 @@ static void raid10_error(struct mddev *mddev,
> > struct md_rdev *rdev)
> > * Don't fail the drive, just return an IO error.
> > */
> > spin_unlock_irqrestore(&conf->device_lock, flags);
> > + set_bit(MD_BROKEN, &mddev->flags);
> > return;
> > }
> > if (test_and_clear_bit(In_sync, &rdev->flags))
> > --
> > 2.26.2
> >
>
> Hi Mariusz
>
> In your first Version, it checks MD_BROKEN in super_written. Does it
> need to do this in this version?
> From my understanding, it needs to retry the write when failfast is
> supported. If it checks MD_BROKEN
> in super_written, it can't send re-write anymore. Am I right?
>
Hi Xiao,
We have discussion about it with Song and as a result a dropped it.
I just left it as before. V2 shouldn't race with failfast because it
still takes care about faulty flag on device (MD_BROKEN is set
additionally).
I don't know a lot about failfast so, idea proposed in first patch could
be wrong. I did it this way because I've updated all md_error() and
later test_bit(Faulty, &rdev->flags) paths, with strict assumption
faulty == MD_BROKEN. For failfast and last_dev it is not obvious.
If you have any recommendation, I will be grateful.
Thanks,
Mariusz
next prev parent reply other threads:[~2021-12-27 12:34 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-16 14:52 [PATCH v2 0/3] Use MD_BROKEN for redundant arrays Mariusz Tkaczyk
2021-12-16 14:52 ` [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear Mariusz Tkaczyk
2021-12-17 2:00 ` Guoqing Jiang
2021-12-17 2:07 ` Guoqing Jiang
2021-12-19 3:26 ` Xiao Ni
2021-12-22 1:22 ` Guoqing Jiang
2021-12-20 9:39 ` Mariusz Tkaczyk
2021-12-19 3:20 ` Xiao Ni
2021-12-20 8:45 ` Mariusz Tkaczyk
2021-12-21 1:40 ` Xiao Ni
2021-12-21 13:56 ` Mariusz Tkaczyk
2021-12-22 1:54 ` Guoqing Jiang
2021-12-22 3:08 ` Xiao Ni
2021-12-16 14:52 ` [PATCH 2/3] md: Set MD_BROKEN for RAID1 and RAID10 Mariusz Tkaczyk
2021-12-17 2:16 ` Guoqing Jiang
2021-12-22 7:24 ` Xiao Ni
2021-12-27 12:34 ` Mariusz Tkaczyk [this message]
2021-12-16 14:52 ` [PATCH 3/3] raid5: introduce MD_BROKEN Mariusz Tkaczyk
2021-12-17 2:26 ` Guoqing Jiang
2021-12-17 8:37 ` Mariusz Tkaczyk
2021-12-22 1:46 ` Guoqing Jiang
2021-12-17 0:52 ` [PATCH v2 0/3] Use MD_BROKEN for redundant arrays Song Liu
2021-12-17 8:02 ` Mariusz Tkaczyk
2022-01-25 15:52 ` Mariusz Tkaczyk
2022-01-26 1:13 ` Song Liu
-- strict thread matches above, loose matches on Subject: below --
2022-01-27 15:39 [PATCH v3 0/3] Improve failed arrays handling Mariusz Tkaczyk
2022-01-27 15:39 ` [PATCH 2/3] md: Set MD_BROKEN for RAID1 and RAID10 Mariusz Tkaczyk
2022-01-31 8:29 ` Xiao Ni
2022-01-31 9:06 ` Mariusz Tkaczyk
2022-02-08 7:13 ` Song Liu
2022-01-31 12:23 ` Wols Lists
2022-02-12 1:17 ` Guoqing Jiang
2022-02-14 8:55 ` Mariusz Tkaczyk
2022-03-22 15:23 [PATCH 0/3] Failed array handling improvements Mariusz Tkaczyk
2022-03-22 15:23 ` [PATCH 2/3] md: Set MD_BROKEN for RAID1 and RAID10 Mariusz Tkaczyk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211227133429.00002062@linux.intel.com \
--to=mariusz.tkaczyk@linux.intel.com \
--cc=linux-raid@vger.kernel.org \
--cc=song@kernel.org \
--cc=xni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.