From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>, song@kernel.org
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 3/3] raid5: introduce MD_BROKEN
Date: Fri, 17 Dec 2021 10:26:27 +0800 [thread overview]
Message-ID: <3d5fe975-265f-557e-5d13-88ef6b06bcba@linux.dev> (raw)
In-Reply-To: <20211216145222.15370-4-mariusz.tkaczyk@linux.intel.com>
On 12/16/21 10:52 PM, Mariusz Tkaczyk wrote:
> Raid456 module had allowed to achieve failed state. It was fixed by
> fb73b357fb9 ("raid5: block failing device if raid will be failed").
> This fix introduces a bug, now if raid5 fails during IO, it may result
> with a hung task without completion. Faulty flag on the device is
> necessary to process all requests and is checked many times, mainly in
> analyze_stripe().
> Allow to set faulty on drive again and set MD_BROKEN if raid is failed.
>
> As a result, this level is allowed to achieve failed state again, but
> communication with userspace (via -EBUSY status) will be preserved.
>
> This restores possibility to fail array via #mdadm --set-faulty command
> and will be fixed by additional verification on mdadm side.
>
> Reproduction steps:
> mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
> mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
> mkfs.xfs /dev/md126 -f
> mount /dev/md126 /mnt/root/
>
> fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
> --bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
> --time_based --group_reporting --name=throughput-test-job
> --eta-newline=1 &
>
> echo 1 > /sys/block/nvme2n1/device/device/remove
> echo 1 > /sys/block/nvme1n1/device/device/remove
>
> [ 1475.787779] Call Trace:
> [ 1475.793111] __schedule+0x2a6/0x700
> [ 1475.799460] schedule+0x38/0xa0
> [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
> [ 1475.813856] ? finish_wait+0x80/0x80
> [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
> [ 1475.828281] ? finish_wait+0x80/0x80
> [ 1475.834727] ? finish_wait+0x80/0x80
> [ 1475.841127] ? finish_wait+0x80/0x80
> [ 1475.847480] md_handle_request+0x119/0x190
> [ 1475.854390] md_make_request+0x8a/0x190
> [ 1475.861041] generic_make_request+0xcf/0x310
> [ 1475.868145] submit_bio+0x3c/0x160
> [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
> [ 1475.882070] iomap_dio_bio_actor+0x175/0x390
> [ 1475.889149] iomap_apply+0xff/0x310
> [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
> [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
> [ 1475.909974] iomap_dio_rw+0x2f2/0x490
> [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
> [ 1475.923680] ? atime_needs_update+0x77/0xe0
> [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
> [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
> [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
> [ 1475.953403] aio_read+0xd5/0x180
> [ 1475.959395] ? _cond_resched+0x15/0x30
> [ 1475.965907] io_submit_one+0x20b/0x3c0
> [ 1475.972398] __x64_sys_io_submit+0xa2/0x180
> [ 1475.979335] ? do_io_getevents+0x7c/0xc0
> [ 1475.986009] do_syscall_64+0x5b/0x1a0
> [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
> [ 1476.000255] RIP: 0033:0x7f11fc27978d
> [ 1476.006631] Code: Bad RIP value.
> [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.
>
> Fixes: fb73b357fb9 ("raid5: block failing device if raid will be failed")
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> ---
> drivers/md/raid5.c | 34 ++++++++++++++++------------------
> 1 file changed, 16 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 1240a5c16af8..8b5561811431 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -690,6 +690,9 @@ static int has_failed(struct r5conf *conf)
> {
> int degraded;
>
> + if (test_bit(MD_BROKEN, &conf->mddev->flags))
> + return 1;
> +
> if (conf->mddev->reshape_position == MaxSector)
> return conf->mddev->degraded > conf->max_degraded;
>
> @@ -2877,34 +2880,29 @@ static void raid5_error(struct mddev *mddev, struct md_rdev *rdev)
> unsigned long flags;
> pr_debug("raid456: error called\n");
>
> - spin_lock_irqsave(&conf->device_lock, flags);
> -
> - if (test_bit(In_sync, &rdev->flags) &&
> - mddev->degraded == conf->max_degraded) {
> - /*
> - * Don't allow to achieve failed state
> - * Don't try to recover this device
> - */
> - conf->recovery_disabled = mddev->recovery_disabled;
> - spin_unlock_irqrestore(&conf->device_lock, flags);
> - return;
> - }
> + pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n",
> + mdname(mddev), bdevname(rdev->bdev, b));
>
> + spin_lock_irqsave(&conf->device_lock, flags);
> set_bit(Faulty, &rdev->flags);
> clear_bit(In_sync, &rdev->flags);
> mddev->degraded = raid5_calc_degraded(conf);
> +
> + if (has_failed(conf)) {
> + set_bit(MD_BROKEN, &mddev->flags);
What about other callers of has_failed? Do they need to set BROKEN flag?
Or set the flag in has_failed if it returns true, just FYI.
Thanks,
Guoqing
next prev parent reply other threads:[~2021-12-17 2:26 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-16 14:52 [PATCH v2 0/3] Use MD_BROKEN for redundant arrays Mariusz Tkaczyk
2021-12-16 14:52 ` [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear Mariusz Tkaczyk
2021-12-17 2:00 ` Guoqing Jiang
2021-12-17 2:07 ` Guoqing Jiang
2021-12-19 3:26 ` Xiao Ni
2021-12-22 1:22 ` Guoqing Jiang
2021-12-20 9:39 ` Mariusz Tkaczyk
2021-12-19 3:20 ` Xiao Ni
2021-12-20 8:45 ` Mariusz Tkaczyk
2021-12-21 1:40 ` Xiao Ni
2021-12-21 13:56 ` Mariusz Tkaczyk
2021-12-22 1:54 ` Guoqing Jiang
2021-12-22 3:08 ` Xiao Ni
2021-12-16 14:52 ` [PATCH 2/3] md: Set MD_BROKEN for RAID1 and RAID10 Mariusz Tkaczyk
2021-12-17 2:16 ` Guoqing Jiang
2021-12-22 7:24 ` Xiao Ni
2021-12-27 12:34 ` Mariusz Tkaczyk
2021-12-16 14:52 ` [PATCH 3/3] raid5: introduce MD_BROKEN Mariusz Tkaczyk
2021-12-17 2:26 ` Guoqing Jiang [this message]
2021-12-17 8:37 ` Mariusz Tkaczyk
2021-12-22 1:46 ` Guoqing Jiang
2021-12-17 0:52 ` [PATCH v2 0/3] Use MD_BROKEN for redundant arrays Song Liu
2021-12-17 8:02 ` Mariusz Tkaczyk
2022-01-25 15:52 ` Mariusz Tkaczyk
2022-01-26 1:13 ` Song Liu
-- strict thread matches above, loose matches on Subject: below --
2022-01-27 15:39 [PATCH v3 0/3] Improve failed arrays handling Mariusz Tkaczyk
2022-01-27 15:39 ` [PATCH 3/3] raid5: introduce MD_BROKEN Mariusz Tkaczyk
2022-01-31 8:58 ` Xiao Ni
2022-02-12 1:47 ` Guoqing Jiang
2022-02-22 14:18 ` Mariusz Tkaczyk
2022-02-25 7:22 ` Guoqing Jiang
2022-03-03 16:21 ` Mariusz Tkaczyk
2022-03-22 15:23 [PATCH 0/3] Failed array handling improvements Mariusz Tkaczyk
2022-03-22 15:23 ` [PATCH 3/3] raid5: introduce MD_BROKEN Mariusz Tkaczyk
2022-04-08 0:29 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3d5fe975-265f-557e-5d13-88ef6b06bcba@linux.dev \
--to=guoqing.jiang@linux.dev \
--cc=linux-raid@vger.kernel.org \
--cc=mariusz.tkaczyk@linux.intel.com \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).