public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>,
	Song Liu <song@kernel.org>, Xiao Ni <xni@redhat.com>
Subject: [PATCH 5.4 27/34] raid5: introduce MD_BROKEN
Date: Fri,  3 Jun 2022 19:43:23 +0200	[thread overview]
Message-ID: <20220603173816.972587698@linuxfoundation.org> (raw)
In-Reply-To: <20220603173815.990072516@linuxfoundation.org>

From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>

commit 57668f0a4cc4083a120cc8c517ca0055c4543b59 upstream.

Raid456 module had allowed to achieve failed state. It was fixed by
fb73b357fb9 ("raid5: block failing device if raid will be failed").
This fix introduces a bug, now if raid5 fails during IO, it may result
with a hung task without completion. Faulty flag on the device is
necessary to process all requests and is checked many times, mainly in
analyze_stripe().
Allow to set faulty on drive again and set MD_BROKEN if raid is failed.

As a result, this level is allowed to achieve failed state again, but
communication with userspace (via -EBUSY status) will be preserved.

This restores possibility to fail array via #mdadm --set-faulty command
and will be fixed by additional verification on mdadm side.

Reproduction steps:
 mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
 mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
 mkfs.xfs /dev/md126 -f
 mount /dev/md126 /mnt/root/

 fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
--bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
--time_based --group_reporting --name=throughput-test-job
--eta-newline=1 &

 echo 1 > /sys/block/nvme2n1/device/device/remove
 echo 1 > /sys/block/nvme1n1/device/device/remove

 [ 1475.787779] Call Trace:
 [ 1475.793111] __schedule+0x2a6/0x700
 [ 1475.799460] schedule+0x38/0xa0
 [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
 [ 1475.813856] ? finish_wait+0x80/0x80
 [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
 [ 1475.828281] ? finish_wait+0x80/0x80
 [ 1475.834727] ? finish_wait+0x80/0x80
 [ 1475.841127] ? finish_wait+0x80/0x80
 [ 1475.847480] md_handle_request+0x119/0x190
 [ 1475.854390] md_make_request+0x8a/0x190
 [ 1475.861041] generic_make_request+0xcf/0x310
 [ 1475.868145] submit_bio+0x3c/0x160
 [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
 [ 1475.882070] iomap_dio_bio_actor+0x175/0x390
 [ 1475.889149] iomap_apply+0xff/0x310
 [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.909974] iomap_dio_rw+0x2f2/0x490
 [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.923680] ? atime_needs_update+0x77/0xe0
 [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
 [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
 [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
 [ 1475.953403] aio_read+0xd5/0x180
 [ 1475.959395] ? _cond_resched+0x15/0x30
 [ 1475.965907] io_submit_one+0x20b/0x3c0
 [ 1475.972398] __x64_sys_io_submit+0xa2/0x180
 [ 1475.979335] ? do_io_getevents+0x7c/0xc0
 [ 1475.986009] do_syscall_64+0x5b/0x1a0
 [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
 [ 1476.000255] RIP: 0033:0x7f11fc27978d
 [ 1476.006631] Code: Bad RIP value.
 [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.

Cc: stable@vger.kernel.org
Fixes: fb73b357fb9 ("raid5: block failing device if raid will be failed")
Reviewd-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/raid5.c |   47 ++++++++++++++++++++++-------------------------
 1 file changed, 22 insertions(+), 25 deletions(-)

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -609,17 +609,17 @@ int raid5_calc_degraded(struct r5conf *c
 	return degraded;
 }
 
-static int has_failed(struct r5conf *conf)
+static bool has_failed(struct r5conf *conf)
 {
-	int degraded;
+	int degraded = conf->mddev->degraded;
 
-	if (conf->mddev->reshape_position == MaxSector)
-		return conf->mddev->degraded > conf->max_degraded;
+	if (test_bit(MD_BROKEN, &conf->mddev->flags))
+		return true;
 
-	degraded = raid5_calc_degraded(conf);
-	if (degraded > conf->max_degraded)
-		return 1;
-	return 0;
+	if (conf->mddev->reshape_position != MaxSector)
+		degraded = raid5_calc_degraded(conf);
+
+	return degraded > conf->max_degraded;
 }
 
 struct stripe_head *
@@ -2679,34 +2679,31 @@ static void raid5_error(struct mddev *md
 	unsigned long flags;
 	pr_debug("raid456: error called\n");
 
+	pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n",
+		mdname(mddev), bdevname(rdev->bdev, b));
+
 	spin_lock_irqsave(&conf->device_lock, flags);
+	set_bit(Faulty, &rdev->flags);
+	clear_bit(In_sync, &rdev->flags);
+	mddev->degraded = raid5_calc_degraded(conf);
 
-	if (test_bit(In_sync, &rdev->flags) &&
-	    mddev->degraded == conf->max_degraded) {
-		/*
-		 * Don't allow to achieve failed state
-		 * Don't try to recover this device
-		 */
+	if (has_failed(conf)) {
+		set_bit(MD_BROKEN, &conf->mddev->flags);
 		conf->recovery_disabled = mddev->recovery_disabled;
-		spin_unlock_irqrestore(&conf->device_lock, flags);
-		return;
+
+		pr_crit("md/raid:%s: Cannot continue operation (%d/%d failed).\n",
+			mdname(mddev), mddev->degraded, conf->raid_disks);
+	} else {
+		pr_crit("md/raid:%s: Operation continuing on %d devices.\n",
+			mdname(mddev), conf->raid_disks - mddev->degraded);
 	}
 
-	set_bit(Faulty, &rdev->flags);
-	clear_bit(In_sync, &rdev->flags);
-	mddev->degraded = raid5_calc_degraded(conf);
 	spin_unlock_irqrestore(&conf->device_lock, flags);
 	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 
 	set_bit(Blocked, &rdev->flags);
 	set_mask_bits(&mddev->sb_flags, 0,
 		      BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
-	pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n"
-		"md/raid:%s: Operation continuing on %d devices.\n",
-		mdname(mddev),
-		bdevname(rdev->bdev, b),
-		mdname(mddev),
-		conf->raid_disks - mddev->degraded);
 	r5c_update_on_rdev_error(mddev, rdev);
 }
 



  parent reply	other threads:[~2022-06-03 17:48 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
2022-06-03 17:42 ` [PATCH 5.4 01/34] lockdown: also lock down previous kgdb use Greg Kroah-Hartman
2022-06-03 17:42 ` [PATCH 5.4 02/34] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Greg Kroah-Hartman
2022-06-03 17:42 ` [PATCH 5.4 03/34] staging: rtl8723bs: prevent ->Ssid overflow in rtw_wx_set_scan() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 04/34] Input: goodix - fix spurious key release events Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 05/34] tcp: change source port randomizarion at connect() time Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 06/34] secure_seq: use the 64 bits of the siphash for port offset calculation Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 07/34] media: vim2m: Register video device after setting up internals Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 08/34] media: vim2m: initialize the media device earlier Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 09/34] ACPI: sysfs: Make sparse happy about address space in use Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 10/34] ACPI: sysfs: Fix BERT error region memory mapping Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 11/34] pinctrl: sunxi: fix f1c100s uart2 function Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 12/34] net: af_key: check encryption module availability consistency Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 13/34] net: ftgmac100: Disable hardware checksum on AST2600 Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 14/34] i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 15/34] drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 16/34] assoc_array: Fix BUG_ON during garbage collect Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 17/34] cfg80211: set custom regdomain after wiphy registration Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 18/34] drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 19/34] exec: Force single empty string when argv is empty Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 20/34] netfilter: conntrack: re-fetch conntrack after insertion Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 21/34] crypto: ecrdsa - Fix incorrect use of vli_cmp Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 22/34] zsmalloc: fix races between asynchronous zspage free and page migration Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 23/34] dm integrity: fix error code in dm_integrity_ctr() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 24/34] dm crypt: make printing of the key constant-time Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 25/34] dm stats: add cond_resched when looping over entries Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag Greg Kroah-Hartman
2022-06-10  4:22   ` Oleksandr Tymoshenko
2022-06-10  5:15     ` Greg KH
2022-06-10  8:10       ` Oleksandr Tymoshenko
2022-06-10 15:11       ` Mike Snitzer
2022-06-13  9:13         ` Greg KH
2022-06-15 14:36           ` Guenter Roeck
2022-06-15 15:29             ` Mike Snitzer
2022-06-15 17:50               ` Guenter Roeck
2022-06-15 20:02                 ` Mike Snitzer
2022-06-15 20:40                   ` Guenter Roeck
2022-06-15 23:59                   ` Guenter Roeck
2022-06-16 23:22                   ` Guenter Roeck
2022-06-20 11:44                   ` Greg KH
2022-06-21 16:35                     ` [5.4.y PATCH v2] dm: remove special-casing of bio-based immutable singleton target on NVMe Mike Snitzer
2022-06-23 15:48                       ` Greg KH
2022-06-03 17:43 ` Greg Kroah-Hartman [this message]
2022-06-03 17:43 ` [PATCH 5.4 28/34] HID: multitouch: Add support for Google Whiskers Touchpad Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 29/34] tpm: Fix buffer access in tpm2_get_tpm_pt() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 30/34] tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 31/34] docs: submitting-patches: Fix crossref to The canonical patch format Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 32/34] NFS: Memory allocation failures are not server fatal errors Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 33/34] NFSD: Fix possible sleep during nfsd4_release_lockowner() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 34/34] bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes Greg Kroah-Hartman
2022-06-04 12:21 ` [PATCH 5.4 00/34] 5.4.197-rc1 review Sudip Mukherjee
2022-06-04 17:31 ` Naresh Kamboju
2022-06-04 18:54 ` Guenter Roeck
2022-06-06  1:08 ` Samuel Zou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220603173816.972587698@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mariusz.tkaczyk@linux.intel.com \
    --cc=song@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox