From: "Mateusz Jończyk" <mat.jonczyk@o2.pl>
To: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: "Mateusz Jończyk" <mat.jonczyk@o2.pl>,
stable@vger.kernel.org, "Song Liu" <song@kernel.org>,
"Yu Kuai" <yukuai3@huawei.com>,
"Paul Luse" <paul.e.luse@linux.intel.com>,
"Xiao Ni" <xni@redhat.com>,
"Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>
Subject: [PATCH] md/raid1: set max_sectors during early return from choose_slow_rdev()
Date: Thu, 11 Jul 2024 22:23:16 +0200 [thread overview]
Message-ID: <20240711202316.10775-1-mat.jonczyk@o2.pl> (raw)
In-Reply-To: <349e4894-b6ea-6bc4-b040-4a816b6960ab@huaweicloud.com>
Linux 6.9+ is unable to start a degraded RAID1 array with one drive,
when that drive has a write-mostly flag set. During such an attempt,
the following assertion in bio_split() is hit:
BUG_ON(sectors <= 0);
Call Trace:
? bio_split+0x96/0xb0
? exc_invalid_op+0x53/0x70
? bio_split+0x96/0xb0
? asm_exc_invalid_op+0x1b/0x20
? bio_split+0x96/0xb0
? raid1_read_request+0x890/0xd20
? __call_rcu_common.constprop.0+0x97/0x260
raid1_make_request+0x81/0xce0
? __get_random_u32_below+0x17/0x70
? new_slab+0x2b3/0x580
md_handle_request+0x77/0x210
md_submit_bio+0x62/0xa0
__submit_bio+0x17b/0x230
submit_bio_noacct_nocheck+0x18e/0x3c0
submit_bio_noacct+0x244/0x670
After investigation, it turned out that choose_slow_rdev() does not set
the value of max_sectors in some cases and because of it,
raid1_read_request calls bio_split with sectors == 0.
Fix it by filling in this variable.
This bug was introduced in
commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
but apparently hidden until
commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()")
shortly thereafter.
Cc: stable@vger.kernel.org # 6.9.x+
Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
Cc: Song Liu <song@kernel.org>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Paul Luse <paul.e.luse@linux.intel.com>
Cc: Xiao Ni <xni@redhat.com>
Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/
--
Tested on both Linux 6.10 and 6.9.8.
Inside a VM, mdadm testsuite for RAID1 on 6.10 did not find any problems:
./test --dev=loop --no-error --raidtype=raid1
(on 6.9.8 there was one failure, caused by external bitmap support not
compiled in).
Notes:
- I was reliably getting deadlocks when adding / removing devices
on such an array - while the array was loaded with fsstress with 20
concurrent processes. When the array was idle or loaded with fsstress
with 8 processes, no such deadlocks happened in my tests.
This occurred also on unpatched Linux 6.8.0 though, but not on
6.1.97-rc1, so this is likely an independent regression (to be
investigated).
- I was also getting deadlocks when adding / removing the bitmap on the
array in similar conditions - this happened on Linux 6.1.97-rc1
also though. fsstress with 8 concurrent processes did cause it only
once during many tests.
- in my testing, there was once a problem with hot adding an
internal bitmap to the array:
mdadm: Cannot add bitmap while array is resyncing or reshaping etc.
mdadm: failed to set internal bitmap.
even though no such reshaping was happening according to /proc/mdstat.
This seems unrelated, though.
---
drivers/md/raid1.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7b8a71ca66dd..82f70a4ce6ed 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -680,6 +680,7 @@ static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio,
len = r1_bio->sectors;
read_len = raid1_check_read_range(rdev, this_sector, &len);
if (read_len == r1_bio->sectors) {
+ *max_sectors = read_len;
update_read_sectors(conf, disk, this_sector, read_len);
return disk;
}
base-commit: 256abd8e550ce977b728be79a74e1729438b4948
--
2.25.1
next prev parent reply other threads:[~2024-07-11 20:30 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-06 14:30 [REGRESSION] Cannot start degraded RAID1 array with device with write-mostly flag Mateusz Jończyk
2024-07-07 19:50 ` Mateusz Jończyk
2024-07-08 1:54 ` Yu Kuai
2024-07-08 20:09 ` Mateusz Jończyk
2024-07-09 2:57 ` Yu Kuai
2024-07-11 20:23 ` Mateusz Jończyk [this message]
2024-07-11 21:14 ` [PATCH] md/raid1: set max_sectors during early return from choose_slow_rdev() Paul E Luse
2024-07-12 1:16 ` Yu Kuai
2024-07-12 15:11 ` Song Liu
2024-07-13 12:40 ` Mateusz Jończyk
2024-07-09 6:49 ` [REGRESSION] Cannot start degraded RAID1 array with device with write-mostly flag Mariusz Tkaczyk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240711202316.10775-1-mat.jonczyk@o2.pl \
--to=mat.jonczyk@o2.pl \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=mariusz.tkaczyk@linux.intel.com \
--cc=paul.e.luse@linux.intel.com \
--cc=song@kernel.org \
--cc=stable@vger.kernel.org \
--cc=xni@redhat.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).