From: "Mateusz Jończyk" <mat.jonczyk@o2.pl>
To: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: regressions@lists.linux.dev, "Song Liu" <song@kernel.org>,
"Yu Kuai" <yukuai3@huawei.com>,
"Paul Luse" <paul.e.luse@linux.intel.com>,
"Xiao Ni" <xni@redhat.com>, "Mateusz Jończyk" <mat.jonczyk@o2.pl>
Subject: [REGRESSION] Cannot start degraded RAID1 array with device with write-mostly flag
Date: Sat, 6 Jul 2024 16:30:38 +0200 [thread overview]
Message-ID: <20240706143038.7253-1-mat.jonczyk@o2.pl> (raw)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7765 bytes --]
Hello,
Linux 6.9+ cannot start a degraded RAID1 array when the only remaining
device has the write-mostly flag set. Linux 6.8.0 works fine, as does
6.1.96.
#regzbot introduced: v6.8.0..v6.9.0
In my laptop, I used to have two RAID1 arrays on top of NVMe and SATA
SSD drives: /dev/md0 for /boot, /dev/md1 for remaining data. For
performance, I have marked the RAID component devices on the SATA SSD
drive write-mostly, which "means that the 'md' driver will avoid reading
from these devices if at all possible".
Recently, the NVMe drive started failing, so I removed it from the arrays:
$ cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb5[1](W)
471727104 blocks super 1.2 [2/1] [_U]
bitmap: 4/4 pages [16KB], 65536KB chunk
md0 : active raid1 sdb4[1](W)
2094080 blocks super 1.2 [2/1] [_U]
unused devices: <none>
and wiped it. Since then, Linux 6.9+ fails to assemble the arrays on startup
with the following stacktraces in dmesg:
md/raid1:md0: active with 1 out of 2 mirrors
md0: detected capacity change from 0 to 4188160
------------[ cut here ]------------
kernel BUG at block/bio.c:1659!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 174 Comm: mdadm Not tainted 6.10.0-rc6unif33 #493
Hardware name: HP HP Laptop 17-by0xxx/84CA, BIOS F.72 05/31/2024
RIP: 0010:bio_split+0x96/0xb0
Code: df ff ff 41 f6 45 14 80 74 08 66 41 81 4c 24 14 80 00 5b 4c 89 e0 41 5c 41 5d 5d c3 cc cc cc cc 41 c7 45 28 00 00 00 00 eb d9 <0f> 0b 0f 0b 0f 0b 45 31 e4 eb dd 66 66 2e 0f 1f 84 00 00 00 00 00
RSP: 0018:ffffa7588041b330 EFLAGS: 00010246
RAX: 0000000000000008 RBX: 0000000000000001 RCX: ffff9f22cb08f938
RDX: 0000000000000c00 RSI: 0000000000000000 RDI: ffff9f22c1199400
RBP: ffffa7588041b420 R08: ffff9f22c3587b30 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000008 R12: ffff9f22cc9da700
R13: ffff9f22cb08f800 R14: ffff9f22c6a35fa0 R15: ffff9f22c1846800
FS: 00007f5f88404740(0000) GS:ffff9f2621e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056299cb95000 CR3: 000000010c82a002 CR4: 00000000003706f0
Call Trace:
<TASK>
? show_regs+0x67/0x70
? __die_body+0x20/0x70
? die+0x3e/0x60
? do_trap+0xd6/0xf0
? do_error_trap+0x71/0x90
? bio_split+0x96/0xb0
? exc_invalid_op+0x53/0x70
? bio_split+0x96/0xb0
? asm_exc_invalid_op+0x1b/0x20
? bio_split+0x96/0xb0
? raid1_read_request+0x890/0xd20
? __call_rcu_common.constprop.0+0x97/0x260
raid1_make_request+0x81/0xce0
? __get_random_u32_below+0x17/0x70 // is not present in other stacktraces
? new_slab+0x2b3/0x580 // is not present in other stacktraces
md_handle_request+0x77/0x210
md_submit_bio+0x62/0xa0
__submit_bio+0x17b/0x230
submit_bio_noacct_nocheck+0x18e/0x3c0
submit_bio_noacct+0x244/0x670
submit_bio+0xac/0xe0
submit_bh_wbc+0x168/0x190
block_read_full_folio+0x203/0x420
? __mod_memcg_lruvec_state+0xcd/0x210
? __pfx_blkdev_get_block+0x10/0x10
? __lruvec_stat_mod_folio+0x63/0xb0
? __filemap_add_folio+0x24d/0x450
? __pfx_blkdev_read_folio+0x10/0x10
blkdev_read_folio+0x18/0x20
filemap_read_folio+0x45/0x290
? __pfx_workingset_update_node+0x10/0x10
? folio_add_lru+0x5a/0x80
? filemap_add_folio+0xba/0xe0
? __pfx_blkdev_read_folio+0x10/0x10
do_read_cache_folio+0x10a/0x3c0
read_cache_folio+0x12/0x20
read_part_sector+0x36/0xc0
read_lba+0x96/0x1b0
find_valid_gpt+0xe8/0x770
? get_page_from_freelist+0x615/0x12e0
? __pfx_efi_partition+0x10/0x10
efi_partition+0x80/0x4e0
? vsnprintf+0x297/0x4f0
? snprintf+0x49/0x70
? __pfx_efi_partition+0x10/0x10
bdev_disk_changed+0x270/0x760
blkdev_get_whole+0x8b/0xb0
bdev_open+0x2bd/0x390
? __pfx_blkdev_open+0x10/0x10
blkdev_open+0x8f/0xc0
do_dentry_open+0x174/0x570
vfs_open+0x2b/0x40
path_openat+0xb20/0x1150
do_filp_open+0xa8/0x120
? alloc_fd+0xc2/0x180
do_sys_openat2+0x250/0x2a0
do_sys_open+0x46/0x80
__x64_sys_openat+0x20/0x30
x64_sys_call+0xe55/0x20d0
do_syscall_64+0x47/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5f88514f5b
Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
RSP: 002b:00007ffd8839cbe0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007ffd8839dbe0 RCX: 00007f5f88514f5b
RDX: 0000000000004000 RSI: 00007ffd8839cc70 RDI: 00000000ffffff9c
RBP: 00007ffd8839cc70 R08: 0000000000000000 R09: 00007ffd8839cae0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000004000
R13: 0000000000004000 R14: 00007ffd8839cc68 R15: 000055942d9dabe0
</TASK>
Modules linked in: crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 drm_buddy r8169 i2c_algo_bit psmouse i2c_i801 drm_display_helper i2c_mux video i2c_smbus
xhci_pci realtek cec xhci_pci_renesas i2c_hid_acpi i2c_hid hid wmi aesni_intel crypto_simd cryptd
---[ end trace 0000000000000000 ]---
which were logged twice (for two arrays).
The line
kernel BUG at block/bio.c:1659!
corresponds to
BUG_ON(sectors <= 0);
in bio_split().
After some investigation, I have determined that the bug is most likely in
choose_slow_rdev() in drivers/md/raid1.c, which doesn't set max_sectors
before returning early. A test patch (below) seems to fix this issue (Linux
boots and appears to be working correctly with it, but I didn't do any more
advanced experiments yet).
This points to
commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
as the most likely culprit. However, I was running into other bugs in mdadm when
trying to test this commit directly.
Distribution: Ubuntu 20.04, hardware: a HP 17-by0001nw laptop.
Greetings,
Mateusz
---------------------------------------------------
>From e19348bc62eea385459ca1df67bd7c7c2afd7538 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mateusz=20Jo=C5=84czyk?= <mat.jonczyk@o2.pl>
Date: Sat, 6 Jul 2024 11:21:03 +0200
Subject: [RFC PATCH] md/raid1: fill in max_sectors
Not yet fully tested or carefully investigated.
Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
---
drivers/md/raid1.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7b8a71ca66dd..82f70a4ce6ed 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -680,6 +680,7 @@ static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio,
len = r1_bio->sectors;
read_len = raid1_check_read_range(rdev, this_sector, &len);
if (read_len == r1_bio->sectors) {
+ *max_sectors = read_len;
update_read_sectors(conf, disk, this_sector, read_len);
return disk;
}
--
2.25.1
next reply other threads:[~2024-07-06 14:38 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-06 14:30 Mateusz Jończyk [this message]
2024-07-07 19:50 ` [REGRESSION] Cannot start degraded RAID1 array with device with write-mostly flag Mateusz Jończyk
2024-07-08 1:54 ` Yu Kuai
2024-07-08 20:09 ` Mateusz Jończyk
2024-07-09 2:57 ` Yu Kuai
2024-07-11 20:23 ` [PATCH] md/raid1: set max_sectors during early return from choose_slow_rdev() Mateusz Jończyk
2024-07-11 21:14 ` Paul E Luse
2024-07-12 1:16 ` Yu Kuai
2024-07-12 15:11 ` Song Liu
2024-07-13 12:40 ` Mateusz Jończyk
2024-07-09 6:49 ` [REGRESSION] Cannot start degraded RAID1 array with device with write-mostly flag Mariusz Tkaczyk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240706143038.7253-1-mat.jonczyk@o2.pl \
--to=mat.jonczyk@o2.pl \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=paul.e.luse@linux.intel.com \
--cc=regressions@lists.linux.dev \
--cc=song@kernel.org \
--cc=xni@redhat.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).