* [RFC PATCH] dm-raid: only requeue bios when dm is suspending.
2026-04-14 18:19 ` Benjamin Marzinski
@ 2026-04-14 19:03 ` Benjamin Marzinski
2026-04-15 1:28 ` [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE Yang Xiuwei
1 sibling, 0 replies; 6+ messages in thread
From: Benjamin Marzinski @ 2026-04-14 19:03 UTC (permalink / raw)
To: Yang Xiuwei
Cc: Yu Kuai, Li Nan, Song Liu, linux-raid, dm-devel, Xiao Ni,
Nigel Croxon
returning DM_MAPIO_REQUEUE from the target map() function only requeues
the bio during noflush suspends. During regular operations or during
flushing suspends, it fails the bio. Failing the bio during flushing
suspends is the correct behavior here. We cannot handle the bio, and we
cannot suspends while it is outstanding. But during normal operations,
we should not push the bio back to do. Instead, wait for the reshape
to be resumed.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
Yang Xiuwei, if you are still able to see I/O errors during LVM testing,
does this patch fix them?
drivers/md/dm-raid.c | 7 +++++++
drivers/md/md.h | 1 +
drivers/md/raid5.c | 6 ++++--
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 4bacdc499984..cac61d57e7e2 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3831,6 +3831,7 @@ static void raid_presuspend(struct dm_target *ti)
* resume, raid_postsuspend() is too late.
*/
set_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
+ WRITE_ONCE(mddev->dm_suspending, 1);
if (!reshape_interrupted(mddev))
return;
@@ -3847,6 +3848,9 @@ static void raid_presuspend(struct dm_target *ti)
static void raid_presuspend_undo(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
+ struct mddev *mddev = &rs->md;
+
+ WRITE_ONCE(mddev->dm_suspending, 0);
clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
}
@@ -3854,6 +3858,7 @@ static void raid_presuspend_undo(struct dm_target *ti)
static void raid_postsuspend(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
+ struct mddev *mddev = &rs->md;
if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
/*
@@ -3864,6 +3869,8 @@ static void raid_postsuspend(struct dm_target *ti)
mddev_suspend(&rs->md, false);
rs->md.ro = MD_RDONLY;
}
+ WRITE_ONCE(mddev->dm_suspending, 0);
+
}
static void attempt_restore_of_faulty_devices(struct raid_set *rs)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index ac84289664cd..e8d7332c5cb9 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -463,6 +463,7 @@ struct mddev {
int delta_disks, new_level, new_layout;
int new_chunk_sectors;
int reshape_backwards;
+ int dm_suspending;
struct md_thread __rcu *thread; /* management thread */
struct md_thread __rcu *sync_thread; /* doing resync or reconstruct */
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8854e024f311..d528263f92a3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6042,8 +6042,10 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
raid5_release_stripe(sh);
out:
if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) {
- bi->bi_status = BLK_STS_RESOURCE;
- ret = STRIPE_WAIT_RESHAPE;
+ if (!mddev_is_dm(mddev) || READ_ONCE(mddev->dm_suspending)) {
+ bi->bi_status = BLK_STS_RESOURCE;
+ ret = STRIPE_WAIT_RESHAPE;
+ }
pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress");
}
return ret;
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] md/raid5: Don't set bi_status on STRIPE_WAIT_RESHAPE
2026-04-14 18:19 ` Benjamin Marzinski
2026-04-14 19:03 ` [RFC PATCH] dm-raid: only requeue bios when dm is suspending Benjamin Marzinski
@ 2026-04-15 1:28 ` Yang Xiuwei
1 sibling, 0 replies; 6+ messages in thread
From: Yang Xiuwei @ 2026-04-15 1:28 UTC (permalink / raw)
To: Benjamin Marzinski
Cc: Yu Kuai, Li Nan, Song Liu, Xiao Ni, Nigel Croxon, linux-raid,
dm-devel
Hi Ben,
On Tue, Apr 14, 2026 at 02:19:33PM -0400, Benjamin Marzinski wrote:
> Yang Xiuwei, have you verified that this fix actually solves your
> problems? If a dm map() function completes with DM_MAPIO_REQUEUE, and
> the device is in a noflush suspend, it shouldn't set the error on the
> original bio, regardless of the clone bio. It should requeue the bio. If
> a dm map() function completes with DM_MAPIO_REQUEUE, and the device
> isn't in a noflush suspend, the original bio will always be completed
> with an error.
>
> To me, it seems more likely that what you are seeing is
> make_stripe_request() returning STRIPE_WAIT_RESHAPE when the dm device
> isn't actually in a noflush suspend. I have seen this myself.
>
> -Ben
I tested the version that removes setting bi->bi_status to BLK_STS_RESOURCE
in the STRIPE_WAIT_RESHAPE path you described. In my environment it did
not fix the failure below.
Sorry for the slow response. The earlier fix still did not solve the
problem in my testing. I am not very familiar with this area yet and wanted
to learn more before continuing the analysis, but other work meant I have
not had time to pick it up again until now.
I have not yet tested the dm-raid RFC patch from your follow-up message,
but I plan to try it when I have time.
The failure was observed while running the LVM2 shell test
lvconvert-raid-reshape-stripes-load-fail.sh. Below is the test log (kernel
messages and harness output), followed by the script contents.
Test log:
| [ 0:10.630] WARNING: This metadata update is NOT backed up.
| [ 0:10.632] aux disable_dev $dev1
| [ 0:10.748] #lvconvert-raid-reshape-stripes-load-fail.sh:68+ aux disable_dev /dev/mapper/LVMTEST1351568pv1
| [ 0:10.748] Disabling device /dev/mapper/LVMTEST1351568pv1 (252:5)
| [ 0:10.868] [73439.222696] <6> 2026-01-20 13:59:47 md: reshape of RAID array mdX
| [ 0:10.868] aux delay_dev "$dev2" 0 50
| [ 0:10.871] #lvconvert-raid-reshape-stripes-load-fail.sh:69+ aux delay_dev /dev/mapper/LVMTEST1351568pv2 0 50
| [ 0:10.871] check lv_first_seg_field $vg/$lv1 segtype "raid5_ls"
| [ 0:10.886] [73439.231558] <3> 2026-01-20 13:59:47 Buffer I/O error on dev dm-5, logical block 0, async page read
| [ 0:10.886] #lvconvert-raid-reshape-stripes-load-fail.sh:70+ check lv_first_seg_field LVMTEST1351568vg/LV1 segtype raid5_ls
| [ 0:10.886] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ.
| [ 0:10.910] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1).
| [ 0:10.910] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices.
| [ 0:10.910] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices.
| [ 0:10.910] check lv_first_seg_field $vg/$lv1 stripesize "64.00k"
| [ 0:10.912] #lvconvert-raid-reshape-stripes-load-fail.sh:71+ check lv_first_seg_field LVMTEST1351568vg/LV1 stripesize 64.00k
| [ 0:10.912] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ.
| [ 0:10.933] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1).
| [ 0:10.933] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices.
| [ 0:10.933] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices.
| [ 0:10.933] check lv_first_seg_field $vg/$lv1 data_stripes 15
| [ 0:10.935] #lvconvert-raid-reshape-stripes-load-fail.sh:72+ check lv_first_seg_field LVMTEST1351568vg/LV1 data_stripes 15
| [ 0:10.935] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ.
| [ 0:10.956] [73439.292632] <3> 2026-01-20 13:59:47 md: super_written gets error=-5
| [ 0:10.956] [73439.297679] <2> 2026-01-20 13:59:47 md/raid:mdX: Disk failure on dm-22, disabling device.
| [ 0:10.956] [73439.304626] <2> 2026-01-20 13:59:47 md/raid:mdX: Operation continuing on 15 devices.
| [ 0:10.956] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1).
| [ 0:10.956] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices.
| [ 0:10.956] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices.
| [ 0:10.956] check lv_first_seg_field $vg/$lv1 stripes 16
| [ 0:10.958] #lvconvert-raid-reshape-stripes-load-fail.sh:73+ check lv_first_seg_field LVMTEST1351568vg/LV1 stripes 16
| [ 0:10.958] WARNING: Couldn't find device with uuid Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ.
| [ 0:10.979] WARNING: VG LVMTEST1351568vg is missing PV Xprpyw-NTcw-RDRr-HzMg-LDZN-ZDIL-0Q2LoQ (last written to /dev/mapper/LVMTEST1351568pv1).
| [ 0:10.979] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rimage_0 while checking used and assumed devices.
| [ 0:10.979] WARNING: Couldn't find all devices for LV LVMTEST1351568vg/LV1_rmeta_0 while checking used and assumed devices.
| [ 0:10.979]
| [ 0:10.981] kill -9 %%
| [ 0:10.981] #lvconvert-raid-reshape-stripes-load-fail.sh:75+ kill -9 %%
| [ 0:10.981] wait
| [ 0:10.981] #lvconvert-raid-reshape-stripes-load-fail.sh:76+ wait
| [ 0:10.981] rm -fr "$mount_dir/[12]"
| [ 0:11.787] [73439.674065] <4> 2026-01-20 13:59:48 make_stripe_request: 24 callbacks suppressed
| [ 0:11.787] [73439.674074] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.674086] <3> 2026-01-20 13:59:48 Buffer I/O error on dev dm-43, logical block 1074, lost sync page write
| [ 0:11.787] [73439.681096] <6> 2026-01-20 13:59:48 md: mdX: reshape interrupted.
| [ 0:11.787] [73439.682723] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.691180] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.699766] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.708347] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.716934] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.725519] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.734099] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.734574] <2> 2026-01-20 13:59:48 EXT4-fs error (device dm-43): ext4_check_bdev_write_error:225: comm kworker/u388:2: Error while async write back metadata
| [ 0:11.787] [73439.742682] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.787] [73439.764081] <3> 2026-01-20 13:59:48 Aborting journal on device dm-43-8.
| [ 0:11.787] [73439.778040] <3> 2026-01-20 13:59:48 dm-raid456: io across reshape position while reshape can't make progress
| [ 0:11.788] [73439.778043] <3> 2026-01-20 13:59:48 Buffer I/O error on dev dm-43, logical block 740, lost sync page write
| [ 0:11.788] [73439.795025] <3> 2026-01-20 13:59:48 JBD2: I/O error when updating journal superblock for dm-43-8.
| [ 0:11.788] [73439.802674] <2> 2026-01-20 13:59:48 EXT4-fs error (device dm-43): ext4_journal_check_start:85: comm cp: Detected aborted journal
| [ 0:11.788] [73439.802673] <2> 2026-01-20 13:59:48 EXT4-fs error (device dm-43): ext4_journal_check_start:85: comm cp: Detected aborted journal
| [ 0:11.788] [73440.032568] <3> 2026-01-20 13:59:48 Buffer I/O error on dev dm-43, logical block 1, lost sync page write
| [ 0:11.788] [73440.040800] <3> 2026-01-20 13:59:48 EXT4-fs (dm-43): I/O error while writing superblock
| [ 0:11.788] [73440.040813] <3> 2026-01-20 13:59:48 EXT4-fs (dm-43): previous I/O error to superblock detected
| [ 0:11.788] [73440.047569] <2> 2026-01-20 13:59:48 EXT4-fs (dm-43): Remounting filesystem read-only
| [ 0:11.788] [73440.054948] <3> 2026-01-20 13:59:48 Buffer I/O error on dev dm-43, logical block 1, lost sync page write
| [ 0:11.788] [73440.069663] <3> 2026-01-20 13:59:48 EXT4-fs (dm-43): I/O error while writing superblock
| [ 0:11.788] [73440.076428] <2> 2026-01-20 13:59:48 EXT4-fs (dm-43): Remounting filesystem read-only
| [ 0:11.788] #lvconvert-raid-reshape-stripes-load-fail.sh:77+ rm -fr 'mnt/[12]'
| [ 0:11.788]
| [ 0:11.789] sync
| [ 0:11.789] #lvconvert-raid-reshape-stripes-load-fail.sh:79+ sync
| [ 0:11.789] umount "$mount_dir"
| [ 0:11.798] [73440.145596] <3> 2026-01-20 13:59:48 Buffer I/O error on dev dm-43, logical block 82, lost async page write
| [ 0:11.798] #lvconvert-raid-reshape-stripes-load-fail.sh:80+ umount mnt
| [ 0:11.798]
| [ 0:11.814] fsck -fn "$DM_DEV_DIR/$vg/$lv1"
| [ 0:11.814] [73440.162114] <6> 2026-01-20 13:59:48 EXT4-fs (dm-43): unmounting filesystem 86548d8e-e409-4ae8-b7d5-8b78a9b5fb50.
| [ 0:11.814] [73440.162336] <3> 2026-01-20 13:59:48 EXT4-fs (dm-43): I/O error while writing superblock
| [ 0:11.814] #lvconvert-raid-reshape-stripes-load-fail.sh:82+ fsck -fn /dev/LVMTEST1351568vg/LV1
| [ 0:11.814] fsck from util-linux 2.39.1
| [ 0:11.816] e2fsck 1.47.0 (5-Feb-2023)
| [ 0:11.821] fsck.ext2: Input/output error while trying to open /dev/mapper/LVMTEST1351568vg-LV1
| [ 0:11.821]
| [ 0:11.821] The superblock could not be read or does not describe a valid ext2/ext3/ext4
| [ 0:11.821] filesystem. If the device is valid and it really contains an ext2/ext3/ext4
| [ 0:11.821] filesystem (and not swap or ufs or something else), then the superblock
| [ 0:11.821] is corrupt, and you might try running e2fsck with an alternate superblock:
| [ 0:11.821] e2fsck -b 8193 <device>
| [ 0:11.821] or
| [ 0:11.821] e2fsck -b 32768 <device>
| [ 0:11.821]
| [ 0:11.821] set +vx; STACKTRACE; set -vx
| [ 0:11.822] ##lvconvert-raid-reshape-stripes-load-fail.sh:82+ set +vx
| [ 0:11.822] ## - /opt/K2CI_agent_tool/lvm2/test/shell/lvconvert-raid-reshape-stripes-load-fail.sh:82
| [ 0:11.822] ## 1 STACKTRACE() called from /opt/K2CI_agent_tool/lvm2/test/shell/lvconvert-raid-reshape-stripes-load-fail.sh:82
lvconvert-raid-reshape-stripes-load-fail.sh:
#!/usr/bin/env bash
# Copyright (C) 2017 Red Hat, Inc. All rights reserved.
#
# This copyrighted material is made available to anyone wishing to use,
# modify, copy, or redistribute it subject to the terms and conditions
# of the GNU General Public License v.2.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 51 Franklin Street, Fifth Floor, Boston, MA2110-1301 USA
SKIP_WITH_LVMPOLLD=1
. lib/inittest
# Test reshaping under io load
case "$(uname -r)" in
3.10.0-862*) skip "Cannot run this test on unfixed kernel." ;;
esac
which mkfs.ext4 || skip
aux have_raid 1 13 2 || skip
mount_dir="mnt"
cleanup_mounted_and_teardown()
{
umount "$mount_dir" || true
aux teardown
}
aux prepare_pvs 16 32
get_devs
vgcreate $SHARED -s 1M "$vg" "${DEVICES[@]}"
trap 'cleanup_mounted_and_teardown' EXIT
# Create 10-way striped raid5 (11 legs total)
lvcreate --yes --type raid5_ls --stripesize 64K --stripes 10 -L4 -n$lv1 $vg
check lv_first_seg_field $vg/$lv1 segtype "raid5_ls"
check lv_first_seg_field $vg/$lv1 stripesize "64.00k"
check lv_first_seg_field $vg/$lv1 data_stripes 10
check lv_first_seg_field $vg/$lv1 stripes 11
wipefs -a "$DM_DEV_DIR/$vg/$lv1"
mkfs -t ext4 "$DM_DEV_DIR/$vg/$lv1"
fsck -fn "$DM_DEV_DIR/$vg/$lv1"
mkdir -p "$mount_dir"
mount "$DM_DEV_DIR/$vg/$lv1" "$mount_dir"
mkdir -p "$mount_dir/1" "$mount_dir/2"
echo 3 >/proc/sys/vm/drop_caches
cp -r /usr/bin "$mount_dir/1" &>/dev/null &
cp -r /usr/bin "$mount_dir/2" &>/dev/null &
sync &
aux wait_for_sync $vg $lv1
aux delay_dev "$dev2" 0 100
# Reshape it to 15 data stripes
lvconvert --yes --stripes 15 $vg/$lv1
aux disable_dev $dev1
aux delay_dev "$dev2" 0 50
check lv_first_seg_field $vg/$lv1 segtype "raid5_ls"
check lv_first_seg_field $vg/$lv1 stripesize "64.00k"
check lv_first_seg_field $vg/$lv1 data_stripes 15
check lv_first_seg_field $vg/$lv1 stripes 16
kill -9 %%
wait
rm -fr "$mount_dir/[12]"
sync
umount "$mount_dir"
fsck -fn "$DM_DEV_DIR/$vg/$lv1"
vgremove -ff $vg
Thanks,
Yang Xiuwei
^ permalink raw reply [flat|nested] 6+ messages in thread