* [PATCH AUTOSEL 6.3 06/14] md: fix data corruption for raid456 when reshape restart while grow up
[not found] <20230702194053.1777356-1-sashal@kernel.org>
@ 2023-07-02 19:40 ` Sasha Levin
2023-07-02 19:40 ` [PATCH AUTOSEL 6.3 07/14] md/raid10: prevent soft lockup while flush writes Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2023-07-02 19:40 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Yu Kuai, Peter Neuwirth, Song Liu, Sasha Levin, linux-raid
From: Yu Kuai <yukuai3@huawei.com>
[ Upstream commit 873f50ece41aad5c4f788a340960c53774b5526e ]
Currently, if reshape is interrupted, echo "reshape" to sync_action will
restart reshape from scratch, for example:
echo frozen > sync_action
echo reshape > sync_action
This will corrupt data before reshape_position if the array is growing,
fix the problem by continue reshape from reshape_position.
Reported-by: Peter Neuwirth <reddunur@online.de>
Link: https://lore.kernel.org/linux-raid/e2f96772-bfbc-f43b-6da1-f520e5164536@online.de/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/md.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d479e1656ef33..61529d14e4dd6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4824,11 +4824,21 @@ action_store(struct mddev *mddev, const char *page, size_t len)
return -EINVAL;
err = mddev_lock(mddev);
if (!err) {
- if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
err = -EBUSY;
- else {
+ } else if (mddev->reshape_position == MaxSector ||
+ mddev->pers->check_reshape == NULL ||
+ mddev->pers->check_reshape(mddev)) {
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
err = mddev->pers->start_reshape(mddev);
+ } else {
+ /*
+ * If reshape is still in progress, and
+ * md_check_recovery() can continue to reshape,
+ * don't restart reshape because data can be
+ * corrupted for raid456.
+ */
+ clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
}
mddev_unlock(mddev);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 2+ messages in thread
* [PATCH AUTOSEL 6.3 07/14] md/raid10: prevent soft lockup while flush writes
[not found] <20230702194053.1777356-1-sashal@kernel.org>
2023-07-02 19:40 ` [PATCH AUTOSEL 6.3 06/14] md: fix data corruption for raid456 when reshape restart while grow up Sasha Levin
@ 2023-07-02 19:40 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2023-07-02 19:40 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Yu Kuai, Song Liu, Sasha Levin, linux-raid
From: Yu Kuai <yukuai3@huawei.com>
[ Upstream commit 010444623e7f4da6b4a4dd603a7da7469981e293 ]
Currently, there is no limit for raid1/raid10 plugged bio. While flushing
writes, raid1 has cond_resched() while raid10 doesn't, and too many
writes can cause soft lockup.
Follow up soft lockup can be triggered easily with writeback test for
raid10 with ramdisks:
watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
Call Trace:
<TASK>
call_rcu+0x16/0x20
put_object+0x41/0x80
__delete_object+0x50/0x90
delete_object_full+0x2b/0x40
kmemleak_free+0x46/0xa0
slab_free_freelist_hook.constprop.0+0xed/0x1a0
kmem_cache_free+0xfd/0x300
mempool_free_slab+0x1f/0x30
mempool_free+0x3a/0x100
bio_free+0x59/0x80
bio_put+0xcf/0x2c0
free_r10bio+0xbf/0xf0
raid_end_bio_io+0x78/0xb0
one_write_done+0x8a/0xa0
raid10_end_write_request+0x1b4/0x430
bio_endio+0x175/0x320
brd_submit_bio+0x3b9/0x9b7 [brd]
__submit_bio+0x69/0xe0
submit_bio_noacct_nocheck+0x1e6/0x5a0
submit_bio_noacct+0x38c/0x7e0
flush_pending_writes+0xf0/0x240
raid10d+0xac/0x1ed0
Fix the problem by adding cond_resched() to raid10 like what raid1 did.
Note that unlimited plugged bio still need to be optimized, for example,
in the case of lots of dirty pages writeback, this will take lots of
memory and io will spend a long time in plug, hence io latency is bad.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230529131106.2123367-2-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/raid10.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index ea6967aeaa02a..3be84c828e549 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -921,6 +921,7 @@ static void flush_pending_writes(struct r10conf *conf)
else
submit_bio_noacct(bio);
bio = next;
+ cond_resched();
}
blk_finish_plug(&plug);
} else
@@ -1140,6 +1141,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
else
submit_bio_noacct(bio);
bio = next;
+ cond_resched();
}
kfree(plug);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-07-02 19:44 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20230702194053.1777356-1-sashal@kernel.org>
2023-07-02 19:40 ` [PATCH AUTOSEL 6.3 06/14] md: fix data corruption for raid456 when reshape restart while grow up Sasha Levin
2023-07-02 19:40 ` [PATCH AUTOSEL 6.3 07/14] md/raid10: prevent soft lockup while flush writes Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).