* [PATCH 0/2] md/raid5: account discard IO and allow llbitmap discard
@ 2026-06-05 3:22 Yu Kuai
2026-06-05 3:22 ` [PATCH] md/raid5: account discard IO Yu Kuai
2026-06-05 3:22 ` [PATCH] md/raid5: allow discard with llbitmap Yu Kuai
0 siblings, 2 replies; 3+ messages in thread
From: Yu Kuai @ 2026-06-05 3:22 UTC (permalink / raw)
To: Song Liu; +Cc: Yu Kuai, Li Nan, Xiao Ni, linux-raid, linux-kernel
From: Yu Kuai <yukuai@fygo.io>
Hi,
This series fixes RAID5 discard accounting and then allows RAID5
discard when llbitmap is enabled.
Patch 1 routes processed RAID5 discard bios through md_account_bio().
Since RAID5 only discards whole data stripes, it accounts the exact
full-stripe range that make_discard_request() will submit before
restoring the original bio iterator for completion.
Patch 2 allows discard without devices_handle_discard_safely when
llbitmap is enabled. Legacy bitmap cannot record discarded/unwritten
data and can make later partial writes plus member failure reconstruct
inconsistent data. llbitmap records discarded ranges as unwritten, and
RAID5 already uses that state to force safe recovery paths before
relying on parity.
Runtime validation was done in QEMU with a RAID5 array using a lockless
bitmap and discard-capable disks. Multiple aligned and unaligned
blkdiscard ranges were issued while tracing md_account_bio() and
llbitmap_start_discard(); the llbitmap unwritten-bit deltas matched the
traced bitmap ranges.
Yu Kuai (2):
md/raid5: account discard IO
md/raid5: allow discard with llbitmap
drivers/md/raid5.c | 36 +++++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH] md/raid5: account discard IO
2026-06-05 3:22 [PATCH 0/2] md/raid5: account discard IO and allow llbitmap discard Yu Kuai
@ 2026-06-05 3:22 ` Yu Kuai
2026-06-05 3:22 ` [PATCH] md/raid5: allow discard with llbitmap Yu Kuai
1 sibling, 0 replies; 3+ messages in thread
From: Yu Kuai @ 2026-06-05 3:22 UTC (permalink / raw)
To: Song Liu; +Cc: Yu Kuai, Li Nan, Xiao Ni, linux-raid, linux-kernel
From: Yu Kuai <yukuai@fygo.io>
Raid5 handles discard bios internally through make_discard_request() and
never passes them through md_account_bio(). As a result, discard IO is
missing the md-device iostat accounting that normal raid5 IO and discard
IO in other raid levels get from md_account_bio().
Before accounting the bio, trim the request to the full data stripes that
raid5 will actually discard. The first full stripe is the ceiling of the
bio start divided by data-stripe sectors, and the last full stripe is the
floor of the bio end divided by data-stripe sectors. Account that exact
MD logical full-stripe range, then restore the original iterator so bio
completion and iostat still cover the original request.
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
drivers/md/raid5.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 65ae7d8930fc..debf35342ae0 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5688,34 +5688,47 @@ static void release_stripe_plug(struct mddev *mddev,
static void make_discard_request(struct mddev *mddev, struct bio *bi)
{
struct r5conf *conf = mddev->private;
sector_t logical_sector, last_sector;
+ sector_t first_stripe, last_stripe;
struct stripe_head *sh;
+ struct bvec_iter bi_iter;
+ struct bio *orig_bi = bi;
int stripe_sectors;
/* We need to handle this when io_uring supports discard/trim */
if (WARN_ON_ONCE(bi->bi_opf & REQ_NOWAIT))
return;
if (mddev->reshape_position != MaxSector)
/* Skip discard while reshape is happening */
return;
- logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
- last_sector = bio_end_sector(bi);
-
- bi->bi_next = NULL;
-
stripe_sectors = conf->chunk_sectors *
(conf->raid_disks - conf->max_degraded);
- logical_sector = DIV_ROUND_UP_SECTOR_T(logical_sector,
- stripe_sectors);
- sector_div(last_sector, stripe_sectors);
+ first_stripe = DIV_ROUND_UP_SECTOR_T(bi->bi_iter.bi_sector,
+ stripe_sectors);
+ last_stripe = bio_end_sector(bi);
+ sector_div(last_stripe, stripe_sectors);
+
+ if (first_stripe >= last_stripe) {
+ bio_endio(bi);
+ return;
+ }
+
+ bi_iter = bi->bi_iter;
+ bi->bi_iter.bi_sector = first_stripe * stripe_sectors;
+ bi->bi_iter.bi_size = ((last_stripe - first_stripe) *
+ stripe_sectors) << 9;
+ md_account_bio(mddev, &bi);
+ orig_bi->bi_iter = bi_iter;
+ bi->bi_iter = bi_iter;
+ bi->bi_next = NULL;
- logical_sector *= conf->chunk_sectors;
- last_sector *= conf->chunk_sectors;
+ logical_sector = first_stripe * conf->chunk_sectors;
+ last_sector = last_stripe * conf->chunk_sectors;
for (; logical_sector < last_sector;
logical_sector += RAID5_STRIPE_SECTORS(conf)) {
DEFINE_WAIT(w);
int d;
--
2.51.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* [PATCH] md/raid5: allow discard with llbitmap
2026-06-05 3:22 [PATCH 0/2] md/raid5: account discard IO and allow llbitmap discard Yu Kuai
2026-06-05 3:22 ` [PATCH] md/raid5: account discard IO Yu Kuai
@ 2026-06-05 3:22 ` Yu Kuai
1 sibling, 0 replies; 3+ messages in thread
From: Yu Kuai @ 2026-06-05 3:22 UTC (permalink / raw)
To: Song Liu; +Cc: Yu Kuai, Li Nan, Xiao Ni, linux-raid, linux-kernel
From: Yu Kuai <yukuai@fygo.io>
Raid5 disables discard unless devices_handle_discard_safely is set
because lower devices that do not return zeroes after discard can
leave a discarded stripe inconsistent. A later partial write can then
reconstruct bad data if a member fails.
The legacy bitmap needs this restriction because it only records
write-intent dirty ranges. It cannot distinguish discarded data from
valid data, so discard can make data inconsistent after later partial
writes and failures.
llbitmap records discarded ranges as unwritten. Raid5 already consults
llbitmap state to force RCW or lazy recovery before using parity for
unwritten data. Therefore non-zeroing discard is safe with llbitmap
while the existing full-stripe granularity and lower-device
discard-size checks still apply.
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
drivers/md/raid5.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index debf35342ae0..4e9a758a8cc9 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7829,11 +7829,12 @@ static int raid5_set_limits(struct mddev *mddev)
* We only allow DISCARD if the sysadmin has confirmed that only safe
* devices are in use by setting a module parameter. A better idea
* might be to turn DISCARD into WRITE_ZEROES requests, as that is
* required to be safe.
*/
- if (!devices_handle_discard_safely ||
+ if ((!devices_handle_discard_safely &&
+ mddev->bitmap_id != ID_LLBITMAP) ||
lim.max_discard_sectors < (stripe >> 9) ||
lim.discard_granularity < stripe)
lim.max_hw_discard_sectors = 0;
/*
--
2.51.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-05 3:22 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05 3:22 [PATCH 0/2] md/raid5: account discard IO and allow llbitmap discard Yu Kuai
2026-06-05 3:22 ` [PATCH] md/raid5: account discard IO Yu Kuai
2026-06-05 3:22 ` [PATCH] md/raid5: allow discard with llbitmap Yu Kuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox