* [PATCH RFC 0/7] block: fix disordered IO in the case recursive split
@ 2025-08-25 9:36 Yu Kuai
2025-08-25 9:36 ` [PATCH RFC 1/7] block: export helper bio_submit_split() Yu Kuai
` (6 more replies)
0 siblings, 7 replies; 32+ messages in thread
From: Yu Kuai @ 2025-08-25 9:36 UTC (permalink / raw)
To: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil
Cc: linux-block, linux-kernel, cgroups, linux-raid, yukuai1, yi.zhang,
yangerkun, johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
patch 1 export a bio split helper;
patch 2-6 unfiy bio split code from mdraid to use the helper;
patch 7 convert the helper to insert split bio to the head of current
bio_list
This set is just test for raid5 for now, see details in patch 7;
Yu Kuai (7):
block: export helper bio_submit_split()
md/raid0: convert raid0_handle_discard() to use bio_submit_split()
md/raid1: convert to use bio_submit_split()
md/raid10: convert read/write to use bio_submit_split()
md/raid5: convert to use bio_submit_split()
md/md-linear: convert to use bio_submit_split()
block: fix disordered IO in the case recursive split
block/blk-core.c | 54 ++++++++++++++++++++++++-------------
block/blk-merge.c | 60 +++++++++++++++++++++++++++---------------
block/blk-throttle.c | 2 +-
block/blk.h | 3 ++-
drivers/md/md-linear.c | 14 +++-------
drivers/md/raid0.c | 20 ++++++--------
drivers/md/raid1.c | 35 ++++++++++--------------
drivers/md/raid10.c | 53 ++++++++++++++++---------------------
drivers/md/raid10.h | 1 +
drivers/md/raid5.c | 12 +++++----
include/linux/bio.h | 2 ++
11 files changed, 135 insertions(+), 121 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RFC 1/7] block: export helper bio_submit_split()
2025-08-25 9:36 [PATCH RFC 0/7] block: fix disordered IO in the case recursive split Yu Kuai
@ 2025-08-25 9:36 ` Yu Kuai
2025-08-25 10:53 ` Christoph Hellwig
2025-08-25 9:36 ` [PATCH RFC 2/7] md/raid0: convert raid0_handle_discard() to use bio_submit_split() Yu Kuai
` (5 subsequent siblings)
6 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-08-25 9:36 UTC (permalink / raw)
To: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil
Cc: linux-block, linux-kernel, cgroups, linux-raid, yukuai1, yi.zhang,
yangerkun, johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
No functional changes are intended, some drivers like mdraid will split
bio by internal processing, prepare to unify bio split codes.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-merge.c | 60 +++++++++++++++++++++++++++++----------------
include/linux/bio.h | 2 ++
2 files changed, 41 insertions(+), 21 deletions(-)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 70d704615be5..c45d5e43e172 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -104,35 +104,49 @@ static unsigned int bio_allowed_max_sectors(const struct queue_limits *lim)
return round_down(UINT_MAX, lim->logical_block_size) >> SECTOR_SHIFT;
}
-static struct bio *bio_submit_split(struct bio *bio, int split_sectors)
+/**
+ * bio_submit_split - Submit a bio, splitting it at a designated sector
+ * @bio: the original bio to be submitted and split
+ * @split_sectors: the sector count (from the start of @bio) at which to split
+ * @bs: the bio set used for allocating the new split bio
+ *
+ * The original bio is modified to contain the remaining sectors and submitted.
+ * The caller is responsible for submitting the returned bio.
+ *
+ * If succeed, the newly allocated bio representing the initial part will be
+ * returned, on failure NULL will be returned and original bio will fail.
+ */
+struct bio *bio_submit_split(struct bio *bio, int split_sectors,
+ struct bio_set *bs)
{
+ struct bio *split;
+
if (unlikely(split_sectors < 0))
goto error;
- if (split_sectors) {
- struct bio *split;
+ if (!split_sectors)
+ return bio;
- split = bio_split(bio, split_sectors, GFP_NOIO,
- &bio->bi_bdev->bd_disk->bio_split);
- if (IS_ERR(split)) {
- split_sectors = PTR_ERR(split);
- goto error;
- }
- split->bi_opf |= REQ_NOMERGE;
- blkcg_bio_issue_init(split);
- bio_chain(split, bio);
- trace_block_split(split, bio->bi_iter.bi_sector);
- WARN_ON_ONCE(bio_zone_write_plugging(bio));
- submit_bio_noacct(bio);
- return split;
+ split = bio_split(bio, split_sectors, GFP_NOIO, bs);
+ if (IS_ERR(split)) {
+ split_sectors = PTR_ERR(split);
+ goto error;
}
- return bio;
+ split->bi_opf |= REQ_NOMERGE;
+ blkcg_bio_issue_init(split);
+ bio_chain(split, bio);
+ trace_block_split(split, bio->bi_iter.bi_sector);
+ WARN_ON_ONCE(bio_zone_write_plugging(bio));
+ submit_bio_noacct(bio);
+ return split;
+
error:
bio->bi_status = errno_to_blk_status(split_sectors);
bio_endio(bio);
return NULL;
}
+EXPORT_SYMBOL_GPL(bio_submit_split);
struct bio *bio_split_discard(struct bio *bio, const struct queue_limits *lim,
unsigned *nsegs)
@@ -167,7 +181,8 @@ struct bio *bio_split_discard(struct bio *bio, const struct queue_limits *lim,
if (split_sectors > tmp)
split_sectors -= tmp;
- return bio_submit_split(bio, split_sectors);
+ return bio_submit_split(bio, split_sectors,
+ &bio->bi_bdev->bd_disk->bio_split);
}
static inline unsigned int blk_boundary_sectors(const struct queue_limits *lim,
@@ -357,7 +372,8 @@ struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
{
return bio_submit_split(bio,
bio_split_rw_at(bio, lim, nr_segs,
- get_max_io_size(bio, lim) << SECTOR_SHIFT));
+ get_max_io_size(bio, lim) << SECTOR_SHIFT),
+ &bio->bi_bdev->bd_disk->bio_split);
}
/*
@@ -376,7 +392,8 @@ struct bio *bio_split_zone_append(struct bio *bio,
lim->max_zone_append_sectors << SECTOR_SHIFT);
if (WARN_ON_ONCE(split_sectors > 0))
split_sectors = -EINVAL;
- return bio_submit_split(bio, split_sectors);
+ return bio_submit_split(bio, split_sectors,
+ &bio->bi_bdev->bd_disk->bio_split);
}
struct bio *bio_split_write_zeroes(struct bio *bio,
@@ -396,7 +413,8 @@ struct bio *bio_split_write_zeroes(struct bio *bio,
return bio;
if (bio_sectors(bio) <= max_sectors)
return bio;
- return bio_submit_split(bio, max_sectors);
+ return bio_submit_split(bio, max_sectors,
+ &bio->bi_bdev->bd_disk->bio_split);
}
/**
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 46ffac5caab7..2233261be5e8 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -324,6 +324,8 @@ extern struct bio *bio_split(struct bio *bio, int sectors,
gfp_t gfp, struct bio_set *bs);
int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim,
unsigned *segs, unsigned max_bytes);
+struct bio *bio_submit_split(struct bio *bio, int split_sectors,
+ struct bio_set *bs);
/**
* bio_next_split - get next @sectors from a bio, splitting if necessary
--
2.39.2
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH RFC 2/7] md/raid0: convert raid0_handle_discard() to use bio_submit_split()
2025-08-25 9:36 [PATCH RFC 0/7] block: fix disordered IO in the case recursive split Yu Kuai
2025-08-25 9:36 ` [PATCH RFC 1/7] block: export helper bio_submit_split() Yu Kuai
@ 2025-08-25 9:36 ` Yu Kuai
2025-08-25 10:57 ` Christoph Hellwig
2025-08-25 9:36 ` [PATCH RFC 3/7] md/raid1: convert " Yu Kuai
` (4 subsequent siblings)
6 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-08-25 9:36 UTC (permalink / raw)
To: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil
Cc: linux-block, linux-kernel, cgroups, linux-raid, yukuai1, yi.zhang,
yangerkun, johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
On the one hand unify bio split code, prepare to fix disordered split
IO; On the other hand fix missing blkcg_bio_issue_init() and
trace_block_split() for split IO.
Noted raid0_make_request() already fix disordered split IO by
319ff40a5427 ("md/raid0: Fix performance regression for large sequential
writes"), by convert bio to underlying disks before submit_bio_noacct(),
with the respect md_submit_bio() already split by sectors, and
raid0_make_request() will split at most once for unaligned IO. This is a
bit hacky and we'll convert this to solution in general later.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/raid0.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index f1d8811a542a..19b5faf238b7 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -463,21 +463,17 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
zone = find_zone(conf, &start);
if (bio_end_sector(bio) > zone->zone_end) {
- struct bio *split = bio_split(bio,
- zone->zone_end - bio->bi_iter.bi_sector, GFP_NOIO,
- &mddev->bio_set);
-
- if (IS_ERR(split)) {
- bio->bi_status = errno_to_blk_status(PTR_ERR(split));
- bio_endio(bio);
+ bio = bio_submit_split(bio,
+ zone->zone_end - bio->bi_iter.bi_sector,
+ &mddev->bio_set);
+ if (!bio)
return;
- }
- bio_chain(split, bio);
- submit_bio_noacct(bio);
- bio = split;
+
+ bio->bi_opf &= ~REQ_NOMERGE;
end = zone->zone_end;
- } else
+ } else {
end = bio_end_sector(bio);
+ }
orig_end = end;
if (zone != conf->strip_zone)
--
2.39.2
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH RFC 3/7] md/raid1: convert to use bio_submit_split()
2025-08-25 9:36 [PATCH RFC 0/7] block: fix disordered IO in the case recursive split Yu Kuai
2025-08-25 9:36 ` [PATCH RFC 1/7] block: export helper bio_submit_split() Yu Kuai
2025-08-25 9:36 ` [PATCH RFC 2/7] md/raid0: convert raid0_handle_discard() to use bio_submit_split() Yu Kuai
@ 2025-08-25 9:36 ` Yu Kuai
2025-08-25 10:57 ` Christoph Hellwig
2025-08-25 9:36 ` [PATCH RFC 4/7] md/raid10: convert read/write " Yu Kuai
` (3 subsequent siblings)
6 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-08-25 9:36 UTC (permalink / raw)
To: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil
Cc: linux-block, linux-kernel, cgroups, linux-raid, yukuai1, yi.zhang,
yangerkun, johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
On the one hand unify bio split code, prepare to fix disordered split
IO; On the other hand fix missing blkcg_bio_issue_init() and
trace_block_split() for split IO.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/raid1.c | 35 ++++++++++++++---------------------
1 file changed, 14 insertions(+), 21 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 408c26398321..95196c8749f9 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1317,7 +1317,7 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
struct raid1_info *mirror;
struct bio *read_bio;
int max_sectors;
- int rdisk, error;
+ int rdisk;
bool r1bio_existed = !!r1_bio;
/*
@@ -1376,16 +1376,13 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
}
if (max_sectors < bio_sectors(bio)) {
- struct bio *split = bio_split(bio, max_sectors,
- gfp, &conf->bio_split);
-
- if (IS_ERR(split)) {
- error = PTR_ERR(split);
+ bio = bio_submit_split(bio, max_sectors,&conf->bio_split);
+ if (!bio) {
+ set_bit(R1BIO_Returned, &r1_bio->state);
goto err_handle;
}
- bio_chain(split, bio);
- submit_bio_noacct(bio);
- bio = split;
+
+ bio->bi_opf &= ~REQ_NOMERGE;
r1_bio->master_bio = bio;
r1_bio->sectors = max_sectors;
}
@@ -1413,7 +1410,6 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
err_handle:
atomic_dec(&mirror->rdev->nr_pending);
- bio->bi_status = errno_to_blk_status(error);
set_bit(R1BIO_Uptodate, &r1_bio->state);
raid_end_bio_io(r1_bio);
}
@@ -1457,7 +1453,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
{
struct r1conf *conf = mddev->private;
struct r1bio *r1_bio;
- int i, disks, k, error;
+ int i, disks, k;
unsigned long flags;
int first_clone;
int max_sectors;
@@ -1562,7 +1558,8 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
* the benefit.
*/
if (bio->bi_opf & REQ_ATOMIC) {
- error = -EIO;
+ bio->bi_status =
+ errno_to_blk_status(-EIO);
goto err_handle;
}
@@ -1584,16 +1581,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
max_sectors = min_t(int, max_sectors,
BIO_MAX_VECS * (PAGE_SIZE >> 9));
if (max_sectors < bio_sectors(bio)) {
- struct bio *split = bio_split(bio, max_sectors,
- GFP_NOIO, &conf->bio_split);
-
- if (IS_ERR(split)) {
- error = PTR_ERR(split);
+ bio = bio_submit_split(bio, max_sectors, &conf->bio_split);
+ if (!bio) {
+ set_bit(R1BIO_Returned, &r1_bio->state);
goto err_handle;
}
- bio_chain(split, bio);
- submit_bio_noacct(bio);
- bio = split;
+
+ bio->bi_opf &= ~REQ_NOMERGE;
r1_bio->master_bio = bio;
r1_bio->sectors = max_sectors;
}
@@ -1683,7 +1677,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
}
}
- bio->bi_status = errno_to_blk_status(error);
set_bit(R1BIO_Uptodate, &r1_bio->state);
raid_end_bio_io(r1_bio);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-08-25 9:36 [PATCH RFC 0/7] block: fix disordered IO in the case recursive split Yu Kuai
` (2 preceding siblings ...)
2025-08-25 9:36 ` [PATCH RFC 3/7] md/raid1: convert " Yu Kuai
@ 2025-08-25 9:36 ` Yu Kuai
2025-08-25 10:59 ` Christoph Hellwig
2025-08-25 9:36 ` [PATCH RFC 5/7] md/raid5: convert " Yu Kuai
` (2 subsequent siblings)
6 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-08-25 9:36 UTC (permalink / raw)
To: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil
Cc: linux-block, linux-kernel, cgroups, linux-raid, yukuai1, yi.zhang,
yangerkun, johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
On the one hand unify bio split code, prepare to fix disordered split
IO; On the other hand fix missing blkcg_bio_issue_init() and
trace_block_split() for split IO.
Noted discard is not handled, because discard is only splited for
unaligned head and tail.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/raid10.c | 53 ++++++++++++++++++++-------------------------
drivers/md/raid10.h | 1 +
2 files changed, 24 insertions(+), 30 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index b60c30bfb6c7..b8777661307b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -322,10 +322,12 @@ static void raid_end_bio_io(struct r10bio *r10_bio)
struct bio *bio = r10_bio->master_bio;
struct r10conf *conf = r10_bio->mddev->private;
- if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
- bio->bi_status = BLK_STS_IOERR;
+ if (!test_and_set_bit(R10BIO_Returned, &r10_bio->state)) {
+ if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
+ bio->bi_status = BLK_STS_IOERR;
+ bio_endio(bio);
+ }
- bio_endio(bio);
/*
* Wake up any possible resync thread that waits for the device
* to go idle.
@@ -1154,7 +1156,6 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
int slot = r10_bio->read_slot;
struct md_rdev *err_rdev = NULL;
gfp_t gfp = GFP_NOIO;
- int error;
if (slot >= 0 && r10_bio->devs[slot].rdev) {
/*
@@ -1203,17 +1204,16 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
rdev->bdev,
(unsigned long long)r10_bio->sector);
if (max_sectors < bio_sectors(bio)) {
- struct bio *split = bio_split(bio, max_sectors,
- gfp, &conf->bio_split);
- if (IS_ERR(split)) {
- error = PTR_ERR(split);
- goto err_handle;
- }
- bio_chain(split, bio);
allow_barrier(conf);
- submit_bio_noacct(bio);
+ bio = bio_submit_split(bio, max_sectors, &conf->bio_split);
wait_barrier(conf, false);
- bio = split;
+
+ if (!bio) {
+ set_bit(R10BIO_Returned, &r10_bio->state);
+ goto err_handle;
+ }
+
+ bio->bi_opf &= ~REQ_NOMERGE;
r10_bio->master_bio = bio;
r10_bio->sectors = max_sectors;
}
@@ -1239,10 +1239,9 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
mddev_trace_remap(mddev, read_bio, r10_bio->sector);
submit_bio_noacct(read_bio);
return;
+
err_handle:
atomic_dec(&rdev->nr_pending);
- bio->bi_status = errno_to_blk_status(error);
- set_bit(R10BIO_Uptodate, &r10_bio->state);
raid_end_bio_io(r10_bio);
}
@@ -1351,7 +1350,6 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
int i, k;
sector_t sectors;
int max_sectors;
- int error;
if ((mddev_is_clustered(mddev) &&
mddev->cluster_ops->area_resyncing(mddev, WRITE,
@@ -1465,10 +1463,8 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
* complexity of supporting that is not worth
* the benefit.
*/
- if (bio->bi_opf & REQ_ATOMIC) {
- error = -EIO;
+ if (bio->bi_opf & REQ_ATOMIC)
goto err_handle;
- }
good_sectors = first_bad - dev_sector;
if (good_sectors < max_sectors)
@@ -1489,17 +1485,16 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
r10_bio->sectors = max_sectors;
if (r10_bio->sectors < bio_sectors(bio)) {
- struct bio *split = bio_split(bio, r10_bio->sectors,
- GFP_NOIO, &conf->bio_split);
- if (IS_ERR(split)) {
- error = PTR_ERR(split);
- goto err_handle;
- }
- bio_chain(split, bio);
allow_barrier(conf);
- submit_bio_noacct(bio);
+ bio = bio_submit_split(bio, r10_bio->sectors, &conf->bio_split);
wait_barrier(conf, false);
- bio = split;
+
+ if (!bio) {
+ set_bit(R10BIO_Returned, &r10_bio->state);
+ goto err_handle;
+ }
+
+ bio->bi_opf &= ~REQ_NOMERGE;
r10_bio->master_bio = bio;
}
@@ -1531,8 +1526,6 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
}
}
- bio->bi_status = errno_to_blk_status(error);
- set_bit(R10BIO_Uptodate, &r10_bio->state);
raid_end_bio_io(r10_bio);
}
diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index 3f16ad6904a9..cc167e708125 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -165,6 +165,7 @@ enum r10bio_state {
* so that raid10d knows what to do with them.
*/
R10BIO_ReadError,
+ R10BIO_Returned,
/* If a write for this request means we can clear some
* known-bad-block records, we set this flag.
*/
--
2.39.2
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH RFC 5/7] md/raid5: convert to use bio_submit_split()
2025-08-25 9:36 [PATCH RFC 0/7] block: fix disordered IO in the case recursive split Yu Kuai
` (3 preceding siblings ...)
2025-08-25 9:36 ` [PATCH RFC 4/7] md/raid10: convert read/write " Yu Kuai
@ 2025-08-25 9:36 ` Yu Kuai
2025-08-25 11:00 ` Christoph Hellwig
2025-08-25 9:36 ` [PATCH RFC 6/7] md/md-linear: " Yu Kuai
2025-08-25 9:37 ` [PATCH RFC 7/7] block: fix disordered IO in the case recursive split Yu Kuai
6 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-08-25 9:36 UTC (permalink / raw)
To: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil
Cc: linux-block, linux-kernel, cgroups, linux-raid, yukuai1, yi.zhang,
yangerkun, johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
On the one hand unify bio split code, prepare to fix disordered split
IO; On the other hand fix missing blkcg_bio_issue_init() and
trace_block_split() for split IO.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/raid5.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 023649fe2476..9ae749e66e9d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5468,17 +5468,19 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio *raid_bio)
{
- struct bio *split;
sector_t sector = raid_bio->bi_iter.bi_sector;
unsigned chunk_sects = mddev->chunk_sectors;
unsigned sectors = chunk_sects - (sector & (chunk_sects-1));
if (sectors < bio_sectors(raid_bio)) {
struct r5conf *conf = mddev->private;
- split = bio_split(raid_bio, sectors, GFP_NOIO, &conf->bio_split);
- bio_chain(split, raid_bio);
- submit_bio_noacct(raid_bio);
- raid_bio = split;
+
+ raid_bio = bio_submit_split(raid_bio, sectors,
+ &conf->bio_split);
+ if (!raid_bio)
+ return NULL;
+
+ raid_bio->bi_opf &= ~REQ_NOMERGE;
}
if (!raid5_read_one_chunk(mddev, raid_bio))
--
2.39.2
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH RFC 6/7] md/md-linear: convert to use bio_submit_split()
2025-08-25 9:36 [PATCH RFC 0/7] block: fix disordered IO in the case recursive split Yu Kuai
` (4 preceding siblings ...)
2025-08-25 9:36 ` [PATCH RFC 5/7] md/raid5: convert " Yu Kuai
@ 2025-08-25 9:36 ` Yu Kuai
2025-08-25 9:37 ` [PATCH RFC 7/7] block: fix disordered IO in the case recursive split Yu Kuai
6 siblings, 0 replies; 32+ messages in thread
From: Yu Kuai @ 2025-08-25 9:36 UTC (permalink / raw)
To: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil
Cc: linux-block, linux-kernel, cgroups, linux-raid, yukuai1, yi.zhang,
yangerkun, johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
On the one hand unify bio split code, prepare to fix disordered split
IO; On the other hand fix missing blkcg_bio_issue_init() and
trace_block_split() for split IO.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-linear.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
index 5d9b08115375..61375c61e4aa 100644
--- a/drivers/md/md-linear.c
+++ b/drivers/md/md-linear.c
@@ -256,18 +256,10 @@ static bool linear_make_request(struct mddev *mddev, struct bio *bio)
if (unlikely(bio_end_sector(bio) > end_sector)) {
/* This bio crosses a device boundary, so we have to split it */
- struct bio *split = bio_split(bio, end_sector - bio_sector,
- GFP_NOIO, &mddev->bio_set);
-
- if (IS_ERR(split)) {
- bio->bi_status = errno_to_blk_status(PTR_ERR(split));
- bio_endio(bio);
+ bio = bio_submit_split(bio, end_sector - bio_sector,
+ &mddev->bio_set);
+ if (!bio)
return true;
- }
-
- bio_chain(split, bio);
- submit_bio_noacct(bio);
- bio = split;
}
md_account_bio(mddev, &bio);
--
2.39.2
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH RFC 7/7] block: fix disordered IO in the case recursive split
2025-08-25 9:36 [PATCH RFC 0/7] block: fix disordered IO in the case recursive split Yu Kuai
` (5 preceding siblings ...)
2025-08-25 9:36 ` [PATCH RFC 6/7] md/md-linear: " Yu Kuai
@ 2025-08-25 9:37 ` Yu Kuai
2025-08-25 11:07 ` Christoph Hellwig
6 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-08-25 9:37 UTC (permalink / raw)
To: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil
Cc: linux-block, linux-kernel, cgroups, linux-raid, yukuai1, yi.zhang,
yangerkun, johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Currently, split bio will be chained to original bio, and original bio
will be resubmitted to the tail of current->bio_list, waiting for
split bio to be issued. However, if split bio get split again, the IO
order will be messed up, for example, in raid456 IO will first be
split by max_sector from md_submit_bio(), and then later be split
again by chunksize for internal handling:
For example, assume max_sectors is 1M, and chunksize is 512k
1) issue a 2M IO:
bio issuing: 0+2M
current->bio_list: NULL
2) md_submit_bio() split by max_sector:
bio issuing: 0+1M
current->bio_list: 1M+1M
3) chunk_aligned_read() split by chunksize:
bio issuing: 0+512k
current->bio_list: 1M+1M -> 512k+512k
4) after first bio issued, __submit_bio_noacct() will contuine issuing
next bio:
bio issuing: 1M+1M
current->bio_list: 512k+512k
bio issued: 0+512k
5) chunk_aligned_read() split by chunksize:
bio issuing: 1M+512k
current->bio_list: 512k+512k -> 1536k+512k
bio issued: 0+512k
6) no split afterwards, finally the issue order is:
0+512k -> 1M+512k -> 512k+512k -> 1536k+512k
This behaviour will cause large IO read on raid456 endup to be small
discontinuous IO in underlying disks. Fix this problem by placing split
bio to the head of current->bio_list.
Test script: test on 8 disk raid5 with 64k chunksize
dd if=/dev/md0 of=/dev/null bs=4480k iflag=direct
Test results:
Before this patch
1) iostat results:
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util
md0 52430.00 3276.87 0.00 0.00 0.62 64.00 32.60 80.10
sd* 4487.00 409.00 2054.00 31.40 0.82 93.34 3.68 71.20
2) blktrace G stage:
8,0 0 486445 11.357392936 843 G R 14071424 + 128 [dd]
8,0 0 486451 11.357466360 843 G R 14071168 + 128 [dd]
8,0 0 486454 11.357515868 843 G R 14071296 + 128 [dd]
8,0 0 486468 11.357968099 843 G R 14072192 + 128 [dd]
8,0 0 486474 11.358031320 843 G R 14071936 + 128 [dd]
8,0 0 486480 11.358096298 843 G R 14071552 + 128 [dd]
8,0 0 486490 11.358303858 843 G R 14071808 + 128 [dd]
3) io seek for sdx:
Noted io seek is the result from blktrace D stage, statistic of:
ABS((offset of next IO) - (offset + len of previous IO))
Read|Write seek
cnt 55175, zero cnt 25079
>=(KB) .. <(KB) : count ratio |distribution |
0 .. 1 : 25079 45.5% |########################################|
1 .. 2 : 0 0.0% | |
2 .. 4 : 0 0.0% | |
4 .. 8 : 0 0.0% | |
8 .. 16 : 0 0.0% | |
16 .. 32 : 0 0.0% | |
32 .. 64 : 12540 22.7% |##################### |
64 .. 128 : 2508 4.5% |##### |
128 .. 256 : 0 0.0% | |
256 .. 512 : 10032 18.2% |################# |
512 .. 1024 : 5016 9.1% |######### |
After this patch:
1) iostat results:
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util
md0 87965.00 5271.88 0.00 0.00 0.16 61.37 14.03 90.60
sd* 6020.00 658.44 5117.00 45.95 0.44 112.00 2.68 86.50
2) blktrace G stage:
8,0 0 206296 5.354894072 664 G R 7156992 + 128 [dd]
8,0 0 206305 5.355018179 664 G R 7157248 + 128 [dd]
8,0 0 206316 5.355204438 664 G R 7157504 + 128 [dd]
8,0 0 206319 5.355241048 664 G R 7157760 + 128 [dd]
8,0 0 206333 5.355500923 664 G R 7158016 + 128 [dd]
8,0 0 206344 5.355837806 664 G R 7158272 + 128 [dd]
8,0 0 206353 5.355960395 664 G R 7158528 + 128 [dd]
8,0 0 206357 5.356020772 664 G R 7158784 + 128 [dd]
2) io seek for sdx
Read|Write seek
cnt 28644, zero cnt 21483
>=(KB) .. <(KB) : count ratio |distribution |
0 .. 1 : 21483 75.0% |########################################|
1 .. 2 : 0 0.0% | |
2 .. 4 : 0 0.0% | |
4 .. 8 : 0 0.0% | |
8 .. 16 : 0 0.0% | |
16 .. 32 : 0 0.0% | |
32 .. 64 : 7161 25.0% |############## |
BTW, this looks like a long term problem from day one, and large
sequential IO read is pretty common case like video playing.
And even with this patch, in this test case IO is merged to at most 128k
is due to block layer plug limit BLK_PLUG_FLUSH_SIZE, increase such
limit and cat get even better performance. However, we'll figure out
how to do this properly later.
Fixes: d89d87965dcb ("When stacked block devices are in-use (e.g. md or dm), the recursive calls")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-core.c | 54 ++++++++++++++++++++++++++++----------------
block/blk-merge.c | 2 +-
block/blk-throttle.c | 2 +-
block/blk.h | 3 ++-
4 files changed, 39 insertions(+), 22 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 4201504158a1..cfb2179cc91e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -725,7 +725,7 @@ static void __submit_bio_noacct_mq(struct bio *bio)
current->bio_list = NULL;
}
-void submit_bio_noacct_nocheck(struct bio *bio)
+void submit_bio_noacct_nocheck(struct bio *bio, bool split)
{
blk_cgroup_bio_start(bio);
blkcg_bio_issue_init(bio);
@@ -745,12 +745,16 @@ void submit_bio_noacct_nocheck(struct bio *bio)
* to collect a list of requests submited by a ->submit_bio method while
* it is active, and then process them after it returned.
*/
- if (current->bio_list)
- bio_list_add(¤t->bio_list[0], bio);
- else if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO))
+ if (current->bio_list) {
+ if (split)
+ bio_list_add_head(¤t->bio_list[0], bio);
+ else
+ bio_list_add(¤t->bio_list[0], bio);
+ } else if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
__submit_bio_noacct_mq(bio);
- else
+ } else {
__submit_bio_noacct(bio);
+ }
}
static blk_status_t blk_validate_atomic_write_op_size(struct request_queue *q,
@@ -765,16 +769,7 @@ static blk_status_t blk_validate_atomic_write_op_size(struct request_queue *q,
return BLK_STS_OK;
}
-/**
- * submit_bio_noacct - re-submit a bio to the block device layer for I/O
- * @bio: The bio describing the location in memory and on the device.
- *
- * This is a version of submit_bio() that shall only be used for I/O that is
- * resubmitted to lower level drivers by stacking block drivers. All file
- * systems and other upper level users of the block layer should use
- * submit_bio() instead.
- */
-void submit_bio_noacct(struct bio *bio)
+static bool submit_bio_check(struct bio *bio)
{
struct block_device *bdev = bio->bi_bdev;
struct request_queue *q = bdev_get_queue(bdev);
@@ -869,19 +864,40 @@ void submit_bio_noacct(struct bio *bio)
goto not_supported;
}
- if (blk_throtl_bio(bio))
- return;
- submit_bio_noacct_nocheck(bio);
- return;
+ return !blk_throtl_bio(bio);
not_supported:
status = BLK_STS_NOTSUPP;
end_io:
bio->bi_status = status;
bio_endio(bio);
+ return false;
+}
+
+/**
+ * submit_bio_noacct - re-submit a bio to the block device layer for I/O
+ * @bio: The bio describing the location in memory and on the device.
+ *
+ * This is a version of submit_bio() that shall only be used for I/O that is
+ * resubmitted to lower level drivers by stacking block drivers. All file
+ * systems and other upper level users of the block layer should use
+ * submit_bio() instead.
+ */
+void submit_bio_noacct(struct bio *bio)
+{
+ if (submit_bio_check(bio))
+ submit_bio_noacct_nocheck(bio, false);
}
EXPORT_SYMBOL(submit_bio_noacct);
+void submit_split_bio_noacct(struct bio *bio)
+{
+ WARN_ON_ONCE(!current->bio_list);
+
+ if (submit_bio_check(bio))
+ submit_bio_noacct_nocheck(bio, true);
+}
+
static void bio_set_ioprio(struct bio *bio)
{
/* Nobody set ioprio so far? Initialize it based on task's nice value */
diff --git a/block/blk-merge.c b/block/blk-merge.c
index c45d5e43e172..934bbafe0462 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -138,7 +138,7 @@ struct bio *bio_submit_split(struct bio *bio, int split_sectors,
bio_chain(split, bio);
trace_block_split(split, bio->bi_iter.bi_sector);
WARN_ON_ONCE(bio_zone_write_plugging(bio));
- submit_bio_noacct(bio);
+ submit_split_bio_noacct(bio);
return split;
error:
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 397b6a410f9e..ead7b0eb4846 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1224,7 +1224,7 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work)
if (!bio_list_empty(&bio_list_on_stack)) {
blk_start_plug(&plug);
while ((bio = bio_list_pop(&bio_list_on_stack)))
- submit_bio_noacct_nocheck(bio);
+ submit_bio_noacct_nocheck(bio, false);
blk_finish_plug(&plug);
}
}
diff --git a/block/blk.h b/block/blk.h
index 46f566f9b126..d804a49c6313 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -54,7 +54,8 @@ bool blk_queue_start_drain(struct request_queue *q);
bool __blk_freeze_queue_start(struct request_queue *q,
struct task_struct *owner);
int __bio_queue_enter(struct request_queue *q, struct bio *bio);
-void submit_bio_noacct_nocheck(struct bio *bio);
+void submit_bio_noacct_nocheck(struct bio *bio, bool split);
+void submit_split_bio_noacct(struct bio *bio);
void bio_await_chain(struct bio *bio);
static inline bool blk_try_enter_queue(struct request_queue *q, bool pm)
--
2.39.2
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 1/7] block: export helper bio_submit_split()
2025-08-25 9:36 ` [PATCH RFC 1/7] block: export helper bio_submit_split() Yu Kuai
@ 2025-08-25 10:53 ` Christoph Hellwig
2025-08-26 0:51 ` Yu Kuai
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-08-25 10:53 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil, linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi
On Mon, Aug 25, 2025 at 05:36:54PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> No functional changes are intended, some drivers like mdraid will split
> bio by internal processing, prepare to unify bio split codes.
Maybe name the exported helper bio_submit_split_bioset and keep
bio_submit_split() as a wrapper that passes the default split
bioset to keep the code a bit tidyer in blk-merge.c?
> +struct bio *bio_submit_split(struct bio *bio, int split_sectors,
> + struct bio_set *bs)
> {
> + struct bio *split;
> +
> if (unlikely(split_sectors < 0))
> goto error;
>
> - if (split_sectors) {
> - struct bio *split;
> + if (!split_sectors)
> + return bio;
>
> - split = bio_split(bio, split_sectors, GFP_NOIO,
> - &bio->bi_bdev->bd_disk->bio_split);
> - if (IS_ERR(split)) {
> - split_sectors = PTR_ERR(split);
> - goto error;
> - }
> - split->bi_opf |= REQ_NOMERGE;
> - blkcg_bio_issue_init(split);
> - bio_chain(split, bio);
> - trace_block_split(split, bio->bi_iter.bi_sector);
> - WARN_ON_ONCE(bio_zone_write_plugging(bio));
> - submit_bio_noacct(bio);
Maybe skip the reformatting which makes this much harder to read?
If you think it is useful it can be done in a separate patch.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 2/7] md/raid0: convert raid0_handle_discard() to use bio_submit_split()
2025-08-25 9:36 ` [PATCH RFC 2/7] md/raid0: convert raid0_handle_discard() to use bio_submit_split() Yu Kuai
@ 2025-08-25 10:57 ` Christoph Hellwig
2025-08-26 1:08 ` Yu Kuai
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-08-25 10:57 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil, linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi
On Mon, Aug 25, 2025 at 05:36:55PM +0800, Yu Kuai wrote:
> + bio = bio_submit_split(bio,
> + zone->zone_end - bio->bi_iter.bi_sector,
> + &mddev->bio_set);
Do you know why raid0 and linear use mddev->bio_set for splitting
instead of their own split bio_sets like raid1/10/5? Is this safe?
Otherwise this looks nice.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 3/7] md/raid1: convert to use bio_submit_split()
2025-08-25 9:36 ` [PATCH RFC 3/7] md/raid1: convert " Yu Kuai
@ 2025-08-25 10:57 ` Christoph Hellwig
2025-08-26 1:09 ` Yu Kuai
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-08-25 10:57 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil, linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi
On Mon, Aug 25, 2025 at 05:36:56PM +0800, Yu Kuai wrote:
> + bio = bio_submit_split(bio, max_sectors,&conf->bio_split);
missing whitespace after the comma.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-08-25 9:36 ` [PATCH RFC 4/7] md/raid10: convert read/write " Yu Kuai
@ 2025-08-25 10:59 ` Christoph Hellwig
2025-08-26 1:13 ` Yu Kuai
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-08-25 10:59 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil, linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi
> allow_barrier(conf);
> + bio = bio_submit_split(bio, max_sectors, &conf->bio_split);
> wait_barrier(conf, false);
> +
> + if (!bio) {
> + set_bit(R10BIO_Returned, &r10_bio->state);
> + goto err_handle;
> + }
The NULL return should only happen for REQ_NOWAIT here, so maybe
give R10BIO_Returned a more descriptive name? Also please document
the flag in the header.
Any maybe yhe code wants a splitting helper instead of open coding
setting this flag in multiple places?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 5/7] md/raid5: convert to use bio_submit_split()
2025-08-25 9:36 ` [PATCH RFC 5/7] md/raid5: convert " Yu Kuai
@ 2025-08-25 11:00 ` Christoph Hellwig
2025-08-26 1:15 ` Yu Kuai
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-08-25 11:00 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil, linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi
On Mon, Aug 25, 2025 at 05:36:58PM +0800, Yu Kuai wrote:
> + raid_bio = bio_submit_split(raid_bio, sectors,
> + &conf->bio_split);
> + if (!raid_bio)
> + return NULL;
> +
> + raid_bio->bi_opf &= ~REQ_NOMERGE;
It almost feels as if md wants a little helper that wraps
bio_submit_split and also clears REQ_NOMERGE?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 7/7] block: fix disordered IO in the case recursive split
2025-08-25 9:37 ` [PATCH RFC 7/7] block: fix disordered IO in the case recursive split Yu Kuai
@ 2025-08-25 11:07 ` Christoph Hellwig
2025-08-26 1:20 ` Yu Kuai
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-08-25 11:07 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, colyli, hare, tieren, axboe, tj, josef, song, yukuai3, akpm,
neil, linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi
On Mon, Aug 25, 2025 at 05:37:00PM +0800, Yu Kuai wrote:
> +void submit_bio_noacct(struct bio *bio)
Maybe just have version of submit_bio_noacct that takes the split
argument, and make submit_bio_noacct a tiny wrapper around it? That
should create less churns than this version I think. In fact I suspect
we can actually bypass submit_bio_noacct entirely, all the checks and
accounting in it were already done when submitting the origin bio, so
the bio split helper could just call into submit_bio_noacct_nocheck
directly.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 1/7] block: export helper bio_submit_split()
2025-08-25 10:53 ` Christoph Hellwig
@ 2025-08-26 0:51 ` Yu Kuai
0 siblings, 0 replies; 32+ messages in thread
From: Yu Kuai @ 2025-08-26 0:51 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/08/25 18:53, Christoph Hellwig 写道:
> On Mon, Aug 25, 2025 at 05:36:54PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> No functional changes are intended, some drivers like mdraid will split
>> bio by internal processing, prepare to unify bio split codes.
>
> Maybe name the exported helper bio_submit_split_bioset and keep
> bio_submit_split() as a wrapper that passes the default split
> bioset to keep the code a bit tidyer in blk-merge.c?
>
Sure.
>> +struct bio *bio_submit_split(struct bio *bio, int split_sectors,
>> + struct bio_set *bs)
>> {
>> + struct bio *split;
>> +
>> if (unlikely(split_sectors < 0))
>> goto error;
>>
>> - if (split_sectors) {
>> - struct bio *split;
>> + if (!split_sectors)
>> + return bio;
>>
>> - split = bio_split(bio, split_sectors, GFP_NOIO,
>> - &bio->bi_bdev->bd_disk->bio_split);
>> - if (IS_ERR(split)) {
>> - split_sectors = PTR_ERR(split);
>> - goto error;
>> - }
>> - split->bi_opf |= REQ_NOMERGE;
>> - blkcg_bio_issue_init(split);
>> - bio_chain(split, bio);
>> - trace_block_split(split, bio->bi_iter.bi_sector);
>> - WARN_ON_ONCE(bio_zone_write_plugging(bio));
>> - submit_bio_noacct(bio);
>
> Maybe skip the reformatting which makes this much harder to read?
> If you think it is useful it can be done in a separate patch.
>
Please ignore this, I'll skip this.
Thanks for the review!
Kuai
> .
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 2/7] md/raid0: convert raid0_handle_discard() to use bio_submit_split()
2025-08-25 10:57 ` Christoph Hellwig
@ 2025-08-26 1:08 ` Yu Kuai
2025-08-26 7:54 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-08-26 1:08 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/08/25 18:57, Christoph Hellwig 写道:
> On Mon, Aug 25, 2025 at 05:36:55PM +0800, Yu Kuai wrote:
>> + bio = bio_submit_split(bio,
>> + zone->zone_end - bio->bi_iter.bi_sector,
>> + &mddev->bio_set);
>
> Do you know why raid0 and linear use mddev->bio_set for splitting
> instead of their own split bio_sets like raid1/10/5? Is this safe?
>
I think it's not safe, as mddev->bio_split pool size is just 2, reuse
this pool to split multiple times before submitting will need greate
pool size to make this work.
By the way, do you think it's better to increate disk->bio_split pool
size to 4 and convert all mdraid internal split to use disk->bio_split
directly?
Thanks,
Kuai
> Otherwise this looks nice.
> .
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 3/7] md/raid1: convert to use bio_submit_split()
2025-08-25 10:57 ` Christoph Hellwig
@ 2025-08-26 1:09 ` Yu Kuai
0 siblings, 0 replies; 32+ messages in thread
From: Yu Kuai @ 2025-08-26 1:09 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/08/25 18:57, Christoph Hellwig 写道:
> On Mon, Aug 25, 2025 at 05:36:56PM +0800, Yu Kuai wrote:
>> + bio = bio_submit_split(bio, max_sectors,&conf->bio_split);
>
> missing whitespace after the comma.
>
> .
>
Ok,
Thanks for the review.
Kuai
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-08-25 10:59 ` Christoph Hellwig
@ 2025-08-26 1:13 ` Yu Kuai
2025-08-26 7:55 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-08-26 1:13 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/08/25 18:59, Christoph Hellwig 写道:
>> allow_barrier(conf);
>> + bio = bio_submit_split(bio, max_sectors, &conf->bio_split);
>> wait_barrier(conf, false);
>> +
>> + if (!bio) {
>> + set_bit(R10BIO_Returned, &r10_bio->state);
>> + goto err_handle;
>> + }
>
> The NULL return should only happen for REQ_NOWAIT here, so maybe
> give R10BIO_Returned a more descriptive name? Also please document
> the flag in the header.
And also atomic write here, if bio has to split due to badblocks here.
The flag is refer to raid1. I can add cocument for both raid1 and raid10
in this case.
>
> Any maybe yhe code wants a splitting helper instead of open coding
> setting this flag in multiple places?
> .
>
Yes.
Thanks,
Kuai
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 5/7] md/raid5: convert to use bio_submit_split()
2025-08-25 11:00 ` Christoph Hellwig
@ 2025-08-26 1:15 ` Yu Kuai
2025-08-26 7:56 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-08-26 1:15 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/08/25 19:00, Christoph Hellwig 写道:
> On Mon, Aug 25, 2025 at 05:36:58PM +0800, Yu Kuai wrote:
>> + raid_bio = bio_submit_split(raid_bio, sectors,
>> + &conf->bio_split);
>> + if (!raid_bio)
>> + return NULL;
>> +
>> + raid_bio->bi_opf &= ~REQ_NOMERGE;
>
> It almost feels as if md wants a little helper that wraps
> bio_submit_split and also clears REQ_NOMERGE?
>
Yes.
And with the respect bio_submit_split() set this flag and then we clear
it, will it make more sense to set this flag after bio_submit_split()
from block layer?
Thanks,
Kuai
> .
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 7/7] block: fix disordered IO in the case recursive split
2025-08-25 11:07 ` Christoph Hellwig
@ 2025-08-26 1:20 ` Yu Kuai
0 siblings, 0 replies; 32+ messages in thread
From: Yu Kuai @ 2025-08-26 1:20 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/08/25 19:07, Christoph Hellwig 写道:
> On Mon, Aug 25, 2025 at 05:37:00PM +0800, Yu Kuai wrote:
>> +void submit_bio_noacct(struct bio *bio)
>
> Maybe just have version of submit_bio_noacct that takes the split
> argument, and make submit_bio_noacct a tiny wrapper around it? That
> should create less churns than this version I think. In fact I suspect
> we can actually bypass submit_bio_noacct entirely, all the checks and
> accounting in it were already done when submitting the origin bio, so
> the bio split helper could just call into submit_bio_noacct_nocheck
> directly.
>
I can do this, I was trying to avoid touching submit_bio_noacct()
because there are many many callers, a tiny wrapper sounds good!
And for bypassing submit_bio_noacct(), I think it's ok, just
blk_throtl_bio() should be called seperately. Perhaps we can do
this later.
Thanks,
Kuai
> .
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 2/7] md/raid0: convert raid0_handle_discard() to use bio_submit_split()
2025-08-26 1:08 ` Yu Kuai
@ 2025-08-26 7:54 ` Christoph Hellwig
2025-08-26 9:11 ` Yu Kuai
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-08-26 7:54 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, colyli, hare, tieren, axboe, tj, josef, song,
akpm, neil, linux-block, linux-kernel, cgroups, linux-raid,
yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
On Tue, Aug 26, 2025 at 09:08:33AM +0800, Yu Kuai wrote:
> 在 2025/08/25 18:57, Christoph Hellwig 写道:
> > On Mon, Aug 25, 2025 at 05:36:55PM +0800, Yu Kuai wrote:
> > > + bio = bio_submit_split(bio,
> > > + zone->zone_end - bio->bi_iter.bi_sector,
> > > + &mddev->bio_set);
> >
> > Do you know why raid0 and linear use mddev->bio_set for splitting
> > instead of their own split bio_sets like raid1/10/5? Is this safe?
> >
>
> I think it's not safe, as mddev->bio_split pool size is just 2, reuse
> this pool to split multiple times before submitting will need greate
> pool size to make this work.
>
> By the way, do you think it's better to increate disk->bio_split pool
> size to 4 and convert all mdraid internal split to use disk->bio_split
> directly?
I don't really know where that magic number 4 or even the current number
comes from, but I think Jens might be amenable to a small increase with a
good explanation.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-08-26 1:13 ` Yu Kuai
@ 2025-08-26 7:55 ` Christoph Hellwig
2025-08-26 9:14 ` Yu Kuai
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-08-26 7:55 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, colyli, hare, tieren, axboe, tj, josef, song,
akpm, neil, linux-block, linux-kernel, cgroups, linux-raid,
yi.zhang, yangerkun, johnny.chenyi, yukuai (C), John Garry
On Tue, Aug 26, 2025 at 09:13:41AM +0800, Yu Kuai wrote:
> > The NULL return should only happen for REQ_NOWAIT here, so maybe
> > give R10BIO_Returned a more descriptive name? Also please document
> > the flag in the header.
>
> And also atomic write here, if bio has to split due to badblocks here.
> The flag is refer to raid1. I can add cocument for both raid1 and raid10
> in this case.
Umm, that's actually a red flag. If a device guarantees atomic behavior
it can't just fail it. So I think REQ_ATOMIC should be disallowed
for md raid with bad block tracking.
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 5/7] md/raid5: convert to use bio_submit_split()
2025-08-26 1:15 ` Yu Kuai
@ 2025-08-26 7:56 ` Christoph Hellwig
0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2025-08-26 7:56 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, colyli, hare, tieren, axboe, tj, josef, song,
akpm, neil, linux-block, linux-kernel, cgroups, linux-raid,
yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
On Tue, Aug 26, 2025 at 09:15:34AM +0800, Yu Kuai wrote:
> > > + raid_bio->bi_opf &= ~REQ_NOMERGE;
> >
> > It almost feels as if md wants a little helper that wraps
> > bio_submit_split and also clears REQ_NOMERGE?
> >
>
> Yes.
>
> And with the respect bio_submit_split() set this flag and then we clear
> it, will it make more sense to set this flag after bio_submit_split()
> from block layer?
Yes.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 2/7] md/raid0: convert raid0_handle_discard() to use bio_submit_split()
2025-08-26 7:54 ` Christoph Hellwig
@ 2025-08-26 9:11 ` Yu Kuai
0 siblings, 0 replies; 32+ messages in thread
From: Yu Kuai @ 2025-08-26 9:11 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/08/26 15:54, Christoph Hellwig 写道:
> On Tue, Aug 26, 2025 at 09:08:33AM +0800, Yu Kuai wrote:
>> 在 2025/08/25 18:57, Christoph Hellwig 写道:
>>> On Mon, Aug 25, 2025 at 05:36:55PM +0800, Yu Kuai wrote:
>>>> + bio = bio_submit_split(bio,
>>>> + zone->zone_end - bio->bi_iter.bi_sector,
>>>> + &mddev->bio_set);
>>>
>>> Do you know why raid0 and linear use mddev->bio_set for splitting
>>> instead of their own split bio_sets like raid1/10/5? Is this safe?
>>>
>>
>> I think it's not safe, as mddev->bio_split pool size is just 2, reuse
>> this pool to split multiple times before submitting will need greate
>> pool size to make this work.
>>
>> By the way, do you think it's better to increate disk->bio_split pool
>> size to 4 and convert all mdraid internal split to use disk->bio_split
>> directly?
>
> I don't really know where that magic number 4 or even the current number
> comes from, but I think Jens might be amenable to a small increase with a
> good explanation.
I was thinking we have to make sure issuing the allocated split bio
before allocating new bio, and that number is the safe limit that we can
allocated before issuing.
In case of recursive split, we can hold multiple split bio in
curent->bio_list, and with this set to handle split bio first, we can
gurantee we'll at most hold 3 split bios from mdraid:
- bio_split_to_limits(), for example, by max_sectors
- bio_split() by internal chunksize
- bio_split() by badblocks
That's why I said 4 should be safe :) If genddisk->bio_split can be
expanded to 4, all internal bio_split can be removed now.
Thanks,
Kuai
>
> .
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-08-26 7:55 ` Christoph Hellwig
@ 2025-08-26 9:14 ` Yu Kuai
2025-08-26 17:35 ` anthony
0 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-08-26 9:14 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, John Garry, yukuai (C)
Hi,
在 2025/08/26 15:55, Christoph Hellwig 写道:
> On Tue, Aug 26, 2025 at 09:13:41AM +0800, Yu Kuai wrote:
>>> The NULL return should only happen for REQ_NOWAIT here, so maybe
>>> give R10BIO_Returned a more descriptive name? Also please document
>>> the flag in the header.
>>
>> And also atomic write here, if bio has to split due to badblocks here.
>> The flag is refer to raid1. I can add cocument for both raid1 and raid10
>> in this case.
>
> Umm, that's actually a red flag. If a device guarantees atomic behavior
> it can't just fail it. So I think REQ_ATOMIC should be disallowed
> for md raid with bad block tracking.
>
I agree that do not look good, however, John explained while adding this
that user should retry and fallback without REQ_ATOMIC to make things
work as usual.
Thanks,
Kuai
>>
>
> .
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-08-26 9:14 ` Yu Kuai
@ 2025-08-26 17:35 ` anthony
2025-08-27 7:31 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: anthony @ 2025-08-26 17:35 UTC (permalink / raw)
To: Yu Kuai, Christoph Hellwig
Cc: colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, John Garry, yukuai (C)
On 26/08/2025 10:14, Yu Kuai wrote:
>> Umm, that's actually a red flag. If a device guarantees atomic behavior
>> it can't just fail it. So I think REQ_ATOMIC should be disallowed
>> for md raid with bad block tracking.
>>
>
> I agree that do not look good, however, John explained while adding this
> that user should retry and fallback without REQ_ATOMIC to make things
> work as usual.
Whether a device promises atomic write is orthogonal to whether that
write succeeds - it could fail for a whole host of reasons, so why can't
"this is too big to be atomic" just be another reason for failing?
Yes you want to know *why* the write failed, if you can't pass that
back, then you have a problem, but if you can pass back the error "too
big for atomic write" then the caller can sort it out.
That then allows the driver - if it knows the block size of the device -
to manage atomic writes (in the sense that it can refuse writes that are
too large), even if the device doesn't claim to support it. It can just
force the caller to submit small enough blocks.
Cheers,
Wol
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-08-26 17:35 ` anthony
@ 2025-08-27 7:31 ` Christoph Hellwig
2025-09-02 6:18 ` John Garry
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-08-27 7:31 UTC (permalink / raw)
To: anthony
Cc: Yu Kuai, Christoph Hellwig, colyli, hare, tieren, axboe, tj,
josef, song, akpm, neil, linux-block, linux-kernel, cgroups,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, John Garry,
yukuai (C)
On Tue, Aug 26, 2025 at 06:35:10PM +0100, anthony wrote:
> On 26/08/2025 10:14, Yu Kuai wrote:
> > > Umm, that's actually a red flag. If a device guarantees atomic behavior
> > > it can't just fail it. So I think REQ_ATOMIC should be disallowed
> > > for md raid with bad block tracking.
> > >
> >
> > I agree that do not look good, however, John explained while adding this
> > that user should retry and fallback without REQ_ATOMIC to make things
> > work as usual.
>
> Whether a device promises atomic write is orthogonal to whether that write
> succeeds - it could fail for a whole host of reasons, so why can't "this is
> too big to be atomic" just be another reason for failing?
Too big to be atomic is a valid failure reason. But the limit needs
to be documented in the queue limits beforehand.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-08-27 7:31 ` Christoph Hellwig
@ 2025-09-02 6:18 ` John Garry
2025-09-02 6:30 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2025-09-02 6:18 UTC (permalink / raw)
To: Christoph Hellwig, anthony
Cc: Yu Kuai, colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
On 27/08/2025 08:31, Christoph Hellwig wrote:
> On Tue, Aug 26, 2025 at 06:35:10PM +0100, anthony wrote:
>> On 26/08/2025 10:14, Yu Kuai wrote:
>>>> Umm, that's actually a red flag. If a device guarantees atomic behavior
>>>> it can't just fail it. So I think REQ_ATOMIC should be disallowed
>>>> for md raid with bad block tracking.
>>>>
>>>
>>> I agree that do not look good, however, John explained while adding this
>>> that user should retry and fallback without REQ_ATOMIC to make things
>>> work as usual.
>>
>> Whether a device promises atomic write is orthogonal to whether that write
>> succeeds - it could fail for a whole host of reasons, so why can't "this is
>> too big to be atomic" just be another reason for failing?
>
> Too big to be atomic is a valid failure reason. But the limit needs
> to be documented in the queue limits beforehand.
>
>
What exactly could need to be documented?
We just report -EIO in this case (when we try to write to a bad blocks
region with REQ_ATOMIC). In general, for RWF_ATOMIC, we report -EINVAL
for too large/small a size.
BTW, do we realistically expect atomic writes HW support and bad blocks
ever to meet?
Thanks,
John
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-09-02 6:18 ` John Garry
@ 2025-09-02 6:30 ` Christoph Hellwig
2025-09-02 6:58 ` John Garry
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-09-02 6:30 UTC (permalink / raw)
To: John Garry
Cc: Christoph Hellwig, anthony, Yu Kuai, colyli, hare, tieren, axboe,
tj, josef, song, akpm, neil, linux-block, linux-kernel, cgroups,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
On Tue, Sep 02, 2025 at 07:18:01AM +0100, John Garry wrote:
> BTW, do we realistically expect atomic writes HW support and bad blocks ever
> to meet?
That's the point I'm trying to make. bad block tracking is stupid
with modern hardware. Both SSDs and HDDs are overprovisioned on
physical "blocks", and once they run out fine grained bad block tracking
is not going to help. І really do not understand why md even tries
to do this bad block tracking, but claiming to support atomic writes
while it does is actively harmful.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-09-02 6:30 ` Christoph Hellwig
@ 2025-09-02 6:58 ` John Garry
2025-09-02 8:25 ` Yu Kuai
0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2025-09-02 6:58 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: anthony, colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
On 02/09/2025 07:30, Christoph Hellwig wrote:
> On Tue, Sep 02, 2025 at 07:18:01AM +0100, John Garry wrote:
>> BTW, do we realistically expect atomic writes HW support and bad blocks ever
>> to meet?
>
> That's the point I'm trying to make. bad block tracking is stupid
> with modern hardware. Both SSDs and HDDs are overprovisioned on
> physical "blocks", and once they run out fine grained bad block tracking
> is not going to help. І really do not understand why md even tries
> to do this bad block tracking,
Just because they can try to deal with bad blocks for some (mirroring)
personalities, I suppose.
> but claiming to support atomic writes
> while it does is actively harmful.
>
There does not look to be some switch to turn off bad block support.
That's from briefly checking raid10.c anyway. Kuai, any thoughts on
whether we should allow this to be disabled?
Thanks,
John
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-09-02 6:58 ` John Garry
@ 2025-09-02 8:25 ` Yu Kuai
2025-09-02 14:46 ` John Garry
0 siblings, 1 reply; 32+ messages in thread
From: Yu Kuai @ 2025-09-02 8:25 UTC (permalink / raw)
To: John Garry, Christoph Hellwig, Yu Kuai
Cc: anthony, colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/09/02 14:58, John Garry 写道:
> On 02/09/2025 07:30, Christoph Hellwig wrote:
>> On Tue, Sep 02, 2025 at 07:18:01AM +0100, John Garry wrote:
>>> BTW, do we realistically expect atomic writes HW support and bad
>>> blocks ever
>>> to meet?
>>
>> That's the point I'm trying to make. bad block tracking is stupid
>> with modern hardware. Both SSDs and HDDs are overprovisioned on
>> physical "blocks", and once they run out fine grained bad block tracking
>> is not going to help. І really do not understand why md even tries
>> to do this bad block tracking,
>
> Just because they can try to deal with bad blocks for some (mirroring)
> personalities, I suppose.
I agree it's useless for enterprise storage, however, for personal
storage, there are lots of users using cost-effective (often aging)
disks, badblocks tracking can reduce the risk of data lost, and
make sure these devices will not become waste.
>
>> but claiming to support atomic writes
>> while it does is actively harmful.
>>
>
> There does not look to be some switch to turn off bad block support.
> That's from briefly checking raid10.c anyway. Kuai, any thoughts on
> whether we should allow this to be disabled?
>
I remember that I used to suggest always enable failfast in this case,
and badblocks can be bypassed. Anyway, I think it's good to allow this
to be disabled, it will behave very similar to failfast.
Thanks,
Kuai
> Thanks,
> John
>
> .
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
2025-09-02 8:25 ` Yu Kuai
@ 2025-09-02 14:46 ` John Garry
0 siblings, 0 replies; 32+ messages in thread
From: John Garry @ 2025-09-02 14:46 UTC (permalink / raw)
To: Yu Kuai, Christoph Hellwig
Cc: anthony, colyli, hare, tieren, axboe, tj, josef, song, akpm, neil,
linux-block, linux-kernel, cgroups, linux-raid, yi.zhang,
yangerkun, johnny.chenyi, yukuai (C)
On 02/09/2025 09:25, Yu Kuai wrote:
>> There does not look to be some switch to turn off bad block support.
>> That's from briefly checking raid10.c anyway. Kuai, any thoughts on
>> whether we should allow this to be disabled?
>>
>
> I remember that I used to suggest always enable failfast in this case,
> and badblocks can be bypassed. Anyway, I think it's good to allow this
> to be disabled, it will behave very similar to failfast.
ok, I can put this on my TODO list, unless someone else wants it.
Thanks,
John
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-09-02 14:49 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-25 9:36 [PATCH RFC 0/7] block: fix disordered IO in the case recursive split Yu Kuai
2025-08-25 9:36 ` [PATCH RFC 1/7] block: export helper bio_submit_split() Yu Kuai
2025-08-25 10:53 ` Christoph Hellwig
2025-08-26 0:51 ` Yu Kuai
2025-08-25 9:36 ` [PATCH RFC 2/7] md/raid0: convert raid0_handle_discard() to use bio_submit_split() Yu Kuai
2025-08-25 10:57 ` Christoph Hellwig
2025-08-26 1:08 ` Yu Kuai
2025-08-26 7:54 ` Christoph Hellwig
2025-08-26 9:11 ` Yu Kuai
2025-08-25 9:36 ` [PATCH RFC 3/7] md/raid1: convert " Yu Kuai
2025-08-25 10:57 ` Christoph Hellwig
2025-08-26 1:09 ` Yu Kuai
2025-08-25 9:36 ` [PATCH RFC 4/7] md/raid10: convert read/write " Yu Kuai
2025-08-25 10:59 ` Christoph Hellwig
2025-08-26 1:13 ` Yu Kuai
2025-08-26 7:55 ` Christoph Hellwig
2025-08-26 9:14 ` Yu Kuai
2025-08-26 17:35 ` anthony
2025-08-27 7:31 ` Christoph Hellwig
2025-09-02 6:18 ` John Garry
2025-09-02 6:30 ` Christoph Hellwig
2025-09-02 6:58 ` John Garry
2025-09-02 8:25 ` Yu Kuai
2025-09-02 14:46 ` John Garry
2025-08-25 9:36 ` [PATCH RFC 5/7] md/raid5: convert " Yu Kuai
2025-08-25 11:00 ` Christoph Hellwig
2025-08-26 1:15 ` Yu Kuai
2025-08-26 7:56 ` Christoph Hellwig
2025-08-25 9:36 ` [PATCH RFC 6/7] md/md-linear: " Yu Kuai
2025-08-25 9:37 ` [PATCH RFC 7/7] block: fix disordered IO in the case recursive split Yu Kuai
2025-08-25 11:07 ` Christoph Hellwig
2025-08-26 1:20 ` Yu Kuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).