* fix unintentional splitting of zone append bios
@ 2024-08-26 17:37 Christoph Hellwig
2024-08-26 17:37 ` [PATCH 1/4] block: rework bio splitting Christoph Hellwig
` (6 more replies)
0 siblings, 7 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-26 17:37 UTC (permalink / raw)
To: Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Damien Le Moal, Shin'ichiro Kawasaki, linux-block,
linux-btrfs
Hi Jens,
this series fixes code that incorrectly splits of zoned append bios due
to checking for a wrong max_sectors limit. A big part of the cause is
that the bio splitting code is a bit of a mess and full of landmines, so
I fixed this as well.
To hit this bug a submitter needs to submit a bio larger than max_sectors
of device, but smaller than max_hw_sectors. So far the only thing that
reproduces it is my not yet upstream zoned XFS code, but in theory this
could affect every submitter of zone append bios.
Diffstat:
block/blk-merge.c | 162 ++++++++++++++++++++++---------------------------
block/blk-mq.c | 11 +--
block/blk.h | 70 +++++++++++++++------
fs/btrfs/bio.c | 30 +++++----
include/linux/bio.h | 4 -
include/linux/blkdev.h | 3
6 files changed, 153 insertions(+), 127 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/4] block: rework bio splitting
2024-08-26 17:37 fix unintentional splitting of zone append bios Christoph Hellwig
@ 2024-08-26 17:37 ` Christoph Hellwig
2024-08-26 20:29 ` David Sterba
` (2 more replies)
2024-08-26 17:37 ` [PATCH 2/4] block: constify the lim argument to queue_limits_max_zone_append_sectors Christoph Hellwig
` (5 subsequent siblings)
6 siblings, 3 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-26 17:37 UTC (permalink / raw)
To: Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Damien Le Moal, Shin'ichiro Kawasaki, linux-block,
linux-btrfs
The current setup with bio_may_exceed_limit and __bio_split_to_limits
is a bit of a mess.
Change it so that __bio_split_to_limits does all the work and is just
a variant of bio_split_to_limits that returns nr_segs. This is done
by inlining it and instead have the various bio_split_* helpers directly
submit the potentially split bios.
To support btrfs, the rw version has a lower level helper split out
that just returns the offset to split. This turns out to nicely clean
up the btrfs flow as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-merge.c | 146 +++++++++++++++++---------------------------
block/blk-mq.c | 11 ++--
block/blk.h | 63 +++++++++++++------
fs/btrfs/bio.c | 30 +++++----
include/linux/bio.h | 4 +-
5 files changed, 125 insertions(+), 129 deletions(-)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index de5281bcadc538..c7222c4685e060 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -105,9 +105,33 @@ static unsigned int bio_allowed_max_sectors(const struct queue_limits *lim)
return round_down(UINT_MAX, lim->logical_block_size) >> SECTOR_SHIFT;
}
-static struct bio *bio_split_discard(struct bio *bio,
- const struct queue_limits *lim,
- unsigned *nsegs, struct bio_set *bs)
+static struct bio *bio_submit_split(struct bio *bio, int split_sectors)
+{
+ if (unlikely(split_sectors < 0)) {
+ bio->bi_status = errno_to_blk_status(split_sectors);
+ bio_endio(bio);
+ return NULL;
+ }
+
+ if (split_sectors) {
+ struct bio *split;
+
+ split = bio_split(bio, split_sectors, GFP_NOIO,
+ &bio->bi_bdev->bd_disk->bio_split);
+ split->bi_opf |= REQ_NOMERGE;
+ blkcg_bio_issue_init(split);
+ bio_chain(split, bio);
+ trace_block_split(split, bio->bi_iter.bi_sector);
+ WARN_ON_ONCE(bio_zone_write_plugging(bio));
+ submit_bio_noacct(bio);
+ return split;
+ }
+
+ return bio;
+}
+
+struct bio *bio_split_discard(struct bio *bio, const struct queue_limits *lim,
+ unsigned *nsegs)
{
unsigned int max_discard_sectors, granularity;
sector_t tmp;
@@ -121,10 +145,10 @@ static struct bio *bio_split_discard(struct bio *bio,
min(lim->max_discard_sectors, bio_allowed_max_sectors(lim));
max_discard_sectors -= max_discard_sectors % granularity;
if (unlikely(!max_discard_sectors))
- return NULL;
+ return bio;
if (bio_sectors(bio) <= max_discard_sectors)
- return NULL;
+ return bio;
split_sectors = max_discard_sectors;
@@ -139,19 +163,18 @@ static struct bio *bio_split_discard(struct bio *bio,
if (split_sectors > tmp)
split_sectors -= tmp;
- return bio_split(bio, split_sectors, GFP_NOIO, bs);
+ return bio_submit_split(bio, split_sectors);
}
-static struct bio *bio_split_write_zeroes(struct bio *bio,
- const struct queue_limits *lim,
- unsigned *nsegs, struct bio_set *bs)
+struct bio *bio_split_write_zeroes(struct bio *bio,
+ const struct queue_limits *lim, unsigned *nsegs)
{
*nsegs = 0;
if (!lim->max_write_zeroes_sectors)
- return NULL;
+ return bio;
if (bio_sectors(bio) <= lim->max_write_zeroes_sectors)
- return NULL;
- return bio_split(bio, lim->max_write_zeroes_sectors, GFP_NOIO, bs);
+ return bio;
+ return bio_submit_split(bio, lim->max_write_zeroes_sectors);
}
static inline unsigned int blk_boundary_sectors(const struct queue_limits *lim,
@@ -274,27 +297,19 @@ static bool bvec_split_segs(const struct queue_limits *lim,
}
/**
- * bio_split_rw - split a bio in two bios
+ * bio_split_rw_at - check if and where to split a read/write bio
* @bio: [in] bio to be split
* @lim: [in] queue limits to split based on
* @segs: [out] number of segments in the bio with the first half of the sectors
- * @bs: [in] bio set to allocate the clone from
* @max_bytes: [in] maximum number of bytes per bio
*
- * Clone @bio, update the bi_iter of the clone to represent the first sectors
- * of @bio and update @bio->bi_iter to represent the remaining sectors. The
- * following is guaranteed for the cloned bio:
- * - That it has at most @max_bytes worth of data
- * - That it has at most queue_max_segments(@q) segments.
- *
- * Except for discard requests the cloned bio will point at the bi_io_vec of
- * the original bio. It is the responsibility of the caller to ensure that the
- * original bio is not freed before the cloned bio. The caller is also
- * responsible for ensuring that @bs is only destroyed after processing of the
- * split bio has finished.
+ * Find out if @bio needs to be split to fit the queue limits in @lim and a
+ * maximum size of @max_bytes. Returns a negative error number if @bio can't be
+ * split, 0 if the bio doesn't have to be split, or a positive sector offset if
+ * @bio needs to be split.
*/
-struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
- unsigned *segs, struct bio_set *bs, unsigned max_bytes)
+int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim,
+ unsigned *segs, unsigned max_bytes)
{
struct bio_vec bv, bvprv, *bvprvp = NULL;
struct bvec_iter iter;
@@ -324,22 +339,17 @@ struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
}
*segs = nsegs;
- return NULL;
+ return 0;
split:
- if (bio->bi_opf & REQ_ATOMIC) {
- bio->bi_status = BLK_STS_INVAL;
- bio_endio(bio);
- return ERR_PTR(-EINVAL);
- }
+ if (bio->bi_opf & REQ_ATOMIC)
+ return -EINVAL;
+
/*
* We can't sanely support splitting for a REQ_NOWAIT bio. End it
* with EAGAIN if splitting is required and return an error pointer.
*/
- if (bio->bi_opf & REQ_NOWAIT) {
- bio->bi_status = BLK_STS_AGAIN;
- bio_endio(bio);
- return ERR_PTR(-EAGAIN);
- }
+ if (bio->bi_opf & REQ_NOWAIT)
+ return -EAGAIN;
*segs = nsegs;
@@ -356,58 +366,16 @@ struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
* big IO can be trival, disable iopoll when split needed.
*/
bio_clear_polled(bio);
- return bio_split(bio, bytes >> SECTOR_SHIFT, GFP_NOIO, bs);
+ return bytes >> SECTOR_SHIFT;
}
-EXPORT_SYMBOL_GPL(bio_split_rw);
+EXPORT_SYMBOL_GPL(bio_split_rw_at);
-/**
- * __bio_split_to_limits - split a bio to fit the queue limits
- * @bio: bio to be split
- * @lim: queue limits to split based on
- * @nr_segs: returns the number of segments in the returned bio
- *
- * Check if @bio needs splitting based on the queue limits, and if so split off
- * a bio fitting the limits from the beginning of @bio and return it. @bio is
- * shortened to the remainder and re-submitted.
- *
- * The split bio is allocated from @q->bio_split, which is provided by the
- * block layer.
- */
-struct bio *__bio_split_to_limits(struct bio *bio,
- const struct queue_limits *lim,
- unsigned int *nr_segs)
+struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
+ unsigned *nr_segs)
{
- struct bio_set *bs = &bio->bi_bdev->bd_disk->bio_split;
- struct bio *split;
-
- switch (bio_op(bio)) {
- case REQ_OP_DISCARD:
- case REQ_OP_SECURE_ERASE:
- split = bio_split_discard(bio, lim, nr_segs, bs);
- break;
- case REQ_OP_WRITE_ZEROES:
- split = bio_split_write_zeroes(bio, lim, nr_segs, bs);
- break;
- default:
- split = bio_split_rw(bio, lim, nr_segs, bs,
- get_max_io_size(bio, lim) << SECTOR_SHIFT);
- if (IS_ERR(split))
- return NULL;
- break;
- }
-
- if (split) {
- /* there isn't chance to merge the split bio */
- split->bi_opf |= REQ_NOMERGE;
-
- blkcg_bio_issue_init(split);
- bio_chain(split, bio);
- trace_block_split(split, bio->bi_iter.bi_sector);
- WARN_ON_ONCE(bio_zone_write_plugging(bio));
- submit_bio_noacct(bio);
- return split;
- }
- return bio;
+ return bio_submit_split(bio,
+ bio_split_rw_at(bio, lim, nr_segs,
+ get_max_io_size(bio, lim) << SECTOR_SHIFT));
}
/**
@@ -426,9 +394,7 @@ struct bio *bio_split_to_limits(struct bio *bio)
const struct queue_limits *lim = &bdev_get_queue(bio->bi_bdev)->limits;
unsigned int nr_segs;
- if (bio_may_exceed_limits(bio, lim))
- return __bio_split_to_limits(bio, lim, &nr_segs);
- return bio;
+ return __bio_split_to_limits(bio, lim, &nr_segs);
}
EXPORT_SYMBOL(bio_split_to_limits);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e3c3c0c21b5536..36abbaefe38749 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2939,7 +2939,7 @@ void blk_mq_submit_bio(struct bio *bio)
struct blk_plug *plug = current->plug;
const int is_sync = op_is_sync(bio->bi_opf);
struct blk_mq_hw_ctx *hctx;
- unsigned int nr_segs = 1;
+ unsigned int nr_segs;
struct request *rq;
blk_status_t ret;
@@ -2981,11 +2981,10 @@ void blk_mq_submit_bio(struct bio *bio)
goto queue_exit;
}
- if (unlikely(bio_may_exceed_limits(bio, &q->limits))) {
- bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
- if (!bio)
- goto queue_exit;
- }
+ bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
+ if (!bio)
+ goto queue_exit;
+
if (!bio_integrity_prep(bio))
goto queue_exit;
diff --git a/block/blk.h b/block/blk.h
index e180863f918b15..0d8cd64c126064 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -331,33 +331,58 @@ ssize_t part_timeout_show(struct device *, struct device_attribute *, char *);
ssize_t part_timeout_store(struct device *, struct device_attribute *,
const char *, size_t);
-static inline bool bio_may_exceed_limits(struct bio *bio,
- const struct queue_limits *lim)
+struct bio *bio_split_discard(struct bio *bio, const struct queue_limits *lim,
+ unsigned *nsegs);
+struct bio *bio_split_write_zeroes(struct bio *bio,
+ const struct queue_limits *lim, unsigned *nsegs);
+struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
+ unsigned *nr_segs);
+
+/*
+ * All drivers must accept single-segments bios that are smaller than PAGE_SIZE.
+ *
+ * This is a quick and dirty check that relies on the fact that bi_io_vec[0] is
+ * always valid if a bio has data. The check might lead to occasional false
+ * positives when bios are cloned, but compared to the performance impact of
+ * cloned bios themselves the loop below doesn't matter anyway.
+ */
+static inline bool bio_may_need_split(struct bio *bio,
+ const struct queue_limits *lim)
+{
+ return lim->chunk_sectors || bio->bi_vcnt != 1 ||
+ bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE;
+}
+
+/**
+ * __bio_split_to_limits - split a bio to fit the queue limits
+ * @bio: bio to be split
+ * @lim: queue limits to split based on
+ * @nr_segs: returns the number of segments in the returned bio
+ *
+ * Check if @bio needs splitting based on the queue limits, and if so split off
+ * a bio fitting the limits from the beginning of @bio and return it. @bio is
+ * shortened to the remainder and re-submitted.
+ *
+ * The split bio is allocated from @q->bio_split, which is provided by the
+ * block layer.
+ */
+static inline struct bio *__bio_split_to_limits(struct bio *bio,
+ const struct queue_limits *lim, unsigned int *nr_segs)
{
switch (bio_op(bio)) {
+ default:
+ if (bio_may_need_split(bio, lim))
+ return bio_split_rw(bio, lim, nr_segs);
+ *nr_segs = 1;
+ return bio;
case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE:
+ return bio_split_discard(bio, lim, nr_segs);
case REQ_OP_WRITE_ZEROES:
- return true; /* non-trivial splitting decisions */
- default:
- break;
+ return bio_split_write_zeroes(bio, lim, nr_segs);
}
-
- /*
- * All drivers must accept single-segments bios that are <= PAGE_SIZE.
- * This is a quick and dirty check that relies on the fact that
- * bi_io_vec[0] is always valid if a bio has data. The check might
- * lead to occasional false negatives when bios are cloned, but compared
- * to the performance impact of cloned bios themselves the loop below
- * doesn't matter anyway.
- */
- return lim->chunk_sectors || bio->bi_vcnt != 1 ||
- bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE;
}
-struct bio *__bio_split_to_limits(struct bio *bio,
- const struct queue_limits *lim,
- unsigned int *nr_segs);
int ll_back_merge_fn(struct request *req, struct bio *bio,
unsigned int nr_segs);
bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index f04d931099601a..5ae7691849072b 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -73,20 +73,13 @@ struct btrfs_bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
static struct btrfs_bio *btrfs_split_bio(struct btrfs_fs_info *fs_info,
struct btrfs_bio *orig_bbio,
- u64 map_length, bool use_append)
+ u64 map_length)
{
struct btrfs_bio *bbio;
struct bio *bio;
- if (use_append) {
- unsigned int nr_segs;
-
- bio = bio_split_rw(&orig_bbio->bio, &fs_info->limits, &nr_segs,
- &btrfs_clone_bioset, map_length);
- } else {
- bio = bio_split(&orig_bbio->bio, map_length >> SECTOR_SHIFT,
- GFP_NOFS, &btrfs_clone_bioset);
- }
+ bio = bio_split(&orig_bbio->bio, map_length >> SECTOR_SHIFT, GFP_NOFS,
+ &btrfs_clone_bioset);
bbio = btrfs_bio(bio);
btrfs_bio_init(bbio, fs_info, NULL, orig_bbio);
bbio->inode = orig_bbio->inode;
@@ -664,6 +657,19 @@ static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio,
return true;
}
+static u64 btrfs_append_map_length(struct btrfs_bio *bbio, u64 map_length)
+{
+ unsigned int nr_segs;
+ int sector_offset;
+
+ map_length = min(map_length, bbio->fs_info->max_zone_append_size);
+ sector_offset = bio_split_rw_at(&bbio->bio, &bbio->fs_info->limits,
+ &nr_segs, map_length);
+ if (sector_offset)
+ return sector_offset << SECTOR_SHIFT;
+ return map_length;
+}
+
static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
{
struct btrfs_inode *inode = bbio->inode;
@@ -691,10 +697,10 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
map_length = min(map_length, length);
if (use_append)
- map_length = min(map_length, fs_info->max_zone_append_size);
+ map_length = btrfs_append_map_length(bbio, map_length);
if (map_length < length) {
- bbio = btrfs_split_bio(fs_info, bbio, map_length, use_append);
+ bbio = btrfs_split_bio(fs_info, bbio, map_length);
bio = &bbio->bio;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a46e2047bea4d2..faceadb040f9ac 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -324,8 +324,8 @@ static inline void bio_next_folio(struct folio_iter *fi, struct bio *bio)
void bio_trim(struct bio *bio, sector_t offset, sector_t size);
extern struct bio *bio_split(struct bio *bio, int sectors,
gfp_t gfp, struct bio_set *bs);
-struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
- unsigned *segs, struct bio_set *bs, unsigned max_bytes);
+int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim,
+ unsigned *segs, unsigned max_bytes);
/**
* bio_next_split - get next @sectors from a bio, splitting if necessary
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/4] block: constify the lim argument to queue_limits_max_zone_append_sectors
2024-08-26 17:37 fix unintentional splitting of zone append bios Christoph Hellwig
2024-08-26 17:37 ` [PATCH 1/4] block: rework bio splitting Christoph Hellwig
@ 2024-08-26 17:37 ` Christoph Hellwig
2024-08-26 22:27 ` Damien Le Moal
2024-08-26 17:37 ` [PATCH 3/4] block: properly handle REQ_OP_ZONE_APPEND in __bio_split_to_limits Christoph Hellwig
` (4 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-26 17:37 UTC (permalink / raw)
To: Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Damien Le Moal, Shin'ichiro Kawasaki, linux-block,
linux-btrfs
queue_limits_max_zone_append_sectors doesn't change the lim argument,
so mark it as const.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/blkdev.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e85ec73a07d575..ec3ea5d1f99dfe 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1187,7 +1187,8 @@ static inline unsigned int queue_max_segment_size(const struct request_queue *q)
return q->limits.max_segment_size;
}
-static inline unsigned int queue_limits_max_zone_append_sectors(struct queue_limits *l)
+static inline unsigned int
+queue_limits_max_zone_append_sectors(const struct queue_limits *l)
{
unsigned int max_sectors = min(l->chunk_sectors, l->max_hw_sectors);
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/4] block: properly handle REQ_OP_ZONE_APPEND in __bio_split_to_limits
2024-08-26 17:37 fix unintentional splitting of zone append bios Christoph Hellwig
2024-08-26 17:37 ` [PATCH 1/4] block: rework bio splitting Christoph Hellwig
2024-08-26 17:37 ` [PATCH 2/4] block: constify the lim argument to queue_limits_max_zone_append_sectors Christoph Hellwig
@ 2024-08-26 17:37 ` Christoph Hellwig
2024-08-26 22:32 ` Damien Le Moal
2024-08-26 17:37 ` [PATCH 4/4] block: don't use bio_split_rw on misc operations Christoph Hellwig
` (3 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-26 17:37 UTC (permalink / raw)
To: Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Damien Le Moal, Shin'ichiro Kawasaki, linux-block,
linux-btrfs
Currently REQ_OP_ZONE_APPEND is handled by the bio_split_rw case in
__bio_split_to_limits. This is harmful because REQ_OP_ZONE_APPEND
bios do not adhere to the soft max_limits value but instead use their
own capped version of max_hw_sectors, leading to incorrect splits that
later blow up in bio_split.
We still need the bio_split_rw logic to count nr_segs for blk-mq code,
so add a new wrapper that passes in the right limit, and turns any bio
that would need a split into an error as an additional debugging aid.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-merge.c | 20 ++++++++++++++++++++
block/blk.h | 4 ++++
2 files changed, 24 insertions(+)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index c7222c4685e060..56769c4bcd799b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -378,6 +378,26 @@ struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
get_max_io_size(bio, lim) << SECTOR_SHIFT));
}
+/*
+ * REQ_OP_ZONE_APPEND bios must never be split by the block layer.
+ *
+ * But we want the nr_segs calculation provided by bio_split_rw_at, and having
+ * a good sanity check that the submitter built the bio correctly is nice to
+ * have as well.
+ */
+struct bio *bio_split_zone_append(struct bio *bio,
+ const struct queue_limits *lim, unsigned *nr_segs)
+{
+ unsigned int max_sectors = queue_limits_max_zone_append_sectors(lim);
+ int split_sectors;
+
+ split_sectors = bio_split_rw_at(bio, lim, nr_segs,
+ max_sectors << SECTOR_SHIFT);
+ if (WARN_ON_ONCE(split_sectors > 0))
+ split_sectors = -EINVAL;
+ return bio_submit_split(bio, split_sectors);
+}
+
/**
* bio_split_to_limits - split a bio to fit the queue limits
* @bio: bio to be split
diff --git a/block/blk.h b/block/blk.h
index 0d8cd64c126064..61c2afa67daabb 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -337,6 +337,8 @@ struct bio *bio_split_write_zeroes(struct bio *bio,
const struct queue_limits *lim, unsigned *nsegs);
struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
unsigned *nr_segs);
+struct bio *bio_split_zone_append(struct bio *bio,
+ const struct queue_limits *lim, unsigned *nr_segs);
/*
* All drivers must accept single-segments bios that are smaller than PAGE_SIZE.
@@ -375,6 +377,8 @@ static inline struct bio *__bio_split_to_limits(struct bio *bio,
return bio_split_rw(bio, lim, nr_segs);
*nr_segs = 1;
return bio;
+ case REQ_OP_ZONE_APPEND:
+ return bio_split_zone_append(bio, lim, nr_segs);
case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE:
return bio_split_discard(bio, lim, nr_segs);
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/4] block: don't use bio_split_rw on misc operations
2024-08-26 17:37 fix unintentional splitting of zone append bios Christoph Hellwig
` (2 preceding siblings ...)
2024-08-26 17:37 ` [PATCH 3/4] block: properly handle REQ_OP_ZONE_APPEND in __bio_split_to_limits Christoph Hellwig
@ 2024-08-26 17:37 ` Christoph Hellwig
2024-08-26 22:34 ` Damien Le Moal
2024-08-27 11:23 ` fix unintentional splitting of zone append bios Hans Holmberg
` (2 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-26 17:37 UTC (permalink / raw)
To: Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Damien Le Moal, Shin'ichiro Kawasaki, linux-block,
linux-btrfs
bio_split_rw is designed to split read and write bios with a payload.
Currently it is called by __bio_split_to_limits for all operations not
explicitly list, which works because bio_may_need_split explicitly checks
for bi_vcnt == 1 and thus skips the bypass if there is no payload and
bio_for_each_bvec loop will never execute it's body if bi_size is 0.
But all this is hard to understand, fragile and wasted pointless cycles.
Switch __bio_split_to_limits to only call bio_split_rw for READ and
WRITE command and don't attempt any kind split for operation that do not
require splitting.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/block/blk.h b/block/blk.h
index 61c2afa67daabb..32f4e9f630a3ac 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -372,7 +372,8 @@ static inline struct bio *__bio_split_to_limits(struct bio *bio,
const struct queue_limits *lim, unsigned int *nr_segs)
{
switch (bio_op(bio)) {
- default:
+ case REQ_OP_READ:
+ case REQ_OP_WRITE:
if (bio_may_need_split(bio, lim))
return bio_split_rw(bio, lim, nr_segs);
*nr_segs = 1;
@@ -384,6 +385,10 @@ static inline struct bio *__bio_split_to_limits(struct bio *bio,
return bio_split_discard(bio, lim, nr_segs);
case REQ_OP_WRITE_ZEROES:
return bio_split_write_zeroes(bio, lim, nr_segs);
+ default:
+ /* other operations can't be split */
+ *nr_segs = 0;
+ return bio;
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/4] block: rework bio splitting
2024-08-26 17:37 ` [PATCH 1/4] block: rework bio splitting Christoph Hellwig
@ 2024-08-26 20:29 ` David Sterba
2024-08-26 22:26 ` Damien Le Moal
2024-08-27 4:08 ` Damien Le Moal
2 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2024-08-26 20:29 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Damien Le Moal, Shin'ichiro Kawasaki, linux-block,
linux-btrfs
On Mon, Aug 26, 2024 at 07:37:54PM +0200, Christoph Hellwig wrote:
> The current setup with bio_may_exceed_limit and __bio_split_to_limits
> is a bit of a mess.
>
> Change it so that __bio_split_to_limits does all the work and is just
> a variant of bio_split_to_limits that returns nr_segs. This is done
> by inlining it and instead have the various bio_split_* helpers directly
> submit the potentially split bios.
>
> To support btrfs, the rw version has a lower level helper split out
> that just returns the offset to split. This turns out to nicely clean
> up the btrfs flow as well.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/blk-merge.c | 146 +++++++++++++++++---------------------------
> block/blk-mq.c | 11 ++--
> block/blk.h | 63 +++++++++++++------
For
> fs/btrfs/bio.c | 30 +++++----
Acked-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/4] block: rework bio splitting
2024-08-26 17:37 ` [PATCH 1/4] block: rework bio splitting Christoph Hellwig
2024-08-26 20:29 ` David Sterba
@ 2024-08-26 22:26 ` Damien Le Moal
2024-08-26 22:37 ` Damien Le Moal
2024-08-27 3:20 ` Christoph Hellwig
2024-08-27 4:08 ` Damien Le Moal
2 siblings, 2 replies; 17+ messages in thread
From: Damien Le Moal @ 2024-08-26 22:26 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Shin'ichiro Kawasaki, linux-block, linux-btrfs
On 8/27/24 02:37, Christoph Hellwig wrote:
> The current setup with bio_may_exceed_limit and __bio_split_to_limits
> is a bit of a mess.
>
> Change it so that __bio_split_to_limits does all the work and is just
> a variant of bio_split_to_limits that returns nr_segs. This is done
> by inlining it and instead have the various bio_split_* helpers directly
> submit the potentially split bios.
>
> To support btrfs, the rw version has a lower level helper split out
> that just returns the offset to split. This turns out to nicely clean
> up the btrfs flow as well.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/blk-merge.c | 146 +++++++++++++++++---------------------------
> block/blk-mq.c | 11 ++--
> block/blk.h | 63 +++++++++++++------
> fs/btrfs/bio.c | 30 +++++----
> include/linux/bio.h | 4 +-
> 5 files changed, 125 insertions(+), 129 deletions(-)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index de5281bcadc538..c7222c4685e060 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -105,9 +105,33 @@ static unsigned int bio_allowed_max_sectors(const struct queue_limits *lim)
> return round_down(UINT_MAX, lim->logical_block_size) >> SECTOR_SHIFT;
> }
>
> -static struct bio *bio_split_discard(struct bio *bio,
> - const struct queue_limits *lim,
> - unsigned *nsegs, struct bio_set *bs)
> +static struct bio *bio_submit_split(struct bio *bio, int split_sectors)
Why not "unsigned int" for split_sectors ? That would avoid the need for the
first "if" of the function. Note that bio_split() also takes an int for the
sector count and also checks for < 0 count with a BUG_ON(). We can clean that up
too. BIOs sector count is unsigned int...
> +{
> + if (unlikely(split_sectors < 0)) {
> + bio->bi_status = errno_to_blk_status(split_sectors);
> + bio_endio(bio);
> + return NULL;
> + }
> +
> + if (split_sectors) {
May be the simple case should come first ? E.g.:
if (!split_sectors)
return bio;
But shouldn't this check be:
if (split_sectors >= bio_sectors(bio))
return bio;
?
[...]
> +/**
> + * __bio_split_to_limits - split a bio to fit the queue limits
> + * @bio: bio to be split
> + * @lim: queue limits to split based on
> + * @nr_segs: returns the number of segments in the returned bio
> + *
> + * Check if @bio needs splitting based on the queue limits, and if so split off
> + * a bio fitting the limits from the beginning of @bio and return it. @bio is
> + * shortened to the remainder and re-submitted.
> + *
> + * The split bio is allocated from @q->bio_split, which is provided by the
> + * block layer.
> + */
> +static inline struct bio *__bio_split_to_limits(struct bio *bio,
> + const struct queue_limits *lim, unsigned int *nr_segs)
> {
> switch (bio_op(bio)) {
> + default:
> + if (bio_may_need_split(bio, lim))
> + return bio_split_rw(bio, lim, nr_segs);
> + *nr_segs = 1;
> + return bio;
Wouldn't it be safer to move the if check inside bio_split_rw(), so that here we
can have:
return bio_split_rw(bio, lim, nr_segs);
And at the top of bio_split_rw() add:
if (!bio_may_need_split(bio, lim)) {
*nr_segs = 1;
return bio;
}
so that bio_split_rw() always does the right thing ?
> case REQ_OP_DISCARD:
> case REQ_OP_SECURE_ERASE:
> + return bio_split_discard(bio, lim, nr_segs);
> case REQ_OP_WRITE_ZEROES:
> - return true; /* non-trivial splitting decisions */
> - default:
> - break;
> + return bio_split_write_zeroes(bio, lim, nr_segs);
> }
> -
> - /*
> - * All drivers must accept single-segments bios that are <= PAGE_SIZE.
> - * This is a quick and dirty check that relies on the fact that
> - * bi_io_vec[0] is always valid if a bio has data. The check might
> - * lead to occasional false negatives when bios are cloned, but compared
> - * to the performance impact of cloned bios themselves the loop below
> - * doesn't matter anyway.
> - */
> - return lim->chunk_sectors || bio->bi_vcnt != 1 ||
> - bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE;
> }
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] block: constify the lim argument to queue_limits_max_zone_append_sectors
2024-08-26 17:37 ` [PATCH 2/4] block: constify the lim argument to queue_limits_max_zone_append_sectors Christoph Hellwig
@ 2024-08-26 22:27 ` Damien Le Moal
0 siblings, 0 replies; 17+ messages in thread
From: Damien Le Moal @ 2024-08-26 22:27 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Shin'ichiro Kawasaki, linux-block, linux-btrfs
On 8/27/24 02:37, Christoph Hellwig wrote:
> queue_limits_max_zone_append_sectors doesn't change the lim argument,
> so mark it as const.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] block: properly handle REQ_OP_ZONE_APPEND in __bio_split_to_limits
2024-08-26 17:37 ` [PATCH 3/4] block: properly handle REQ_OP_ZONE_APPEND in __bio_split_to_limits Christoph Hellwig
@ 2024-08-26 22:32 ` Damien Le Moal
0 siblings, 0 replies; 17+ messages in thread
From: Damien Le Moal @ 2024-08-26 22:32 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Shin'ichiro Kawasaki, linux-block, linux-btrfs
On 8/27/24 02:37, Christoph Hellwig wrote:
> Currently REQ_OP_ZONE_APPEND is handled by the bio_split_rw case in
> __bio_split_to_limits. This is harmful because REQ_OP_ZONE_APPEND
> bios do not adhere to the soft max_limits value but instead use their
> own capped version of max_hw_sectors, leading to incorrect splits that
> later blow up in bio_split.
>
> We still need the bio_split_rw logic to count nr_segs for blk-mq code,
> so add a new wrapper that passes in the right limit, and turns any bio
> that would need a split into an error as an additional debugging aid.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/blk-merge.c | 20 ++++++++++++++++++++
> block/blk.h | 4 ++++
> 2 files changed, 24 insertions(+)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index c7222c4685e060..56769c4bcd799b 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -378,6 +378,26 @@ struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
> get_max_io_size(bio, lim) << SECTOR_SHIFT));
> }
>
> +/*
> + * REQ_OP_ZONE_APPEND bios must never be split by the block layer.
> + *
> + * But we want the nr_segs calculation provided by bio_split_rw_at, and having
> + * a good sanity check that the submitter built the bio correctly is nice to
> + * have as well.
> + */
> +struct bio *bio_split_zone_append(struct bio *bio,
> + const struct queue_limits *lim, unsigned *nr_segs)
> +{
> + unsigned int max_sectors = queue_limits_max_zone_append_sectors(lim);
> + int split_sectors;
> +
> + split_sectors = bio_split_rw_at(bio, lim, nr_segs,
> + max_sectors << SECTOR_SHIFT);
> + if (WARN_ON_ONCE(split_sectors > 0))
> + split_sectors = -EINVAL;
ah! OK, I see why you used an int for the sector count now.
Hmmm... This is subtle and I am not a huge fan. But I see how that saves some
cleanup code in patch 1. So OK.
Feel free to add:
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> + return bio_submit_split(bio, split_sectors);
> +}
> +
> /**
> * bio_split_to_limits - split a bio to fit the queue limits
> * @bio: bio to be split
> diff --git a/block/blk.h b/block/blk.h
> index 0d8cd64c126064..61c2afa67daabb 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -337,6 +337,8 @@ struct bio *bio_split_write_zeroes(struct bio *bio,
> const struct queue_limits *lim, unsigned *nsegs);
> struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
> unsigned *nr_segs);
> +struct bio *bio_split_zone_append(struct bio *bio,
> + const struct queue_limits *lim, unsigned *nr_segs);
>
> /*
> * All drivers must accept single-segments bios that are smaller than PAGE_SIZE.
> @@ -375,6 +377,8 @@ static inline struct bio *__bio_split_to_limits(struct bio *bio,
> return bio_split_rw(bio, lim, nr_segs);
> *nr_segs = 1;
> return bio;
> + case REQ_OP_ZONE_APPEND:
> + return bio_split_zone_append(bio, lim, nr_segs);
> case REQ_OP_DISCARD:
> case REQ_OP_SECURE_ERASE:
> return bio_split_discard(bio, lim, nr_segs);
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] block: don't use bio_split_rw on misc operations
2024-08-26 17:37 ` [PATCH 4/4] block: don't use bio_split_rw on misc operations Christoph Hellwig
@ 2024-08-26 22:34 ` Damien Le Moal
0 siblings, 0 replies; 17+ messages in thread
From: Damien Le Moal @ 2024-08-26 22:34 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Shin'ichiro Kawasaki, linux-block, linux-btrfs
On 8/27/24 02:37, Christoph Hellwig wrote:
> bio_split_rw is designed to split read and write bios with a payload.
> Currently it is called by __bio_split_to_limits for all operations not
> explicitly list, which works because bio_may_need_split explicitly checks
> for bi_vcnt == 1 and thus skips the bypass if there is no payload and
> bio_for_each_bvec loop will never execute it's body if bi_size is 0.
>
> But all this is hard to understand, fragile and wasted pointless cycles.
> Switch __bio_split_to_limits to only call bio_split_rw for READ and
> WRITE command and don't attempt any kind split for operation that do not
> require splitting.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/4] block: rework bio splitting
2024-08-26 22:26 ` Damien Le Moal
@ 2024-08-26 22:37 ` Damien Le Moal
2024-08-27 3:20 ` Christoph Hellwig
1 sibling, 0 replies; 17+ messages in thread
From: Damien Le Moal @ 2024-08-26 22:37 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Shin'ichiro Kawasaki, linux-block, linux-btrfs
On 8/27/24 07:26, Damien Le Moal wrote:
> On 8/27/24 02:37, Christoph Hellwig wrote:
>> The current setup with bio_may_exceed_limit and __bio_split_to_limits
>> is a bit of a mess.
>>
>> Change it so that __bio_split_to_limits does all the work and is just
>> a variant of bio_split_to_limits that returns nr_segs. This is done
>> by inlining it and instead have the various bio_split_* helpers directly
>> submit the potentially split bios.
>>
>> To support btrfs, the rw version has a lower level helper split out
>> that just returns the offset to split. This turns out to nicely clean
>> up the btrfs flow as well.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>> block/blk-merge.c | 146 +++++++++++++++++---------------------------
>> block/blk-mq.c | 11 ++--
>> block/blk.h | 63 +++++++++++++------
>> fs/btrfs/bio.c | 30 +++++----
>> include/linux/bio.h | 4 +-
>> 5 files changed, 125 insertions(+), 129 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5281bcadc538..c7222c4685e060 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -105,9 +105,33 @@ static unsigned int bio_allowed_max_sectors(const struct queue_limits *lim)
>> return round_down(UINT_MAX, lim->logical_block_size) >> SECTOR_SHIFT;
>> }
>>
>> -static struct bio *bio_split_discard(struct bio *bio,
>> - const struct queue_limits *lim,
>> - unsigned *nsegs, struct bio_set *bs)
>> +static struct bio *bio_submit_split(struct bio *bio, int split_sectors)
>
> Why not "unsigned int" for split_sectors ? That would avoid the need for the
> first "if" of the function. Note that bio_split() also takes an int for the
> sector count and also checks for < 0 count with a BUG_ON(). We can clean that up
> too. BIOs sector count is unsigned int...
>
>> +{
>> + if (unlikely(split_sectors < 0)) {
>> + bio->bi_status = errno_to_blk_status(split_sectors);
>> + bio_endio(bio);
>> + return NULL;
>> + }
>> +
>> + if (split_sectors) {
>
> May be the simple case should come first ? E.g.:
>
> if (!split_sectors)
> return bio;
>
> But shouldn't this check be:
>
> if (split_sectors >= bio_sectors(bio))
> return bio;
Please ignore this one. The passed sector count is a limit, which can be 0, so
checking "if (!split_sectors)" is correct.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/4] block: rework bio splitting
2024-08-26 22:26 ` Damien Le Moal
2024-08-26 22:37 ` Damien Le Moal
@ 2024-08-27 3:20 ` Christoph Hellwig
1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-27 3:20 UTC (permalink / raw)
To: Damien Le Moal
Cc: Christoph Hellwig, Jens Axboe, Chris Mason, Josef Bacik,
David Sterba, Hans Holmberg, Shin'ichiro Kawasaki,
linux-block, linux-btrfs
On Tue, Aug 27, 2024 at 07:26:40AM +0900, Damien Le Moal wrote:
> > +static struct bio *bio_submit_split(struct bio *bio, int split_sectors)
>
> Why not "unsigned int" for split_sectors ? That would avoid the need for the
> first "if" of the function. Note that bio_split() also takes an int for the
> sector count and also checks for < 0 count with a BUG_ON(). We can clean that up
> too. BIOs sector count is unsigned int...
Because we need to handle the case where we do no want to submit any
bio and error out as well, and a negative error code works nicely
for that. Note that the bios has an unsigned int byte count, so we
have the extra precision for the sign bit.
> But shouldn't this check be:
>
> if (split_sectors >= bio_sectors(bio))
> return bio;
The API is to return the split position or 0 if no split is needed.
That is needed because splits are usually done for limits singnificantly
smaller than what the bio could take.
> > +static inline struct bio *__bio_split_to_limits(struct bio *bio,
> > + const struct queue_limits *lim, unsigned int *nr_segs)
> > {
> > switch (bio_op(bio)) {
> > + default:
> > + if (bio_may_need_split(bio, lim))
> > + return bio_split_rw(bio, lim, nr_segs);
> > + *nr_segs = 1;
> > + return bio;
>
> Wouldn't it be safer to move the if check inside bio_split_rw(), so that here we
> can have:
The !bio_may_need_split case is the existing fast path optimization
without a function call that I wanted to keep because it made a
difference from some workloads that Jens was testing.
> And at the top of bio_split_rw() add:
>
> if (!bio_may_need_split(bio, lim)) {
> *nr_segs = 1;
> return bio;
> }
>
> so that bio_split_rw() always does the right thing ?
Note that bio_split_rw still always does the right thing, it'll just
do it a little slower than this fast path optimization.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/4] block: rework bio splitting
2024-08-26 17:37 ` [PATCH 1/4] block: rework bio splitting Christoph Hellwig
2024-08-26 20:29 ` David Sterba
2024-08-26 22:26 ` Damien Le Moal
@ 2024-08-27 4:08 ` Damien Le Moal
2 siblings, 0 replies; 17+ messages in thread
From: Damien Le Moal @ 2024-08-27 4:08 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Shin'ichiro Kawasaki, linux-block, linux-btrfs
On 8/27/24 02:37, Christoph Hellwig wrote:
> The current setup with bio_may_exceed_limit and __bio_split_to_limits
> is a bit of a mess.
>
> Change it so that __bio_split_to_limits does all the work and is just
> a variant of bio_split_to_limits that returns nr_segs. This is done
> by inlining it and instead have the various bio_split_* helpers directly
> submit the potentially split bios.
>
> To support btrfs, the rw version has a lower level helper split out
> that just returns the offset to split. This turns out to nicely clean
> up the btrfs flow as well.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: fix unintentional splitting of zone append bios
2024-08-26 17:37 fix unintentional splitting of zone append bios Christoph Hellwig
` (3 preceding siblings ...)
2024-08-26 17:37 ` [PATCH 4/4] block: don't use bio_split_rw on misc operations Christoph Hellwig
@ 2024-08-27 11:23 ` Hans Holmberg
2024-08-27 11:43 ` Niklas Cassel
2024-08-29 10:33 ` Jens Axboe
6 siblings, 0 replies; 17+ messages in thread
From: Hans Holmberg @ 2024-08-27 11:23 UTC (permalink / raw)
To: hch, Jens Axboe
Cc: Chris Mason, Josef Bacik, David Sterba, Damien Le Moal,
Shinichiro Kawasaki, linux-block@vger.kernel.org,
linux-btrfs@vger.kernel.org
On 26/08/2024 19:38, Christoph Hellwig wrote:
> Hi Jens,
>
> this series fixes code that incorrectly splits of zoned append bios due
> to checking for a wrong max_sectors limit. A big part of the cause is
> that the bio splitting code is a bit of a mess and full of landmines, so
> I fixed this as well.
>
> To hit this bug a submitter needs to submit a bio larger than max_sectors
> of device, but smaller than max_hw_sectors. So far the only thing that
> reproduces it is my not yet upstream zoned XFS code, but in theory this
> could affect every submitter of zone append bios.
>
> Diffstat:
> block/blk-merge.c | 162 ++++++++++++++++++++++---------------------------
> block/blk-mq.c | 11 +--
> block/blk.h | 70 +++++++++++++++------
> fs/btrfs/bio.c | 30 +++++----
> include/linux/bio.h | 4 -
> include/linux/blkdev.h | 3
> 6 files changed, 153 insertions(+), 127 deletions(-)
>
Looks good to me, and with these patches applied the
issues mentioned above goes away in my zoned xfstest setup.
For this series,
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: fix unintentional splitting of zone append bios
2024-08-26 17:37 fix unintentional splitting of zone append bios Christoph Hellwig
` (4 preceding siblings ...)
2024-08-27 11:23 ` fix unintentional splitting of zone append bios Hans Holmberg
@ 2024-08-27 11:43 ` Niklas Cassel
2024-08-27 12:18 ` Christoph Hellwig
2024-08-29 10:33 ` Jens Axboe
6 siblings, 1 reply; 17+ messages in thread
From: Niklas Cassel @ 2024-08-27 11:43 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Damien Le Moal, Shin'ichiro Kawasaki, linux-block,
linux-btrfs
On Mon, Aug 26, 2024 at 07:37:53PM +0200, Christoph Hellwig wrote:
> Hi Jens,
>
> this series fixes code that incorrectly splits of zoned append bios due
> to checking for a wrong max_sectors limit. A big part of the cause is
> that the bio splitting code is a bit of a mess and full of landmines, so
> I fixed this as well.
Hello Christoph,
You say that this series fixes code that could affect every submitter of
zone append bios, thus I am a bit surprised to not see any Fixes-tag(s)
in any of the patches in this series. Was that intentional?
Even for a theoretical fix, doesn't this sound serious enough to warrant
zone append splits to be fixed in stable kernels as well?
When was this bug introduced? Or has it been broken since the support for
zone append was first added?
Kind regards,
Niklas
>
> To hit this bug a submitter needs to submit a bio larger than max_sectors
> of device, but smaller than max_hw_sectors. So far the only thing that
> reproduces it is my not yet upstream zoned XFS code, but in theory this
> could affect every submitter of zone append bios.
>
> Diffstat:
> block/blk-merge.c | 162 ++++++++++++++++++++++---------------------------
> block/blk-mq.c | 11 +--
> block/blk.h | 70 +++++++++++++++------
> fs/btrfs/bio.c | 30 +++++----
> include/linux/bio.h | 4 -
> include/linux/blkdev.h | 3
> 6 files changed, 153 insertions(+), 127 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: fix unintentional splitting of zone append bios
2024-08-27 11:43 ` Niklas Cassel
@ 2024-08-27 12:18 ` Christoph Hellwig
0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-27 12:18 UTC (permalink / raw)
To: Niklas Cassel
Cc: Christoph Hellwig, Jens Axboe, Chris Mason, Josef Bacik,
David Sterba, Hans Holmberg, Damien Le Moal,
Shin'ichiro Kawasaki, linux-block, linux-btrfs
On Tue, Aug 27, 2024 at 01:43:44PM +0200, Niklas Cassel wrote:
> You say that this series fixes code that could affect every submitter of
> zone append bios, thus I am a bit surprised to not see any Fixes-tag(s)
> in any of the patches in this series. Was that intentional?
I'm honestly not sure which exact commit it fixes.
> When was this bug introduced? Or has it been broken since the support for
> zone append was first added?
Possibly.
> Even for a theoretical fix, doesn't this sound serious enough to warrant
> zone append splits to be fixed in stable kernels as well?
I think pretty much every user could hit, at least when manually
dereasing max_sectors.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: fix unintentional splitting of zone append bios
2024-08-26 17:37 fix unintentional splitting of zone append bios Christoph Hellwig
` (5 preceding siblings ...)
2024-08-27 11:43 ` Niklas Cassel
@ 2024-08-29 10:33 ` Jens Axboe
6 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2024-08-29 10:33 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Chris Mason, Josef Bacik, David Sterba, Hans Holmberg,
Damien Le Moal, Shin'ichiro Kawasaki, linux-block,
linux-btrfs
On Mon, 26 Aug 2024 19:37:53 +0200, Christoph Hellwig wrote:
> this series fixes code that incorrectly splits of zoned append bios due
> to checking for a wrong max_sectors limit. A big part of the cause is
> that the bio splitting code is a bit of a mess and full of landmines, so
> I fixed this as well.
>
> To hit this bug a submitter needs to submit a bio larger than max_sectors
> of device, but smaller than max_hw_sectors. So far the only thing that
> reproduces it is my not yet upstream zoned XFS code, but in theory this
> could affect every submitter of zone append bios.
>
> [...]
Applied, thanks!
[1/4] block: rework bio splitting
commit: b35243a447b9fe6457fa8e1352152b818436ba5a
[2/4] block: constify the lim argument to queue_limits_max_zone_append_sectors
commit: 379b122a3ec8033aa43cb70e8ecb6fb7f98aa68f
[3/4] block: properly handle REQ_OP_ZONE_APPEND in __bio_split_to_limits
commit: 1e8a7f6af926e266cc1d7ac49b56bd064057d625
[4/4] block: don't use bio_split_rw on misc operations
commit: 1251580983f267e2e6b6505609a835119b68c513
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-08-29 10:33 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-26 17:37 fix unintentional splitting of zone append bios Christoph Hellwig
2024-08-26 17:37 ` [PATCH 1/4] block: rework bio splitting Christoph Hellwig
2024-08-26 20:29 ` David Sterba
2024-08-26 22:26 ` Damien Le Moal
2024-08-26 22:37 ` Damien Le Moal
2024-08-27 3:20 ` Christoph Hellwig
2024-08-27 4:08 ` Damien Le Moal
2024-08-26 17:37 ` [PATCH 2/4] block: constify the lim argument to queue_limits_max_zone_append_sectors Christoph Hellwig
2024-08-26 22:27 ` Damien Le Moal
2024-08-26 17:37 ` [PATCH 3/4] block: properly handle REQ_OP_ZONE_APPEND in __bio_split_to_limits Christoph Hellwig
2024-08-26 22:32 ` Damien Le Moal
2024-08-26 17:37 ` [PATCH 4/4] block: don't use bio_split_rw on misc operations Christoph Hellwig
2024-08-26 22:34 ` Damien Le Moal
2024-08-27 11:23 ` fix unintentional splitting of zone append bios Hans Holmberg
2024-08-27 11:43 ` Niklas Cassel
2024-08-27 12:18 ` Christoph Hellwig
2024-08-29 10:33 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).