From: Damien Le Moal <dlemoal@kernel.org>
To: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
Keith Busch <keith.busch@wdc.com>, Christoph Hellwig <hch@lst.de>,
dm-devel@lists.linux.dev, Mike Snitzer <snitzer@kernel.org>,
Mikulas Patocka <mpatocka@redhat.com>,
"Martin K . Petersen" <martin.petersen@oracle.com>,
linux-scsi@vger.kernel.org, linux-xfs@vger.kernel.org,
Carlos Maiolino <cem@kernel.org>,
linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.com>
Subject: [PATCH v2 01/15] block: handle zone management operations completions
Date: Mon, 3 Nov 2025 22:31:09 +0900 [thread overview]
Message-ID: <20251103133123.645038-2-dlemoal@kernel.org> (raw)
In-Reply-To: <20251103133123.645038-1-dlemoal@kernel.org>
The functions blk_zone_wplug_handle_reset_or_finish() and
blk_zone_wplug_handle_reset_all() both modify the zone write pointer
offset of zone write plugs that are the target of a reset, reset all or
finish zone management operation. However, these functions do this
modification before the BIO is executed. So if the zone operation fails,
the modified zone write pointer offsets become invalid.
Avoid this by modifying the zone write pointer offset of a zone write
plug that is the target of a zone management operation when the
operation completes. To do so, modify blk_zone_bio_endio() to call the
new function blk_zone_mgmt_bio_endio() which in turn calls the functions
blk_zone_reset_all_bio_endio(), blk_zone_reset_bio_endio() or
blk_zone_finish_bio_endio() depending on the operation of the completed
BIO, to modify a zone write plug write pointer offset accordingly.
These functions are called only if the BIO execution was successful.
Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
block/blk-zoned.c | 139 ++++++++++++++++++++++++++++++----------------
block/blk.h | 14 +++++
2 files changed, 104 insertions(+), 49 deletions(-)
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 5e2a5788dc3b..1621e8f78338 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -71,6 +71,11 @@ struct blk_zone_wplug {
struct gendisk *disk;
};
+static inline unsigned int disk_zone_wplugs_hash_size(struct gendisk *disk)
+{
+ return 1U << disk->zone_wplugs_hash_bits;
+}
+
/*
* Zone write plug flags bits:
* - BLK_ZONE_WPLUG_PLUGGED: Indicates that the zone write plug is plugged,
@@ -698,71 +703,91 @@ static int disk_zone_sync_wp_offset(struct gendisk *disk, sector_t sector)
disk_report_zones_cb, &args);
}
-static bool blk_zone_wplug_handle_reset_or_finish(struct bio *bio,
- unsigned int wp_offset)
+static void blk_zone_reset_bio_endio(struct bio *bio)
{
struct gendisk *disk = bio->bi_bdev->bd_disk;
- sector_t sector = bio->bi_iter.bi_sector;
struct blk_zone_wplug *zwplug;
- unsigned long flags;
-
- /* Conventional zones cannot be reset nor finished. */
- if (!bdev_zone_is_seq(bio->bi_bdev, sector)) {
- bio_io_error(bio);
- return true;
- }
-
- /*
- * No-wait reset or finish BIOs do not make much sense as the callers
- * issue these as blocking operations in most cases. To avoid issues
- * the BIO execution potentially failing with BLK_STS_AGAIN, warn about
- * REQ_NOWAIT being set and ignore that flag.
- */
- if (WARN_ON_ONCE(bio->bi_opf & REQ_NOWAIT))
- bio->bi_opf &= ~REQ_NOWAIT;
/*
- * If we have a zone write plug, set its write pointer offset to 0
- * (reset case) or to the zone size (finish case). This will abort all
- * BIOs plugged for the target zone. It is fine as resetting or
- * finishing zones while writes are still in-flight will result in the
+ * If we have a zone write plug, set its write pointer offset to 0.
+ * This will abort all BIOs plugged for the target zone. It is fine as
+ * resetting zones while writes are still in-flight will result in the
* writes failing anyway.
*/
- zwplug = disk_get_zone_wplug(disk, sector);
+ zwplug = disk_get_zone_wplug(disk, bio->bi_iter.bi_sector);
if (zwplug) {
+ unsigned long flags;
+
spin_lock_irqsave(&zwplug->lock, flags);
- disk_zone_wplug_set_wp_offset(disk, zwplug, wp_offset);
+ disk_zone_wplug_set_wp_offset(disk, zwplug, 0);
spin_unlock_irqrestore(&zwplug->lock, flags);
disk_put_zone_wplug(zwplug);
}
-
- return false;
}
-static bool blk_zone_wplug_handle_reset_all(struct bio *bio)
+static void blk_zone_reset_all_bio_endio(struct bio *bio)
{
struct gendisk *disk = bio->bi_bdev->bd_disk;
struct blk_zone_wplug *zwplug;
unsigned long flags;
- sector_t sector;
+ unsigned int i;
- /*
- * Set the write pointer offset of all zone write plugs to 0. This will
- * abort all plugged BIOs. It is fine as resetting zones while writes
- * are still in-flight will result in the writes failing anyway.
- */
- for (sector = 0; sector < get_capacity(disk);
- sector += disk->queue->limits.chunk_sectors) {
- zwplug = disk_get_zone_wplug(disk, sector);
- if (zwplug) {
+ /* Update the condition of all zone write plugs. */
+ rcu_read_lock();
+ for (i = 0; i < disk_zone_wplugs_hash_size(disk); i++) {
+ hlist_for_each_entry_rcu(zwplug, &disk->zone_wplugs_hash[i],
+ node) {
spin_lock_irqsave(&zwplug->lock, flags);
disk_zone_wplug_set_wp_offset(disk, zwplug, 0);
spin_unlock_irqrestore(&zwplug->lock, flags);
- disk_put_zone_wplug(zwplug);
}
}
+ rcu_read_unlock();
+}
- return false;
+static void blk_zone_finish_bio_endio(struct bio *bio)
+{
+ struct block_device *bdev = bio->bi_bdev;
+ struct gendisk *disk = bdev->bd_disk;
+ struct blk_zone_wplug *zwplug;
+
+ /*
+ * If we have a zone write plug, set its write pointer offset to the
+ * zone size. This will abort all BIOs plugged for the target zone. It
+ * is fine as resetting zones while writes are still in-flight will
+ * result in the writes failing anyway.
+ */
+ zwplug = disk_get_zone_wplug(disk, bio->bi_iter.bi_sector);
+ if (zwplug) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&zwplug->lock, flags);
+ disk_zone_wplug_set_wp_offset(disk, zwplug,
+ bdev_zone_sectors(bdev));
+ spin_unlock_irqrestore(&zwplug->lock, flags);
+ disk_put_zone_wplug(zwplug);
+ }
+}
+
+void blk_zone_mgmt_bio_endio(struct bio *bio)
+{
+ /* If the BIO failed, we have nothing to do. */
+ if (bio->bi_status != BLK_STS_OK)
+ return;
+
+ switch (bio_op(bio)) {
+ case REQ_OP_ZONE_RESET:
+ blk_zone_reset_bio_endio(bio);
+ return;
+ case REQ_OP_ZONE_RESET_ALL:
+ blk_zone_reset_all_bio_endio(bio);
+ return;
+ case REQ_OP_ZONE_FINISH:
+ blk_zone_finish_bio_endio(bio);
+ return;
+ default:
+ return;
+ }
}
static void disk_zone_wplug_schedule_bio_work(struct gendisk *disk,
@@ -1106,6 +1131,30 @@ static void blk_zone_wplug_handle_native_zone_append(struct bio *bio)
disk_put_zone_wplug(zwplug);
}
+static bool blk_zone_wplug_handle_zone_mgmt(struct bio *bio)
+{
+ if (bio_op(bio) != REQ_OP_ZONE_RESET_ALL &&
+ !bdev_zone_is_seq(bio->bi_bdev, bio->bi_iter.bi_sector)) {
+ /*
+ * Zone reset and zone finish operations do not apply to
+ * conventional zones.
+ */
+ bio_io_error(bio);
+ return true;
+ }
+
+ /*
+ * No-wait zone management BIOs do not make much sense as the callers
+ * issue these as blocking operations in most cases. To avoid issues
+ * with the BIO execution potentially failing with BLK_STS_AGAIN, warn
+ * about REQ_NOWAIT being set and ignore that flag.
+ */
+ if (WARN_ON_ONCE(bio->bi_opf & REQ_NOWAIT))
+ bio->bi_opf &= ~REQ_NOWAIT;
+
+ return false;
+}
+
/**
* blk_zone_plug_bio - Handle a zone write BIO with zone write plugging
* @bio: The BIO being submitted
@@ -1153,12 +1202,9 @@ bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs)
case REQ_OP_WRITE_ZEROES:
return blk_zone_wplug_handle_write(bio, nr_segs);
case REQ_OP_ZONE_RESET:
- return blk_zone_wplug_handle_reset_or_finish(bio, 0);
case REQ_OP_ZONE_FINISH:
- return blk_zone_wplug_handle_reset_or_finish(bio,
- bdev_zone_sectors(bdev));
case REQ_OP_ZONE_RESET_ALL:
- return blk_zone_wplug_handle_reset_all(bio);
+ return blk_zone_wplug_handle_zone_mgmt(bio);
default:
return false;
}
@@ -1332,11 +1378,6 @@ static void blk_zone_wplug_bio_work(struct work_struct *work)
disk_put_zone_wplug(zwplug);
}
-static inline unsigned int disk_zone_wplugs_hash_size(struct gendisk *disk)
-{
- return 1U << disk->zone_wplugs_hash_bits;
-}
-
void disk_init_zone_resources(struct gendisk *disk)
{
spin_lock_init(&disk->zone_wplugs_lock);
diff --git a/block/blk.h b/block/blk.h
index 32a10024efba..4d809588b771 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -489,9 +489,23 @@ static inline bool blk_req_bio_is_zone_append(struct request *rq,
void blk_zone_write_plug_bio_merged(struct bio *bio);
void blk_zone_write_plug_init_request(struct request *rq);
void blk_zone_append_update_request_bio(struct request *rq, struct bio *bio);
+void blk_zone_mgmt_bio_endio(struct bio *bio);
void blk_zone_write_plug_bio_endio(struct bio *bio);
static inline void blk_zone_bio_endio(struct bio *bio)
{
+ /*
+ * Zone management BIOs may impact zone write plugs (e.g. a zone reset
+ * changes a zone write plug zone write pointer offset), but these
+ * operation do not go through zone write plugging as they may operate
+ * on zones that do not have a zone write
+ * plug. blk_zone_mgmt_bio_endio() handles the potential changes to zone
+ * write plugs that are present.
+ */
+ if (op_is_zone_mgmt(bio_op(bio))) {
+ blk_zone_mgmt_bio_endio(bio);
+ return;
+ }
+
/*
* For write BIOs to zoned devices, signal the completion of the BIO so
* that the next write BIO can be submitted by zone write plugging.
--
2.51.0
next prev parent reply other threads:[~2025-11-03 13:35 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-03 13:31 [PATCH v2 00/15] Introduce cached report zones Damien Le Moal
2025-11-03 13:31 ` Damien Le Moal [this message]
2025-11-03 13:53 ` [PATCH v2 01/15] block: handle zone management operations completions Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 02/15] block: freeze queue when updating zone resources Damien Le Moal
2025-11-03 13:46 ` Christoph Hellwig
2025-11-03 13:55 ` Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 03/15] block: cleanup blkdev_report_zones() Damien Le Moal
2025-11-03 13:56 ` Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 04/15] block: introduce disk_report_zone() Damien Le Moal
2025-11-03 14:01 ` Johannes Thumshirn
2025-11-04 0:36 ` kernel test robot
2025-11-03 13:31 ` [PATCH v2 05/15] block: reorganize struct blk_zone_wplug Damien Le Moal
2025-11-03 14:01 ` Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 06/15] block: use zone condition to determine conventional zones Damien Le Moal
2025-11-03 14:52 ` Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 07/15] block: track zone conditions Damien Le Moal
2025-11-03 15:00 ` Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 08/15] block: refactor blkdev_report_zones() code Damien Le Moal
2025-11-03 13:47 ` Christoph Hellwig
2025-11-03 15:01 ` Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 09/15] block: introduce blkdev_get_zone_info() Damien Le Moal
2025-11-03 15:12 ` Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 10/15] block: introduce blkdev_report_zones_cached() Damien Le Moal
2025-11-03 13:31 ` [PATCH v2 11/15] block: introduce BLKREPORTZONESV2 ioctl Damien Le Moal
2025-11-03 15:17 ` Johannes Thumshirn
2025-11-03 22:12 ` Bart Van Assche
2025-11-03 23:01 ` Damien Le Moal
2025-11-04 0:15 ` Damien Le Moal
2025-11-04 1:01 ` Bart Van Assche
2025-11-04 1:20 ` Damien Le Moal
2025-11-04 7:23 ` Johannes Thumshirn
2025-11-04 7:38 ` Damien Le Moal
2025-11-03 13:31 ` [PATCH v2 12/15] block: improve zone_wplugs debugfs attribute output Damien Le Moal
2025-11-03 13:47 ` Christoph Hellwig
2025-11-03 15:18 ` Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 13/15] block: add zone write plug condition to debugfs zone_wplugs Damien Le Moal
2025-11-03 15:23 ` Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 14/15] btrfs: use blkdev_report_zones_cached() Damien Le Moal
2025-11-03 15:26 ` Johannes Thumshirn
2025-11-03 13:31 ` [PATCH v2 15/15] xfs: " Damien Le Moal
2025-11-03 15:27 ` Johannes Thumshirn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251103133123.645038-2-dlemoal@kernel.org \
--to=dlemoal@kernel.org \
--cc=axboe@kernel.dk \
--cc=cem@kernel.org \
--cc=dm-devel@lists.linux.dev \
--cc=dsterba@suse.com \
--cc=hch@lst.de \
--cc=keith.busch@wdc.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mpatocka@redhat.com \
--cc=snitzer@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).