All of lore.kernel.org
 help / color / mirror / Atom feed
* add a flag to allow underordered zoned writes
@ 2025-11-18  7:03 Christoph Hellwig
  2025-11-18  7:03 ` [PATCH 1/3] block: add a bio_list_add_sorted helper Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Christoph Hellwig @ 2025-11-18  7:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Damien Le Moal, Mike Snitzer, Mikulas Patocka, linux-block,
	dm-devel

Hi all,

currently writes to sequential write zones either need to use Zone
Append, or have a local queue list to order writes before the
submission, just for the block layer than having another ordered list
in the zwplug to issue them one at a time.

This series allows to leverage the zwplug list to create ordering when
the submitter guarantees that it will fill any resulting gap, i.e. unless
an I/O error happens there will be no missing I/Os.  Users of zoned
devices have to do that anyway as they can't leave gaps, but we can't
guaranteed that for user I/O.  Kernel I/O on the other is trusted and can
set this flag.

This series adds the support and converts dm-kcopyd as the most trivial
user.  File system conversion will be a bit more complex as the
call chains are bit more complex and need a full audit.

Diffstat:
 block/blk-mq-debugfs.c    |    1 
 block/blk-zoned.c         |   61 ++++++++++++++++++++++++++++++++++++----------
 drivers/md/dm-kcopyd.c    |   48 ++++--------------------------------
 include/linux/bio.h       |   21 +++++++++++++++
 include/linux/blk_types.h |    2 +
 5 files changed, 79 insertions(+), 54 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] block: add a bio_list_add_sorted helper
  2025-11-18  7:03 add a flag to allow underordered zoned writes Christoph Hellwig
@ 2025-11-18  7:03 ` Christoph Hellwig
  2025-11-19  3:09   ` Damien Le Moal
  2025-11-18  7:03 ` [PATCH 2/3] block: add a REQ_ZWPLUG_UNORDERED flag Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-11-18  7:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Damien Le Moal, Mike Snitzer, Mikulas Patocka, linux-block,
	dm-devel

Add a helper to insert a bio into a bio_list so that the list remains
sorted by bi_sector.

While it is on the upper hand of the size for an inline function, it is a
trade off for having it with the common helpers, but not bloating the
kernel unless the so far only user is enabled.  If it gets more widely
used moving it out of line should be considered.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/bio.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index ad2d57908c1c..e5f92bc88eea 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -565,6 +565,27 @@ static inline void bio_list_add_head(struct bio_list *bl, struct bio *bio)
 		bl->tail = bio;
 }
 
+static inline void bio_list_add_sorted(struct bio_list *bl, struct bio *bio)
+{
+	if (bio_list_empty(bl) ||
+	    bio->bi_iter.bi_sector > bl->tail->bi_iter.bi_sector) {
+		bio_list_add(bl, bio);
+	} else if (bio->bi_iter.bi_sector < bl->head->bi_iter.bi_sector) {
+		bio_list_add_head(bl, bio);
+	} else {
+		struct bio *n = bl->head, *next;
+
+		while ((next = n->bi_next) != NULL) {
+			if (bio->bi_iter.bi_sector < next->bi_iter.bi_sector) {
+				bio->bi_next = next;
+				n->bi_next = bio;
+				break;
+			}
+			n = next;
+		}
+	}
+}
+
 static inline void bio_list_merge(struct bio_list *bl, struct bio_list *bl2)
 {
 	if (!bl2->head)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] block: add a REQ_ZWPLUG_UNORDERED flag
  2025-11-18  7:03 add a flag to allow underordered zoned writes Christoph Hellwig
  2025-11-18  7:03 ` [PATCH 1/3] block: add a bio_list_add_sorted helper Christoph Hellwig
@ 2025-11-18  7:03 ` Christoph Hellwig
  2025-11-19  3:17   ` Damien Le Moal
  2025-11-18  7:03 ` [PATCH 3/3] dm-kcopyd: use REQ_ZWPLUG_UNORDERED Christoph Hellwig
  2025-11-19  6:10 ` add a flag to allow underordered zoned writes Bart Van Assche
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-11-18  7:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Damien Le Moal, Mike Snitzer, Mikulas Patocka, linux-block,
	dm-devel

This instructs the zone write plugging code to queue up any bio not
at the write pointer, and includes an implicit guarantee that the caller
will fill any sector gaps.  I.e., this can be used by file systems and
stacking block drivers, but not for untrusted user block device writes.

Because all writes through the write plug cancel all outstanding writes
for the plug there is no risk that queue up writes for higher sectors are
stuck in the zone write plug even on error.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-debugfs.c    |  1 +
 block/blk-zoned.c         | 61 +++++++++++++++++++++++++++++++--------
 include/linux/blk_types.h |  2 ++
 3 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 4896525b1c05..3b0e3ebf35b2 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -222,6 +222,7 @@ static const char *const cmd_flag_name[] = {
 	CMD_FLAG_NAME(FS_PRIVATE),
 	CMD_FLAG_NAME(ATOMIC),
 	CMD_FLAG_NAME(NOUNMAP),
+	CMD_FLAG_NAME(ZWPLUG_UNORDERED),
 };
 #undef CMD_FLAG_NAME
 
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index dcc295721c2c..5bed52d28ed8 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -607,8 +607,12 @@ static inline bool disk_should_remove_zone_wplug(struct gendisk *disk,
 	if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED)
 		return false;
 
-	/* If the zone write plug is still plugged, it cannot be removed. */
-	if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)
+	/*
+	 * The zone write plug can't be removed if it is still plugged or there
+	 * are bios queued up behind the write pointer.
+	 */
+	if ((zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) ||
+	    !bio_list_empty(&zwplug->bio_list))
 		return false;
 
 	/*
@@ -1188,6 +1192,15 @@ void blk_zone_mgmt_bio_endio(struct bio *bio)
 	}
 }
 
+static bool blk_zwplug_has_bio_at_write_pointer(struct blk_zone_wplug *zwplug)
+{
+	struct bio *bio = bio_list_peek(&zwplug->bio_list);
+
+	return bio &&
+		(bio_op(bio) == REQ_OP_ZONE_APPEND ||
+		 bio_offset_from_zone_start(bio) == zwplug->wp_offset);
+}
+
 static void disk_zone_wplug_schedule_bio_work(struct gendisk *disk,
 					      struct blk_zone_wplug *zwplug)
 {
@@ -1231,10 +1244,15 @@ static inline void disk_zone_wplug_add_bio(struct gendisk *disk,
 	/*
 	 * We always receive BIOs after they are split and ready to be issued.
 	 * The block layer passes the parts of a split BIO in order, and the
-	 * user must also issue write sequentially. So simply add the new BIO
-	 * at the tail of the list to preserve the sequential write order.
+	 * user must also issue write sequentially unless REQ_ZWPLUG_UNORDERED
+	 * is set. So simply add the new BIO at the tail of the list to preserve
+	 * the sequential write order.
 	 */
-	bio_list_add(&zwplug->bio_list, bio);
+	if (bio->bi_opf & REQ_ZWPLUG_UNORDERED)
+		bio_list_add_sorted(&zwplug->bio_list, bio);
+	else
+		bio_list_add(&zwplug->bio_list, bio);
+
 	trace_disk_zone_wplug_add_bio(zwplug->disk->queue, zwplug->zone_no,
 				      bio->bi_iter.bi_sector, bio_sectors(bio));
 }
@@ -1403,7 +1421,8 @@ static bool blk_zone_wplug_prepare_bio(struct blk_zone_wplug *zwplug,
 static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs)
 {
 	struct gendisk *disk = bio->bi_bdev->bd_disk;
-	sector_t sector = bio->bi_iter.bi_sector;
+	sector_t zone_offset = bio_offset_from_zone_start(bio);
+	sector_t zone_start = bio->bi_iter.bi_sector - zone_offset;
 	struct blk_zone_wplug *zwplug;
 	gfp_t gfp_mask = GFP_NOIO;
 	unsigned long flags;
@@ -1422,7 +1441,7 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs)
 	}
 
 	/* Conventional zones do not need write plugging. */
-	if (!bdev_zone_is_seq(bio->bi_bdev, sector)) {
+	if (!bdev_zone_is_seq(bio->bi_bdev, zone_start)) {
 		/* Zone append to conventional zones is not allowed. */
 		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
 			bio_io_error(bio);
@@ -1434,7 +1453,8 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs)
 	if (bio->bi_opf & REQ_NOWAIT)
 		gfp_mask = GFP_NOWAIT;
 
-	zwplug = disk_get_and_lock_zone_wplug(disk, sector, gfp_mask, &flags);
+	zwplug = disk_get_and_lock_zone_wplug(disk, zone_start, gfp_mask,
+			&flags);
 	if (!zwplug) {
 		if (bio->bi_opf & REQ_NOWAIT)
 			bio_wouldblock_error(bio);
@@ -1459,6 +1479,15 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs)
 	if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)
 		goto queue_bio;
 
+	/*
+	 * For REQ_ZWPLUG_UNORDERED, the caller guarantees it will submit all
+	 * bios that fill unordered gaps, so just queue up the bio if it is
+	 * past the write pointer.
+	 */
+	if ((bio->bi_opf & REQ_ZWPLUG_UNORDERED) &&
+	    zone_offset > zwplug->wp_offset)
+		goto queue_bio;
+
 	if (!blk_zone_wplug_prepare_bio(zwplug, bio)) {
 		spin_unlock_irqrestore(&zwplug->lock, flags);
 		bio_io_error(bio);
@@ -1475,7 +1504,15 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs)
 queue_bio:
 	disk_zone_wplug_add_bio(disk, zwplug, bio, nr_segs);
 
-	if (!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)) {
+	/*
+	 * If this bio is at the write pointer, schedule the work now.  Normally
+	 * bios at the write pointer are immediately submitted unless there is
+	 * another write to the zone in-flight, but for NOWAIT I/O we can end up
+	 * here even for bios at the write pointer, and for REQ_ZWPLUG_UNORDERED
+	 * we might have to queue up bios even when no I/O is in-flight.
+	 */
+	if (!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) &&
+	    blk_zwplug_has_bio_at_write_pointer(zwplug)) {
 		zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED;
 		disk_zone_wplug_schedule_bio_work(disk, zwplug);
 	}
@@ -1619,7 +1656,7 @@ static void disk_zone_wplug_unplug_bio(struct gendisk *disk,
 	spin_lock_irqsave(&zwplug->lock, flags);
 
 	/* Schedule submission of the next plugged BIO if we have one. */
-	if (!bio_list_empty(&zwplug->bio_list)) {
+	if (blk_zwplug_has_bio_at_write_pointer(zwplug)) {
 		disk_zone_wplug_schedule_bio_work(disk, zwplug);
 		spin_unlock_irqrestore(&zwplug->lock, flags);
 		return;
@@ -1738,13 +1775,13 @@ static void blk_zone_wplug_bio_work(struct work_struct *work)
 	 */
 again:
 	spin_lock_irqsave(&zwplug->lock, flags);
-	bio = bio_list_pop(&zwplug->bio_list);
-	if (!bio) {
+	if (!blk_zwplug_has_bio_at_write_pointer(zwplug)) {
 		zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED;
 		spin_unlock_irqrestore(&zwplug->lock, flags);
 		goto put_zwplug;
 	}
 
+	bio = bio_list_pop(&zwplug->bio_list);
 	trace_blk_zone_wplug_bio(zwplug->disk->queue, zwplug->zone_no,
 				 bio->bi_iter.bi_sector, bio_sectors(bio));
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d884cc1256ec..e41ab9404a74 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -390,6 +390,7 @@ enum req_flag_bits {
 	__REQ_POLLED,		/* caller polls for completion using bio_poll */
 	__REQ_ALLOC_CACHE,	/* allocate IO from cache if available */
 	__REQ_SWAP,		/* swap I/O */
+	__REQ_ZWPLUG_UNORDERED,	/* might not be at write pointer */
 	__REQ_DRV,		/* for driver use */
 	__REQ_FS_PRIVATE,	/* for file system (submitter) use */
 	__REQ_ATOMIC,		/* for atomic write operations */
@@ -422,6 +423,7 @@ enum req_flag_bits {
 #define REQ_POLLED	(__force blk_opf_t)(1ULL << __REQ_POLLED)
 #define REQ_ALLOC_CACHE	(__force blk_opf_t)(1ULL << __REQ_ALLOC_CACHE)
 #define REQ_SWAP	(__force blk_opf_t)(1ULL << __REQ_SWAP)
+#define REQ_ZWPLUG_UNORDERED (__force blk_opf_t)(1ULL << __REQ_ZWPLUG_UNORDERED)
 #define REQ_DRV		(__force blk_opf_t)(1ULL << __REQ_DRV)
 #define REQ_FS_PRIVATE	(__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
 #define REQ_ATOMIC	(__force blk_opf_t)(1ULL << __REQ_ATOMIC)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] dm-kcopyd: use REQ_ZWPLUG_UNORDERED
  2025-11-18  7:03 add a flag to allow underordered zoned writes Christoph Hellwig
  2025-11-18  7:03 ` [PATCH 1/3] block: add a bio_list_add_sorted helper Christoph Hellwig
  2025-11-18  7:03 ` [PATCH 2/3] block: add a REQ_ZWPLUG_UNORDERED flag Christoph Hellwig
@ 2025-11-18  7:03 ` Christoph Hellwig
  2025-11-19  3:18   ` Damien Le Moal
  2025-11-19  6:10 ` add a flag to allow underordered zoned writes Bart Van Assche
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-11-18  7:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Damien Le Moal, Mike Snitzer, Mikulas Patocka, linux-block,
	dm-devel

Use the new REQ_ZWPLUG_UNORDERED bio flag so that even copies to
sequential write required zones can queue up writes bios as soon as the
reads finish.  This simplifies the code and reduces the submission
latency as all writes are immediately available in the zoned write plug
list when the previous write completes instead of going through two
layers of queueing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm-kcopyd.c | 48 ++++++------------------------------------
 1 file changed, 6 insertions(+), 42 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 6ea75436a433..37e181c79760 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -383,7 +383,6 @@ struct kcopyd_job {
 	struct mutex lock;
 	atomic_t sub_jobs;
 	sector_t progress;
-	sector_t write_offset;
 
 	struct kcopyd_job *master_job;
 };
@@ -411,50 +410,17 @@ void dm_kcopyd_exit(void)
 }
 
 /*
- * Functions to push and pop a job onto the head of a given job
- * list.
+ * Functions to push and pop a job onto the head of a given job list.
  */
-static struct kcopyd_job *pop_io_job(struct list_head *jobs,
-				     struct dm_kcopyd_client *kc)
-{
-	struct kcopyd_job *job;
-
-	/*
-	 * For I/O jobs, pop any read, any write without sequential write
-	 * constraint and sequential writes that are at the right position.
-	 */
-	list_for_each_entry(job, jobs, list) {
-		if (job->op == REQ_OP_READ ||
-		    !(job->flags & BIT(DM_KCOPYD_WRITE_SEQ))) {
-			list_del(&job->list);
-			return job;
-		}
-
-		if (job->write_offset == job->master_job->write_offset) {
-			job->master_job->write_offset += job->source.count;
-			list_del(&job->list);
-			return job;
-		}
-	}
-
-	return NULL;
-}
-
 static struct kcopyd_job *pop(struct list_head *jobs,
 			      struct dm_kcopyd_client *kc)
 {
 	struct kcopyd_job *job = NULL;
 
 	spin_lock_irq(&kc->job_lock);
-
-	if (!list_empty(jobs)) {
-		if (jobs == &kc->io_jobs)
-			job = pop_io_job(jobs, kc);
-		else {
-			job = list_entry(jobs->next, struct kcopyd_job, list);
-			list_del(&job->list);
-		}
-	}
+	job = list_first_entry_or_null(jobs, struct kcopyd_job, list);
+	if (job)
+		list_del(&job->list);
 	spin_unlock_irq(&kc->job_lock);
 
 	return job;
@@ -541,7 +507,7 @@ static void complete_io(unsigned long error, void *context)
 		push(&kc->complete_jobs, job);
 
 	else {
-		job->op = REQ_OP_WRITE;
+		job->op = REQ_OP_WRITE | REQ_ZWPLUG_UNORDERED;
 		push(&kc->io_jobs, job);
 	}
 
@@ -730,7 +696,6 @@ static void segment_complete(int read_err, unsigned long write_err,
 		int i;
 
 		*sub_job = *job;
-		sub_job->write_offset = progress;
 		sub_job->source.sector += progress;
 		sub_job->source.count = count;
 
@@ -833,7 +798,7 @@ void dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
 		/*
 		 * Use WRITE ZEROES to optimize zeroing if all dests support it.
 		 */
-		job->op = REQ_OP_WRITE_ZEROES;
+		job->op = REQ_OP_WRITE_ZEROES | REQ_ZWPLUG_UNORDERED;
 		for (i = 0; i < job->num_dests; i++)
 			if (!bdev_write_zeroes_sectors(job->dests[i].bdev)) {
 				job->op = REQ_OP_WRITE;
@@ -844,7 +809,6 @@ void dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
 	job->fn = fn;
 	job->context = context;
 	job->master_job = job;
-	job->write_offset = 0;
 
 	if (job->source.count <= kc->sub_job_size)
 		dispatch_job(job);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] block: add a bio_list_add_sorted helper
  2025-11-18  7:03 ` [PATCH 1/3] block: add a bio_list_add_sorted helper Christoph Hellwig
@ 2025-11-19  3:09   ` Damien Le Moal
  0 siblings, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2025-11-19  3:09 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Mike Snitzer, Mikulas Patocka, linux-block, dm-devel

On 11/18/25 16:03, Christoph Hellwig wrote:
> Add a helper to insert a bio into a bio_list so that the list remains
> sorted by bi_sector.
> 
> While it is on the upper hand of the size for an inline function, it is a
> trade off for having it with the common helpers, but not bloating the
> kernel unless the so far only user is enabled.  If it gets more widely
> used moving it out of line should be considered.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] block: add a REQ_ZWPLUG_UNORDERED flag
  2025-11-18  7:03 ` [PATCH 2/3] block: add a REQ_ZWPLUG_UNORDERED flag Christoph Hellwig
@ 2025-11-19  3:17   ` Damien Le Moal
  0 siblings, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2025-11-19  3:17 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Mike Snitzer, Mikulas Patocka, linux-block, dm-devel

On 11/18/25 16:03, Christoph Hellwig wrote:
> This instructs the zone write plugging code to queue up any bio not
> at the write pointer, and includes an implicit guarantee that the caller
> will fill any sector gaps.  I.e., this can be used by file systems and
> stacking block drivers, but not for untrusted user block device writes.
> 
> Because all writes through the write plug cancel all outstanding writes
> for the plug there is no risk that queue up writes for higher sectors are
> stuck in the zone write plug even on error.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] dm-kcopyd: use REQ_ZWPLUG_UNORDERED
  2025-11-18  7:03 ` [PATCH 3/3] dm-kcopyd: use REQ_ZWPLUG_UNORDERED Christoph Hellwig
@ 2025-11-19  3:18   ` Damien Le Moal
  0 siblings, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2025-11-19  3:18 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Mike Snitzer, Mikulas Patocka, linux-block, dm-devel

On 11/18/25 16:03, Christoph Hellwig wrote:
> Use the new REQ_ZWPLUG_UNORDERED bio flag so that even copies to
> sequential write required zones can queue up writes bios as soon as the
> reads finish.  This simplifies the code and reduces the submission
> latency as all writes are immediately available in the zoned write plug
> list when the previous write completes instead of going through two
> layers of queueing.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Nice cleanup !

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: add a flag to allow underordered zoned writes
  2025-11-18  7:03 add a flag to allow underordered zoned writes Christoph Hellwig
                   ` (2 preceding siblings ...)
  2025-11-18  7:03 ` [PATCH 3/3] dm-kcopyd: use REQ_ZWPLUG_UNORDERED Christoph Hellwig
@ 2025-11-19  6:10 ` Bart Van Assche
  2025-11-19  7:58   ` Christoph Hellwig
  3 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2025-11-19  6:10 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Damien Le Moal, Mike Snitzer, Mikulas Patocka, linux-block,
	dm-devel

On 11/17/25 11:03 PM, Christoph Hellwig wrote:
> currently writes to sequential write zones either need to use Zone
> Append, or have a local queue list to order writes before the
> submission, just for the block layer than having another ordered list
> in the zwplug to issue them one at a time.
> 
> This series allows to leverage the zwplug list to create ordering when
> the submitter guarantees that it will fill any resulting gap, i.e. unless
> an I/O error happens there will be no missing I/Os.  Users of zoned
> devices have to do that anyway as they can't leave gaps, but we can't
> guaranteed that for user I/O.  Kernel I/O on the other is trusted and can
> set this flag.
> 
> This series adds the support and converts dm-kcopyd as the most trivial
> user.  File system conversion will be a bit more complex as the
> call chains are bit more complex and need a full audit.

Hi Christoph,

Is the following deadlock inherent to this approach?
- Several bios are present on zwplug->bio_list and these bios cannot be
   submitted because their starting offset is past the write pointer.
- No new bios can be allocated because all memory is in use.
- A deadlock occurs because none of the queued bios can be submitted
   and no new bio can be allocated.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: add a flag to allow underordered zoned writes
  2025-11-19  6:10 ` add a flag to allow underordered zoned writes Bart Van Assche
@ 2025-11-19  7:58   ` Christoph Hellwig
  2025-11-19 17:00     ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-11-19  7:58 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, Mike Snitzer,
	Mikulas Patocka, linux-block, dm-devel

On Tue, Nov 18, 2025 at 10:10:33PM -0800, Bart Van Assche wrote:
> Is the following deadlock inherent to this approach?
> - Several bios are present on zwplug->bio_list and these bios cannot be
>   submitted because their starting offset is past the write pointer.
> - No new bios can be allocated because all memory is in use.
> - A deadlock occurs because none of the queued bios can be submitted
>   and no new bio can be allocated.

This can if the number of in-flight bios is larger than bio mempool
reserve.  Which is real, but we also have this in various other
submitters that hold bios to submit in a list.  This suggests it is
mostly theoretical, but I'd still love to address it for all these
cases.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: add a flag to allow underordered zoned writes
  2025-11-19  7:58   ` Christoph Hellwig
@ 2025-11-19 17:00     ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2025-11-19 17:00 UTC (permalink / raw)
  To: Christoph Hellwig, Bart Van Assche
  Cc: Damien Le Moal, Mike Snitzer, Mikulas Patocka, linux-block,
	dm-devel

On 11/19/25 12:58 AM, Christoph Hellwig wrote:
> On Tue, Nov 18, 2025 at 10:10:33PM -0800, Bart Van Assche wrote:
>> Is the following deadlock inherent to this approach?
>> - Several bios are present on zwplug->bio_list and these bios cannot be
>>   submitted because their starting offset is past the write pointer.
>> - No new bios can be allocated because all memory is in use.
>> - A deadlock occurs because none of the queued bios can be submitted
>>   and no new bio can be allocated.
> 
> This can if the number of in-flight bios is larger than bio mempool
> reserve.  Which is real, but we also have this in various other
> submitters that hold bios to submit in a list.  This suggests it is
> mostly theoretical, but I'd still love to address it for all these
> cases.

The default bio pool size is 2 entries, which is perfectly fine for the
"normal" alloc-and-submit cases. But yes if you're doing a bunch of
allocations before submitting, then it could certainly deadlock if the
normal allocation fails and we end up sleeping in the mempool waiting
for previous unsubmitted IO to finish.

It's certainly a bit tricky to handle... But definitely something we
should take a stab at, as it basically voids the forward progress
guarantee and is surely something that will cause production workload
deadlocks if REQ_ZWPLUG_UNORDERED is used.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-11-19 17:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-18  7:03 add a flag to allow underordered zoned writes Christoph Hellwig
2025-11-18  7:03 ` [PATCH 1/3] block: add a bio_list_add_sorted helper Christoph Hellwig
2025-11-19  3:09   ` Damien Le Moal
2025-11-18  7:03 ` [PATCH 2/3] block: add a REQ_ZWPLUG_UNORDERED flag Christoph Hellwig
2025-11-19  3:17   ` Damien Le Moal
2025-11-18  7:03 ` [PATCH 3/3] dm-kcopyd: use REQ_ZWPLUG_UNORDERED Christoph Hellwig
2025-11-19  3:18   ` Damien Le Moal
2025-11-19  6:10 ` add a flag to allow underordered zoned writes Bart Van Assche
2025-11-19  7:58   ` Christoph Hellwig
2025-11-19 17:00     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.