linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices
@ 2023-05-16 22:33 Bart Van Assche
  2023-05-16 22:33 ` [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock() Bart Van Assche
                   ` (10 more replies)
  0 siblings, 11 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche

Hi Jens,

This patch series improves support for zoned block devices in the mq-deadline
scheduler by preserving the order of requeued writes (REQ_OP_WRITE and
REQ_OP_WRITE_ZEROES).

Please consider this patch series for the next merge window.

Thanks,

Bart.

Changes compared to v4:
- Changed blk_rq_is_seq_zoned_write() into an inline function.
- Reworked patch "Reduce lock contention" such that all merged requests are
  freed at once.

Changes compared to v3:
- Addressed Christoph's review feedback.
- Dropped patch "block: Micro-optimize blk_req_needs_zone_write_lock()".
- Added three new patches:
  * block: Fix the type of the second bdev_op_is_zoned_write() argument
  * block: Introduce op_is_zoned_write()
  * block: mq-deadline: Reduce lock contention

Changes compared to v2:
- In the patch that micro-optimizes blk_req_needs_zone_write_lock(), inline
  bdev_op_is_zoned_write() instead of modifying it.
- In patch "block: Introduce blk_rq_is_seq_zoned_write()", converted "case
  REQ_OP_ZONE_APPEND" into a source code comment.
- Reworked deadline_skip_seq_writes() as suggested by Christoph.
- Dropped the patch that disabled head insertion for zoned writes.
- Dropped patch "mq-deadline: Fix a race condition related to zoned writes".
- Reworked handling of requeued requests: the 'next_rq' pointer has been
  removed and instead the position of the most recently dispatched request is
  tracked.
- Dropped the patches for tracking zone capacity and for restricting the number
  of active zones.

Changes compared to v1:
- Left out the patches related to request insertion and requeuing since
  Christoph is busy with reworking these patches.
- Added a patch for enforcing the active zone limit.

Bart Van Assche (11):
  block: Simplify blk_req_needs_zone_write_lock()
  block: Fix the type of the second bdev_op_is_zoned_write() argument
  block: Introduce op_is_zoned_write()
  block: Introduce blk_rq_is_seq_zoned_write()
  block: mq-deadline: Clean up deadline_check_fifo()
  block: mq-deadline: Simplify deadline_skip_seq_writes()
  block: mq-deadline: Improve deadline_skip_seq_writes()
  block: mq-deadline: Reduce lock contention
  block: mq-deadline: Track the dispatch position
  block: mq-deadline: Handle requeued requests correctly
  block: mq-deadline: Fix handling of at-head zoned writes

 block/blk-zoned.c      |   8 +--
 block/mq-deadline.c    | 123 +++++++++++++++++++++++++++--------------
 include/linux/blk-mq.h |  16 ++++++
 include/linux/blkdev.h |  13 +++--
 4 files changed, 108 insertions(+), 52 deletions(-)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock()
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-16 23:23   ` Damien Le Moal
                     ` (2 more replies)
  2023-05-16 22:33 ` [PATCH v5 02/11] block: Fix the type of the second bdev_op_is_zoned_write() argument Bart Van Assche
                   ` (9 subsequent siblings)
  10 siblings, 3 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei

Remove the blk_rq_is_passthrough() check because it is redundant:
blk_req_needs_zone_write_lock() also calls bdev_op_is_zoned_write()
and the latter function returns false for pass-through requests.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-zoned.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index fce9082384d6..835d9e937d4d 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -57,9 +57,6 @@ EXPORT_SYMBOL_GPL(blk_zone_cond_str);
  */
 bool blk_req_needs_zone_write_lock(struct request *rq)
 {
-	if (blk_rq_is_passthrough(rq))
-		return false;
-
 	if (!rq->q->disk->seq_zones_wlock)
 		return false;
 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 02/11] block: Fix the type of the second bdev_op_is_zoned_write() argument
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
  2023-05-16 22:33 ` [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock() Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-16 23:26   ` Damien Le Moal
  2023-05-17  7:37   ` Hannes Reinecke
  2023-05-16 22:33 ` [PATCH v5 03/11] block: Introduce op_is_zoned_write() Bart Van Assche
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Johannes Thumshirn, Pankaj Raghav, Damien Le Moal, Ming Lei

Change the type of the second argument of bdev_op_is_zoned_write() from
blk_opf_t into enum req_op because this function expects an operation
without flags as second argument.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Fixes: 8cafdb5ab94c ("block: adapt blk_mq_plug() to not plug for writes that require a zone lock")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/linux/blkdev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b441e633f4dd..db24cf98ccfb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1282,7 +1282,7 @@ static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
 }
 
 static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
-					  blk_opf_t op)
+					  enum req_op op)
 {
 	if (!bdev_is_zoned(bdev))
 		return false;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 03/11] block: Introduce op_is_zoned_write()
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
  2023-05-16 22:33 ` [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock() Bart Van Assche
  2023-05-16 22:33 ` [PATCH v5 02/11] block: Fix the type of the second bdev_op_is_zoned_write() argument Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-16 23:30   ` Damien Le Moal
  2023-05-17  7:37   ` Hannes Reinecke
  2023-05-16 22:33 ` [PATCH v5 04/11] block: Introduce blk_rq_is_seq_zoned_write() Bart Van Assche
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei

Introduce a helper function for checking whether write serialization is
required if the operation will be sent to a zoned device. A second caller
for op_is_zoned_write() will be introduced in the next patch in this
series.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/linux/blkdev.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index db24cf98ccfb..a4f85781060c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1281,13 +1281,16 @@ static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
 	return disk_zone_no(bdev->bd_disk, sec);
 }
 
+/* Whether write serialization is required for @op on zoned devices. */
+static inline bool op_is_zoned_write(enum req_op op)
+{
+	return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES;
+}
+
 static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
 					  enum req_op op)
 {
-	if (!bdev_is_zoned(bdev))
-		return false;
-
-	return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES;
+	return bdev_is_zoned(bdev) && op_is_zoned_write(op);
 }
 
 static inline sector_t bdev_zone_sectors(struct block_device *bdev)

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 04/11] block: Introduce blk_rq_is_seq_zoned_write()
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-05-16 22:33 ` [PATCH v5 03/11] block: Introduce op_is_zoned_write() Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-17  0:01   ` Damien Le Moal
                     ` (2 more replies)
  2023-05-16 22:33 ` [PATCH v5 05/11] block: mq-deadline: Clean up deadline_check_fifo() Bart Van Assche
                   ` (6 subsequent siblings)
  10 siblings, 3 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei

Introduce the function blk_rq_is_seq_zoned_write(). This function will
be used in later patches to preserve the order of zoned writes that
require write serialization.

This patch includes an optimization: instead of using
rq->q->disk->part0->bd_queue to check whether or not the queue is
associated with a zoned block device, use rq->q->disk->queue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-zoned.c      |  5 +----
 include/linux/blk-mq.h | 16 ++++++++++++++++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 835d9e937d4d..096b6b47561f 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -60,10 +60,7 @@ bool blk_req_needs_zone_write_lock(struct request *rq)
 	if (!rq->q->disk->seq_zones_wlock)
 		return false;
 
-	if (bdev_op_is_zoned_write(rq->q->disk->part0, req_op(rq)))
-		return blk_rq_zone_is_seq(rq);
-
-	return false;
+	return blk_rq_is_seq_zoned_write(rq);
 }
 EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 06caacd77ed6..301d72f85486 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -1164,6 +1164,17 @@ static inline unsigned int blk_rq_zone_is_seq(struct request *rq)
 	return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq));
 }
 
+/**
+ * blk_rq_is_seq_zoned_write() - Check if @rq requires write serialization.
+ * @rq: Request to examine.
+ *
+ * Note: REQ_OP_ZONE_APPEND requests do not require serialization.
+ */
+static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
+{
+	return op_is_zoned_write(req_op(rq)) && blk_rq_zone_is_seq(rq);
+}
+
 bool blk_req_needs_zone_write_lock(struct request *rq);
 bool blk_req_zone_write_trylock(struct request *rq);
 void __blk_req_zone_write_lock(struct request *rq);
@@ -1194,6 +1205,11 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
 	return !blk_req_zone_is_write_locked(rq);
 }
 #else /* CONFIG_BLK_DEV_ZONED */
+static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
+{
+	return false;
+}
+
 static inline bool blk_req_needs_zone_write_lock(struct request *rq)
 {
 	return false;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 05/11] block: mq-deadline: Clean up deadline_check_fifo()
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-05-16 22:33 ` [PATCH v5 04/11] block: Introduce blk_rq_is_seq_zoned_write() Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-17  1:02   ` Damien Le Moal
  2023-05-17  7:39   ` Hannes Reinecke
  2023-05-16 22:33 ` [PATCH v5 06/11] block: mq-deadline: Simplify deadline_skip_seq_writes() Bart Van Assche
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei

Change the return type of deadline_check_fifo() from 'int' into 'bool'.
Use time_is_before_eq_jiffies() instead of time_after_eq(). No
functionality has been changed.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 5839a027e0f0..a016cafa54b3 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -272,21 +272,15 @@ static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
 }
 
 /*
- * deadline_check_fifo returns 0 if there are no expired requests on the fifo,
- * 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
+ * deadline_check_fifo returns true if and only if there are expired requests
+ * in the FIFO list. Requires !list_empty(&dd->fifo_list[data_dir]).
  */
-static inline int deadline_check_fifo(struct dd_per_prio *per_prio,
-				      enum dd_data_dir data_dir)
+static inline bool deadline_check_fifo(struct dd_per_prio *per_prio,
+				       enum dd_data_dir data_dir)
 {
 	struct request *rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
 
-	/*
-	 * rq is expired!
-	 */
-	if (time_after_eq(jiffies, (unsigned long)rq->fifo_time))
-		return 1;
-
-	return 0;
+	return time_is_before_eq_jiffies((unsigned long)rq->fifo_time);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 06/11] block: mq-deadline: Simplify deadline_skip_seq_writes()
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
                   ` (4 preceding siblings ...)
  2023-05-16 22:33 ` [PATCH v5 05/11] block: mq-deadline: Clean up deadline_check_fifo() Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-17  7:40   ` Hannes Reinecke
  2023-05-16 22:33 ` [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes() Bart Van Assche
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei

Make the deadline_skip_seq_writes() code shorter without changing its
functionality.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index a016cafa54b3..6276afede9cd 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -304,14 +304,11 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
 						struct request *rq)
 {
 	sector_t pos = blk_rq_pos(rq);
-	sector_t skipped_sectors = 0;
 
-	while (rq) {
-		if (blk_rq_pos(rq) != pos + skipped_sectors)
-			break;
-		skipped_sectors += blk_rq_sectors(rq);
+	do {
+		pos += blk_rq_sectors(rq);
 		rq = deadline_latter_request(rq);
-	}
+	} while (rq && blk_rq_pos(rq) == pos);
 
 	return rq;
 }

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes()
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
                   ` (5 preceding siblings ...)
  2023-05-16 22:33 ` [PATCH v5 06/11] block: mq-deadline: Simplify deadline_skip_seq_writes() Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-17  1:06   ` Damien Le Moal
  2023-05-17  7:41   ` Hannes Reinecke
  2023-05-16 22:33 ` [PATCH v5 08/11] block: mq-deadline: Reduce lock contention Bart Van Assche
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei

Make deadline_skip_seq_writes() do what its name suggests, namely to
skip sequential writes.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 6276afede9cd..dbc0feca963e 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -308,7 +308,7 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
 	do {
 		pos += blk_rq_sectors(rq);
 		rq = deadline_latter_request(rq);
-	} while (rq && blk_rq_pos(rq) == pos);
+	} while (rq && blk_rq_pos(rq) == pos && blk_rq_is_seq_zoned_write(rq));
 
 	return rq;
 }

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 08/11] block: mq-deadline: Reduce lock contention
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
                   ` (6 preceding siblings ...)
  2023-05-16 22:33 ` [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes() Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-17  1:07   ` Damien Le Moal
                     ` (2 more replies)
  2023-05-16 22:33 ` [PATCH v5 09/11] block: mq-deadline: Track the dispatch position Bart Van Assche
                   ` (2 subsequent siblings)
  10 siblings, 3 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei

blk_mq_free_requests() calls dd_finish_request() indirectly. Prevent
nested locking of dd->lock and dd->zone_lock by moving the code for
freeing requests.

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index dbc0feca963e..06af9c28a3bf 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -757,7 +757,7 @@ static bool dd_bio_merge(struct request_queue *q, struct bio *bio,
  * add rq to rbtree and fifo
  */
 static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
-			      blk_insert_t flags)
+			      blk_insert_t flags, struct list_head *free)
 {
 	struct request_queue *q = hctx->queue;
 	struct deadline_data *dd = q->elevator->elevator_data;
@@ -766,7 +766,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
 	struct dd_per_prio *per_prio;
 	enum dd_prio prio;
-	LIST_HEAD(free);
 
 	lockdep_assert_held(&dd->lock);
 
@@ -783,10 +782,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		rq->elv.priv[0] = (void *)(uintptr_t)1;
 	}
 
-	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
-		blk_mq_free_requests(&free);
+	if (blk_mq_sched_try_insert_merge(q, rq, free))
 		return;
-	}
 
 	trace_block_rq_insert(rq);
 
@@ -819,6 +816,7 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 {
 	struct request_queue *q = hctx->queue;
 	struct deadline_data *dd = q->elevator->elevator_data;
+	LIST_HEAD(free);
 
 	spin_lock(&dd->lock);
 	while (!list_empty(list)) {
@@ -826,9 +824,11 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 
 		rq = list_first_entry(list, struct request, queuelist);
 		list_del_init(&rq->queuelist);
-		dd_insert_request(hctx, rq, flags);
+		dd_insert_request(hctx, rq, flags, &free);
 	}
 	spin_unlock(&dd->lock);
+
+	blk_mq_free_requests(&free);
 }
 
 /* Callback from inside blk_mq_rq_ctx_init(). */

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 09/11] block: mq-deadline: Track the dispatch position
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
                   ` (7 preceding siblings ...)
  2023-05-16 22:33 ` [PATCH v5 08/11] block: mq-deadline: Reduce lock contention Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-17  1:13   ` Damien Le Moal
  2023-05-17  7:45   ` Hannes Reinecke
  2023-05-16 22:33 ` [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly Bart Van Assche
  2023-05-16 22:33 ` [PATCH v5 11/11] block: mq-deadline: Fix handling of at-head zoned writes Bart Van Assche
  10 siblings, 2 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei

Track the position (sector_t) of the most recently dispatched request
instead of tracking a pointer to the next request to dispatch. This
patch is the basis for patch "Handle requeued requests correctly".
Without this patch it would be significantly more complicated to make
sure that zoned writes are dispatched in LBA order per zone.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 45 +++++++++++++++++++++++++++++++--------------
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 06af9c28a3bf..6d0b99042c96 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -74,8 +74,8 @@ struct dd_per_prio {
 	struct list_head dispatch;
 	struct rb_root sort_list[DD_DIR_COUNT];
 	struct list_head fifo_list[DD_DIR_COUNT];
-	/* Next request in FIFO order. Read, write or both are NULL. */
-	struct request *next_rq[DD_DIR_COUNT];
+	/* Position of the most recently dispatched request. */
+	sector_t latest_pos[DD_DIR_COUNT];
 	struct io_stats_per_prio stats;
 };
 
@@ -156,6 +156,25 @@ deadline_latter_request(struct request *rq)
 	return NULL;
 }
 
+/* Return the first request for which blk_rq_pos() >= pos. */
+static inline struct request *deadline_from_pos(struct dd_per_prio *per_prio,
+				enum dd_data_dir data_dir, sector_t pos)
+{
+	struct rb_node *node = per_prio->sort_list[data_dir].rb_node;
+	struct request *rq, *res = NULL;
+
+	while (node) {
+		rq = rb_entry_rq(node);
+		if (blk_rq_pos(rq) >= pos) {
+			res = rq;
+			node = node->rb_left;
+		} else {
+			node = node->rb_right;
+		}
+	}
+	return res;
+}
+
 static void
 deadline_add_rq_rb(struct dd_per_prio *per_prio, struct request *rq)
 {
@@ -167,11 +186,6 @@ deadline_add_rq_rb(struct dd_per_prio *per_prio, struct request *rq)
 static inline void
 deadline_del_rq_rb(struct dd_per_prio *per_prio, struct request *rq)
 {
-	const enum dd_data_dir data_dir = rq_data_dir(rq);
-
-	if (per_prio->next_rq[data_dir] == rq)
-		per_prio->next_rq[data_dir] = deadline_latter_request(rq);
-
 	elv_rb_del(deadline_rb_root(per_prio, rq), rq);
 }
 
@@ -251,10 +265,6 @@ static void
 deadline_move_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 		      struct request *rq)
 {
-	const enum dd_data_dir data_dir = rq_data_dir(rq);
-
-	per_prio->next_rq[data_dir] = deadline_latter_request(rq);
-
 	/*
 	 * take it off the sort and fifo list
 	 */
@@ -363,7 +373,8 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	struct request *rq;
 	unsigned long flags;
 
-	rq = per_prio->next_rq[data_dir];
+	rq = deadline_from_pos(per_prio, data_dir,
+			       per_prio->latest_pos[data_dir]);
 	if (!rq)
 		return NULL;
 
@@ -426,6 +437,7 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 		if (started_after(dd, rq, latest_start))
 			return NULL;
 		list_del_init(&rq->queuelist);
+		data_dir = rq_data_dir(rq);
 		goto done;
 	}
 
@@ -433,9 +445,11 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	 * batches are currently reads XOR writes
 	 */
 	rq = deadline_next_request(dd, per_prio, dd->last_dir);
-	if (rq && dd->batching < dd->fifo_batch)
+	if (rq && dd->batching < dd->fifo_batch) {
 		/* we have a next request are still entitled to batch */
+		data_dir = rq_data_dir(rq);
 		goto dispatch_request;
+	}
 
 	/*
 	 * at this point we are not running a batch. select the appropriate
@@ -513,6 +527,7 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 done:
 	ioprio_class = dd_rq_ioclass(rq);
 	prio = ioprio_class_to_prio[ioprio_class];
+	dd->per_prio[prio].latest_pos[data_dir] = blk_rq_pos(rq);
 	dd->per_prio[prio].stats.dispatched++;
 	/*
 	 * If the request needs its target zone locked, do it.
@@ -1026,8 +1041,10 @@ static int deadline_##name##_next_rq_show(void *data,			\
 	struct request_queue *q = data;					\
 	struct deadline_data *dd = q->elevator->elevator_data;		\
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];		\
-	struct request *rq = per_prio->next_rq[data_dir];		\
+	struct request *rq;						\
 									\
+	rq = deadline_from_pos(per_prio, data_dir,			\
+			       per_prio->latest_pos[data_dir]);		\
 	if (rq)								\
 		__blk_mq_debugfs_rq_show(m, rq);			\
 	return 0;							\

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
                   ` (8 preceding siblings ...)
  2023-05-16 22:33 ` [PATCH v5 09/11] block: mq-deadline: Track the dispatch position Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-17  1:22   ` Damien Le Moal
  2023-05-17  7:46   ` Hannes Reinecke
  2023-05-16 22:33 ` [PATCH v5 11/11] block: mq-deadline: Fix handling of at-head zoned writes Bart Van Assche
  10 siblings, 2 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei

Start dispatching from the start of a zone instead of from the starting
position of the most recently dispatched request.

If a zoned write is requeued with an LBA that is lower than already
inserted zoned writes, make sure that it is submitted first.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 6d0b99042c96..059727fa4b98 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -156,13 +156,28 @@ deadline_latter_request(struct request *rq)
 	return NULL;
 }
 
-/* Return the first request for which blk_rq_pos() >= pos. */
+/*
+ * Return the first request for which blk_rq_pos() >= @pos. For zoned devices,
+ * return the first request after the highest zone start <= @pos.
+ */
 static inline struct request *deadline_from_pos(struct dd_per_prio *per_prio,
 				enum dd_data_dir data_dir, sector_t pos)
 {
 	struct rb_node *node = per_prio->sort_list[data_dir].rb_node;
 	struct request *rq, *res = NULL;
 
+	if (!node)
+		return NULL;
+
+	rq = rb_entry_rq(node);
+	/*
+	 * A zoned write may have been requeued with a starting position that
+	 * is below that of the most recently dispatched request. Hence, for
+	 * zoned writes, start searching from the start of a zone.
+	 */
+	if (blk_rq_is_seq_zoned_write(rq))
+		pos -= round_down(pos, rq->q->limits.chunk_sectors);
+
 	while (node) {
 		rq = rb_entry_rq(node);
 		if (blk_rq_pos(rq) >= pos) {
@@ -806,6 +821,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		list_add(&rq->queuelist, &per_prio->dispatch);
 		rq->fifo_time = jiffies;
 	} else {
+		struct list_head *insert_before;
+
 		deadline_add_rq_rb(per_prio, rq);
 
 		if (rq_mergeable(rq)) {
@@ -818,7 +835,20 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		 * set expire time and add to fifo list
 		 */
 		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
-		list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]);
+		insert_before = &per_prio->fifo_list[data_dir];
+#ifdef CONFIG_BLK_DEV_ZONED
+		/*
+		 * Insert zoned writes such that requests are sorted by
+		 * position per zone.
+		 */
+		if (blk_rq_is_seq_zoned_write(rq)) {
+			struct request *rq2 = deadline_latter_request(rq);
+
+			if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
+				insert_before = &rq2->queuelist;
+		}
+#endif
+		list_add_tail(&rq->queuelist, insert_before);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 11/11] block: mq-deadline: Fix handling of at-head zoned writes
  2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
                   ` (9 preceding siblings ...)
  2023-05-16 22:33 ` [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly Bart Van Assche
@ 2023-05-16 22:33 ` Bart Van Assche
  2023-05-17  1:24   ` Damien Le Moal
  2023-05-17  7:47   ` Hannes Reinecke
  10 siblings, 2 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-16 22:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei

Before dispatching a zoned write from the FIFO list, check whether there
are any zoned writes in the RB-tree with a lower LBA for the same zone.
This patch ensures that zoned writes happen in order even if at_head is
set for some writes for a zone and not for others.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 059727fa4b98..67989f8d29a5 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -346,7 +346,7 @@ static struct request *
 deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 		      enum dd_data_dir data_dir)
 {
-	struct request *rq;
+	struct request *rq, *rb_rq, *next;
 	unsigned long flags;
 
 	if (list_empty(&per_prio->fifo_list[data_dir]))
@@ -364,7 +364,12 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	 * zones and these zones are unlocked.
 	 */
 	spin_lock_irqsave(&dd->zone_lock, flags);
-	list_for_each_entry(rq, &per_prio->fifo_list[DD_WRITE], queuelist) {
+	list_for_each_entry_safe(rq, next, &per_prio->fifo_list[DD_WRITE],
+				 queuelist) {
+		/* Check whether a prior request exists for the same zone. */
+		rb_rq = deadline_from_pos(per_prio, data_dir, blk_rq_pos(rq));
+		if (rb_rq && blk_rq_pos(rb_rq) < blk_rq_pos(rq))
+			rq = rb_rq;
 		if (blk_req_can_dispatch_to_zone(rq) &&
 		    (blk_queue_nonrot(rq->q) ||
 		     !deadline_is_seq_write(dd, rq)))

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock()
  2023-05-16 22:33 ` [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock() Bart Van Assche
@ 2023-05-16 23:23   ` Damien Le Moal
  2023-05-17  7:36   ` Hannes Reinecke
  2023-05-17 10:00   ` Johannes Thumshirn
  2 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-16 23:23 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 07:33, Bart Van Assche wrote:
> Remove the blk_rq_is_passthrough() check because it is redundant:
> blk_req_needs_zone_write_lock() also calls bdev_op_is_zoned_write()
> and the latter function returns false for pass-through requests.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 02/11] block: Fix the type of the second bdev_op_is_zoned_write() argument
  2023-05-16 22:33 ` [PATCH v5 02/11] block: Fix the type of the second bdev_op_is_zoned_write() argument Bart Van Assche
@ 2023-05-16 23:26   ` Damien Le Moal
  2023-05-17  7:37   ` Hannes Reinecke
  1 sibling, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-16 23:26 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Johannes Thumshirn,
	Pankaj Raghav, Ming Lei

On 5/17/23 07:33, Bart Van Assche wrote:
> Change the type of the second argument of bdev_op_is_zoned_write() from
> blk_opf_t into enum req_op because this function expects an operation
> without flags as second argument.
> 
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Fixes: 8cafdb5ab94c ("block: adapt blk_mq_plug() to not plug for writes that require a zone lock")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/11] block: Introduce op_is_zoned_write()
  2023-05-16 22:33 ` [PATCH v5 03/11] block: Introduce op_is_zoned_write() Bart Van Assche
@ 2023-05-16 23:30   ` Damien Le Moal
  2023-05-17  0:00     ` Bart Van Assche
  2023-05-17  7:37   ` Hannes Reinecke
  1 sibling, 1 reply; 50+ messages in thread
From: Damien Le Moal @ 2023-05-16 23:30 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 07:33, Bart Van Assche wrote:
> Introduce a helper function for checking whether write serialization is
> required if the operation will be sent to a zoned device. A second caller
> for op_is_zoned_write() will be introduced in the next patch in this
> series.
> 
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  include/linux/blkdev.h | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index db24cf98ccfb..a4f85781060c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1281,13 +1281,16 @@ static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
>  	return disk_zone_no(bdev->bd_disk, sec);
>  }
>  
> +/* Whether write serialization is required for @op on zoned devices. */
> +static inline bool op_is_zoned_write(enum req_op op)

I do not like the name of this function as it has nothing to do with zoned
devices. What about something like op_is_data_write() to clarify what is being
tested ?

> +{
> +	return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES;
> +}
> +
>  static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
>  					  enum req_op op)
>  {
> -	if (!bdev_is_zoned(bdev))
> -		return false;
> -
> -	return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES;
> +	return bdev_is_zoned(bdev) && op_is_zoned_write(op);
>  }

Or if you really want to rewrite this, may be something like:

static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
  					  enum req_op op)
{
	return bdev_is_zoned(bdev) &&
	       (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES);
}

which is very easy to understand.


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/11] block: Introduce op_is_zoned_write()
  2023-05-16 23:30   ` Damien Le Moal
@ 2023-05-17  0:00     ` Bart Van Assche
  2023-05-17  6:45       ` Christoph Hellwig
  0 siblings, 1 reply; 50+ messages in thread
From: Bart Van Assche @ 2023-05-17  0:00 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/16/23 16:30, Damien Le Moal wrote:
> Or if you really want to rewrite this, may be something like:
> 
> static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
>    					  enum req_op op)
> {
> 	return bdev_is_zoned(bdev) &&
> 	       (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES);
> }
> 
> which is very easy to understand.

The op_is_zoned_write() function was introduced to use it in patch 4/11 
of this series. Anyway, I will look into open-coding it.

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/11] block: Introduce blk_rq_is_seq_zoned_write()
  2023-05-16 22:33 ` [PATCH v5 04/11] block: Introduce blk_rq_is_seq_zoned_write() Bart Van Assche
@ 2023-05-17  0:01   ` Damien Le Moal
  2023-05-17  7:38   ` Hannes Reinecke
  2023-05-17 10:02   ` Johannes Thumshirn
  2 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17  0:01 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 07:33, Bart Van Assche wrote:
> Introduce the function blk_rq_is_seq_zoned_write(). This function will
> be used in later patches to preserve the order of zoned writes that
> require write serialization.
> 
> This patch includes an optimization: instead of using
> rq->q->disk->part0->bd_queue to check whether or not the queue is
> associated with a zoned block device, use rq->q->disk->queue.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/11] block: mq-deadline: Clean up deadline_check_fifo()
  2023-05-16 22:33 ` [PATCH v5 05/11] block: mq-deadline: Clean up deadline_check_fifo() Bart Van Assche
@ 2023-05-17  1:02   ` Damien Le Moal
  2023-05-17 15:01     ` Bart Van Assche
  2023-05-17  7:39   ` Hannes Reinecke
  1 sibling, 1 reply; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17  1:02 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 07:33, Bart Van Assche wrote:
> Change the return type of deadline_check_fifo() from 'int' into 'bool'.
> Use time_is_before_eq_jiffies() instead of time_after_eq(). No
> functionality has been changed.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c | 16 +++++-----------
>  1 file changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 5839a027e0f0..a016cafa54b3 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -272,21 +272,15 @@ static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
>  }
>  
>  /*
> - * deadline_check_fifo returns 0 if there are no expired requests on the fifo,
> - * 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
> + * deadline_check_fifo returns true if and only if there are expired requests
> + * in the FIFO list. Requires !list_empty(&dd->fifo_list[data_dir]).
>   */
> -static inline int deadline_check_fifo(struct dd_per_prio *per_prio,
> -				      enum dd_data_dir data_dir)
> +static inline bool deadline_check_fifo(struct dd_per_prio *per_prio,
> +				       enum dd_data_dir data_dir)
>  {
>  	struct request *rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
>  
> -	/*
> -	 * rq is expired!
> -	 */
> -	if (time_after_eq(jiffies, (unsigned long)rq->fifo_time))
> -		return 1;
> -
> -	return 0;
> +	return time_is_before_eq_jiffies((unsigned long)rq->fifo_time);

This looks wrong: isn't this reversing the return value ?
Shouldn't this be:

	return time_after_eq(jiffies, (unsigned long)rq->fifo_time));

To return true if the first request in fifo list *expired* as indicated by the
function kdoc comment.

>  }
>  
>  /*

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes()
  2023-05-16 22:33 ` [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes() Bart Van Assche
@ 2023-05-17  1:06   ` Damien Le Moal
  2023-05-17 16:30     ` Bart Van Assche
  2023-05-17  7:41   ` Hannes Reinecke
  1 sibling, 1 reply; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17  1:06 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 07:33, Bart Van Assche wrote:
> Make deadline_skip_seq_writes() do what its name suggests, namely to
> skip sequential writes.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 6276afede9cd..dbc0feca963e 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -308,7 +308,7 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
>  	do {
>  		pos += blk_rq_sectors(rq);
>  		rq = deadline_latter_request(rq);
> -	} while (rq && blk_rq_pos(rq) == pos);
> +	} while (rq && blk_rq_pos(rq) == pos && blk_rq_is_seq_zoned_write(rq));

No ! The "seq write" skip here is to skip writes that are contiguous/sequential
to ensure that we keep issuing contiguous/sequential writes belonging to
different zones, regardless of the target zone type.

So drop this change please.

>  
>  	return rq;
>  }

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 08/11] block: mq-deadline: Reduce lock contention
  2023-05-16 22:33 ` [PATCH v5 08/11] block: mq-deadline: Reduce lock contention Bart Van Assche
@ 2023-05-17  1:07   ` Damien Le Moal
  2023-05-17  6:46   ` Christoph Hellwig
  2023-05-17  7:42   ` Hannes Reinecke
  2 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17  1:07 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 07:33, Bart Van Assche wrote:
> blk_mq_free_requests() calls dd_finish_request() indirectly. Prevent
> nested locking of dd->lock and dd->zone_lock by moving the code for
> freeing requests.
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 09/11] block: mq-deadline: Track the dispatch position
  2023-05-16 22:33 ` [PATCH v5 09/11] block: mq-deadline: Track the dispatch position Bart Van Assche
@ 2023-05-17  1:13   ` Damien Le Moal
  2023-05-17  7:45   ` Hannes Reinecke
  1 sibling, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17  1:13 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 07:33, Bart Van Assche wrote:
> Track the position (sector_t) of the most recently dispatched request
> instead of tracking a pointer to the next request to dispatch. This
> patch is the basis for patch "Handle requeued requests correctly".
> Without this patch it would be significantly more complicated to make
> sure that zoned writes are dispatched in LBA order per zone.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Looks OK. One nit below.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> @@ -433,9 +445,11 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
>  	 * batches are currently reads XOR writes
>  	 */
>  	rq = deadline_next_request(dd, per_prio, dd->last_dir);
> -	if (rq && dd->batching < dd->fifo_batch)
> +	if (rq && dd->batching < dd->fifo_batch) {
>  		/* we have a next request are still entitled to batch */

Nit: while at it, please fix this comment:

...next request are still... -> ...next request and are still...

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly
  2023-05-16 22:33 ` [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly Bart Van Assche
@ 2023-05-17  1:22   ` Damien Le Moal
  2023-05-17 16:28     ` Bart Van Assche
  2023-05-17  7:46   ` Hannes Reinecke
  1 sibling, 1 reply; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17  1:22 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 07:33, Bart Van Assche wrote:
> Start dispatching from the start of a zone instead of from the starting
> position of the most recently dispatched request.
> 
> If a zoned write is requeued with an LBA that is lower than already
> inserted zoned writes, make sure that it is submitted first.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c | 34 ++++++++++++++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 6d0b99042c96..059727fa4b98 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -156,13 +156,28 @@ deadline_latter_request(struct request *rq)
>  	return NULL;
>  }
>  
> -/* Return the first request for which blk_rq_pos() >= pos. */
> +/*
> + * Return the first request for which blk_rq_pos() >= @pos. For zoned devices,
> + * return the first request after the highest zone start <= @pos.

This comment is strange. What about:

For zoned devices, return the first request after the start of the zone
containing @pos.

> + */
>  static inline struct request *deadline_from_pos(struct dd_per_prio *per_prio,
>  				enum dd_data_dir data_dir, sector_t pos)
>  {
>  	struct rb_node *node = per_prio->sort_list[data_dir].rb_node;
>  	struct request *rq, *res = NULL;
>  
> +	if (!node)
> +		return NULL;
> +
> +	rq = rb_entry_rq(node);
> +	/*
> +	 * A zoned write may have been requeued with a starting position that
> +	 * is below that of the most recently dispatched request. Hence, for
> +	 * zoned writes, start searching from the start of a zone.
> +	 */
> +	if (blk_rq_is_seq_zoned_write(rq))
> +		pos -= round_down(pos, rq->q->limits.chunk_sectors);
> +
>  	while (node) {
>  		rq = rb_entry_rq(node);
>  		if (blk_rq_pos(rq) >= pos) {
> @@ -806,6 +821,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  		list_add(&rq->queuelist, &per_prio->dispatch);
>  		rq->fifo_time = jiffies;
>  	} else {
> +		struct list_head *insert_before;
> +
>  		deadline_add_rq_rb(per_prio, rq);
>  
>  		if (rq_mergeable(rq)) {
> @@ -818,7 +835,20 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  		 * set expire time and add to fifo list
>  		 */
>  		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> -		list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]);
> +		insert_before = &per_prio->fifo_list[data_dir];
> +#ifdef CONFIG_BLK_DEV_ZONED
> +		/*
> +		 * Insert zoned writes such that requests are sorted by
> +		 * position per zone.
> +		 */
> +		if (blk_rq_is_seq_zoned_write(rq)) {
> +			struct request *rq2 = deadline_latter_request(rq);
> +
> +			if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
> +				insert_before = &rq2->queuelist;
> +		}
> +#endif
> +		list_add_tail(&rq->queuelist, insert_before);

Why not:

		insert_before = &per_prio->fifo_list[data_dir];
		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) 	
		    && blk_rq_is_seq_zoned_write(rq)) {
			/*
			 * Insert zoned writes such that requests are sorted by
			 * position per zone.
		 	*/
			struct request *rq2 = deadline_latter_request(rq);

			if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
				insert_before = &rq2->queuelist;
		}
		list_add_tail(&rq->queuelist, insert_before);

to avoid the ugly #ifdef ?


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 11/11] block: mq-deadline: Fix handling of at-head zoned writes
  2023-05-16 22:33 ` [PATCH v5 11/11] block: mq-deadline: Fix handling of at-head zoned writes Bart Van Assche
@ 2023-05-17  1:24   ` Damien Le Moal
  2023-05-17  7:47   ` Hannes Reinecke
  1 sibling, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17  1:24 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 07:33, Bart Van Assche wrote:
> Before dispatching a zoned write from the FIFO list, check whether there
> are any zoned writes in the RB-tree with a lower LBA for the same zone.
> This patch ensures that zoned writes happen in order even if at_head is
> set for some writes for a zone and not for others.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/11] block: Introduce op_is_zoned_write()
  2023-05-17  0:00     ` Bart Van Assche
@ 2023-05-17  6:45       ` Christoph Hellwig
  2023-05-17  6:47         ` Damien Le Moal
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Hellwig @ 2023-05-17  6:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Damien Le Moal, Jens Axboe, linux-block, Jaegeuk Kim,
	Christoph Hellwig, Ming Lei

On Tue, May 16, 2023 at 05:00:29PM -0700, Bart Van Assche wrote:
> On 5/16/23 16:30, Damien Le Moal wrote:
>> Or if you really want to rewrite this, may be something like:
>>
>> static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
>>    					  enum req_op op)
>> {
>> 	return bdev_is_zoned(bdev) &&
>> 	       (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES);
>> }
>>
>> which is very easy to understand.
>
> The op_is_zoned_write() function was introduced to use it in patch 4/11 of 
> this series. Anyway, I will look into open-coding it.

I think the idea here is that we're testing for an operation that needs
zone locking.  Maybe that needs to be reflected in the name?
op_needs_zone_write_locking() ?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 08/11] block: mq-deadline: Reduce lock contention
  2023-05-16 22:33 ` [PATCH v5 08/11] block: mq-deadline: Reduce lock contention Bart Van Assche
  2023-05-17  1:07   ` Damien Le Moal
@ 2023-05-17  6:46   ` Christoph Hellwig
  2023-05-17  7:42   ` Hannes Reinecke
  2 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2023-05-17  6:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Christoph Hellwig,
	Damien Le Moal, Ming Lei

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/11] block: Introduce op_is_zoned_write()
  2023-05-17  6:45       ` Christoph Hellwig
@ 2023-05-17  6:47         ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17  6:47 UTC (permalink / raw)
  To: Christoph Hellwig, Bart Van Assche
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei

On 2023/05/17 15:45, Christoph Hellwig wrote:
> On Tue, May 16, 2023 at 05:00:29PM -0700, Bart Van Assche wrote:
>> On 5/16/23 16:30, Damien Le Moal wrote:
>>> Or if you really want to rewrite this, may be something like:
>>>
>>> static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
>>>    					  enum req_op op)
>>> {
>>> 	return bdev_is_zoned(bdev) &&
>>> 	       (op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES);
>>> }
>>>
>>> which is very easy to understand.
>>
>> The op_is_zoned_write() function was introduced to use it in patch 4/11 of 
>> this series. Anyway, I will look into open-coding it.
> 
> I think the idea here is that we're testing for an operation that needs
> zone locking.  Maybe that needs to be reflected in the name?
> op_needs_zone_write_locking() ?

Sounds better !

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock()
  2023-05-16 22:33 ` [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock() Bart Van Assche
  2023-05-16 23:23   ` Damien Le Moal
@ 2023-05-17  7:36   ` Hannes Reinecke
  2023-05-17 10:00   ` Johannes Thumshirn
  2 siblings, 0 replies; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:36 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> Remove the blk_rq_is_passthrough() check because it is redundant:
> blk_req_needs_zone_write_lock() also calls bdev_op_is_zoned_write()
> and the latter function returns false for pass-through requests.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/blk-zoned.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index fce9082384d6..835d9e937d4d 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -57,9 +57,6 @@ EXPORT_SYMBOL_GPL(blk_zone_cond_str);
>    */
>   bool blk_req_needs_zone_write_lock(struct request *rq)
>   {
> -	if (blk_rq_is_passthrough(rq))
> -		return false;
> -
>   	if (!rq->q->disk->seq_zones_wlock)
>   		return false;
>   
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 02/11] block: Fix the type of the second bdev_op_is_zoned_write() argument
  2023-05-16 22:33 ` [PATCH v5 02/11] block: Fix the type of the second bdev_op_is_zoned_write() argument Bart Van Assche
  2023-05-16 23:26   ` Damien Le Moal
@ 2023-05-17  7:37   ` Hannes Reinecke
  1 sibling, 0 replies; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:37 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Johannes Thumshirn,
	Pankaj Raghav, Damien Le Moal, Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> Change the type of the second argument of bdev_op_is_zoned_write() from
> blk_opf_t into enum req_op because this function expects an operation
> without flags as second argument.
> 
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Fixes: 8cafdb5ab94c ("block: adapt blk_mq_plug() to not plug for writes that require a zone lock")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   include/linux/blkdev.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index b441e633f4dd..db24cf98ccfb 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1282,7 +1282,7 @@ static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
>   }
>   
>   static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
> -					  blk_opf_t op)
> +					  enum req_op op)
>   {
>   	if (!bdev_is_zoned(bdev))
>   		return false;

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/11] block: Introduce op_is_zoned_write()
  2023-05-16 22:33 ` [PATCH v5 03/11] block: Introduce op_is_zoned_write() Bart Van Assche
  2023-05-16 23:30   ` Damien Le Moal
@ 2023-05-17  7:37   ` Hannes Reinecke
  1 sibling, 0 replies; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:37 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> Introduce a helper function for checking whether write serialization is
> required if the operation will be sent to a zoned device. A second caller
> for op_is_zoned_write() will be introduced in the next patch in this
> series.
> 
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   include/linux/blkdev.h | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index db24cf98ccfb..a4f85781060c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1281,13 +1281,16 @@ static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
>   	return disk_zone_no(bdev->bd_disk, sec);
>   }
>   
> +/* Whether write serialization is required for @op on zoned devices. */
> +static inline bool op_is_zoned_write(enum req_op op)
> +{
> +	return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES;
> +}
> +
>   static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
>   					  enum req_op op)
>   {
> -	if (!bdev_is_zoned(bdev))
> -		return false;
> -
> -	return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES;
> +	return bdev_is_zoned(bdev) && op_is_zoned_write(op);
>   }
>   
>   static inline sector_t bdev_zone_sectors(struct block_device *bdev)

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/11] block: Introduce blk_rq_is_seq_zoned_write()
  2023-05-16 22:33 ` [PATCH v5 04/11] block: Introduce blk_rq_is_seq_zoned_write() Bart Van Assche
  2023-05-17  0:01   ` Damien Le Moal
@ 2023-05-17  7:38   ` Hannes Reinecke
  2023-05-17 10:02   ` Johannes Thumshirn
  2 siblings, 0 replies; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:38 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> Introduce the function blk_rq_is_seq_zoned_write(). This function will
> be used in later patches to preserve the order of zoned writes that
> require write serialization.
> 
> This patch includes an optimization: instead of using
> rq->q->disk->part0->bd_queue to check whether or not the queue is
> associated with a zoned block device, use rq->q->disk->queue.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/blk-zoned.c      |  5 +----
>   include/linux/blk-mq.h | 16 ++++++++++++++++
>   2 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 835d9e937d4d..096b6b47561f 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -60,10 +60,7 @@ bool blk_req_needs_zone_write_lock(struct request *rq)
>   	if (!rq->q->disk->seq_zones_wlock)
>   		return false;
>   
> -	if (bdev_op_is_zoned_write(rq->q->disk->part0, req_op(rq)))
> -		return blk_rq_zone_is_seq(rq);
> -
> -	return false;
> +	return blk_rq_is_seq_zoned_write(rq);
>   }
>   EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock);
>   
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 06caacd77ed6..301d72f85486 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -1164,6 +1164,17 @@ static inline unsigned int blk_rq_zone_is_seq(struct request *rq)
>   	return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq));
>   }
>   
> +/**
> + * blk_rq_is_seq_zoned_write() - Check if @rq requires write serialization.
> + * @rq: Request to examine.
> + *
> + * Note: REQ_OP_ZONE_APPEND requests do not require serialization.
> + */
> +static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
> +{
> +	return op_is_zoned_write(req_op(rq)) && blk_rq_zone_is_seq(rq);
> +}
> +
>   bool blk_req_needs_zone_write_lock(struct request *rq);
>   bool blk_req_zone_write_trylock(struct request *rq);
>   void __blk_req_zone_write_lock(struct request *rq);
> @@ -1194,6 +1205,11 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
>   	return !blk_req_zone_is_write_locked(rq);
>   }
>   #else /* CONFIG_BLK_DEV_ZONED */
> +static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
> +{
> +	return false;
> +}
> +
>   static inline bool blk_req_needs_zone_write_lock(struct request *rq)
>   {
>   	return false;
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/11] block: mq-deadline: Clean up deadline_check_fifo()
  2023-05-16 22:33 ` [PATCH v5 05/11] block: mq-deadline: Clean up deadline_check_fifo() Bart Van Assche
  2023-05-17  1:02   ` Damien Le Moal
@ 2023-05-17  7:39   ` Hannes Reinecke
  1 sibling, 0 replies; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:39 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> Change the return type of deadline_check_fifo() from 'int' into 'bool'.
> Use time_is_before_eq_jiffies() instead of time_after_eq(). No
> functionality has been changed.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/mq-deadline.c | 16 +++++-----------
>   1 file changed, 5 insertions(+), 11 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/11] block: mq-deadline: Simplify deadline_skip_seq_writes()
  2023-05-16 22:33 ` [PATCH v5 06/11] block: mq-deadline: Simplify deadline_skip_seq_writes() Bart Van Assche
@ 2023-05-17  7:40   ` Hannes Reinecke
  0 siblings, 0 replies; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:40 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> Make the deadline_skip_seq_writes() code shorter without changing its
> functionality.
> 
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/mq-deadline.c | 9 +++------
>   1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index a016cafa54b3..6276afede9cd 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -304,14 +304,11 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
>   						struct request *rq)
>   {
>   	sector_t pos = blk_rq_pos(rq);
> -	sector_t skipped_sectors = 0;
>   
> -	while (rq) {
> -		if (blk_rq_pos(rq) != pos + skipped_sectors)
> -			break;
> -		skipped_sectors += blk_rq_sectors(rq);
> +	do {
> +		pos += blk_rq_sectors(rq);
>   		rq = deadline_latter_request(rq);
> -	}
> +	} while (rq && blk_rq_pos(rq) == pos);
>   
>   	return rq;
>   }

Description could be improved (you're not changing the functionality, 
you are changing the implementation), but anyway:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes()
  2023-05-16 22:33 ` [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes() Bart Van Assche
  2023-05-17  1:06   ` Damien Le Moal
@ 2023-05-17  7:41   ` Hannes Reinecke
  2023-05-17  7:55     ` Damien Le Moal
  1 sibling, 1 reply; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:41 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> Make deadline_skip_seq_writes() do what its name suggests, namely to
> skip sequential writes.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/mq-deadline.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 6276afede9cd..dbc0feca963e 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -308,7 +308,7 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
>   	do {
>   		pos += blk_rq_sectors(rq);
>   		rq = deadline_latter_request(rq);
> -	} while (rq && blk_rq_pos(rq) == pos);
> +	} while (rq && blk_rq_pos(rq) == pos && blk_rq_is_seq_zoned_write(rq));
>   
>   	return rq;
>   }

Please merge it with the previous patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 08/11] block: mq-deadline: Reduce lock contention
  2023-05-16 22:33 ` [PATCH v5 08/11] block: mq-deadline: Reduce lock contention Bart Van Assche
  2023-05-17  1:07   ` Damien Le Moal
  2023-05-17  6:46   ` Christoph Hellwig
@ 2023-05-17  7:42   ` Hannes Reinecke
  2 siblings, 0 replies; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:42 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> blk_mq_free_requests() calls dd_finish_request() indirectly. Prevent
> nested locking of dd->lock and dd->zone_lock by moving the code for
> freeing requests.
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/mq-deadline.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 09/11] block: mq-deadline: Track the dispatch position
  2023-05-16 22:33 ` [PATCH v5 09/11] block: mq-deadline: Track the dispatch position Bart Van Assche
  2023-05-17  1:13   ` Damien Le Moal
@ 2023-05-17  7:45   ` Hannes Reinecke
  1 sibling, 0 replies; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:45 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> Track the position (sector_t) of the most recently dispatched request
> instead of tracking a pointer to the next request to dispatch. This
> patch is the basis for patch "Handle requeued requests correctly".
> Without this patch it would be significantly more complicated to make
> sure that zoned writes are dispatched in LBA order per zone.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/mq-deadline.c | 45 +++++++++++++++++++++++++++++++--------------
>   1 file changed, 31 insertions(+), 14 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 06af9c28a3bf..6d0b99042c96 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -74,8 +74,8 @@ struct dd_per_prio {
>   	struct list_head dispatch;
>   	struct rb_root sort_list[DD_DIR_COUNT];
>   	struct list_head fifo_list[DD_DIR_COUNT];
> -	/* Next request in FIFO order. Read, write or both are NULL. */
> -	struct request *next_rq[DD_DIR_COUNT];
> +	/* Position of the most recently dispatched request. */
> +	sector_t latest_pos[DD_DIR_COUNT];
>   	struct io_stats_per_prio stats;
>   };
>   
> @@ -156,6 +156,25 @@ deadline_latter_request(struct request *rq)
>   	return NULL;
>   }
>   
> +/* Return the first request for which blk_rq_pos() >= pos. */
> +static inline struct request *deadline_from_pos(struct dd_per_prio *per_prio,
> +				enum dd_data_dir data_dir, sector_t pos)
> +{
> +	struct rb_node *node = per_prio->sort_list[data_dir].rb_node;
> +	struct request *rq, *res = NULL;
> +
> +	while (node) {
> +		rq = rb_entry_rq(node);
> +		if (blk_rq_pos(rq) >= pos) {
> +			res = rq;
> +			node = node->rb_left;
> +		} else {
> +			node = node->rb_right;
> +		}
> +	}
> +	return res;
> +}
> +
This is getting ever more awkward; you have to traverse the entire tree 
to get the result. Doesn't that impact performance?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly
  2023-05-16 22:33 ` [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly Bart Van Assche
  2023-05-17  1:22   ` Damien Le Moal
@ 2023-05-17  7:46   ` Hannes Reinecke
  1 sibling, 0 replies; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:46 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> Start dispatching from the start of a zone instead of from the starting
> position of the most recently dispatched request.
> 
> If a zoned write is requeued with an LBA that is lower than already
> inserted zoned writes, make sure that it is submitted first.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/mq-deadline.c | 34 ++++++++++++++++++++++++++++++++--
>   1 file changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 6d0b99042c96..059727fa4b98 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -156,13 +156,28 @@ deadline_latter_request(struct request *rq)
>   	return NULL;
>   }
>   
> -/* Return the first request for which blk_rq_pos() >= pos. */
> +/*
> + * Return the first request for which blk_rq_pos() >= @pos. For zoned devices,
> + * return the first request after the highest zone start <= @pos.
> + */
>   static inline struct request *deadline_from_pos(struct dd_per_prio *per_prio,
>   				enum dd_data_dir data_dir, sector_t pos)
>   {
>   	struct rb_node *node = per_prio->sort_list[data_dir].rb_node;
>   	struct request *rq, *res = NULL;
>   
> +	if (!node)
> +		return NULL;
> +
> +	rq = rb_entry_rq(node);
> +	/*
> +	 * A zoned write may have been requeued with a starting position that
> +	 * is below that of the most recently dispatched request. Hence, for
> +	 * zoned writes, start searching from the start of a zone.
> +	 */
> +	if (blk_rq_is_seq_zoned_write(rq))
> +		pos -= round_down(pos, rq->q->limits.chunk_sectors);
> +
>   	while (node) {
>   		rq = rb_entry_rq(node);
>   		if (blk_rq_pos(rq) >= pos) {
> @@ -806,6 +821,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>   		list_add(&rq->queuelist, &per_prio->dispatch);
>   		rq->fifo_time = jiffies;
>   	} else {
> +		struct list_head *insert_before;
> +
>   		deadline_add_rq_rb(per_prio, rq);
>   
>   		if (rq_mergeable(rq)) {
> @@ -818,7 +835,20 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>   		 * set expire time and add to fifo list
>   		 */
>   		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> -		list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]);
> +		insert_before = &per_prio->fifo_list[data_dir];
> +#ifdef CONFIG_BLK_DEV_ZONED
> +		/*
> +		 * Insert zoned writes such that requests are sorted by
> +		 * position per zone.
> +		 */
> +		if (blk_rq_is_seq_zoned_write(rq)) {
> +			struct request *rq2 = deadline_latter_request(rq);
> +
> +			if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
> +				insert_before = &rq2->queuelist;
> +		}
> +#endif
> +		list_add_tail(&rq->queuelist, insert_before);
>   	}
>   }
>   
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 11/11] block: mq-deadline: Fix handling of at-head zoned writes
  2023-05-16 22:33 ` [PATCH v5 11/11] block: mq-deadline: Fix handling of at-head zoned writes Bart Van Assche
  2023-05-17  1:24   ` Damien Le Moal
@ 2023-05-17  7:47   ` Hannes Reinecke
  2023-05-17  7:53     ` Damien Le Moal
  2023-05-17 17:13     ` Bart Van Assche
  1 sibling, 2 replies; 50+ messages in thread
From: Hannes Reinecke @ 2023-05-17  7:47 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:33, Bart Van Assche wrote:
> Before dispatching a zoned write from the FIFO list, check whether there
> are any zoned writes in the RB-tree with a lower LBA for the same zone.
> This patch ensures that zoned writes happen in order even if at_head is
> set for some writes for a zone and not for others.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/mq-deadline.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 059727fa4b98..67989f8d29a5 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -346,7 +346,7 @@ static struct request *
>   deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>   		      enum dd_data_dir data_dir)
>   {
> -	struct request *rq;
> +	struct request *rq, *rb_rq, *next;
>   	unsigned long flags;
>   
>   	if (list_empty(&per_prio->fifo_list[data_dir]))
> @@ -364,7 +364,12 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>   	 * zones and these zones are unlocked.
>   	 */
>   	spin_lock_irqsave(&dd->zone_lock, flags);
> -	list_for_each_entry(rq, &per_prio->fifo_list[DD_WRITE], queuelist) {
> +	list_for_each_entry_safe(rq, next, &per_prio->fifo_list[DD_WRITE],
> +				 queuelist) {
> +		/* Check whether a prior request exists for the same zone. */
> +		rb_rq = deadline_from_pos(per_prio, data_dir, blk_rq_pos(rq));
> +		if (rb_rq && blk_rq_pos(rb_rq) < blk_rq_pos(rq))
> +			rq = rb_rq;
>   		if (blk_req_can_dispatch_to_zone(rq) &&
>   		    (blk_queue_nonrot(rq->q) ||
>   		     !deadline_is_seq_write(dd, rq)))

Similar concern here; we'll have to traverse the entire tree here.
But if that's of no concern...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 11/11] block: mq-deadline: Fix handling of at-head zoned writes
  2023-05-17  7:47   ` Hannes Reinecke
@ 2023-05-17  7:53     ` Damien Le Moal
  2023-05-17 17:13     ` Bart Van Assche
  1 sibling, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17  7:53 UTC (permalink / raw)
  To: Hannes Reinecke, Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 16:47, Hannes Reinecke wrote:
> On 5/17/23 00:33, Bart Van Assche wrote:
>> Before dispatching a zoned write from the FIFO list, check whether there
>> are any zoned writes in the RB-tree with a lower LBA for the same zone.
>> This patch ensures that zoned writes happen in order even if at_head is
>> set for some writes for a zone and not for others.
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>   block/mq-deadline.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>> index 059727fa4b98..67989f8d29a5 100644
>> --- a/block/mq-deadline.c
>> +++ b/block/mq-deadline.c
>> @@ -346,7 +346,7 @@ static struct request *
>>   deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>>   		      enum dd_data_dir data_dir)
>>   {
>> -	struct request *rq;
>> +	struct request *rq, *rb_rq, *next;
>>   	unsigned long flags;
>>   
>>   	if (list_empty(&per_prio->fifo_list[data_dir]))
>> @@ -364,7 +364,12 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>>   	 * zones and these zones are unlocked.
>>   	 */
>>   	spin_lock_irqsave(&dd->zone_lock, flags);
>> -	list_for_each_entry(rq, &per_prio->fifo_list[DD_WRITE], queuelist) {
>> +	list_for_each_entry_safe(rq, next, &per_prio->fifo_list[DD_WRITE],
>> +				 queuelist) {
>> +		/* Check whether a prior request exists for the same zone. */
>> +		rb_rq = deadline_from_pos(per_prio, data_dir, blk_rq_pos(rq));
>> +		if (rb_rq && blk_rq_pos(rb_rq) < blk_rq_pos(rq))
>> +			rq = rb_rq;
>>   		if (blk_req_can_dispatch_to_zone(rq) &&
>>   		    (blk_queue_nonrot(rq->q) ||
>>   		     !deadline_is_seq_write(dd, rq)))
> 
> Similar concern here; we'll have to traverse the entire tree here.
> But if that's of no concern...

Should be fine for HDDs. Not so sure about much faster UFS devices.
And for NVMe ZNS, using a scheduler in itself already halve the max perf you can
get...

> 
> Cheers,
> 
> Hannes

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes()
  2023-05-17  7:41   ` Hannes Reinecke
@ 2023-05-17  7:55     ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17  7:55 UTC (permalink / raw)
  To: Hannes Reinecke, Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 16:41, Hannes Reinecke wrote:
> On 5/17/23 00:33, Bart Van Assche wrote:
>> Make deadline_skip_seq_writes() do what its name suggests, namely to
>> skip sequential writes.
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>   block/mq-deadline.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>> index 6276afede9cd..dbc0feca963e 100644
>> --- a/block/mq-deadline.c
>> +++ b/block/mq-deadline.c
>> @@ -308,7 +308,7 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
>>   	do {
>>   		pos += blk_rq_sectors(rq);
>>   		rq = deadline_latter_request(rq);
>> -	} while (rq && blk_rq_pos(rq) == pos);
>> +	} while (rq && blk_rq_pos(rq) == pos && blk_rq_is_seq_zoned_write(rq));
>>   
>>   	return rq;
>>   }
> 
> Please merge it with the previous patch.

Please no. Let's drop this change.

> 
> Cheers,
> 
> Hannes

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock()
  2023-05-16 22:33 ` [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock() Bart Van Assche
  2023-05-16 23:23   ` Damien Le Moal
  2023-05-17  7:36   ` Hannes Reinecke
@ 2023-05-17 10:00   ` Johannes Thumshirn
  2 siblings, 0 replies; 50+ messages in thread
From: Johannes Thumshirn @ 2023-05-17 10:00 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block@vger.kernel.org, Jaegeuk Kim, Christoph Hellwig,
	Damien Le Moal, Ming Lei

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/11] block: Introduce blk_rq_is_seq_zoned_write()
  2023-05-16 22:33 ` [PATCH v5 04/11] block: Introduce blk_rq_is_seq_zoned_write() Bart Van Assche
  2023-05-17  0:01   ` Damien Le Moal
  2023-05-17  7:38   ` Hannes Reinecke
@ 2023-05-17 10:02   ` Johannes Thumshirn
  2 siblings, 0 replies; 50+ messages in thread
From: Johannes Thumshirn @ 2023-05-17 10:02 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block@vger.kernel.org, Jaegeuk Kim, Christoph Hellwig,
	Damien Le Moal, Ming Lei

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/11] block: mq-deadline: Clean up deadline_check_fifo()
  2023-05-17  1:02   ` Damien Le Moal
@ 2023-05-17 15:01     ` Bart Van Assche
  2023-05-17 22:07       ` Damien Le Moal
  0 siblings, 1 reply; 50+ messages in thread
From: Bart Van Assche @ 2023-05-17 15:01 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/16/23 18:02, Damien Le Moal wrote:
> On 5/17/23 07:33, Bart Van Assche wrote:
>> -	if (time_after_eq(jiffies, (unsigned long)rq->fifo_time))
>> -		return 1;
>> -
>> -	return 0;
>> +	return time_is_before_eq_jiffies((unsigned long)rq->fifo_time);
> 
> This looks wrong: isn't this reversing the return value ?
> Shouldn't this be:
> 
> 	return time_after_eq(jiffies, (unsigned long)rq->fifo_time));
> 
> To return true if the first request in fifo list *expired* as indicated by the
> function kdoc comment.

Hi Damien,

 From include/linux/jiffies.h:

#define time_is_before_eq_jiffies(a) time_after_eq(jiffies, a)

Hence, time_is_before_eq_jiffies((unsigned long)rq->fifo_time) is 
equivalent to time_after_eq(jiffies, (unsigned long)rq->fifo_time). Both 
expressions check whether or not rq->fifo_time is in the past.

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly
  2023-05-17  1:22   ` Damien Le Moal
@ 2023-05-17 16:28     ` Bart Van Assche
  2023-05-17 22:05       ` Damien Le Moal
  2023-05-18 12:58       ` Damien Le Moal
  0 siblings, 2 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-17 16:28 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/16/23 18:22, Damien Le Moal wrote:
> On 5/17/23 07:33, Bart Van Assche wrote:
>> -/* Return the first request for which blk_rq_pos() >= pos. */
>> +/*
>> + * Return the first request for which blk_rq_pos() >= @pos. For zoned devices,
>> + * return the first request after the highest zone start <= @pos.
> 
> This comment is strange. What about:
> 
> For zoned devices, return the first request after the start of the zone
> containing @pos.

I will make this change.

>> @@ -818,7 +835,20 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>   		 * set expire time and add to fifo list
>>   		 */
>>   		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
>> -		list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]);
>> +		insert_before = &per_prio->fifo_list[data_dir];
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +		/*
>> +		 * Insert zoned writes such that requests are sorted by
>> +		 * position per zone.
>> +		 */
>> +		if (blk_rq_is_seq_zoned_write(rq)) {
>> +			struct request *rq2 = deadline_latter_request(rq);
>> +
>> +			if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
>> +				insert_before = &rq2->queuelist;
>> +		}
>> +#endif
>> +		list_add_tail(&rq->queuelist, insert_before);
> 
> Why not:
> 
> 		insert_before = &per_prio->fifo_list[data_dir];
> 		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) 	
> 		    && blk_rq_is_seq_zoned_write(rq)) {
> 			/*
> 			 * Insert zoned writes such that requests are sorted by
> 			 * position per zone.
> 		 	*/
> 			struct request *rq2 = deadline_latter_request(rq);
> 
> 			if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
> 				insert_before = &rq2->queuelist;
> 		}
> 		list_add_tail(&rq->queuelist, insert_before);
> 
> to avoid the ugly #ifdef ?

I think the above code won't build because no blk_rq_zone_no() 
definition is available if CONFIG_BLK_DEV_ZONED=n. I prefer it this way 
because my view is that using blk_rq_zone_no() if CONFIG_BLK_DEV_ZONED=n 
is wrong.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes()
  2023-05-17  1:06   ` Damien Le Moal
@ 2023-05-17 16:30     ` Bart Van Assche
  2023-05-17 22:15       ` Damien Le Moal
  0 siblings, 1 reply; 50+ messages in thread
From: Bart Van Assche @ 2023-05-17 16:30 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/16/23 18:06, Damien Le Moal wrote:
> On 5/17/23 07:33, Bart Van Assche wrote:
>> Make deadline_skip_seq_writes() do what its name suggests, namely to
>> skip sequential writes.
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>   block/mq-deadline.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>> index 6276afede9cd..dbc0feca963e 100644
>> --- a/block/mq-deadline.c
>> +++ b/block/mq-deadline.c
>> @@ -308,7 +308,7 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
>>   	do {
>>   		pos += blk_rq_sectors(rq);
>>   		rq = deadline_latter_request(rq);
>> -	} while (rq && blk_rq_pos(rq) == pos);
>> +	} while (rq && blk_rq_pos(rq) == pos && blk_rq_is_seq_zoned_write(rq));
> 
> No ! The "seq write" skip here is to skip writes that are contiguous/sequential
> to ensure that we keep issuing contiguous/sequential writes belonging to
> different zones, regardless of the target zone type.
> 
> So drop this change please.

Hi Damien,

I'm fine with dropping this patch. I came up with this patch because it 
surprised me to see that deadline_skip_seq_writes() does not check the 
type of the requests that it is skipping. If e.g. a WRITE is followed by 
two contiguous READs, all three requests are skipped. Is this intentional?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 11/11] block: mq-deadline: Fix handling of at-head zoned writes
  2023-05-17  7:47   ` Hannes Reinecke
  2023-05-17  7:53     ` Damien Le Moal
@ 2023-05-17 17:13     ` Bart Van Assche
  1 sibling, 0 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-17 17:13 UTC (permalink / raw)
  To: Hannes Reinecke, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei

On 5/17/23 00:47, Hannes Reinecke wrote:
> Similar concern here; we'll have to traverse the entire tree here.
> But if that's of no concern...

Hi Hannes,

If I measure IOPS for a null_blk device instance then I see the 
following performance results for a single CPU core:
* No I/O scheduler:                       690 K IOPS.
* mq-deadline, without this patch series: 147 K IOPS.
* mq-deadline, with this patch series:    146 K IOPS.

In other words, the performance impact of this patch series on the 
mq-deadline scheduler is small.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly
  2023-05-17 16:28     ` Bart Van Assche
@ 2023-05-17 22:05       ` Damien Le Moal
  2023-05-18 12:58       ` Damien Le Moal
  1 sibling, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17 22:05 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/18/23 01:28, Bart Van Assche wrote:
> On 5/16/23 18:22, Damien Le Moal wrote:
>> On 5/17/23 07:33, Bart Van Assche wrote:
>>> -/* Return the first request for which blk_rq_pos() >= pos. */
>>> +/*
>>> + * Return the first request for which blk_rq_pos() >= @pos. For zoned devices,
>>> + * return the first request after the highest zone start <= @pos.
>>
>> This comment is strange. What about:
>>
>> For zoned devices, return the first request after the start of the zone
>> containing @pos.
> 
> I will make this change.
> 
>>> @@ -818,7 +835,20 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>   		 * set expire time and add to fifo list
>>>   		 */
>>>   		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
>>> -		list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]);
>>> +		insert_before = &per_prio->fifo_list[data_dir];
>>> +#ifdef CONFIG_BLK_DEV_ZONED
>>> +		/*
>>> +		 * Insert zoned writes such that requests are sorted by
>>> +		 * position per zone.
>>> +		 */
>>> +		if (blk_rq_is_seq_zoned_write(rq)) {
>>> +			struct request *rq2 = deadline_latter_request(rq);
>>> +
>>> +			if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
>>> +				insert_before = &rq2->queuelist;
>>> +		}
>>> +#endif
>>> +		list_add_tail(&rq->queuelist, insert_before);
>>
>> Why not:
>>
>> 		insert_before = &per_prio->fifo_list[data_dir];
>> 		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) 	
>> 		    && blk_rq_is_seq_zoned_write(rq)) {
>> 			/*
>> 			 * Insert zoned writes such that requests are sorted by
>> 			 * position per zone.
>> 		 	*/
>> 			struct request *rq2 = deadline_latter_request(rq);
>>
>> 			if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
>> 				insert_before = &rq2->queuelist;
>> 		}
>> 		list_add_tail(&rq->queuelist, insert_before);
>>
>> to avoid the ugly #ifdef ?
> 
> I think the above code won't build because no blk_rq_zone_no() 
> definition is available if CONFIG_BLK_DEV_ZONED=n. I prefer it this way 
> because my view is that using blk_rq_zone_no() if CONFIG_BLK_DEV_ZONED=n 
> is wrong.

The compiler should drop the entire if block for CONFIG_BLK_DEV_ZONED=n case.
Worth trying to check as the code is much nicer without the #ifdef.
I will test this series today and check.

> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/11] block: mq-deadline: Clean up deadline_check_fifo()
  2023-05-17 15:01     ` Bart Van Assche
@ 2023-05-17 22:07       ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17 22:07 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/18/23 00:01, Bart Van Assche wrote:
> On 5/16/23 18:02, Damien Le Moal wrote:
>> On 5/17/23 07:33, Bart Van Assche wrote:
>>> -	if (time_after_eq(jiffies, (unsigned long)rq->fifo_time))
>>> -		return 1;
>>> -
>>> -	return 0;
>>> +	return time_is_before_eq_jiffies((unsigned long)rq->fifo_time);
>>
>> This looks wrong: isn't this reversing the return value ?
>> Shouldn't this be:
>>
>> 	return time_after_eq(jiffies, (unsigned long)rq->fifo_time));
>>
>> To return true if the first request in fifo list *expired* as indicated by the
>> function kdoc comment.
> 
> Hi Damien,
> 
>  From include/linux/jiffies.h:
> 
> #define time_is_before_eq_jiffies(a) time_after_eq(jiffies, a)

Thanks for clarifying. However, it begs the question: is this macro name correct
at all ? Why does "after" is changed to "before" ? That seems bogus to me at
worst and super confusing at best. This macro should really be:

#define time_after_eq_jiffies(a) time_after_eq(jiffies, a)

> 
> Hence, time_is_before_eq_jiffies((unsigned long)rq->fifo_time) is 
> equivalent to time_after_eq(jiffies, (unsigned long)rq->fifo_time). Both 
> expressions check whether or not rq->fifo_time is in the past.
> 
> Bart.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes()
  2023-05-17 16:30     ` Bart Van Assche
@ 2023-05-17 22:15       ` Damien Le Moal
  2023-05-18 18:48         ` Bart Van Assche
  0 siblings, 1 reply; 50+ messages in thread
From: Damien Le Moal @ 2023-05-17 22:15 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/18/23 01:30, Bart Van Assche wrote:
> On 5/16/23 18:06, Damien Le Moal wrote:
>> On 5/17/23 07:33, Bart Van Assche wrote:
>>> Make deadline_skip_seq_writes() do what its name suggests, namely to
>>> skip sequential writes.
>>>
>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>>> Cc: Ming Lei <ming.lei@redhat.com>
>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>> ---
>>>   block/mq-deadline.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>>> index 6276afede9cd..dbc0feca963e 100644
>>> --- a/block/mq-deadline.c
>>> +++ b/block/mq-deadline.c
>>> @@ -308,7 +308,7 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
>>>   	do {
>>>   		pos += blk_rq_sectors(rq);
>>>   		rq = deadline_latter_request(rq);
>>> -	} while (rq && blk_rq_pos(rq) == pos);
>>> +	} while (rq && blk_rq_pos(rq) == pos && blk_rq_is_seq_zoned_write(rq));
>>
>> No ! The "seq write" skip here is to skip writes that are contiguous/sequential
>> to ensure that we keep issuing contiguous/sequential writes belonging to
>> different zones, regardless of the target zone type.
>>
>> So drop this change please.
> 
> Hi Damien,
> 
> I'm fine with dropping this patch. I came up with this patch because it 
> surprised me to see that deadline_skip_seq_writes() does not check the 
> type of the requests that it is skipping. If e.g. a WRITE is followed by 
> two contiguous READs, all three requests are skipped. Is this intentional?

Hmmm... mq-deadline is not supposed to mix up reads and writes in the same
scheduling batch, and there is one sort_list (rbtree) per data direction. So we
should not be seeing different data directions in this loop, no ?

> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly
  2023-05-17 16:28     ` Bart Van Assche
  2023-05-17 22:05       ` Damien Le Moal
@ 2023-05-18 12:58       ` Damien Le Moal
  1 sibling, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2023-05-18 12:58 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/18/23 01:28, Bart Van Assche wrote:
> On 5/16/23 18:22, Damien Le Moal wrote:
>> On 5/17/23 07:33, Bart Van Assche wrote:
>>> -/* Return the first request for which blk_rq_pos() >= pos. */
>>> +/*
>>> + * Return the first request for which blk_rq_pos() >= @pos. For zoned devices,
>>> + * return the first request after the highest zone start <= @pos.
>>
>> This comment is strange. What about:
>>
>> For zoned devices, return the first request after the start of the zone
>> containing @pos.
> 
> I will make this change.
> 
>>> @@ -818,7 +835,20 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>   		 * set expire time and add to fifo list
>>>   		 */
>>>   		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
>>> -		list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]);
>>> +		insert_before = &per_prio->fifo_list[data_dir];
>>> +#ifdef CONFIG_BLK_DEV_ZONED
>>> +		/*
>>> +		 * Insert zoned writes such that requests are sorted by
>>> +		 * position per zone.
>>> +		 */
>>> +		if (blk_rq_is_seq_zoned_write(rq)) {
>>> +			struct request *rq2 = deadline_latter_request(rq);
>>> +
>>> +			if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
>>> +				insert_before = &rq2->queuelist;
>>> +		}
>>> +#endif
>>> +		list_add_tail(&rq->queuelist, insert_before);
>>
>> Why not:
>>
>> 		insert_before = &per_prio->fifo_list[data_dir];
>> 		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) 	
>> 		    && blk_rq_is_seq_zoned_write(rq)) {
>> 			/*
>> 			 * Insert zoned writes such that requests are sorted by
>> 			 * position per zone.
>> 		 	*/
>> 			struct request *rq2 = deadline_latter_request(rq);
>>
>> 			if (rq2 && blk_rq_zone_no(rq2) == blk_rq_zone_no(rq))
>> 				insert_before = &rq2->queuelist;
>> 		}
>> 		list_add_tail(&rq->queuelist, insert_before);
>>
>> to avoid the ugly #ifdef ?
> 
> I think the above code won't build because no blk_rq_zone_no() 
> definition is available if CONFIG_BLK_DEV_ZONED=n. I prefer it this way 
> because my view is that using blk_rq_zone_no() if CONFIG_BLK_DEV_ZONED=n 
> is wrong.

I tried and it does not work. The compiler does not remove the if block for the
!CONFIG_BLK_DEV_ZONED case. Unfortunate.

> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes()
  2023-05-17 22:15       ` Damien Le Moal
@ 2023-05-18 18:48         ` Bart Van Assche
  0 siblings, 0 replies; 50+ messages in thread
From: Bart Van Assche @ 2023-05-18 18:48 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 5/17/23 15:15, Damien Le Moal wrote:
> Hmmm... mq-deadline is not supposed to mix up reads and writes in the same
> scheduling batch, and there is one sort_list (rbtree) per data direction. So we
> should not be seeing different data directions in this loop, no ?

Right, we don't need this patch. I will drop it.

Bart.


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-05-18 18:48 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-16 22:33 [PATCH v5 00/11] mq-deadline: Improve support for zoned block devices Bart Van Assche
2023-05-16 22:33 ` [PATCH v5 01/11] block: Simplify blk_req_needs_zone_write_lock() Bart Van Assche
2023-05-16 23:23   ` Damien Le Moal
2023-05-17  7:36   ` Hannes Reinecke
2023-05-17 10:00   ` Johannes Thumshirn
2023-05-16 22:33 ` [PATCH v5 02/11] block: Fix the type of the second bdev_op_is_zoned_write() argument Bart Van Assche
2023-05-16 23:26   ` Damien Le Moal
2023-05-17  7:37   ` Hannes Reinecke
2023-05-16 22:33 ` [PATCH v5 03/11] block: Introduce op_is_zoned_write() Bart Van Assche
2023-05-16 23:30   ` Damien Le Moal
2023-05-17  0:00     ` Bart Van Assche
2023-05-17  6:45       ` Christoph Hellwig
2023-05-17  6:47         ` Damien Le Moal
2023-05-17  7:37   ` Hannes Reinecke
2023-05-16 22:33 ` [PATCH v5 04/11] block: Introduce blk_rq_is_seq_zoned_write() Bart Van Assche
2023-05-17  0:01   ` Damien Le Moal
2023-05-17  7:38   ` Hannes Reinecke
2023-05-17 10:02   ` Johannes Thumshirn
2023-05-16 22:33 ` [PATCH v5 05/11] block: mq-deadline: Clean up deadline_check_fifo() Bart Van Assche
2023-05-17  1:02   ` Damien Le Moal
2023-05-17 15:01     ` Bart Van Assche
2023-05-17 22:07       ` Damien Le Moal
2023-05-17  7:39   ` Hannes Reinecke
2023-05-16 22:33 ` [PATCH v5 06/11] block: mq-deadline: Simplify deadline_skip_seq_writes() Bart Van Assche
2023-05-17  7:40   ` Hannes Reinecke
2023-05-16 22:33 ` [PATCH v5 07/11] block: mq-deadline: Improve deadline_skip_seq_writes() Bart Van Assche
2023-05-17  1:06   ` Damien Le Moal
2023-05-17 16:30     ` Bart Van Assche
2023-05-17 22:15       ` Damien Le Moal
2023-05-18 18:48         ` Bart Van Assche
2023-05-17  7:41   ` Hannes Reinecke
2023-05-17  7:55     ` Damien Le Moal
2023-05-16 22:33 ` [PATCH v5 08/11] block: mq-deadline: Reduce lock contention Bart Van Assche
2023-05-17  1:07   ` Damien Le Moal
2023-05-17  6:46   ` Christoph Hellwig
2023-05-17  7:42   ` Hannes Reinecke
2023-05-16 22:33 ` [PATCH v5 09/11] block: mq-deadline: Track the dispatch position Bart Van Assche
2023-05-17  1:13   ` Damien Le Moal
2023-05-17  7:45   ` Hannes Reinecke
2023-05-16 22:33 ` [PATCH v5 10/11] block: mq-deadline: Handle requeued requests correctly Bart Van Assche
2023-05-17  1:22   ` Damien Le Moal
2023-05-17 16:28     ` Bart Van Assche
2023-05-17 22:05       ` Damien Le Moal
2023-05-18 12:58       ` Damien Le Moal
2023-05-17  7:46   ` Hannes Reinecke
2023-05-16 22:33 ` [PATCH v5 11/11] block: mq-deadline: Fix handling of at-head zoned writes Bart Van Assche
2023-05-17  1:24   ` Damien Le Moal
2023-05-17  7:47   ` Hannes Reinecke
2023-05-17  7:53     ` Damien Le Moal
2023-05-17 17:13     ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).