linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Improve the performance of F2FS on zoned UFS
@ 2023-07-26  0:57 Bart Van Assche
  2023-07-26  0:57 ` [PATCH v3 1/6] block: Introduce the flag REQ_NO_WRITE_LOCK Bart Van Assche
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Bart Van Assche @ 2023-07-26  0:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche

Hi Jens,

This patch series improves write performance for zoned UFS devices. Please
consider these patches for the next merge window.

Thank you,

Bart.

Changes compared to v2:
 - Renamed the request queue flag for disabling zone write locking.
 - Introduced a new request flag for disabling zone write locking.
 - Modified the mq-deadline scheduler such that zone write locking is only
   disabled if both flags are set.
 - Added an F2FS patch that sets the request flag for disabling zone write
   locking.
 - Only disable zone write locking in the UFS driver if auto-hibernation is
   disabled.

Changes compared to v1:
 - Left out the patches that are already upstream.
 - Switched the approach in patch "scsi: Retry unaligned zoned writes" from
   retrying immediately to sending unaligned write commands to the SCSI error
   handler.

Bart Van Assche (5):
  block: Introduce a request queue flag for pipelining zoned writes
  block/mq-deadline: Only use zone locking if necessary
  block/null_blk: Add support for pipelining zoned writes
  scsi: Retry unaligned zoned writes
  scsi: ufs: Enable zoned write pipelining

Bart Van Assche (6):
  block: Introduce the flag REQ_NO_WRITE_LOCK
  block/mq-deadline: Only use zone locking if necessary
  block/null_blk: Support disabling zone write locking
  scsi: Retry unaligned zoned writes
  scsi: ufs: Disable zone write locking
  fs/f2fs: Disable zone write locking

 block/blk-flush.c                 |  3 ++-
 block/mq-deadline.c               | 28 ++++++++++++++-----
 drivers/block/null_blk/main.c     |  2 ++
 drivers/block/null_blk/null_blk.h |  1 +
 drivers/block/null_blk/zoned.c    |  3 +++
 drivers/scsi/scsi_error.c         | 37 +++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c           |  1 +
 drivers/scsi/sd.c                 |  3 +++
 drivers/ufs/core/ufshcd.c         | 45 ++++++++++++++++++++++++++++---
 fs/f2fs/data.c                    |  1 +
 include/linux/blk_types.h         |  8 ++++++
 include/scsi/scsi.h               |  1 +
 12 files changed, 123 insertions(+), 10 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/6] block: Introduce the flag REQ_NO_WRITE_LOCK
  2023-07-26  0:57 [PATCH v3 0/6] Improve the performance of F2FS on zoned UFS Bart Van Assche
@ 2023-07-26  0:57 ` Bart Van Assche
  2023-07-26  8:37   ` Damien Le Moal
  2023-07-26  0:57 ` [PATCH v3 2/6] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2023-07-26  0:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
	Damien Le Moal, Ming Lei

Not all software that supports zoned storage allocates and submits zoned
writes in LBA order per zone. Introduce the REQ_NO_WRITE_LOCK flag such
that submitters of zoned writes can indicate that zoned writes are
allocated and submitted in LBA order per zone.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-flush.c         | 3 ++-
 include/linux/blk_types.h | 8 ++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index e73dc22d05c1..038350ed7cce 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -336,7 +336,8 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 		flush_rq->internal_tag = first_rq->internal_tag;
 
 	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
-	flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
+	flush_rq->cmd_flags |= flags &
+		(REQ_FAILFAST_MASK | REQ_NO_ZONE_WRITE_LOCK | REQ_DRV);
 	flush_rq->rq_flags |= RQF_FLUSH_SEQ;
 	flush_rq->end_io = flush_end_io;
 	/*
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 0bad62cca3d0..68f7934d8fa6 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -416,6 +416,12 @@ enum req_flag_bits {
 	__REQ_PREFLUSH,		/* request for cache flush */
 	__REQ_RAHEAD,		/* read ahead, can fail anytime */
 	__REQ_BACKGROUND,	/* background IO */
+	/*
+	 * Do not use zone write locking. Setting this flag is only safe if
+	 * the request submitter allocates and submit requests in LBA order per
+	 * zone.
+	 */
+	__REQ_NO_ZONE_WRITE_LOCK,
 	__REQ_NOWAIT,           /* Don't wait if request will block */
 	__REQ_POLLED,		/* caller polls for completion using bio_poll */
 	__REQ_ALLOC_CACHE,	/* allocate IO from cache if available */
@@ -448,6 +454,8 @@ enum req_flag_bits {
 #define REQ_PREFLUSH	(__force blk_opf_t)(1ULL << __REQ_PREFLUSH)
 #define REQ_RAHEAD	(__force blk_opf_t)(1ULL << __REQ_RAHEAD)
 #define REQ_BACKGROUND	(__force blk_opf_t)(1ULL << __REQ_BACKGROUND)
+#define REQ_NO_ZONE_WRITE_LOCK	\
+			(__force blk_opf_t)(1ULL << __REQ_NO_ZONE_WRITE_LOCK)
 #define REQ_NOWAIT	(__force blk_opf_t)(1ULL << __REQ_NOWAIT)
 #define REQ_POLLED	(__force blk_opf_t)(1ULL << __REQ_POLLED)
 #define REQ_ALLOC_CACHE	(__force blk_opf_t)(1ULL << __REQ_ALLOC_CACHE)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/6] block/mq-deadline: Only use zone locking if necessary
  2023-07-26  0:57 [PATCH v3 0/6] Improve the performance of F2FS on zoned UFS Bart Van Assche
  2023-07-26  0:57 ` [PATCH v3 1/6] block: Introduce the flag REQ_NO_WRITE_LOCK Bart Van Assche
@ 2023-07-26  0:57 ` Bart Van Assche
  2023-07-26  8:41   ` Damien Le Moal
  2023-07-26  0:57 ` [PATCH v3 3/6] block/null_blk: Support disabling zone write locking Bart Van Assche
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2023-07-26  0:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
	Damien Le Moal, Ming Lei

Measurements have shown that limiting the queue depth to one for zoned
writes has a significant negative performance impact on zoned UFS devices.
Hence this patch that disables zone locking by the mq-deadline scheduler
if zoned writes are submitted in order and if the storage controller
preserves the command order. This patch is based on the following
assumptions:
- It happens infrequently that zoned write requests are reordered by the
  block layer.
- The I/O priority of all write requests is the same per zone.
- Either no I/O scheduler is used or an I/O scheduler is used that
  submits write requests per zone in LBA order.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 02a916ba62ee..ce5b5048935e 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -338,6 +338,18 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
 	return rq;
 }
 
+/*
+ * Use write locking if either QUEUE_FLAG_NO_ZONE_WRITE_LOCK or
+ * REQ_NO_ZONE_WRITE_LOCK has not been set. Not using zone write locking is
+ * only safe if the submitter allocates and submit requests in LBA order per
+ * zone and if the block driver preserves the request order.
+ */
+static bool dd_use_write_locking(struct request *rq)
+{
+	return !blk_queue_no_zone_write_lock(rq->q) ||
+		!(rq->cmd_flags & REQ_NO_ZONE_WRITE_LOCK);
+}
+
 /*
  * For the specified data direction, return the next request to
  * dispatch using arrival ordered lists.
@@ -353,7 +365,8 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 		return NULL;
 
 	rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q) ||
+	    !dd_use_write_locking(rq))
 		return rq;
 
 	/*
@@ -398,7 +411,8 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	if (!rq)
 		return NULL;
 
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q) ||
+	    !dd_use_write_locking(rq))
 		return rq;
 
 	/*
@@ -526,8 +540,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	}
 
 	/*
-	 * For a zoned block device, if we only have writes queued and none of
-	 * them can be dispatched, rq will be NULL.
+	 * For a zoned block device that requires write serialization, if we
+	 * only have writes queued and none of them can be dispatched, rq will
+	 * be NULL.
 	 */
 	if (!rq)
 		return NULL;
@@ -552,7 +567,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	/*
 	 * If the request needs its target zone locked, do it.
 	 */
-	blk_req_zone_write_lock(rq);
+	if (dd_use_write_locking(rq))
+		blk_req_zone_write_lock(rq);
 	rq->rq_flags |= RQF_STARTED;
 	return rq;
 }
@@ -933,7 +949,7 @@ static void dd_finish_request(struct request *rq)
 
 	atomic_inc(&per_prio->stats.completed);
 
-	if (blk_queue_is_zoned(q)) {
+	if (blk_queue_is_zoned(rq->q) && dd_use_write_locking(rq)) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&dd->zone_lock, flags);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/6] block/null_blk: Support disabling zone write locking
  2023-07-26  0:57 [PATCH v3 0/6] Improve the performance of F2FS on zoned UFS Bart Van Assche
  2023-07-26  0:57 ` [PATCH v3 1/6] block: Introduce the flag REQ_NO_WRITE_LOCK Bart Van Assche
  2023-07-26  0:57 ` [PATCH v3 2/6] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
@ 2023-07-26  0:57 ` Bart Van Assche
  2023-07-26  8:43   ` Damien Le Moal
  2023-07-26  0:57 ` [PATCH v3 4/6] scsi: Retry unaligned zoned writes Bart Van Assche
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2023-07-26  0:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
	Damien Le Moal, Ming Lei, Chaitanya Kulkarni, Damien Le Moal,
	Nitesh Shetty, Vincent Fu, Dan Carpenter, Akinobu Mita,
	Shin'ichiro Kawasaki, Martin K. Petersen, Johannes Thumshirn

Add a new configfs attribute for disabling zone write locking. The test
script below reports 250 K IOPS with no I/O scheduler, 6 K IOPS with
mq-deadline and write locking enabled and 123 K IOPS with mq-deadline
and write locking disabled. This shows that disabling write locking
results in about 20 times more IOPS for this particular test case.

    #!/bin/bash

    for mode in "none 0" "mq-deadline 0" "mq-deadline 1"; do
        set +e
        for d in /sys/kernel/config/nullb/*; do
            [ -d "$d" ] && rmdir "$d"
        done
        modprobe -r null_blk
        set -e
        read -r iosched no_write_locking <<<"$mode"
        modprobe null_blk nr_devices=0
        (
            cd /sys/kernel/config/nullb
            mkdir nullb0
            cd nullb0
            params=(
                completion_nsec=100000
                hw_queue_depth=64
                irqmode=2                # NULL_IRQ_TIMER
                max_sectors=$((4096/512))
                memory_backed=1
                no_zone_write_lock="${no_write_locking}"
                size=1
                submit_queues=1
                zone_size=1
                zoned=1
                power=1
            )
            for p in "${params[@]}"; do
                echo "${p//*=}" > "${p//=*}"
            done
        )
        udevadm settle
        dev=/dev/nullb0
        [ -b "${dev}" ]
        params=(
            --direct=1
            --filename="${dev}"
            --iodepth=64
            --iodepth_batch=16
            --ioengine=io_uring
            --ioscheduler="${iosched}"
            --gtod_reduce=1
            --hipri=0
            --name=nullb0
            --runtime=30
            --rw=write
            --time_based=1
            --zonemode=zbd
        )
        fio "${params[@]}"
    done

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/null_blk/main.c     | 2 ++
 drivers/block/null_blk/null_blk.h | 1 +
 drivers/block/null_blk/zoned.c    | 3 +++
 3 files changed, 6 insertions(+)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 864013019d6b..5c0578137f51 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -424,6 +424,7 @@ NULLB_DEVICE_ATTR(zone_capacity, ulong, NULL);
 NULLB_DEVICE_ATTR(zone_nr_conv, uint, NULL);
 NULLB_DEVICE_ATTR(zone_max_open, uint, NULL);
 NULLB_DEVICE_ATTR(zone_max_active, uint, NULL);
+NULLB_DEVICE_ATTR(no_zone_write_lock, bool, NULL);
 NULLB_DEVICE_ATTR(virt_boundary, bool, NULL);
 NULLB_DEVICE_ATTR(no_sched, bool, NULL);
 NULLB_DEVICE_ATTR(shared_tag_bitmap, bool, NULL);
@@ -569,6 +570,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
 	&nullb_device_attr_zone_max_active,
 	&nullb_device_attr_zone_readonly,
 	&nullb_device_attr_zone_offline,
+	&nullb_device_attr_no_zone_write_lock,
 	&nullb_device_attr_virt_boundary,
 	&nullb_device_attr_no_sched,
 	&nullb_device_attr_shared_tag_bitmap,
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index 929f659dd255..b521096bcc3f 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -117,6 +117,7 @@ struct nullb_device {
 	bool memory_backed; /* if data is stored in memory */
 	bool discard; /* if support discard */
 	bool zoned; /* if device is zoned */
+	bool no_zone_write_lock;
 	bool virt_boundary; /* virtual boundary on/off for the device */
 	bool no_sched; /* no IO scheduler for the device */
 	bool shared_tag_bitmap; /* use hostwide shared tags */
diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
index 55c5b48bc276..31c8364a63e9 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -96,6 +96,9 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
 
 	spin_lock_init(&dev->zone_res_lock);
 
+	if (dev->no_zone_write_lock)
+		blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
+
 	if (dev->zone_nr_conv >= dev->nr_zones) {
 		dev->zone_nr_conv = dev->nr_zones - 1;
 		pr_info("changed the number of conventional zones to %u",

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/6] scsi: Retry unaligned zoned writes
  2023-07-26  0:57 [PATCH v3 0/6] Improve the performance of F2FS on zoned UFS Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-07-26  0:57 ` [PATCH v3 3/6] block/null_blk: Support disabling zone write locking Bart Van Assche
@ 2023-07-26  0:57 ` Bart Van Assche
  2023-07-26  8:47   ` Damien Le Moal
  2023-07-26  0:57 ` [PATCH v3 5/6] scsi: ufs: Disable zone write locking Bart Van Assche
  2023-07-26  0:57 ` [PATCH v3 6/6] fs/f2fs: " Bart Van Assche
  5 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2023-07-26  0:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
	Martin K . Petersen, Damien Le Moal, Ming Lei,
	James E.J. Bottomley

From ZBC-2: "The device server terminates with CHECK CONDITION status, with
the sense key set to ILLEGAL REQUEST, and the additional sense code set to
UNALIGNED WRITE COMMAND a write command, other than an entire medium write
same command, that specifies: a) the starting LBA in a sequential write
required zone set to a value that is not equal to the write pointer for
that sequential write required zone; or b) an ending LBA that is not equal
to the last logical block within a physical block (see SBC-5)."

Send commands that failed with an unaligned write error to the SCSI error
handler. Let the SCSI error handler sort SCSI commands per LBA before
resubmitting these.

Increase the number of retries for write commands sent to a sequential
zone to the maximum number of outstanding commands.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_error.c | 37 +++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c   |  1 +
 drivers/scsi/sd.c         |  3 +++
 include/scsi/scsi.h       |  1 +
 4 files changed, 42 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c67cdcdc3ba8..2b9aec05dc36 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -27,6 +27,7 @@
 #include <linux/blkdev.h>
 #include <linux/delay.h>
 #include <linux/jiffies.h>
+#include <linux/list_sort.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -698,6 +699,17 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		fallthrough;
 
 	case ILLEGAL_REQUEST:
+		/*
+		 * Unaligned write command. This indicates that zoned writes
+		 * have been received by the device in the wrong order. If zone
+		 * write locking is disabled, retry after all pending commands
+		 * have completed.
+		 */
+		if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
+		    blk_queue_no_zone_write_lock(sdev->request_queue) &&
+		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd))
+			return NEEDS_DELAYED_RETRY;
+
 		if (sshdr.asc == 0x20 || /* Invalid command operation code */
 		    sshdr.asc == 0x21 || /* Logical block address out of range */
 		    sshdr.asc == 0x22 || /* Invalid function */
@@ -2223,6 +2235,25 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
 }
 EXPORT_SYMBOL(scsi_eh_flush_done_q);
 
+/*
+ * Returns a negative value if @_a has a lower LBA than @_b, zero if
+ * both have the same LBA and a positive value otherwise.
+ */
+static int scsi_cmp_lba(void *priv, const struct list_head *_a,
+			const struct list_head *_b)
+{
+	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
+	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
+	const sector_t pos_a = blk_rq_pos(scsi_cmd_to_rq(a));
+	const sector_t pos_b = blk_rq_pos(scsi_cmd_to_rq(b));
+
+	if (pos_a < pos_b)
+		return -1;
+	if (pos_a > pos_b)
+		return 1;
+	return 0;
+}
+
 /**
  * scsi_unjam_host - Attempt to fix a host which has a cmd that failed.
  * @shost:	Host to unjam.
@@ -2258,6 +2289,12 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
 
 	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
 
+	/*
+	 * Sort pending SCSI commands in LBA order. This is important if zone
+	 * write locking is disabled for a zoned SCSI device.
+	 */
+	list_sort(NULL, &eh_work_q, scsi_cmp_lba);
+
 	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
 		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 59176946ab56..69da8aee13df 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1443,6 +1443,7 @@ static void scsi_complete(struct request *rq)
 	case ADD_TO_MLQUEUE:
 		scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 		break;
+	case NEEDS_DELAYED_RETRY:
 	default:
 		scsi_eh_scmd_add(cmd);
 		break;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 68b12afa0721..27b9ebe05b90 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1235,6 +1235,9 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
 	cmd->transfersize = sdp->sector_size;
 	cmd->underflow = nr_blocks << 9;
 	cmd->allowed = sdkp->max_retries;
+	if (blk_queue_no_zone_write_lock(rq->q) &&
+	    blk_rq_is_seq_zoned_write(rq))
+		cmd->allowed += rq->q->nr_requests;
 	cmd->sdb.length = nr_blocks * sdp->sector_size;
 
 	SCSI_LOG_HLQUEUE(1,
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index ec093594ba53..6600db046227 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -93,6 +93,7 @@ static inline int scsi_status_is_check_condition(int status)
  * Internal return values.
  */
 enum scsi_disposition {
+	NEEDS_DELAYED_RETRY	= 0x2000,
 	NEEDS_RETRY		= 0x2001,
 	SUCCESS			= 0x2002,
 	FAILED			= 0x2003,

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 5/6] scsi: ufs: Disable zone write locking
  2023-07-26  0:57 [PATCH v3 0/6] Improve the performance of F2FS on zoned UFS Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-07-26  0:57 ` [PATCH v3 4/6] scsi: Retry unaligned zoned writes Bart Van Assche
@ 2023-07-26  0:57 ` Bart Van Assche
  2023-07-26 12:04   ` kernel test robot
  2023-07-26  0:57 ` [PATCH v3 6/6] fs/f2fs: " Bart Van Assche
  5 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2023-07-26  0:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
	Martin K . Petersen, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Stanley Chu, Can Guo, Asutosh Das,
	Bao D. Nguyen, Bean Huo, Arthur Simchaev

From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
"The host controller always process transfer requests in-order according
to the order submitted to the list. In case of multiple commands with
single doorbell register ringing (batch mode), The dispatch order for
these transfer requests by host controller will base on their index in
the List. A transfer request with lower index value will be executed
before a transfer request with higher index value."

From the UFSHCI 4.0 specification, about the MCQ mode:
"Command Submission
1. Host SW writes an Entry to SQ
2. Host SW updates SQ doorbell tail pointer

Command Processing
3. After fetching the Entry, Host Controller updates SQ doorbell head
   pointer
4. Host controller sends COMMAND UPIU to UFS device"

In other words, for both legacy and MCQ mode, UFS controllers are
required to forward commands to the UFS device in the order these
commands have been received from the host.

Notes:
- For legacy mode this is only correct if the host submits one
  command at a time. The UFS driver does this.
- Also in legacy mode, the command order is not preserved if
  auto-hibernation is enabled in the UFS controller.

This patch improves small write IOPS with a factor four on my test
setup.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 45 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 129446775796..0f7f91e2cda9 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4337,29 +4337,67 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
 }
 EXPORT_SYMBOL_GPL(ufshcd_uic_hibern8_exit);
 
+void ufshcd_update_no_zone_write_lock(struct ufs_hba *hba,
+				      bool set_no_zone_write_lock)
+{
+	struct scsi_device *sdev;
+
+	shost_for_each_device(sdev, hba->host)
+		blk_freeze_queue_start(sdev->request_queue);
+	shost_for_each_device(sdev, hba->host) {
+		struct request_queue *q = sdev->request_queue;
+
+		blk_mq_freeze_queue_wait(q);
+		if (set_no_zone_write_lock)
+			blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
+		else
+			blk_queue_flag_clear(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
+		blk_mq_unfreeze_queue(q);
+	}
+}
+
 void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 {
 	unsigned long flags;
-	bool update = false;
+	bool prev_state, new_state, update = false;
 
 	if (!ufshcd_is_auto_hibern8_supported(hba))
 		return;
 
 	spin_lock_irqsave(hba->host->host_lock, flags);
+	prev_state = ufshcd_is_auto_hibern8_enabled(hba);
 	if (hba->ahit != ahit) {
 		hba->ahit = ahit;
 		update = true;
 	}
+	new_state = ufshcd_is_auto_hibern8_enabled(hba);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
-	if (update &&
-	    !pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
+	if (!update)
+		return;
+	if (!is_mcq_enabled(hba) && !prev_state && new_state) {
+		/*
+		 * Auto-hibernation will be enabled. Enable write locking for
+		 * zoned writes since auto-hibernation may cause reordering of
+		 * zoned writes when using the legacy mode of the UFS host
+		 * controller.
+		 */
+		ufshcd_update_no_zone_write_lock(hba, false);
+	}
+	if (!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
 		ufshcd_rpm_get_sync(hba);
 		ufshcd_hold(hba);
 		ufshcd_auto_hibern8_enable(hba);
 		ufshcd_release(hba);
 		ufshcd_rpm_put_sync(hba);
 	}
+	if (!is_mcq_enabled(hba) && prev_state && !new_state) {
+		/*
+		 * Auto-hibernation has been disabled. Disable write locking
+		 * for zoned writes.
+		 */
+		ufshcd_update_no_zone_write_lock(hba, true);
+	}
 }
 EXPORT_SYMBOL_GPL(ufshcd_auto_hibern8_update);
 
@@ -5139,6 +5177,7 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
 
 	ufshcd_hpb_configure(hba, sdev);
 
+	blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
 	blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
 	if (hba->quirks & UFSHCD_QUIRK_4KB_DMA_ALIGNMENT)
 		blk_queue_update_dma_alignment(q, SZ_4K - 1);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 6/6] fs/f2fs: Disable zone write locking
  2023-07-26  0:57 [PATCH v3 0/6] Improve the performance of F2FS on zoned UFS Bart Van Assche
                   ` (4 preceding siblings ...)
  2023-07-26  0:57 ` [PATCH v3 5/6] scsi: ufs: Disable zone write locking Bart Van Assche
@ 2023-07-26  0:57 ` Bart Van Assche
  5 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2023-07-26  0:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
	Damien Le Moal, Ming Lei, Chao Yu

Set the REQ_NO_ZONE_WRITE_LOCK flag to inform the block layer that F2FS
allocates and submits zoned writes in LBA order per zone.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/f2fs/data.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5882afe71d82..6361553f4ab1 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -569,6 +569,7 @@ static void f2fs_submit_write_bio(struct f2fs_sb_info *sbi, struct bio *bio,
 		}
 	}
 
+	bio->bi_opf |= REQ_NO_ZONE_WRITE_LOCK;
 	trace_f2fs_submit_write_bio(sbi->sb, type, bio);
 	iostat_update_submit_ctx(bio, type);
 	submit_bio(bio);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/6] block: Introduce the flag REQ_NO_WRITE_LOCK
  2023-07-26  0:57 ` [PATCH v3 1/6] block: Introduce the flag REQ_NO_WRITE_LOCK Bart Van Assche
@ 2023-07-26  8:37   ` Damien Le Moal
  2023-07-26 14:58     ` Bart Van Assche
  0 siblings, 1 reply; 15+ messages in thread
From: Damien Le Moal @ 2023-07-26  8:37 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Ming Lei

On 7/26/23 09:57, Bart Van Assche wrote:
> Not all software that supports zoned storage allocates and submits zoned
> writes in LBA order per zone. Introduce the REQ_NO_WRITE_LOCK flag such
> that submitters of zoned writes can indicate that zoned writes are
> allocated and submitted in LBA order per zone.

That is absolutely NOT true. *ALL* in-kernel and application software supporting
zoned block devices issue writes sequentially. Otherwise, they are considered
buggy. The justification for zone write locking is to preserve sequential write
streams in the presence of storage adapters that do not preserve command orders,
or if the low level driver requeue write commands. If both of these conditions
are false (e.g. UFS), with the help of proper scsi requeue handling, zone write
locking can be disabled. REQ_NO_WRITE_LOCK indicates that a write request does
not need zone write locking when the underlying device signaled that it can
preserve command ordering. So we also need a queue flag for that...

> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-flush.c         | 3 ++-
>  include/linux/blk_types.h | 8 ++++++++
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index e73dc22d05c1..038350ed7cce 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -336,7 +336,8 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
>  		flush_rq->internal_tag = first_rq->internal_tag;
>  
>  	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
> -	flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
> +	flush_rq->cmd_flags |= flags &
> +		(REQ_FAILFAST_MASK | REQ_NO_ZONE_WRITE_LOCK | REQ_DRV);

Why ? I do not see why this change is necessary. If it is, the commit message
should mention it and this change should probably be its own patch.

>  	flush_rq->rq_flags |= RQF_FLUSH_SEQ;
>  	flush_rq->end_io = flush_end_io;
>  	/*
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 0bad62cca3d0..68f7934d8fa6 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -416,6 +416,12 @@ enum req_flag_bits {
>  	__REQ_PREFLUSH,		/* request for cache flush */
>  	__REQ_RAHEAD,		/* read ahead, can fail anytime */
>  	__REQ_BACKGROUND,	/* background IO */
> +	/*
> +	 * Do not use zone write locking. Setting this flag is only safe if
> +	 * the request submitter allocates and submit requests in LBA order per
> +	 * zone.

...if the request submitter submits write requests sequentially per zone.

> +	 */
> +	__REQ_NO_ZONE_WRITE_LOCK,
>  	__REQ_NOWAIT,           /* Don't wait if request will block */
>  	__REQ_POLLED,		/* caller polls for completion using bio_poll */
>  	__REQ_ALLOC_CACHE,	/* allocate IO from cache if available */
> @@ -448,6 +454,8 @@ enum req_flag_bits {
>  #define REQ_PREFLUSH	(__force blk_opf_t)(1ULL << __REQ_PREFLUSH)
>  #define REQ_RAHEAD	(__force blk_opf_t)(1ULL << __REQ_RAHEAD)
>  #define REQ_BACKGROUND	(__force blk_opf_t)(1ULL << __REQ_BACKGROUND)
> +#define REQ_NO_ZONE_WRITE_LOCK	\
> +			(__force blk_opf_t)(1ULL << __REQ_NO_ZONE_WRITE_LOCK)
>  #define REQ_NOWAIT	(__force blk_opf_t)(1ULL << __REQ_NOWAIT)
>  #define REQ_POLLED	(__force blk_opf_t)(1ULL << __REQ_POLLED)
>  #define REQ_ALLOC_CACHE	(__force blk_opf_t)(1ULL << __REQ_ALLOC_CACHE)

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/6] block/mq-deadline: Only use zone locking if necessary
  2023-07-26  0:57 ` [PATCH v3 2/6] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
@ 2023-07-26  8:41   ` Damien Le Moal
  0 siblings, 0 replies; 15+ messages in thread
From: Damien Le Moal @ 2023-07-26  8:41 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Ming Lei

On 7/26/23 09:57, Bart Van Assche wrote:
> Measurements have shown that limiting the queue depth to one for zoned
> writes has a significant negative performance impact on zoned UFS devices.
> Hence this patch that disables zone locking by the mq-deadline scheduler
> if zoned writes are submitted in order and if the storage controller
> preserves the command order. This patch is based on the following
> assumptions:

if zoned writes are submitted in order and if the storage... -> if the storage
controller preserves...

> - It happens infrequently that zoned write requests are reordered by the
>   block layer.
> - The I/O priority of all write requests is the same per zone.
> - Either no I/O scheduler is used or an I/O scheduler is used that
>   submits write requests per zone in LBA order.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>

Please use dlemoal@kernel.org

> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 02a916ba62ee..ce5b5048935e 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -338,6 +338,18 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
>  	return rq;
>  }
>  
> +/*
> + * Use write locking if either QUEUE_FLAG_NO_ZONE_WRITE_LOCK or
> + * REQ_NO_ZONE_WRITE_LOCK has not been set. Not using zone write locking is
> + * only safe if the submitter allocates and submit requests in LBA order per
> + * zone and if the block driver preserves the request order.
> + */
> +static bool dd_use_write_locking(struct request *rq)
> +{
> +	return !blk_queue_no_zone_write_lock(rq->q) ||
> +		!(rq->cmd_flags & REQ_NO_ZONE_WRITE_LOCK);
> +}
> +
>  /*
>   * For the specified data direction, return the next request to
>   * dispatch using arrival ordered lists.
> @@ -353,7 +365,8 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  		return NULL;
>  
>  	rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
> -	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
> +	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q) ||
> +	    !dd_use_write_locking(rq))

The test for !blk_queue_is_zoned(rq->q) can be moved inside
dd_use_write_locking() I think. That will avoid repeating it again below.

>  		return rq;
>  
>  	/*
> @@ -398,7 +411,8 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  	if (!rq)
>  		return NULL;
>  
> -	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
> +	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q) ||
> +	    !dd_use_write_locking(rq))
>  		return rq;
>  
>  	/*
> @@ -526,8 +540,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
>  	}
>  
>  	/*
> -	 * For a zoned block device, if we only have writes queued and none of
> -	 * them can be dispatched, rq will be NULL.
> +	 * For a zoned block device that requires write serialization, if we
> +	 * only have writes queued and none of them can be dispatched, rq will
> +	 * be NULL.
>  	 */
>  	if (!rq)
>  		return NULL;
> @@ -552,7 +567,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
>  	/*
>  	 * If the request needs its target zone locked, do it.
>  	 */
> -	blk_req_zone_write_lock(rq);
> +	if (dd_use_write_locking(rq))
> +		blk_req_zone_write_lock(rq);
>  	rq->rq_flags |= RQF_STARTED;
>  	return rq;
>  }
> @@ -933,7 +949,7 @@ static void dd_finish_request(struct request *rq)
>  
>  	atomic_inc(&per_prio->stats.completed);
>  
> -	if (blk_queue_is_zoned(q)) {
> +	if (blk_queue_is_zoned(rq->q) && dd_use_write_locking(rq)) {
>  		unsigned long flags;
>  
>  		spin_lock_irqsave(&dd->zone_lock, flags);

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/6] block/null_blk: Support disabling zone write locking
  2023-07-26  0:57 ` [PATCH v3 3/6] block/null_blk: Support disabling zone write locking Bart Van Assche
@ 2023-07-26  8:43   ` Damien Le Moal
  0 siblings, 0 replies; 15+ messages in thread
From: Damien Le Moal @ 2023-07-26  8:43 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Damien Le Moal,
	Ming Lei, Chaitanya Kulkarni, Nitesh Shetty, Vincent Fu,
	Dan Carpenter, Akinobu Mita, Shin'ichiro Kawasaki,
	Martin K. Petersen, Johannes Thumshirn

On 7/26/23 09:57, Bart Van Assche wrote:
> Add a new configfs attribute for disabling zone write locking. The test
> script below reports 250 K IOPS with no I/O scheduler, 6 K IOPS with
> mq-deadline and write locking enabled and 123 K IOPS with mq-deadline
> and write locking disabled. This shows that disabling write locking
> results in about 20 times more IOPS for this particular test case.
> 
>     #!/bin/bash
> 
>     for mode in "none 0" "mq-deadline 0" "mq-deadline 1"; do
>         set +e
>         for d in /sys/kernel/config/nullb/*; do
>             [ -d "$d" ] && rmdir "$d"
>         done
>         modprobe -r null_blk
>         set -e
>         read -r iosched no_write_locking <<<"$mode"
>         modprobe null_blk nr_devices=0
>         (
>             cd /sys/kernel/config/nullb
>             mkdir nullb0
>             cd nullb0
>             params=(
>                 completion_nsec=100000
>                 hw_queue_depth=64
>                 irqmode=2                # NULL_IRQ_TIMER
>                 max_sectors=$((4096/512))
>                 memory_backed=1
>                 no_zone_write_lock="${no_write_locking}"
>                 size=1
>                 submit_queues=1
>                 zone_size=1
>                 zoned=1
>                 power=1
>             )
>             for p in "${params[@]}"; do
>                 echo "${p//*=}" > "${p//=*}"
>             done
>         )
>         udevadm settle
>         dev=/dev/nullb0
>         [ -b "${dev}" ]
>         params=(
>             --direct=1
>             --filename="${dev}"
>             --iodepth=64
>             --iodepth_batch=16
>             --ioengine=io_uring
>             --ioscheduler="${iosched}"
>             --gtod_reduce=1
>             --hipri=0
>             --name=nullb0
>             --runtime=30
>             --rw=write
>             --time_based=1
>             --zonemode=zbd
>         )
>         fio "${params[@]}"
>     done
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/block/null_blk/main.c     | 2 ++
>  drivers/block/null_blk/null_blk.h | 1 +
>  drivers/block/null_blk/zoned.c    | 3 +++
>  3 files changed, 6 insertions(+)
> 
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 864013019d6b..5c0578137f51 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -424,6 +424,7 @@ NULLB_DEVICE_ATTR(zone_capacity, ulong, NULL);
>  NULLB_DEVICE_ATTR(zone_nr_conv, uint, NULL);
>  NULLB_DEVICE_ATTR(zone_max_open, uint, NULL);
>  NULLB_DEVICE_ATTR(zone_max_active, uint, NULL);
> +NULLB_DEVICE_ATTR(no_zone_write_lock, bool, NULL);
>  NULLB_DEVICE_ATTR(virt_boundary, bool, NULL);
>  NULLB_DEVICE_ATTR(no_sched, bool, NULL);
>  NULLB_DEVICE_ATTR(shared_tag_bitmap, bool, NULL);
> @@ -569,6 +570,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
>  	&nullb_device_attr_zone_max_active,
>  	&nullb_device_attr_zone_readonly,
>  	&nullb_device_attr_zone_offline,
> +	&nullb_device_attr_no_zone_write_lock,
>  	&nullb_device_attr_virt_boundary,
>  	&nullb_device_attr_no_sched,
>  	&nullb_device_attr_shared_tag_bitmap,
> diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
> index 929f659dd255..b521096bcc3f 100644
> --- a/drivers/block/null_blk/null_blk.h
> +++ b/drivers/block/null_blk/null_blk.h
> @@ -117,6 +117,7 @@ struct nullb_device {
>  	bool memory_backed; /* if data is stored in memory */
>  	bool discard; /* if support discard */
>  	bool zoned; /* if device is zoned */
> +	bool no_zone_write_lock;
>  	bool virt_boundary; /* virtual boundary on/off for the device */
>  	bool no_sched; /* no IO scheduler for the device */
>  	bool shared_tag_bitmap; /* use hostwide shared tags */
> diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
> index 55c5b48bc276..31c8364a63e9 100644
> --- a/drivers/block/null_blk/zoned.c
> +++ b/drivers/block/null_blk/zoned.c
> @@ -96,6 +96,9 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
>  
>  	spin_lock_init(&dev->zone_res_lock);
>  
> +	if (dev->no_zone_write_lock)
> +		blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);

This patch look OK, but there is no patch introducing
QUEUE_FLAG_NO_ZONE_WRITE_LOCK... Patch 1 only introduced REQ_NO_ZONE_WRITE_LOCK.

So the series as is will not compile.

> +
>  	if (dev->zone_nr_conv >= dev->nr_zones) {
>  		dev->zone_nr_conv = dev->nr_zones - 1;
>  		pr_info("changed the number of conventional zones to %u",

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] scsi: Retry unaligned zoned writes
  2023-07-26  0:57 ` [PATCH v3 4/6] scsi: Retry unaligned zoned writes Bart Van Assche
@ 2023-07-26  8:47   ` Damien Le Moal
  2023-07-26 15:02     ` Bart Van Assche
  0 siblings, 1 reply; 15+ messages in thread
From: Damien Le Moal @ 2023-07-26  8:47 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Martin K . Petersen,
	Damien Le Moal, Ming Lei, James E.J. Bottomley

On 7/26/23 09:57, Bart Van Assche wrote:
> From ZBC-2: "The device server terminates with CHECK CONDITION status, with
> the sense key set to ILLEGAL REQUEST, and the additional sense code set to
> UNALIGNED WRITE COMMAND a write command, other than an entire medium write
> same command, that specifies: a) the starting LBA in a sequential write
> required zone set to a value that is not equal to the write pointer for
> that sequential write required zone; or b) an ending LBA that is not equal
> to the last logical block within a physical block (see SBC-5)."
> 
> Send commands that failed with an unaligned write error to the SCSI error
> handler. Let the SCSI error handler sort SCSI commands per LBA before
> resubmitting these.
> 
> Increase the number of retries for write commands sent to a sequential
> zone to the maximum number of outstanding commands.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi_error.c | 37 +++++++++++++++++++++++++++++++++++++
>  drivers/scsi/scsi_lib.c   |  1 +
>  drivers/scsi/sd.c         |  3 +++
>  include/scsi/scsi.h       |  1 +
>  4 files changed, 42 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index c67cdcdc3ba8..2b9aec05dc36 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -27,6 +27,7 @@
>  #include <linux/blkdev.h>
>  #include <linux/delay.h>
>  #include <linux/jiffies.h>
> +#include <linux/list_sort.h>
>  
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
> @@ -698,6 +699,17 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>  		fallthrough;
>  
>  	case ILLEGAL_REQUEST:
> +		/*
> +		 * Unaligned write command. This indicates that zoned writes
> +		 * have been received by the device in the wrong order. If zone
> +		 * write locking is disabled, retry after all pending commands
> +		 * have completed.
> +		 */
> +		if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
> +		    blk_queue_no_zone_write_lock(sdev->request_queue) &&
> +		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd))
> +			return NEEDS_DELAYED_RETRY;
> +
>  		if (sshdr.asc == 0x20 || /* Invalid command operation code */
>  		    sshdr.asc == 0x21 || /* Logical block address out of range */
>  		    sshdr.asc == 0x22 || /* Invalid function */
> @@ -2223,6 +2235,25 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
>  }
>  EXPORT_SYMBOL(scsi_eh_flush_done_q);
>  
> +/*
> + * Returns a negative value if @_a has a lower LBA than @_b, zero if
> + * both have the same LBA and a positive value otherwise.
> + */
> +static int scsi_cmp_lba(void *priv, const struct list_head *_a,
> +			const struct list_head *_b)

The argument priv is unused.

> +{
> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
> +	const sector_t pos_a = blk_rq_pos(scsi_cmd_to_rq(a));
> +	const sector_t pos_b = blk_rq_pos(scsi_cmd_to_rq(b));
> +
> +	if (pos_a < pos_b)
> +		return -1;
> +	if (pos_a > pos_b)
> +		return 1;
> +	return 0;
> +}
> +
>  /**
>   * scsi_unjam_host - Attempt to fix a host which has a cmd that failed.
>   * @shost:	Host to unjam.
> @@ -2258,6 +2289,12 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
>  
>  	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
>  
> +	/*
> +	 * Sort pending SCSI commands in LBA order. This is important if zone
> +	 * write locking is disabled for a zoned SCSI device.
> +	 */
> +	list_sort(NULL, &eh_work_q, scsi_cmp_lba);

Should we do this only for zoned devices ?

> +
>  	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
>  		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
>  
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 59176946ab56..69da8aee13df 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1443,6 +1443,7 @@ static void scsi_complete(struct request *rq)
>  	case ADD_TO_MLQUEUE:
>  		scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
>  		break;
> +	case NEEDS_DELAYED_RETRY:
>  	default:
>  		scsi_eh_scmd_add(cmd);
>  		break;
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 68b12afa0721..27b9ebe05b90 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1235,6 +1235,9 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>  	cmd->transfersize = sdp->sector_size;
>  	cmd->underflow = nr_blocks << 9;
>  	cmd->allowed = sdkp->max_retries;
> +	if (blk_queue_no_zone_write_lock(rq->q) &&
> +	    blk_rq_is_seq_zoned_write(rq))
> +		cmd->allowed += rq->q->nr_requests;

Aouch... that could be a lot...

>  	cmd->sdb.length = nr_blocks * sdp->sector_size;
>  
>  	SCSI_LOG_HLQUEUE(1,
> diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
> index ec093594ba53..6600db046227 100644
> --- a/include/scsi/scsi.h
> +++ b/include/scsi/scsi.h
> @@ -93,6 +93,7 @@ static inline int scsi_status_is_check_condition(int status)
>   * Internal return values.
>   */
>  enum scsi_disposition {
> +	NEEDS_DELAYED_RETRY	= 0x2000,
>  	NEEDS_RETRY		= 0x2001,
>  	SUCCESS			= 0x2002,
>  	FAILED			= 0x2003,

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 5/6] scsi: ufs: Disable zone write locking
  2023-07-26  0:57 ` [PATCH v3 5/6] scsi: ufs: Disable zone write locking Bart Van Assche
@ 2023-07-26 12:04   ` kernel test robot
  0 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2023-07-26 12:04 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: llvm, oe-kbuild-all, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Bart Van Assche, Martin K . Petersen, Avri Altman, Damien Le Moal,
	Ming Lei, James E.J. Bottomley, Stanley Chu, Can Guo, Asutosh Das,
	Bao D. Nguyen, Bean Huo, Arthur Simchaev

Hi Bart,

kernel test robot noticed the following build errors:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on jejb-scsi/for-next linus/master v6.5-rc3]
[cannot apply to mkp-scsi/for-next next-20230726]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Bart-Van-Assche/block-Introduce-the-flag-REQ_NO_WRITE_LOCK/20230726-085936
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20230726005742.303865-6-bvanassche%40acm.org
patch subject: [PATCH v3 5/6] scsi: ufs: Disable zone write locking
config: s390-randconfig-r044-20230726 (https://download.01.org/0day-ci/archive/20230726/202307261940.mazVXKtc-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20230726/202307261940.mazVXKtc-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307261940.mazVXKtc-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from drivers/ufs/core/ufshcd.c:25:
   In file included from include/scsi/scsi_cmnd.h:5:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     547 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     560 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
         |                                                           ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
     102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
         |                                                      ^
   In file included from drivers/ufs/core/ufshcd.c:25:
   In file included from include/scsi/scsi_cmnd.h:5:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     573 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
         |                                                           ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
     115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
         |                                                      ^
   In file included from drivers/ufs/core/ufshcd.c:25:
   In file included from include/scsi/scsi_cmnd.h:5:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     584 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     594 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     604 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     692 |         readsb(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     700 |         readsw(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     708 |         readsl(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     717 |         writesb(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     726 |         writesw(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     735 |         writesl(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
>> drivers/ufs/core/ufshcd.c:4352:23: error: use of undeclared identifier 'QUEUE_FLAG_NO_ZONE_WRITE_LOCK'; did you mean '__REQ_NO_ZONE_WRITE_LOCK'?
    4352 |                         blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
         |                                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |                                            __REQ_NO_ZONE_WRITE_LOCK
   include/linux/blk_types.h:424:2: note: '__REQ_NO_ZONE_WRITE_LOCK' declared here
     424 |         __REQ_NO_ZONE_WRITE_LOCK,
         |         ^
   drivers/ufs/core/ufshcd.c:4354:25: error: use of undeclared identifier 'QUEUE_FLAG_NO_ZONE_WRITE_LOCK'; did you mean '__REQ_NO_ZONE_WRITE_LOCK'?
    4354 |                         blk_queue_flag_clear(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
         |                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |                                              __REQ_NO_ZONE_WRITE_LOCK
   include/linux/blk_types.h:424:2: note: '__REQ_NO_ZONE_WRITE_LOCK' declared here
     424 |         __REQ_NO_ZONE_WRITE_LOCK,
         |         ^
>> drivers/ufs/core/ufshcd.c:4340:6: warning: no previous prototype for function 'ufshcd_update_no_zone_write_lock' [-Wmissing-prototypes]
    4340 | void ufshcd_update_no_zone_write_lock(struct ufs_hba *hba,
         |      ^
   drivers/ufs/core/ufshcd.c:4340:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
    4340 | void ufshcd_update_no_zone_write_lock(struct ufs_hba *hba,
         | ^
         | static 
   drivers/ufs/core/ufshcd.c:5180:21: error: use of undeclared identifier 'QUEUE_FLAG_NO_ZONE_WRITE_LOCK'; did you mean '__REQ_NO_ZONE_WRITE_LOCK'?
    5180 |         blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
         |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |                            __REQ_NO_ZONE_WRITE_LOCK
   include/linux/blk_types.h:424:2: note: '__REQ_NO_ZONE_WRITE_LOCK' declared here
     424 |         __REQ_NO_ZONE_WRITE_LOCK,
         |         ^
   drivers/ufs/core/ufshcd.c:10277:44: warning: shift count >= width of type [-Wshift-count-overflow]
    10277 |                 if (!dma_set_mask_and_coherent(hba->dev, DMA_BIT_MASK(64)))
          |                                                          ^~~~~~~~~~~~~~~~
   include/linux/dma-mapping.h:77:54: note: expanded from macro 'DMA_BIT_MASK'
      77 | #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
         |                                                      ^ ~~~
   14 warnings and 3 errors generated.


vim +4352 drivers/ufs/core/ufshcd.c

  4339	
> 4340	void ufshcd_update_no_zone_write_lock(struct ufs_hba *hba,
  4341					      bool set_no_zone_write_lock)
  4342	{
  4343		struct scsi_device *sdev;
  4344	
  4345		shost_for_each_device(sdev, hba->host)
  4346			blk_freeze_queue_start(sdev->request_queue);
  4347		shost_for_each_device(sdev, hba->host) {
  4348			struct request_queue *q = sdev->request_queue;
  4349	
  4350			blk_mq_freeze_queue_wait(q);
  4351			if (set_no_zone_write_lock)
> 4352				blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
  4353			else
  4354				blk_queue_flag_clear(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
  4355			blk_mq_unfreeze_queue(q);
  4356		}
  4357	}
  4358	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/6] block: Introduce the flag REQ_NO_WRITE_LOCK
  2023-07-26  8:37   ` Damien Le Moal
@ 2023-07-26 14:58     ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2023-07-26 14:58 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Ming Lei

On 7/26/23 01:37, Damien Le Moal wrote:
> REQ_NO_WRITE_LOCK indicates that a write request does
> not need zone write locking when the underlying device signaled that it can
> preserve command ordering. So we also need a queue flag for that...

My apologies: I accidentally left out the patch "block: Introduce the 
flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK" from this series. I will repost this 
series.

Bart.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] scsi: Retry unaligned zoned writes
  2023-07-26  8:47   ` Damien Le Moal
@ 2023-07-26 15:02     ` Bart Van Assche
  2023-07-26 23:46       ` Damien Le Moal
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2023-07-26 15:02 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Martin K . Petersen,
	Damien Le Moal, Ming Lei, James E.J. Bottomley

On 7/26/23 01:47, Damien Le Moal wrote:
> On 7/26/23 09:57, Bart Van Assche wrote:
>> +/*
>> + * Returns a negative value if @_a has a lower LBA than @_b, zero if
>> + * both have the same LBA and a positive value otherwise.
>> + */
>> +static int scsi_cmp_lba(void *priv, const struct list_head *_a,
>> +			const struct list_head *_b)
> 
> The argument priv is unused.

I cannot remove the 'priv' argument. From include/linux/list_sort.h:

typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,
		const struct list_head *, const struct list_head *);

__attribute__((nonnull(2,3)))
void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp);

>>   /**
>>    * scsi_unjam_host - Attempt to fix a host which has a cmd that failed.
>>    * @shost:	Host to unjam.
>> @@ -2258,6 +2289,12 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
>>   
>>   	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
>>   
>> +	/*
>> +	 * Sort pending SCSI commands in LBA order. This is important if zone
>> +	 * write locking is disabled for a zoned SCSI device.
>> +	 */
>> +	list_sort(NULL, &eh_work_q, scsi_cmp_lba);
> 
> Should we do this only for zoned devices ?

I'm not sure this is possible. Error handling happens per SCSI host. 
Some of the logical units associated with a host may be zoned while 
others may represent conventional storage or have no storage associated 
with them (WLUN).

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] scsi: Retry unaligned zoned writes
  2023-07-26 15:02     ` Bart Van Assche
@ 2023-07-26 23:46       ` Damien Le Moal
  0 siblings, 0 replies; 15+ messages in thread
From: Damien Le Moal @ 2023-07-26 23:46 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Martin K . Petersen,
	Ming Lei, James E.J. Bottomley

On 7/27/23 00:02, Bart Van Assche wrote:
> On 7/26/23 01:47, Damien Le Moal wrote:
>> On 7/26/23 09:57, Bart Van Assche wrote:
>>> +/*
>>> + * Returns a negative value if @_a has a lower LBA than @_b, zero if
>>> + * both have the same LBA and a positive value otherwise.
>>> + */
>>> +static int scsi_cmp_lba(void *priv, const struct list_head *_a,
>>> +            const struct list_head *_b)
>>
>> The argument priv is unused.
> 
> I cannot remove the 'priv' argument. From include/linux/list_sort.h:
> 
> typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,
>         const struct list_head *, const struct list_head *);
> 
> __attribute__((nonnull(2,3)))
> void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp);

Yes. I missed that this is a callbalck. Sorry for the noise.

> 
>>>   /**
>>>    * scsi_unjam_host - Attempt to fix a host which has a cmd that failed.
>>>    * @shost:    Host to unjam.
>>> @@ -2258,6 +2289,12 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
>>>         SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
>>>   +    /*
>>> +     * Sort pending SCSI commands in LBA order. This is important if zone
>>> +     * write locking is disabled for a zoned SCSI device.
>>> +     */
>>> +    list_sort(NULL, &eh_work_q, scsi_cmp_lba);
>>
>> Should we do this only for zoned devices ?
> 
> I'm not sure this is possible. Error handling happens per SCSI host. Some of
> the logical units associated with a host may be zoned while others may
> represent conventional storage or have no storage associated with them (WLUN).
> 
> Thanks,
> 
> Bart.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-07-26 23:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-26  0:57 [PATCH v3 0/6] Improve the performance of F2FS on zoned UFS Bart Van Assche
2023-07-26  0:57 ` [PATCH v3 1/6] block: Introduce the flag REQ_NO_WRITE_LOCK Bart Van Assche
2023-07-26  8:37   ` Damien Le Moal
2023-07-26 14:58     ` Bart Van Assche
2023-07-26  0:57 ` [PATCH v3 2/6] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
2023-07-26  8:41   ` Damien Le Moal
2023-07-26  0:57 ` [PATCH v3 3/6] block/null_blk: Support disabling zone write locking Bart Van Assche
2023-07-26  8:43   ` Damien Le Moal
2023-07-26  0:57 ` [PATCH v3 4/6] scsi: Retry unaligned zoned writes Bart Van Assche
2023-07-26  8:47   ` Damien Le Moal
2023-07-26 15:02     ` Bart Van Assche
2023-07-26 23:46       ` Damien Le Moal
2023-07-26  0:57 ` [PATCH v3 5/6] scsi: ufs: Disable zone write locking Bart Van Assche
2023-07-26 12:04   ` kernel test robot
2023-07-26  0:57 ` [PATCH v3 6/6] fs/f2fs: " Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).