linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/7] Improve performance for zoned UFS devices
@ 2023-08-09 20:23 Bart Van Assche
  2023-08-09 20:23 ` [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable Bart Van Assche
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Bart Van Assche @ 2023-08-09 20:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig,
	Bart Van Assche

Hi Jens,

This patch series improves small write IOPS by a factor of four (+300%) for
zoned UFS devices on my test setup with a UFSHCI 3.0 controller. Please
consider this patch series for the next merge window.

Thank you,

Bart.

Changes compared to v6:
 - Removed QUEUE_FLAG_NO_ZONE_WRITE_LOCK and instead introduced a flag in
   the request queue limits data structure.

Changes compared to v5:
 - Renamed scsi_cmp_lba() into scsi_cmp_sector().
 - Improved several source code comments.

Changes compared to v4:
 - Dropped the patch that introduces the REQ_NO_ZONE_WRITE_LOCK flag.
 - Dropped the null_blk patch and added two scsi_debug patches instead.
 - Dropped the f2fs patch.
 - Split the patch for the UFS driver into two patches.
 - Modified several patch descriptions and source code comments.
 - Renamed dd_use_write_locking() into dd_use_zone_write_locking().
 - Moved the list_sort() call from scsi_unjam_host() into scsi_eh_flush_done_q()
   such that sorting happens just before reinserting.
 - Removed the scsi_cmd_retry_allowed() call from scsi_check_sense() to make
   sure that the retry counter is adjusted once per retry instead of twice.

Changes compared to v3:
 - Restored the patch that introduces QUEUE_FLAG_NO_ZONE_WRITE_LOCK. That patch
   had accidentally been left out from v2.
 - In patch "block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK", improved the
   patch description and added the function blk_no_zone_write_lock().
 - In patch "block/mq-deadline: Only use zone locking if necessary", moved the
   blk_queue_is_zoned() call into dd_use_write_locking().
 - In patch "fs/f2fs: Disable zone write locking", set REQ_NO_ZONE_WRITE_LOCK
   from inside __bio_alloc() instead of in f2fs_submit_write_bio().

Changes compared to v2:
 - Renamed the request queue flag for disabling zone write locking.
 - Introduced a new request flag for disabling zone write locking.
 - Modified the mq-deadline scheduler such that zone write locking is only
   disabled if both flags are set.
 - Added an F2FS patch that sets the request flag for disabling zone write
   locking.
 - Only disable zone write locking in the UFS driver if auto-hibernation is
   disabled.

Changes compared to v1:
 - Left out the patches that are already upstream.
 - Switched the approach in patch "scsi: Retry unaligned zoned writes" from
   retrying immediately to sending unaligned write commands to the SCSI error
   handler.

Bart Van Assche (7):
  block: Introduce the use_zone_write_lock member variable
  block/mq-deadline: Only use zone locking if necessary
  scsi: core: Retry unaligned zoned writes
  scsi: scsi_debug: Support disabling zone write locking
  scsi: scsi_debug: Support injecting unaligned write errors
  scsi: ufs: Split an if-condition
  scsi: ufs: Disable zone write locking

 block/blk-settings.c      |  6 ++++++
 block/mq-deadline.c       | 24 ++++++++++++++++------
 drivers/scsi/scsi_debug.c | 20 ++++++++++++++++++-
 drivers/scsi/scsi_error.c | 37 ++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c   |  1 +
 drivers/scsi/sd.c         |  3 +++
 drivers/ufs/core/ufshcd.c | 42 ++++++++++++++++++++++++++++++++++++---
 include/linux/blkdev.h    |  1 +
 include/scsi/scsi.h       |  1 +
 9 files changed, 125 insertions(+), 10 deletions(-)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable
  2023-08-09 20:23 [PATCH v7 0/7] Improve performance for zoned UFS devices Bart Van Assche
@ 2023-08-09 20:23 ` Bart Van Assche
  2023-08-10  1:33   ` Damien Le Moal
  2023-08-09 20:23 ` [PATCH v7 2/7] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2023-08-09 20:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Ming Lei

Writes in sequential write required zones must happen at the write
pointer. Even if the submitter of the write commands (e.g. a filesystem)
submits writes for sequential write required zones in order, the block
layer or the storage controller may reorder these write commands.

The zone locking mechanism in the mq-deadline I/O scheduler serializes
write commands for sequential zones. Some but not all storage controllers
require this serialization. Introduce a new request queue limit member
variable to allow block drivers to indicate that they preserve the order
of write commands and thus do not require serialization of writes per
zone.

Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-settings.c   | 6 ++++++
 include/linux/blkdev.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 0046b447268f..b75c97971860 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -56,6 +56,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->alignment_offset = 0;
 	lim->io_opt = 0;
 	lim->misaligned = 0;
+	lim->use_zone_write_lock = true;
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
 	lim->dma_alignment = 511;
@@ -685,6 +686,11 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 						   b->max_secure_erase_sectors);
 	t->zone_write_granularity = max(t->zone_write_granularity,
 					b->zone_write_granularity);
+	/*
+	 * Whether or not the zone write lock should be used depends on the
+	 * bottom driver only.
+	 */
+	t->use_zone_write_lock = b->use_zone_write_lock;
 	t->zoned = max(t->zoned, b->zoned);
 	return ret;
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2f5371b8482c..deffa1f13444 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -316,6 +316,7 @@ struct queue_limits {
 	unsigned char		misaligned;
 	unsigned char		discard_misaligned;
 	unsigned char		raid_partial_stripes_expensive;
+	bool			use_zone_write_lock;
 	enum blk_zoned_model	zoned;
 
 	/*

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 2/7] block/mq-deadline: Only use zone locking if necessary
  2023-08-09 20:23 [PATCH v7 0/7] Improve performance for zoned UFS devices Bart Van Assche
  2023-08-09 20:23 ` [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable Bart Van Assche
@ 2023-08-09 20:23 ` Bart Van Assche
  2023-08-10  1:36   ` Damien Le Moal
  2023-08-09 20:23 ` [PATCH v7 3/7] scsi: core: Retry unaligned zoned writes Bart Van Assche
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2023-08-09 20:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Ming Lei

Measurements have shown that limiting the queue depth to one per zone for
zoned writes has a significant negative performance impact on zoned UFS
devices. Hence this patch that disables zone locking by the mq-deadline
scheduler if the storage controller preserves the command order. This
patch is based on the following assumptions:
- It happens infrequently that zoned write requests are reordered by the
  block layer.
- The I/O priority of all write requests is the same per zone.
- Either no I/O scheduler is used or an I/O scheduler is used that
  serializes write requests per zone.

Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f958e79277b8..cd2504205ff8 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -338,6 +338,16 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
 	return rq;
 }
 
+/*
+ * Whether or not to use zone write locking. Not using zone write locking for
+ * sequential write required zones is only safe if the block driver preserves
+ * the request order.
+ */
+static bool dd_use_zone_write_locking(struct request_queue *q)
+{
+	return q->limits.use_zone_write_lock && blk_queue_is_zoned(q);
+}
+
 /*
  * For the specified data direction, return the next request to
  * dispatch using arrival ordered lists.
@@ -353,7 +363,7 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 		return NULL;
 
 	rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	if (data_dir == DD_READ || !dd_use_zone_write_locking(rq->q))
 		return rq;
 
 	/*
@@ -398,7 +408,7 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	if (!rq)
 		return NULL;
 
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	if (data_dir == DD_READ || !dd_use_zone_write_locking(rq->q))
 		return rq;
 
 	/*
@@ -526,8 +536,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	}
 
 	/*
-	 * For a zoned block device, if we only have writes queued and none of
-	 * them can be dispatched, rq will be NULL.
+	 * For a zoned block device that requires write serialization, if we
+	 * only have writes queued and none of them can be dispatched, rq will
+	 * be NULL.
 	 */
 	if (!rq)
 		return NULL;
@@ -552,7 +563,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	/*
 	 * If the request needs its target zone locked, do it.
 	 */
-	blk_req_zone_write_lock(rq);
+	if (dd_use_zone_write_locking(rq->q))
+		blk_req_zone_write_lock(rq);
 	rq->rq_flags |= RQF_STARTED;
 	return rq;
 }
@@ -934,7 +946,7 @@ static void dd_finish_request(struct request *rq)
 
 	atomic_inc(&per_prio->stats.completed);
 
-	if (blk_queue_is_zoned(q)) {
+	if (dd_use_zone_write_locking(rq->q)) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&dd->zone_lock, flags);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 3/7] scsi: core: Retry unaligned zoned writes
  2023-08-09 20:23 [PATCH v7 0/7] Improve performance for zoned UFS devices Bart Van Assche
  2023-08-09 20:23 ` [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable Bart Van Assche
  2023-08-09 20:23 ` [PATCH v7 2/7] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
@ 2023-08-09 20:23 ` Bart Van Assche
  2023-08-11 13:29   ` Christoph Hellwig
  2023-08-09 20:23 ` [PATCH v7 4/7] scsi: scsi_debug: Support disabling zone write locking Bart Van Assche
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2023-08-09 20:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Martin K . Petersen, Ming Lei,
	James E.J. Bottomley

If zoned writes (REQ_OP_WRITE) for a sequential write required zone have
a starting LBA that differs from the write pointer, e.g. because zoned
writes have been reordered, then the storage device will respond with an
UNALIGNED WRITE COMMAND error. Send commands that failed with an
unaligned write error to the SCSI error handler if zone write locking is
disabled. Let the SCSI error handler sort SCSI commands per LBA before
resubmitting these.

If zone write locking is disabled, increase the number of retries for
write commands sent to a sequential zone to the maximum number of
outstanding commands because in the worst case the number of times
reordered zoned writes have to be retried is (number of outstanding
writes per sequential zone) - 1.

Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_error.c | 37 +++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c   |  1 +
 drivers/scsi/sd.c         |  3 +++
 include/scsi/scsi.h       |  1 +
 4 files changed, 42 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c67cdcdc3ba8..c624b9c8fdab 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -27,6 +27,7 @@
 #include <linux/blkdev.h>
 #include <linux/delay.h>
 #include <linux/jiffies.h>
+#include <linux/list_sort.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -698,6 +699,16 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		fallthrough;
 
 	case ILLEGAL_REQUEST:
+		/*
+		 * Unaligned write command. This may indicate that zoned writes
+		 * have been received by the device in the wrong order. If zone
+		 * write locking is disabled, retry after all pending commands
+		 * have completed.
+		 */
+		if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
+		    scsi_cmd_to_rq(scmd)->q->limits.use_zone_write_lock)
+			return NEEDS_DELAYED_RETRY;
+
 		if (sshdr.asc == 0x20 || /* Invalid command operation code */
 		    sshdr.asc == 0x21 || /* Logical block address out of range */
 		    sshdr.asc == 0x22 || /* Invalid function */
@@ -2186,6 +2197,25 @@ void scsi_eh_ready_devs(struct Scsi_Host *shost,
 }
 EXPORT_SYMBOL_GPL(scsi_eh_ready_devs);
 
+/*
+ * Returns a negative value if @_a has a lower starting sector than @_b, zero if
+ * both have the same starting sector and a positive value otherwise.
+ */
+static int scsi_cmp_sector(void *priv, const struct list_head *_a,
+			   const struct list_head *_b)
+{
+	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
+	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
+	const sector_t pos_a = blk_rq_pos(scsi_cmd_to_rq(a));
+	const sector_t pos_b = blk_rq_pos(scsi_cmd_to_rq(b));
+
+	if (pos_a < pos_b)
+		return -1;
+	if (pos_a > pos_b)
+		return 1;
+	return 0;
+}
+
 /**
  * scsi_eh_flush_done_q - finish processed commands or retry them.
  * @done_q:	list_head of processed commands.
@@ -2194,6 +2224,13 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
 {
 	struct scsi_cmnd *scmd, *next;
 
+	/*
+	 * Sort pending SCSI commands in starting sector order. This is
+	 * important if one of the SCSI devices associated with @shost is a
+	 * zoned block device for which zone write locking is disabled.
+	 */
+	list_sort(NULL, done_q, scsi_cmp_sector);
+
 	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
 		list_del_init(&scmd->eh_entry);
 		if (scsi_device_online(scmd->device) &&
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 59176946ab56..69da8aee13df 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1443,6 +1443,7 @@ static void scsi_complete(struct request *rq)
 	case ADD_TO_MLQUEUE:
 		scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 		break;
+	case NEEDS_DELAYED_RETRY:
 	default:
 		scsi_eh_scmd_add(cmd);
 		break;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 3c668cfb146d..3716392daa2c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1235,6 +1235,9 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
 	cmd->transfersize = sdp->sector_size;
 	cmd->underflow = nr_blocks << 9;
 	cmd->allowed = sdkp->max_retries;
+	if (rq->q->limits.use_zone_write_lock &&
+	    blk_rq_is_seq_zoned_write(rq))
+		cmd->allowed += rq->q->nr_requests;
 	cmd->sdb.length = nr_blocks * sdp->sector_size;
 
 	SCSI_LOG_HLQUEUE(1,
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index ec093594ba53..6600db046227 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -93,6 +93,7 @@ static inline int scsi_status_is_check_condition(int status)
  * Internal return values.
  */
 enum scsi_disposition {
+	NEEDS_DELAYED_RETRY	= 0x2000,
 	NEEDS_RETRY		= 0x2001,
 	SUCCESS			= 0x2002,
 	FAILED			= 0x2003,

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 4/7] scsi: scsi_debug: Support disabling zone write locking
  2023-08-09 20:23 [PATCH v7 0/7] Improve performance for zoned UFS devices Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-08-09 20:23 ` [PATCH v7 3/7] scsi: core: Retry unaligned zoned writes Bart Van Assche
@ 2023-08-09 20:23 ` Bart Van Assche
  2023-08-09 20:23 ` [PATCH v7 5/7] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2023-08-09 20:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig,
	Bart Van Assche, Martin K . Petersen, Douglas Gilbert,
	James E.J. Bottomley

Make it easier to test not using zone write locking by supporting
disabling zone write locking in the scsi_debug driver.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 9c0af50501f9..22485726c534 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -832,6 +832,7 @@ static int dix_reads;
 static int dif_errors;
 
 /* ZBC global data */
+static bool sdeb_no_zwrl;
 static bool sdeb_zbc_in_use;	/* true for host-aware and host-managed disks */
 static int sdeb_zbc_zone_cap_mb;
 static int sdeb_zbc_zone_size_mb;
@@ -5138,9 +5139,13 @@ static struct sdebug_dev_info *find_build_dev_info(struct scsi_device *sdev)
 
 static int scsi_debug_slave_alloc(struct scsi_device *sdp)
 {
+	struct request_queue *q = sdp->request_queue;
+
 	if (sdebug_verbose)
 		pr_info("slave_alloc <%u %u %u %llu>\n",
 		       sdp->host->host_no, sdp->channel, sdp->id, sdp->lun);
+	if (sdeb_no_zwrl)
+		q->limits.use_zone_write_lock = false;
 	return 0;
 }
 
@@ -5738,6 +5743,7 @@ module_param_named(ndelay, sdebug_ndelay, int, S_IRUGO | S_IWUSR);
 module_param_named(no_lun_0, sdebug_no_lun_0, int, S_IRUGO | S_IWUSR);
 module_param_named(no_rwlock, sdebug_no_rwlock, bool, S_IRUGO | S_IWUSR);
 module_param_named(no_uld, sdebug_no_uld, int, S_IRUGO);
+module_param_named(no_zone_write_lock, sdeb_no_zwrl, bool, S_IRUGO);
 module_param_named(num_parts, sdebug_num_parts, int, S_IRUGO);
 module_param_named(num_tgts, sdebug_num_tgts, int, S_IRUGO | S_IWUSR);
 module_param_named(opt_blks, sdebug_opt_blks, int, S_IRUGO);
@@ -5812,6 +5818,8 @@ MODULE_PARM_DESC(ndelay, "response delay in nanoseconds (def=0 -> ignore)");
 MODULE_PARM_DESC(no_lun_0, "no LU number 0 (def=0 -> have lun 0)");
 MODULE_PARM_DESC(no_rwlock, "don't protect user data reads+writes (def=0)");
 MODULE_PARM_DESC(no_uld, "stop ULD (e.g. sd driver) attaching (def=0))");
+MODULE_PARM_DESC(no_zone_write_lock,
+		 "Disable serialization of zoned writes (def=0)");
 MODULE_PARM_DESC(num_parts, "number of partitions(def=0)");
 MODULE_PARM_DESC(num_tgts, "number of targets per host to simulate(def=1)");
 MODULE_PARM_DESC(opt_blks, "optimal transfer length in blocks (def=1024)");

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 5/7] scsi: scsi_debug: Support injecting unaligned write errors
  2023-08-09 20:23 [PATCH v7 0/7] Improve performance for zoned UFS devices Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-08-09 20:23 ` [PATCH v7 4/7] scsi: scsi_debug: Support disabling zone write locking Bart Van Assche
@ 2023-08-09 20:23 ` Bart Van Assche
  2023-08-09 20:23 ` [PATCH v7 6/7] scsi: ufs: Split an if-condition Bart Van Assche
  2023-08-09 20:23 ` [PATCH v7 7/7] scsi: ufs: Disable zone write locking Bart Van Assche
  6 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2023-08-09 20:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig,
	Bart Van Assche, Martin K . Petersen, Douglas Gilbert,
	James E.J. Bottomley

Allow user space software, e.g. a blktests test, to inject unaligned
write errors.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 22485726c534..b9712275de9d 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -181,6 +181,7 @@ static const char *sdebug_version_date = "20210520";
 #define SDEBUG_OPT_NO_CDB_NOISE		0x4000
 #define SDEBUG_OPT_HOST_BUSY		0x8000
 #define SDEBUG_OPT_CMD_ABORT		0x10000
+#define SDEBUG_OPT_UNALIGNED_WRITE	0x20000
 #define SDEBUG_OPT_ALL_NOISE (SDEBUG_OPT_NOISE | SDEBUG_OPT_Q_NOISE | \
 			      SDEBUG_OPT_RESET_NOISE)
 #define SDEBUG_OPT_ALL_INJECTING (SDEBUG_OPT_RECOVERED_ERR | \
@@ -188,7 +189,8 @@ static const char *sdebug_version_date = "20210520";
 				  SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR | \
 				  SDEBUG_OPT_SHORT_TRANSFER | \
 				  SDEBUG_OPT_HOST_BUSY | \
-				  SDEBUG_OPT_CMD_ABORT)
+				  SDEBUG_OPT_CMD_ABORT | \
+				  SDEBUG_OPT_UNALIGNED_WRITE)
 #define SDEBUG_OPT_RECOV_DIF_DIX (SDEBUG_OPT_RECOVERED_ERR | \
 				  SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR)
 
@@ -3587,6 +3589,14 @@ static int resp_write_dt0(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 	struct sdeb_store_info *sip = devip2sip(devip, true);
 	u8 *cmd = scp->cmnd;
 
+	if (unlikely(sdebug_opts & SDEBUG_OPT_UNALIGNED_WRITE &&
+		     atomic_read(&sdeb_inject_pending))) {
+		atomic_set(&sdeb_inject_pending, 0);
+		mk_sense_buffer(scp, ILLEGAL_REQUEST, LBA_OUT_OF_RANGE,
+				UNALIGNED_WRITE_ASCQ);
+		return check_condition_result;
+	}
+
 	switch (cmd[0]) {
 	case WRITE_16:
 		ei_lba = 0;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 6/7] scsi: ufs: Split an if-condition
  2023-08-09 20:23 [PATCH v7 0/7] Improve performance for zoned UFS devices Bart Van Assche
                   ` (4 preceding siblings ...)
  2023-08-09 20:23 ` [PATCH v7 5/7] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
@ 2023-08-09 20:23 ` Bart Van Assche
  2023-08-09 20:23 ` [PATCH v7 7/7] scsi: ufs: Disable zone write locking Bart Van Assche
  6 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2023-08-09 20:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig,
	Bart Van Assche, Can Guo, Martin K . Petersen, Avri Altman,
	Damien Le Moal, Ming Lei, James E.J. Bottomley, Stanley Chu,
	Bean Huo, Asutosh Das, Bao D. Nguyen, Arthur Simchaev

Make the next patch in this series easier to read. No functionality is
changed.

Reviewed-by: Can Guo <quic_cang@quicinc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 129446775796..ae7b868f9c26 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4352,8 +4352,9 @@ void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 	}
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
-	if (update &&
-	    !pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
+	if (!update)
+		return;
+	if (!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
 		ufshcd_rpm_get_sync(hba);
 		ufshcd_hold(hba);
 		ufshcd_auto_hibern8_enable(hba);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 7/7] scsi: ufs: Disable zone write locking
  2023-08-09 20:23 [PATCH v7 0/7] Improve performance for zoned UFS devices Bart Van Assche
                   ` (5 preceding siblings ...)
  2023-08-09 20:23 ` [PATCH v7 6/7] scsi: ufs: Split an if-condition Bart Van Assche
@ 2023-08-09 20:23 ` Bart Van Assche
  6 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2023-08-09 20:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig,
	Bart Van Assche, Martin K . Petersen, Can Guo, Avri Altman,
	Damien Le Moal, Ming Lei, James E.J. Bottomley, Stanley Chu,
	Bean Huo, Asutosh Das, Bao D. Nguyen, Arthur Simchaev

From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
"The host controller always process transfer requests in-order according
to the order submitted to the list. In case of multiple commands with
single doorbell register ringing (batch mode), The dispatch order for
these transfer requests by host controller will base on their index in
the List. A transfer request with lower index value will be executed
before a transfer request with higher index value."

From the UFSHCI 4.0 specification, about the MCQ mode:
"Command Submission
1. Host SW writes an Entry to SQ
2. Host SW updates SQ doorbell tail pointer

Command Processing
3. After fetching the Entry, Host Controller updates SQ doorbell head
   pointer
4. Host controller sends COMMAND UPIU to UFS device"

In other words, for both legacy and MCQ mode, UFS controllers are
required to forward commands to the UFS device in the order these
commands have been received from the host.

Notes:
- For legacy mode this is only correct if the host submits one
  command at a time. The UFS driver does this.
- Also in legacy mode, the command order is not preserved if
  auto-hibernation is enabled in the UFS controller. Hence, enable
  zone write locking if auto-hibernation is enabled.

This patch improves performance as follows on my test setup:
- With the mq-deadline scheduler: 2.5x more IOPS for small writes.
- When not using an I/O scheduler compared to using mq-deadline with
  zone locking: 4x more IOPS for small writes.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Can Guo <quic_cang@quicinc.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index ae7b868f9c26..ae6b63b02930 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4337,23 +4337,50 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
 }
 EXPORT_SYMBOL_GPL(ufshcd_uic_hibern8_exit);
 
+static void ufshcd_update_zone_write_lock(struct ufs_hba *hba,
+					  bool use_zone_write_lock)
+{
+	struct scsi_device *sdev;
+
+	shost_for_each_device(sdev, hba->host)
+		blk_freeze_queue_start(sdev->request_queue);
+	shost_for_each_device(sdev, hba->host) {
+		struct request_queue *q = sdev->request_queue;
+
+		blk_mq_freeze_queue_wait(q);
+		q->limits.use_zone_write_lock = use_zone_write_lock;
+		blk_mq_unfreeze_queue(q);
+	}
+}
+
 void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 {
 	unsigned long flags;
-	bool update = false;
+	bool prev_state, new_state, update = false;
 
 	if (!ufshcd_is_auto_hibern8_supported(hba))
 		return;
 
 	spin_lock_irqsave(hba->host->host_lock, flags);
+	prev_state = ufshcd_is_auto_hibern8_enabled(hba);
 	if (hba->ahit != ahit) {
 		hba->ahit = ahit;
 		update = true;
 	}
+	new_state = ufshcd_is_auto_hibern8_enabled(hba);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
 	if (!update)
 		return;
+	if (!is_mcq_enabled(hba) && !prev_state && new_state) {
+		/*
+		 * Auto-hibernation will be enabled. Enable write locking for
+		 * zoned writes since auto-hibernation may cause reordering of
+		 * zoned writes when using the legacy mode of the UFS host
+		 * controller.
+		 */
+		ufshcd_update_zone_write_lock(hba, true);
+	}
 	if (!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
 		ufshcd_rpm_get_sync(hba);
 		ufshcd_hold(hba);
@@ -4361,6 +4388,13 @@ void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 		ufshcd_release(hba);
 		ufshcd_rpm_put_sync(hba);
 	}
+	if (!is_mcq_enabled(hba) && prev_state && !new_state) {
+		/*
+		 * Auto-hibernation has been disabled. Disable write locking
+		 * for zoned writes.
+		 */
+		ufshcd_update_zone_write_lock(hba, false);
+	}
 }
 EXPORT_SYMBOL_GPL(ufshcd_auto_hibern8_update);
 
@@ -5140,6 +5174,7 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
 
 	ufshcd_hpb_configure(hba, sdev);
 
+	q->limits.use_zone_write_lock = false;
 	blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
 	if (hba->quirks & UFSHCD_QUIRK_4KB_DMA_ALIGNMENT)
 		blk_queue_update_dma_alignment(q, SZ_4K - 1);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable
  2023-08-09 20:23 ` [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable Bart Van Assche
@ 2023-08-10  1:33   ` Damien Le Moal
  2023-08-10 14:02     ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Le Moal @ 2023-08-10  1:33 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 8/10/23 05:23, Bart Van Assche wrote:
> Writes in sequential write required zones must happen at the write
> pointer. Even if the submitter of the write commands (e.g. a filesystem)
> submits writes for sequential write required zones in order, the block
> layer or the storage controller may reorder these write commands.
> 
> The zone locking mechanism in the mq-deadline I/O scheduler serializes
> write commands for sequential zones. Some but not all storage controllers
> require this serialization. Introduce a new request queue limit member
> variable to allow block drivers to indicate that they preserve the order
> of write commands and thus do not require serialization of writes per
> zone.
> 
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-settings.c   | 6 ++++++
>  include/linux/blkdev.h | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 0046b447268f..b75c97971860 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -56,6 +56,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->alignment_offset = 0;
>  	lim->io_opt = 0;
>  	lim->misaligned = 0;
> +	lim->use_zone_write_lock = true;
>  	lim->zoned = BLK_ZONED_NONE;

Given that the default for zoned is BLK_ZONED_NONE, having use_zone_write_lock
default to true is strange. It would be better to set the default to false and
have disk_set_zoned() set it to true if needed, with an additional argument to
specify if it should be the case or not. E.g., for SMR drives, sd.c would call
something like:

disk_set_zoned(sdkp->disk, BLK_ZONED_HM, sdp->use_zone_write_lock);

sd.c would default to sdp->use_zone_write_lock == true and UFS driver can set
it to false. That would be cleaner I think.

>  	lim->zone_write_granularity = 0;
>  	lim->dma_alignment = 511;
> @@ -685,6 +686,11 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  						   b->max_secure_erase_sectors);
>  	t->zone_write_granularity = max(t->zone_write_granularity,
>  					b->zone_write_granularity);
> +	/*
> +	 * Whether or not the zone write lock should be used depends on the
> +	 * bottom driver only.
> +	 */
> +	t->use_zone_write_lock = b->use_zone_write_lock;

Given that DM bio targets do not have a scheduler and do not have a zone lock
bitmap allocated, I do not think this is necessary at all. This can remain to
false, thus in sync with the fact that there is no IO scheduler.

>  	t->zoned = max(t->zoned, b->zoned);
>  	return ret;
>  }
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2f5371b8482c..deffa1f13444 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -316,6 +316,7 @@ struct queue_limits {
>  	unsigned char		misaligned;
>  	unsigned char		discard_misaligned;
>  	unsigned char		raid_partial_stripes_expensive;
> +	bool			use_zone_write_lock;
>  	enum blk_zoned_model	zoned;
>  
>  	/*

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 2/7] block/mq-deadline: Only use zone locking if necessary
  2023-08-09 20:23 ` [PATCH v7 2/7] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
@ 2023-08-10  1:36   ` Damien Le Moal
  2023-08-10 14:00     ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Le Moal @ 2023-08-10  1:36 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 8/10/23 05:23, Bart Van Assche wrote:
> Measurements have shown that limiting the queue depth to one per zone for
> zoned writes has a significant negative performance impact on zoned UFS
> devices. Hence this patch that disables zone locking by the mq-deadline
> scheduler if the storage controller preserves the command order. This
> patch is based on the following assumptions:
> - It happens infrequently that zoned write requests are reordered by the
>   block layer.
> - The I/O priority of all write requests is the same per zone.
> - Either no I/O scheduler is used or an I/O scheduler is used that
>   serializes write requests per zone.
> 
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index f958e79277b8..cd2504205ff8 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -338,6 +338,16 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
>  	return rq;
>  }
>  
> +/*
> + * Whether or not to use zone write locking. Not using zone write locking for
> + * sequential write required zones is only safe if the block driver preserves
> + * the request order.
> + */
> +static bool dd_use_zone_write_locking(struct request_queue *q)
> +{
> +	return q->limits.use_zone_write_lock && blk_queue_is_zoned(q);

use_zone_write_lock should be true ONLY if the queue is zoned.
So the "&& blk_queue_is_zoned(q)" seems unnecessary to me.
This little helper could be moved to be generic in blkdev.h too.

> +}
> +
>  /*
>   * For the specified data direction, return the next request to
>   * dispatch using arrival ordered lists.
> @@ -353,7 +363,7 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  		return NULL;
>  
>  	rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
> -	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
> +	if (data_dir == DD_READ || !dd_use_zone_write_locking(rq->q))
>  		return rq;
>  
>  	/*
> @@ -398,7 +408,7 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  	if (!rq)
>  		return NULL;
>  
> -	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
> +	if (data_dir == DD_READ || !dd_use_zone_write_locking(rq->q))
>  		return rq;
>  
>  	/*
> @@ -526,8 +536,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
>  	}
>  
>  	/*
> -	 * For a zoned block device, if we only have writes queued and none of
> -	 * them can be dispatched, rq will be NULL.
> +	 * For a zoned block device that requires write serialization, if we
> +	 * only have writes queued and none of them can be dispatched, rq will
> +	 * be NULL.
>  	 */
>  	if (!rq)
>  		return NULL;
> @@ -552,7 +563,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
>  	/*
>  	 * If the request needs its target zone locked, do it.
>  	 */
> -	blk_req_zone_write_lock(rq);
> +	if (dd_use_zone_write_locking(rq->q))
> +		blk_req_zone_write_lock(rq);
>  	rq->rq_flags |= RQF_STARTED;
>  	return rq;
>  }
> @@ -934,7 +946,7 @@ static void dd_finish_request(struct request *rq)
>  
>  	atomic_inc(&per_prio->stats.completed);
>  
> -	if (blk_queue_is_zoned(q)) {
> +	if (dd_use_zone_write_locking(rq->q)) {
>  		unsigned long flags;
>  
>  		spin_lock_irqsave(&dd->zone_lock, flags);

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 2/7] block/mq-deadline: Only use zone locking if necessary
  2023-08-10  1:36   ` Damien Le Moal
@ 2023-08-10 14:00     ` Bart Van Assche
  2023-08-11  0:45       ` Damien Le Moal
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2023-08-10 14:00 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 8/9/23 18:36, Damien Le Moal wrote:
> On 8/10/23 05:23, Bart Van Assche wrote:
>> +static bool dd_use_zone_write_locking(struct request_queue *q)
>> +{
>> +	return q->limits.use_zone_write_lock && blk_queue_is_zoned(q);
> 
> use_zone_write_lock should be true ONLY if the queue is zoned.
> So the "&& blk_queue_is_zoned(q)" seems unnecessary to me.
> This little helper could be moved to be generic in blkdev.h too.

Hi Damien,

use_zone_write_lock should be set by the block driver (e.g. a SCSI
LLD) before I/O starts. The zone model information is retrieved by
submitting I/O. It is not clear to me how to set use_zone_write_lock
to true only for zoned block devices before I/O starts since I/O is
required to retrieve information about the zone model.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable
  2023-08-10  1:33   ` Damien Le Moal
@ 2023-08-10 14:02     ` Bart Van Assche
  2023-08-11  0:39       ` Damien Le Moal
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2023-08-10 14:02 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 8/9/23 18:33, Damien Le Moal wrote:
> On 8/10/23 05:23, Bart Van Assche wrote:
>> Writes in sequential write required zones must happen at the write
>> pointer. Even if the submitter of the write commands (e.g. a filesystem)
>> submits writes for sequential write required zones in order, the block
>> layer or the storage controller may reorder these write commands.
>>
>> The zone locking mechanism in the mq-deadline I/O scheduler serializes
>> write commands for sequential zones. Some but not all storage controllers
>> require this serialization. Introduce a new request queue limit member
>> variable to allow block drivers to indicate that they preserve the order
>> of write commands and thus do not require serialization of writes per
>> zone.
>>
>> Cc: Damien Le Moal <dlemoal@kernel.org>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>   block/blk-settings.c   | 6 ++++++
>>   include/linux/blkdev.h | 1 +
>>   2 files changed, 7 insertions(+)
>>
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 0046b447268f..b75c97971860 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -56,6 +56,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>>   	lim->alignment_offset = 0;
>>   	lim->io_opt = 0;
>>   	lim->misaligned = 0;
>> +	lim->use_zone_write_lock = true;
>>   	lim->zoned = BLK_ZONED_NONE;
> 
> Given that the default for zoned is BLK_ZONED_NONE, having use_zone_write_lock
> default to true is strange. It would be better to set the default to false and
> have disk_set_zoned() set it to true if needed, with an additional argument to
> specify if it should be the case or not. E.g., for SMR drives, sd.c would call
> something like:
> 
> disk_set_zoned(sdkp->disk, BLK_ZONED_HM, sdp->use_zone_write_lock);
> 
> sd.c would default to sdp->use_zone_write_lock == true and UFS driver can set
> it to false. That would be cleaner I think.

Hi Damien,

Thanks for the detailed feedback.

My concerns about the above proposal are as follows:
* The information about whether or not the zone write lock should be used comes
   from the block driver, e.g. a SCSI LLD.
* sdp, the SCSI disk pointer, is owned by the ULD.
* An ULD may be attached and detached multiple times during the lifetime of a
   logical unit without the LLD being informed about this. So how to set
   sdp->use_zone_write_lock without introducing a new callback or member variable
   in a data structure owned by the LLD?

Hence my preference to store use_zone_write_lock in a data structure that has the
same lifetime as the logical unit and not in any data structure controlled by the
ULD.

>>   	lim->zone_write_granularity = 0;
>>   	lim->dma_alignment = 511;
>> @@ -685,6 +686,11 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>   						   b->max_secure_erase_sectors);
>>   	t->zone_write_granularity = max(t->zone_write_granularity,
>>   					b->zone_write_granularity);
>> +	/*
>> +	 * Whether or not the zone write lock should be used depends on the
>> +	 * bottom driver only.
>> +	 */
>> +	t->use_zone_write_lock = b->use_zone_write_lock;
> 
> Given that DM bio targets do not have a scheduler and do not have a zone lock
> bitmap allocated, I do not think this is necessary at all. This can remain to
> false, thus in sync with the fact that there is no IO scheduler.

How about the request-based dm drivers (dm-mpath and dm-target)? Isn't the dm-mpath
driver request based because that allows the I/O scheduler to be configured on top
of the dm-mpath driver? From https://lwn.net/Articles/274292/: "The basic idea to
resolve the issue is to move multipathing layer down below the I/O scheduler, and it
was proposed from Mike Christie as the block layer (request-based) multipath".

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable
  2023-08-10 14:02     ` Bart Van Assche
@ 2023-08-11  0:39       ` Damien Le Moal
  2023-08-11 15:41         ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Le Moal @ 2023-08-11  0:39 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 8/10/23 23:02, Bart Van Assche wrote:
> On 8/9/23 18:33, Damien Le Moal wrote:
>> On 8/10/23 05:23, Bart Van Assche wrote:
>>> Writes in sequential write required zones must happen at the write
>>> pointer. Even if the submitter of the write commands (e.g. a filesystem)
>>> submits writes for sequential write required zones in order, the block
>>> layer or the storage controller may reorder these write commands.
>>>
>>> The zone locking mechanism in the mq-deadline I/O scheduler serializes
>>> write commands for sequential zones. Some but not all storage controllers
>>> require this serialization. Introduce a new request queue limit member
>>> variable to allow block drivers to indicate that they preserve the order
>>> of write commands and thus do not require serialization of writes per
>>> zone.
>>>
>>> Cc: Damien Le Moal <dlemoal@kernel.org>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Cc: Ming Lei <ming.lei@redhat.com>
>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>> ---
>>>   block/blk-settings.c   | 6 ++++++
>>>   include/linux/blkdev.h | 1 +
>>>   2 files changed, 7 insertions(+)
>>>
>>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>>> index 0046b447268f..b75c97971860 100644
>>> --- a/block/blk-settings.c
>>> +++ b/block/blk-settings.c
>>> @@ -56,6 +56,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>>>   	lim->alignment_offset = 0;
>>>   	lim->io_opt = 0;
>>>   	lim->misaligned = 0;
>>> +	lim->use_zone_write_lock = true;
>>>   	lim->zoned = BLK_ZONED_NONE;
>>
>> Given that the default for zoned is BLK_ZONED_NONE, having use_zone_write_lock
>> default to true is strange. It would be better to set the default to false and
>> have disk_set_zoned() set it to true if needed, with an additional argument to
>> specify if it should be the case or not. E.g., for SMR drives, sd.c would call
>> something like:
>>
>> disk_set_zoned(sdkp->disk, BLK_ZONED_HM, sdp->use_zone_write_lock);
>>
>> sd.c would default to sdp->use_zone_write_lock == true and UFS driver can set
>> it to false. That would be cleaner I think.
> 
> Hi Damien,
> 
> Thanks for the detailed feedback.
> 
> My concerns about the above proposal are as follows:
> * The information about whether or not the zone write lock should be used comes
>    from the block driver, e.g. a SCSI LLD.

Yes.

> * sdp, the SCSI disk pointer, is owned by the ULD.
> * An ULD may be attached and detached multiple times during the lifetime of a
>    logical unit without the LLD being informed about this. So how to set
>    sdp->use_zone_write_lock without introducing a new callback or member variable
>    in a data structure owned by the LLD?

That would be set during device scan and device revalidate. And if the value
changes, then disk_set_zoned() should be called again to update the queue limit.
That is already what is done for the zoned limit indicating the type of the
drive. My point is that the zoned limit should dictate if use_zone_write_lock
can be true. The default should be be:

	q->limits.use_zone_write_lock = q->limits.zoned != BLK_ZONED_NONE.

And as proposed, if the UFS driver wants to disable zone write locking, all it
needs to do is call "disk_set_zoned(disk, BLK_ZONED_HM, false)". I did not try
to actually code that, and the scsi disk driver may be in the way and that may
need to be done there, hence the suggestion of having a use_zone_write_lock flag
in the scsi device structure so that the UFS driver can set it as needed as well
(and detect changes when revalidating). That should work, but I may be missing
something.

> Hence my preference to store use_zone_write_lock in a data structure that has the
> same lifetime as the logical unit and not in any data structure controlled by the
> ULD.

See above. The actual storage would still be in q->limits so that the block
layer can see it.

> 
>>>   	lim->zone_write_granularity = 0;
>>>   	lim->dma_alignment = 511;
>>> @@ -685,6 +686,11 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>>   						   b->max_secure_erase_sectors);
>>>   	t->zone_write_granularity = max(t->zone_write_granularity,
>>>   					b->zone_write_granularity);
>>> +	/*
>>> +	 * Whether or not the zone write lock should be used depends on the
>>> +	 * bottom driver only.
>>> +	 */
>>> +	t->use_zone_write_lock = b->use_zone_write_lock;
>>
>> Given that DM bio targets do not have a scheduler and do not have a zone lock
>> bitmap allocated, I do not think this is necessary at all. This can remain to
>> false, thus in sync with the fact that there is no IO scheduler.
> 
> How about the request-based dm drivers (dm-mpath and dm-target)? Isn't the dm-mpath
> driver request based because that allows the I/O scheduler to be configured on top
> of the dm-mpath driver? From https://lwn.net/Articles/274292/: "The basic idea to
> resolve the issue is to move multipathing layer down below the I/O scheduler, and it
> was proposed from Mike Christie as the block layer (request-based) multipath".

These DM targets do not support zoned devices, so I do not think it is an issue.

That said, you can still keep the parameter stacking, but at the very least, I
think it should be:

t->use_zone_write_lock = t->use_zone_write_lock || b->use_zone_write_lock;

so that for a target composed of multiple devices, if one needs zone write
locking, the DM device request that as well. Otherwise, problems may occur I think.

> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 2/7] block/mq-deadline: Only use zone locking if necessary
  2023-08-10 14:00     ` Bart Van Assche
@ 2023-08-11  0:45       ` Damien Le Moal
  2023-08-11 15:49         ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Le Moal @ 2023-08-11  0:45 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 8/10/23 23:00, Bart Van Assche wrote:
> On 8/9/23 18:36, Damien Le Moal wrote:
>> On 8/10/23 05:23, Bart Van Assche wrote:
>>> +static bool dd_use_zone_write_locking(struct request_queue *q)
>>> +{
>>> +	return q->limits.use_zone_write_lock && blk_queue_is_zoned(q);
>>
>> use_zone_write_lock should be true ONLY if the queue is zoned.
>> So the "&& blk_queue_is_zoned(q)" seems unnecessary to me.
>> This little helper could be moved to be generic in blkdev.h too.
> 
> Hi Damien,
> 
> use_zone_write_lock should be set by the block driver (e.g. a SCSI
> LLD) before I/O starts. The zone model information is retrieved by
> submitting I/O. It is not clear to me how to set use_zone_write_lock
> to true only for zoned block devices before I/O starts since I/O is
> required to retrieve information about the zone model.

See my other email. Once you know that the drive is zoned, then
use_zone_write_lock can default to true. That is trivial to do in
disk_set_zoned(), which is called by all drivers. I proposed adding an argument
to this function to override the default, thus allowing a device driver to set
it to false.

The limit default set to true as you have it in your current patch does not make
sense to me. It should be false by default and turned on only for zoned drive
that require zone write locking (e.g. SMR HDDs). With that, mq-deadline can even
be simplified to not even look at the q zoned model and simply us
q->limits.use_zone_write_lock to determine if locking a zone is needed or not.
That would be a lot cleaner.

Not that device scan and revalidation never write to the device. So that can be
done safely regardless of the current value for the use_zone_write_lock limit.


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/7] scsi: core: Retry unaligned zoned writes
  2023-08-09 20:23 ` [PATCH v7 3/7] scsi: core: Retry unaligned zoned writes Bart Van Assche
@ 2023-08-11 13:29   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2023-08-11 13:29 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim,
	Christoph Hellwig, Damien Le Moal, Martin K . Petersen, Ming Lei,
	James E.J. Bottomley

Just as last time:  comparing the lbas really has no business in the
SCSI core.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable
  2023-08-11  0:39       ` Damien Le Moal
@ 2023-08-11 15:41         ` Bart Van Assche
  2023-08-12  2:44           ` Damien Le Moal
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2023-08-11 15:41 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 8/10/23 17:39, Damien Le Moal wrote:
> On 8/10/23 23:02, Bart Van Assche wrote:
>> My concerns about the above proposal are as follows:
>> * The information about whether or not the zone write lock should be used comes
>>     from the block driver, e.g. a SCSI LLD.
> 
> Yes.
> 
>> * sdp, the SCSI disk pointer, is owned by the ULD.
>> * An ULD may be attached and detached multiple times during the lifetime of a
>>     logical unit without the LLD being informed about this. So how to set
>>     sdp->use_zone_write_lock without introducing a new callback or member variable
>>     in a data structure owned by the LLD?
> 
> That would be set during device scan and device revalidate. And if the value
> changes, then disk_set_zoned() should be called again to update the queue limit.
> That is already what is done for the zoned limit indicating the type of the
> drive. My point is that the zoned limit should dictate if use_zone_write_lock
> can be true. The default should be be:
> 
> 	q->limits.use_zone_write_lock = q->limits.zoned != BLK_ZONED_NONE.
> 
> And as proposed, if the UFS driver wants to disable zone write locking, all it
> needs to do is call "disk_set_zoned(disk, BLK_ZONED_HM, false)". I did not try
> to actually code that, and the scsi disk driver may be in the way and that may
> need to be done there, hence the suggestion of having a use_zone_write_lock flag
> in the scsi device structure so that the UFS driver can set it as needed as well
> (and detect changes when revalidating). That should work, but I may be missing
> something.

Hi Damien,

Adding a use_zone_write_lock argument to disk_set_zoned() seems wrong to 
me. The information about whether or not to use the zone write lock 
comes from the LLD. The zone model is retrieved by the ULD. Since both 
pieces of information come from different drivers, both properties 
should be modified independently.

Moving the use_zone_write_lock member variable into a data structure 
owned by the ULD seems wrong to me because that member variable is set 
by the LLD.

Reviewers are allowed to request changes for well-designed and working 
code but should be able to explain why they request these changes. Can 
you please explain why you care about the value of 
q->limits.use_zone_write_lock for the BLK_ZONED_NONE case? If 
dd_use_zone_write_locking() would be renamed and would be moved into 
include/linux/blkdev.h and if reads of q->limits.use_zone_write_lock 
would be changed into blk_use_zone_write_locking() calls then the value 
of q->limits.use_zone_write_lock for the BLK_ZONED_NONE case wouldn't 
matter at all.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 2/7] block/mq-deadline: Only use zone locking if necessary
  2023-08-11  0:45       ` Damien Le Moal
@ 2023-08-11 15:49         ` Bart Van Assche
  2023-08-12  2:49           ` Damien Le Moal
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2023-08-11 15:49 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 8/10/23 17:45, Damien Le Moal wrote:
> With that, mq-deadline can even be simplified to not even look at the
> q zoned model and simply us q->limits.use_zone_write_lock to
> determine if locking a zone is needed or not.

Hi Damien,

I think implementing the above proposal requires splitting 
'use_zone_write_lock' into two variables:
(a) One member variable that indicates whether the zone write lock
     should be used.
(b) Another member variable that indicates whether or not the LLD
     preserves the order of SCSI commands.

Member variable (b) should be set by the LLD and member variable (a) can 
be set by disk_set_zoned().

Do you want me to implement this approach?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable
  2023-08-11 15:41         ` Bart Van Assche
@ 2023-08-12  2:44           ` Damien Le Moal
  0 siblings, 0 replies; 19+ messages in thread
From: Damien Le Moal @ 2023-08-12  2:44 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 8/12/23 00:41, Bart Van Assche wrote:
> On 8/10/23 17:39, Damien Le Moal wrote:
>> On 8/10/23 23:02, Bart Van Assche wrote:
>>> My concerns about the above proposal are as follows:
>>> * The information about whether or not the zone write lock should be used comes
>>>     from the block driver, e.g. a SCSI LLD.
>>
>> Yes.
>>
>>> * sdp, the SCSI disk pointer, is owned by the ULD.
>>> * An ULD may be attached and detached multiple times during the lifetime of a
>>>     logical unit without the LLD being informed about this. So how to set
>>>     sdp->use_zone_write_lock without introducing a new callback or member variable
>>>     in a data structure owned by the LLD?
>>
>> That would be set during device scan and device revalidate. And if the value
>> changes, then disk_set_zoned() should be called again to update the queue limit.
>> That is already what is done for the zoned limit indicating the type of the
>> drive. My point is that the zoned limit should dictate if use_zone_write_lock
>> can be true. The default should be be:
>>
>> 	q->limits.use_zone_write_lock = q->limits.zoned != BLK_ZONED_NONE.
>>
>> And as proposed, if the UFS driver wants to disable zone write locking, all it
>> needs to do is call "disk_set_zoned(disk, BLK_ZONED_HM, false)". I did not try
>> to actually code that, and the scsi disk driver may be in the way and that may
>> need to be done there, hence the suggestion of having a use_zone_write_lock flag
>> in the scsi device structure so that the UFS driver can set it as needed as well
>> (and detect changes when revalidating). That should work, but I may be missing
>> something.
> 
> Hi Damien,
> 
> Adding a use_zone_write_lock argument to disk_set_zoned() seems wrong to 
> me. The information about whether or not to use the zone write lock 
> comes from the LLD. The zone model is retrieved by the ULD. Since both 
> pieces of information come from different drivers, both properties 
> should be modified independently.

OK. But I still think that disk_set_zoned() is the place where the default for
use_zone_write_lock should be set.

And we need a clean way for the LLD to change use_zone_write_lock.

> 
> Moving the use_zone_write_lock member variable into a data structure 
> owned by the ULD seems wrong to me because that member variable is set 
> by the LLD.
> 
> Reviewers are allowed to request changes for well-designed and working 
> code but should be able to explain why they request these changes. Can 
> you please explain why you care about the value of 
> q->limits.use_zone_write_lock for the BLK_ZONED_NONE case? If 

Because use_zone_write_lock should ever be true for !BLK_ZONED_NONE case.
Allowing use_zone_write_lock to be true even with zoned == BLK_ZONED_NONE forces
to always check that the device is zoned to ensure that the value of
use_zone_write_lock is valid. This is awckward and uselessly complicate things.

> dd_use_zone_write_locking() would be renamed and would be moved into 
> include/linux/blkdev.h and if reads of q->limits.use_zone_write_lock 
> would be changed into blk_use_zone_write_locking() calls then the value 
> of q->limits.use_zone_write_lock for the BLK_ZONED_NONE case wouldn't 
> matter at all.

Sure, that would work. But again, why the need to check both model and actual
value of use_zone_write_lock ? I do not think it is that hard to keep
use_zone_write_lock to false for the BLK_ZONED_NONE case. Then
blk_use_zone_write_locking() is reduced to returning the value of
use_zone_write_lock.

> 
> Thanks,
> 
> Bart.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 2/7] block/mq-deadline: Only use zone locking if necessary
  2023-08-11 15:49         ` Bart Van Assche
@ 2023-08-12  2:49           ` Damien Le Moal
  0 siblings, 0 replies; 19+ messages in thread
From: Damien Le Moal @ 2023-08-12  2:49 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Christoph Hellwig, Ming Lei

On 8/12/23 00:49, Bart Van Assche wrote:
> On 8/10/23 17:45, Damien Le Moal wrote:
>> With that, mq-deadline can even be simplified to not even look at the
>> q zoned model and simply us q->limits.use_zone_write_lock to
>> determine if locking a zone is needed or not.
> 
> Hi Damien,
> 
> I think implementing the above proposal requires splitting 
> 'use_zone_write_lock' into two variables:
> (a) One member variable that indicates whether the zone write lock
>      should be used.
> (b) Another member variable that indicates whether or not the LLD
>      preserves the order of SCSI commands.
> 
> Member variable (b) should be set by the LLD and member variable (a) can 
> be set by disk_set_zoned().
> 
> Do you want me to implement this approach?
> 

Looking at your v8, this seems better.
I will have a more in-depth look Monday.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-08-12  2:49 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-09 20:23 [PATCH v7 0/7] Improve performance for zoned UFS devices Bart Van Assche
2023-08-09 20:23 ` [PATCH v7 1/7] block: Introduce the use_zone_write_lock member variable Bart Van Assche
2023-08-10  1:33   ` Damien Le Moal
2023-08-10 14:02     ` Bart Van Assche
2023-08-11  0:39       ` Damien Le Moal
2023-08-11 15:41         ` Bart Van Assche
2023-08-12  2:44           ` Damien Le Moal
2023-08-09 20:23 ` [PATCH v7 2/7] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
2023-08-10  1:36   ` Damien Le Moal
2023-08-10 14:00     ` Bart Van Assche
2023-08-11  0:45       ` Damien Le Moal
2023-08-11 15:49         ` Bart Van Assche
2023-08-12  2:49           ` Damien Le Moal
2023-08-09 20:23 ` [PATCH v7 3/7] scsi: core: Retry unaligned zoned writes Bart Van Assche
2023-08-11 13:29   ` Christoph Hellwig
2023-08-09 20:23 ` [PATCH v7 4/7] scsi: scsi_debug: Support disabling zone write locking Bart Van Assche
2023-08-09 20:23 ` [PATCH v7 5/7] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
2023-08-09 20:23 ` [PATCH v7 6/7] scsi: ufs: Split an if-condition Bart Van Assche
2023-08-09 20:23 ` [PATCH v7 7/7] scsi: ufs: Disable zone write locking Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).