public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] Increase SCSI IOPS
@ 2025-12-16 22:30 Bart Van Assche
  2025-12-16 22:30 ` [PATCH v4 1/6] block: Rename busy_tag_iter_fn into blk_mq_rq_iter_fn Bart Van Assche
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-12-16 22:30 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Damien Le Moal, Bart Van Assche

Hi Martin,

This patch series increases scsi_debug IOPS by 5% on my test setup by disabling
SCSI budget management if it is not needed. This patch series improves the
performance of many SCSI LLDs, including the UFS and ATA drivers. On my UFS 4
test setup this patch improves IOPS by 1% and reduces the time spent in
scsi_mq_get_budget() from 0.22% to 0.01%. The improvement for UFS 5 devices is
expected to be significantly larger than what I measured on my UFS 4 test setup.

Please consider this patch series for the next merge window.

Thanks,

Bart.

Changes compared to v3:
 - Instead of removing the use of cmd->budget_token from the ATA core, introduce
   the SCSI host flag .needs_budget_token and set it from the ATA core.

Changes compared to v2:
 - Fixed a hang during LUN scanning for ATA devices.

Changes compared to v1:
 - Added three block layer patches to introduce the function
   blk_mq_tagset_iter().
 - Applied the optimization not only for host-wide tags but also if there is
   only a single hardware queue.
 - Renamed scsi_device_check_in_flight() into scsi_device_check_allocated().
 - Added support for set->shared_tags == NULL in scsi_device_busy().
 
Bart Van Assche (6):
  block: Rename busy_tag_iter_fn into blk_mq_rq_iter_fn
  block: Introduce __blk_mq_tagset_iter()
  block: Introduce blk_mq_tagset_iter()
  ata: libata: Set .needs_budget_token
  scsi: core: Generalize scsi_device_busy()
  scsi: core: Improve IOPS in case of host-wide tags

 block/blk-mq-tag.c         | 67 ++++++++++++++++++++++++++------------
 block/blk-mq.h             |  4 +--
 drivers/ata/libata-scsi.c  |  1 +
 drivers/scsi/scsi.c        |  6 ++--
 drivers/scsi/scsi_lib.c    | 38 +++++++++++++++++++++
 drivers/scsi/scsi_scan.c   | 20 +++++++++++-
 include/linux/blk-mq.h     |  6 ++--
 include/scsi/scsi_device.h |  5 +--
 include/scsi/scsi_host.h   |  3 ++
 9 files changed, 116 insertions(+), 34 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/6] block: Rename busy_tag_iter_fn into blk_mq_rq_iter_fn
  2025-12-16 22:30 [PATCH v4 0/6] Increase SCSI IOPS Bart Van Assche
@ 2025-12-16 22:30 ` Bart Van Assche
  2025-12-16 22:30 ` [PATCH v4 2/6] block: Introduce __blk_mq_tagset_iter() Bart Van Assche
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-12-16 22:30 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Damien Le Moal, Bart Van Assche, Jens Axboe,
	Ming Lei

The name 'busy_tag_iter_fn' is not correct since blk_mq_all_tag_iter()
uses this function pointer type for requests that may not be "busy"
(started). Hence rename 'busy_tag_iter_fn' into 'blk_mq_rq_iter_fn'.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-tag.c     | 16 ++++++++--------
 block/blk-mq.h         |  4 ++--
 include/linux/blk-mq.h |  4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 5b664dbdf655..e8e1fd398d4b 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -247,7 +247,7 @@ void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags)
 struct bt_iter_data {
 	struct blk_mq_hw_ctx *hctx;
 	struct request_queue *q;
-	busy_tag_iter_fn *fn;
+	blk_mq_rq_iter_fn *fn;
 	void *data;
 	bool reserved;
 };
@@ -310,7 +310,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
  *		bitmap_tags member of struct blk_mq_tags.
  */
 static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct request_queue *q,
-			struct sbitmap_queue *bt, busy_tag_iter_fn *fn,
+			struct sbitmap_queue *bt, blk_mq_rq_iter_fn *fn,
 			void *data, bool reserved)
 {
 	struct bt_iter_data iter_data = {
@@ -326,7 +326,7 @@ static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct request_queue *q,
 
 struct bt_tags_iter_data {
 	struct blk_mq_tags *tags;
-	busy_tag_iter_fn *fn;
+	blk_mq_rq_iter_fn *fn;
 	void *data;
 	unsigned int flags;
 };
@@ -378,7 +378,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
  * @flags:	BT_TAG_ITER_*
  */
 static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
-			     busy_tag_iter_fn *fn, void *data, unsigned int flags)
+			blk_mq_rq_iter_fn *fn, void *data, unsigned int flags)
 {
 	struct bt_tags_iter_data iter_data = {
 		.tags = tags,
@@ -392,7 +392,7 @@ static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
 }
 
 static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags,
-		busy_tag_iter_fn *fn, void *priv, unsigned int flags)
+		blk_mq_rq_iter_fn *fn, void *priv, unsigned int flags)
 {
 	WARN_ON_ONCE(flags & BT_TAG_ITER_RESERVED);
 
@@ -413,7 +413,7 @@ static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags,
  *
  * Caller has to pass the tag map from which requests are allocated.
  */
-void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
+void blk_mq_all_tag_iter(struct blk_mq_tags *tags, blk_mq_rq_iter_fn *fn,
 		void *priv)
 {
 	__blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
@@ -432,7 +432,7 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
  * @fn returns.
  */
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
-		busy_tag_iter_fn *fn, void *priv)
+		blk_mq_rq_iter_fn *fn, void *priv)
 {
 	unsigned int flags = tagset->flags;
 	int i, nr_tags, srcu_idx;
@@ -493,7 +493,7 @@ EXPORT_SYMBOL(blk_mq_tagset_wait_completed_request);
  * called for all requests on all queues that share that tag set and not only
  * for requests associated with @q.
  */
-void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
+void blk_mq_queue_tag_busy_iter(struct request_queue *q, blk_mq_rq_iter_fn *fn,
 		void *priv)
 {
 	int srcu_idx;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index c4fccdeb5441..ae119cb12136 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -190,9 +190,9 @@ void blk_mq_tag_update_sched_shared_tags(struct request_queue *q,
 					 unsigned int nr);
 
 void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
-void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
+void blk_mq_queue_tag_busy_iter(struct request_queue *q, blk_mq_rq_iter_fn *fn,
 		void *priv);
-void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
+void blk_mq_all_tag_iter(struct blk_mq_tags *tags, blk_mq_rq_iter_fn *fn,
 		void *priv);
 
 static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index b25d12545f46..3467cacb281c 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -549,7 +549,7 @@ struct blk_mq_queue_data {
 	bool last;
 };
 
-typedef bool (busy_tag_iter_fn)(struct request *, void *);
+typedef bool (blk_mq_rq_iter_fn)(struct request *, void *);
 
 /**
  * struct blk_mq_ops - Callback functions that implements block driver
@@ -926,7 +926,7 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
-		busy_tag_iter_fn *fn, void *priv);
+		blk_mq_rq_iter_fn *fn, void *priv);
 void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset);
 void blk_mq_freeze_queue_nomemsave(struct request_queue *q);
 void blk_mq_unfreeze_queue_nomemrestore(struct request_queue *q);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/6] block: Introduce __blk_mq_tagset_iter()
  2025-12-16 22:30 [PATCH v4 0/6] Increase SCSI IOPS Bart Van Assche
  2025-12-16 22:30 ` [PATCH v4 1/6] block: Rename busy_tag_iter_fn into blk_mq_rq_iter_fn Bart Van Assche
@ 2025-12-16 22:30 ` Bart Van Assche
  2025-12-16 22:30 ` [PATCH v4 3/6] block: Introduce blk_mq_tagset_iter() Bart Van Assche
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-12-16 22:30 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Damien Le Moal, Bart Van Assche, Jens Axboe,
	Ming Lei

Prepare for introducing a second caller of __blk_mq_tagset_iter(). No
functionality has been changed.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-tag.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index e8e1fd398d4b..783addc52e09 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -419,6 +419,24 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, blk_mq_rq_iter_fn *fn,
 	__blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
 }
 
+static void __blk_mq_tagset_iter(struct blk_mq_tag_set *tagset,
+			blk_mq_rq_iter_fn *fn, void *priv, unsigned long flags)
+{
+	int i, nr_tags, srcu_idx;
+
+	srcu_idx = srcu_read_lock(&tagset->tags_srcu);
+
+	nr_tags = blk_mq_is_shared_tags(tagset->flags) ? 1 :
+		tagset->nr_hw_queues;
+
+	for (i = 0; i < nr_tags; i++) {
+		if (tagset->tags && tagset->tags[i])
+			__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
+					      flags);
+	}
+	srcu_read_unlock(&tagset->tags_srcu, srcu_idx);
+}
+
 /**
  * blk_mq_tagset_busy_iter - iterate over all started requests in a tag set
  * @tagset:	Tag set to iterate over.
@@ -434,19 +452,7 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, blk_mq_rq_iter_fn *fn,
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		blk_mq_rq_iter_fn *fn, void *priv)
 {
-	unsigned int flags = tagset->flags;
-	int i, nr_tags, srcu_idx;
-
-	srcu_idx = srcu_read_lock(&tagset->tags_srcu);
-
-	nr_tags = blk_mq_is_shared_tags(flags) ? 1 : tagset->nr_hw_queues;
-
-	for (i = 0; i < nr_tags; i++) {
-		if (tagset->tags && tagset->tags[i])
-			__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
-					      BT_TAG_ITER_STARTED);
-	}
-	srcu_read_unlock(&tagset->tags_srcu, srcu_idx);
+	__blk_mq_tagset_iter(tagset, fn, priv, BT_TAG_ITER_STARTED);
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/6] block: Introduce blk_mq_tagset_iter()
  2025-12-16 22:30 [PATCH v4 0/6] Increase SCSI IOPS Bart Van Assche
  2025-12-16 22:30 ` [PATCH v4 1/6] block: Rename busy_tag_iter_fn into blk_mq_rq_iter_fn Bart Van Assche
  2025-12-16 22:30 ` [PATCH v4 2/6] block: Introduce __blk_mq_tagset_iter() Bart Van Assche
@ 2025-12-16 22:30 ` Bart Van Assche
  2025-12-16 22:30 ` [PATCH v4 4/6] ata: libata: Set .needs_budget_token Bart Van Assche
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-12-16 22:30 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Damien Le Moal, Bart Van Assche, Jens Axboe,
	Ming Lei

Support iterating over all requests in a tag set, including requests
that have not yet been started. A later patch will call this function
from scsi_device_busy().

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-tag.c     | 19 +++++++++++++++++++
 include/linux/blk-mq.h |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 783addc52e09..0e58e615af87 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -456,6 +456,25 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
 
+/**
+ * blk_mq_tagset_iter - iterate over all requests in a tag set
+ * @tagset:	Tag set to iterate over.
+ * @fn:		Pointer to the function that will be called for each request.
+ *		@fn will be called as follows: @fn(rq, @priv) where rq is a
+ *		pointer to a request. Return true to continue iterating tags,
+ *		false to stop.
+ * @priv:	Will be passed as second argument to @fn.
+ *
+ * We grab one request reference before calling @fn and release it after
+ * @fn returns.
+ */
+void blk_mq_tagset_iter(struct blk_mq_tag_set *tagset, blk_mq_rq_iter_fn *fn,
+			void *priv)
+{
+	__blk_mq_tagset_iter(tagset, fn, priv, 0);
+}
+EXPORT_SYMBOL(blk_mq_tagset_iter);
+
 static bool blk_mq_tagset_count_completed_rqs(struct request *rq, void *data)
 {
 	unsigned *count = data;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3467cacb281c..20a22c1cd067 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -927,6 +927,8 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		blk_mq_rq_iter_fn *fn, void *priv);
+void blk_mq_tagset_iter(struct blk_mq_tag_set *tagset, blk_mq_rq_iter_fn *fn,
+		void *priv);
 void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset);
 void blk_mq_freeze_queue_nomemsave(struct request_queue *q);
 void blk_mq_unfreeze_queue_nomemrestore(struct request_queue *q);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/6] ata: libata: Set .needs_budget_token
  2025-12-16 22:30 [PATCH v4 0/6] Increase SCSI IOPS Bart Van Assche
                   ` (2 preceding siblings ...)
  2025-12-16 22:30 ` [PATCH v4 3/6] block: Introduce blk_mq_tagset_iter() Bart Van Assche
@ 2025-12-16 22:30 ` Bart Van Assche
  2025-12-16 22:30 ` [PATCH v4 5/6] scsi: core: Generalize scsi_device_busy() Bart Van Assche
  2025-12-16 22:30 ` [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags Bart Van Assche
  5 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-12-16 22:30 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Damien Le Moal, Bart Van Assche, Niklas Cassel,
	Christoph Hellwig, James E.J. Bottomley

Make the SCSI core set cmd->budget_token because there is code in the
ATA core that uses this member variable directly. Prepare for skipping
the SCSI budget map allocation if this map is not needed.

Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Niklas Cassel <cassel@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ata/libata-scsi.c | 1 +
 include/scsi/scsi_host.h  | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 026122bb6f2f..66f69116de60 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4499,6 +4499,7 @@ int ata_scsi_add_hosts(struct ata_host *host, const struct scsi_host_template *s
 		shost->max_lun = 1;
 		shost->max_channel = 1;
 		shost->max_cmd_len = 32;
+		shost->needs_budget_token = true;
 
 		/* Schedule policy is determined by ->qc_defer()
 		 * callback and it needs to see every deferred qc.
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index e87cf7eadd26..2b3fc8dcbf0b 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -695,6 +695,9 @@ struct Scsi_Host {
 	/* The transport requires the LUN bits NOT to be stored in CDB[1] */
 	unsigned no_scsi2_lun_in_cdb:1;
 
+	/* Whether the LLD uses cmd->budget_token */
+	unsigned needs_budget_token:1;
+
 	/*
 	 * Optional work queue to be utilized by the transport
 	 */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/6] scsi: core: Generalize scsi_device_busy()
  2025-12-16 22:30 [PATCH v4 0/6] Increase SCSI IOPS Bart Van Assche
                   ` (3 preceding siblings ...)
  2025-12-16 22:30 ` [PATCH v4 4/6] ata: libata: Set .needs_budget_token Bart Van Assche
@ 2025-12-16 22:30 ` Bart Van Assche
  2025-12-16 22:30 ` [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags Bart Van Assche
  5 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-12-16 22:30 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Damien Le Moal, Bart Van Assche, Jens Axboe,
	Ming Lei, James E.J. Bottomley

Instead of only handling dev->budget_map.map != NULL, also handle
dev->budget_map.map == NULL. This patch prepares for supporting logical
units without budget map (sdev->budget_map.map == NULL).

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_lib.c    | 38 ++++++++++++++++++++++++++++++++++++++
 include/scsi/scsi_device.h |  5 +----
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 93031326ac3e..2f9ebf526d89 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -446,6 +446,44 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
+struct sdev_cmds_allocated_data {
+	const struct scsi_device *sdev;
+	int count;
+};
+
+static bool scsi_device_check_allocated(struct request *rq, void *data)
+{
+	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
+	struct sdev_cmds_allocated_data *sifd = data;
+
+	if (cmd->device == sifd->sdev)
+		sifd->count++;
+
+	return true;
+}
+
+/**
+ * scsi_device_busy() - Number of commands allocated for a SCSI device
+ * @sdev: SCSI device.
+ *
+ * Note: There is a subtle difference between this function and
+ * scsi_host_busy(). scsi_host_busy() counts the number of commands that have
+ * been started. This function counts the number of commands that have been
+ * allocated. At least the UFS driver depends on this function counting commands
+ * that have already been allocated but that have not yet been started.
+ */
+int scsi_device_busy(const struct scsi_device *sdev)
+{
+	struct sdev_cmds_allocated_data sifd = { .sdev = sdev };
+	struct blk_mq_tag_set *set = &sdev->host->tag_set;
+
+	if (sdev->budget_map.map)
+		return sbitmap_weight(&sdev->budget_map);
+	blk_mq_tagset_iter(set, scsi_device_check_allocated, &sifd);
+	return sifd.count;
+}
+EXPORT_SYMBOL(scsi_device_busy);
+
 static inline bool scsi_device_is_busy(struct scsi_device *sdev)
 {
 	if (scsi_device_busy(sdev) >= sdev->queue_depth)
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index d32f5841f4f8..0dd078ac9b89 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -713,10 +713,7 @@ static inline int scsi_device_supports_vpd(struct scsi_device *sdev)
 	return 0;
 }
 
-static inline int scsi_device_busy(struct scsi_device *sdev)
-{
-	return sbitmap_weight(&sdev->budget_map);
-}
+int scsi_device_busy(const struct scsi_device *sdev);
 
 /* Macros to access the UNIT ATTENTION counters */
 #define scsi_get_ua_new_media_ctr(sdev)	atomic_read(&sdev->ua_new_media_ctr)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags
  2025-12-16 22:30 [PATCH v4 0/6] Increase SCSI IOPS Bart Van Assche
                   ` (4 preceding siblings ...)
  2025-12-16 22:30 ` [PATCH v4 5/6] scsi: core: Generalize scsi_device_busy() Bart Van Assche
@ 2025-12-16 22:30 ` Bart Van Assche
  2025-12-17  3:24   ` Damien Le Moal
  5 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2025-12-16 22:30 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Damien Le Moal, Bart Van Assche, Jens Axboe,
	Ming Lei, James E.J. Bottomley

The SCSI core uses the budget map to restrict the number of commands
that are in flight per logical unit. That limit check can be left out if
host->cmd_per_lun >= host->can_queue and if the host tag set is shared
across all hardware queues or if there is only one hardware queue  Since
scsi_mq_get_budget() shows up in all CPU profiles for fast SCSI devices,
do not allocate a budget map if cmd_per_lun >= can_queue and if the host
tag set is shared across all hardware queues.

For the following test this patch increases IOPS by 5%:

modprobe scsi_debug delay=0 no_rwlock=1 host_max_queue=192 submit_queues=$(nproc)

fio --bs=4096 --disable_clat=1 --disable_slat=1 --group_reporting=1 \
  --gtod_reduce=1 --invalidate=1 --ioengine=io_uring --ioscheduler=none \
  --norandommap --runtime=60 --rw=randread --thread --time_based=1 \
  --buffered=0 --numjobs=1 --iodepth=192 --iodepth_batch=24 --name=/dev/sda \
  --filename=/dev/sda

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi.c      |  6 ++----
 drivers/scsi/scsi_scan.c | 20 +++++++++++++++++++-
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 76cdad063f7b..3dc93dd9fda2 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -216,9 +216,6 @@ int scsi_device_max_queue_depth(struct scsi_device *sdev)
  */
 int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
 {
-	if (!sdev->budget_map.map)
-		return -EINVAL;
-
 	depth = min_t(int, depth, scsi_device_max_queue_depth(sdev));
 
 	if (depth > 0) {
@@ -229,7 +226,8 @@ int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
 	if (sdev->request_queue)
 		blk_set_queue_depth(sdev->request_queue, depth);
 
-	sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
+	if (sdev->budget_map.map)
+		sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
 
 	return sdev->queue_depth;
 }
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 7acbfcfc2172..35bfc118e048 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -215,9 +215,19 @@ static void scsi_unlock_floptical(struct scsi_device *sdev,
 			 SCSI_TIMEOUT, 3, NULL);
 }
 
+static bool scsi_needs_budget_map(struct Scsi_Host *shost, unsigned int depth)
+{
+	if (shost->needs_budget_token)
+		return true;
+	if (shost->host_tagset || shost->tag_set.nr_hw_queues == 1)
+		return depth < shost->can_queue;
+	return true;
+}
+
 static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev,
 					unsigned int depth)
 {
+	struct Scsi_Host *shost = sdev->host;
 	int new_shift = sbitmap_calculate_shift(depth);
 	bool need_alloc = !sdev->budget_map.map;
 	bool need_free = false;
@@ -225,6 +235,13 @@ static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev,
 	int ret;
 	struct sbitmap sb_backup;
 
+	if (!scsi_needs_budget_map(shost, depth)) {
+		memflags = blk_mq_freeze_queue(sdev->request_queue);
+		sbitmap_free(&sdev->budget_map);
+		blk_mq_unfreeze_queue(sdev->request_queue, memflags);
+		return 0;
+	}
+
 	depth = min_t(unsigned int, depth, scsi_device_max_queue_depth(sdev));
 
 	/*
@@ -1120,7 +1137,8 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	scsi_cdl_check(sdev);
 
 	sdev->max_queue_depth = sdev->queue_depth;
-	WARN_ON_ONCE(sdev->max_queue_depth > sdev->budget_map.depth);
+	WARN_ON_ONCE(sdev->budget_map.map &&
+		     sdev->max_queue_depth > sdev->budget_map.depth);
 
 	/*
 	 * Ok, the device is now all set up, we can

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags
  2025-12-16 22:30 ` [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags Bart Van Assche
@ 2025-12-17  3:24   ` Damien Le Moal
  2025-12-19 17:35     ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2025-12-17  3:24 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Jens Axboe, Ming Lei, James E.J. Bottomley

On 12/17/25 07:30, Bart Van Assche wrote:
> The SCSI core uses the budget map to restrict the number of commands
> that are in flight per logical unit. That limit check can be left out if
> host->cmd_per_lun >= host->can_queue and if the host tag set is shared
> across all hardware queues or if there is only one hardware queue  Since

Missing a period at the end of the sentence (before Since). But more
importantly, this does not explain why the above is true, and frankly, I do not
see it...

> scsi_mq_get_budget() shows up in all CPU profiles for fast SCSI devices,
> do not allocate a budget map if cmd_per_lun >= can_queue and if the host
> tag set is shared across all hardware queues.
> 
> For the following test this patch increases IOPS by 5%:
> 
> modprobe scsi_debug delay=0 no_rwlock=1 host_max_queue=192 submit_queues=$(nproc)
> 
> fio --bs=4096 --disable_clat=1 --disable_slat=1 --group_reporting=1 \
>   --gtod_reduce=1 --invalidate=1 --ioengine=io_uring --ioscheduler=none \
>   --norandommap --runtime=60 --rw=randread --thread --time_based=1 \
>   --buffered=0 --numjobs=1 --iodepth=192 --iodepth_batch=24 --name=/dev/sda \
>   --filename=/dev/sda
> 
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: John Garry <john.g.garry@oracle.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi.c      |  6 ++----
>  drivers/scsi/scsi_scan.c | 20 +++++++++++++++++++-
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 76cdad063f7b..3dc93dd9fda2 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -216,9 +216,6 @@ int scsi_device_max_queue_depth(struct scsi_device *sdev)
>   */
>  int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
>  {
> -	if (!sdev->budget_map.map)
> -		return -EINVAL;
> -
>  	depth = min_t(int, depth, scsi_device_max_queue_depth(sdev));
>  
>  	if (depth > 0) {
> @@ -229,7 +226,8 @@ int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
>  	if (sdev->request_queue)
>  		blk_set_queue_depth(sdev->request_queue, depth);
>  
> -	sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
> +	if (sdev->budget_map.map)
> +		sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
>  
>  	return sdev->queue_depth;
>  }
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 7acbfcfc2172..35bfc118e048 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -215,9 +215,19 @@ static void scsi_unlock_floptical(struct scsi_device *sdev,
>  			 SCSI_TIMEOUT, 3, NULL);
>  }
>  
> +static bool scsi_needs_budget_map(struct Scsi_Host *shost, unsigned int depth)
> +{
> +	if (shost->needs_budget_token)
> +		return true;
> +	if (shost->host_tagset || shost->tag_set.nr_hw_queues == 1)
> +		return depth < shost->can_queue;
> +	return true;
> +}
> +
>  static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev,
>  					unsigned int depth)
>  {
> +	struct Scsi_Host *shost = sdev->host;
>  	int new_shift = sbitmap_calculate_shift(depth);
>  	bool need_alloc = !sdev->budget_map.map;
>  	bool need_free = false;
> @@ -225,6 +235,13 @@ static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev,
>  	int ret;
>  	struct sbitmap sb_backup;
>  
> +	if (!scsi_needs_budget_map(shost, depth)) {
> +		memflags = blk_mq_freeze_queue(sdev->request_queue);
> +		sbitmap_free(&sdev->budget_map);
> +		blk_mq_unfreeze_queue(sdev->request_queue, memflags);
> +		return 0;
> +	}
> +
>  	depth = min_t(unsigned int, depth, scsi_device_max_queue_depth(sdev));
>  
>  	/*
> @@ -1120,7 +1137,8 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
>  	scsi_cdl_check(sdev);
>  
>  	sdev->max_queue_depth = sdev->queue_depth;
> -	WARN_ON_ONCE(sdev->max_queue_depth > sdev->budget_map.depth);
> +	WARN_ON_ONCE(sdev->budget_map.map &&
> +		     sdev->max_queue_depth > sdev->budget_map.depth);
>  
>  	/*
>  	 * Ok, the device is now all set up, we can


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags
  2025-12-17  3:24   ` Damien Le Moal
@ 2025-12-19 17:35     ` Bart Van Assche
  2025-12-19 23:06       ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2025-12-19 17:35 UTC (permalink / raw)
  To: Damien Le Moal, Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Jens Axboe, Ming Lei, James E.J. Bottomley

On 12/16/25 7:24 PM, Damien Le Moal wrote:
> On 12/17/25 07:30, Bart Van Assche wrote:
>> The SCSI core uses the budget map to restrict the number of commands
>> that are in flight per logical unit. That limit check can be left out if
>> host->cmd_per_lun >= host->can_queue and if the host tag set is shared
>> across all hardware queues or if there is only one hardware queue  Since
> 
> Missing a period at the end of the sentence (before Since). But more
> importantly, this does not explain why the above is true, and frankly, I do not
> see it...
Hi Damien,

The purpose of the SCSI device budget map is to prevent that the queue
depth limit for that SCSI device is exceeded. If there is only a single
hardware queue or there is a host-wide tag set and host->cmd_per_lun is
identical to host->can_queue, it is not possible that the queue depth
for a single SCSI device is exceeded and hence the SCSI device budget
map is not needed.

Please help with reviewing the ATA patch in this series.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags
  2025-12-19 17:35     ` Bart Van Assche
@ 2025-12-19 23:06       ` Damien Le Moal
  2025-12-20  0:05         ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2025-12-19 23:06 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Jens Axboe, Ming Lei, James E.J. Bottomley

On 12/20/25 02:35, Bart Van Assche wrote:
> On 12/16/25 7:24 PM, Damien Le Moal wrote:
>> On 12/17/25 07:30, Bart Van Assche wrote:
>>> The SCSI core uses the budget map to restrict the number of commands
>>> that are in flight per logical unit. That limit check can be left out if
>>> host->cmd_per_lun >= host->can_queue and if the host tag set is shared
>>> across all hardware queues or if there is only one hardware queue  Since
>>
>> Missing a period at the end of the sentence (before Since). But more
>> importantly, this does not explain why the above is true, and frankly, I do not
>> see it...
> Hi Damien,
> 
> The purpose of the SCSI device budget map is to prevent that the queue
> depth limit for that SCSI device is exceeded. If there is only a single
> hardware queue or there is a host-wide tag set and host->cmd_per_lun is
> identical to host->can_queue, it is not possible that the queue depth
> for a single SCSI device is exceeded and hence the SCSI device budget
> map is not needed.

Still very confusing because as far as I understand things, a host is not
necessarily a LUN/block device (you can have several devices/LUNs on the same
host). So in general host->can_queue != device max queue depth. Also, I am not
entirely sure if host->cmd_per_lun and max queue depth of the LUN are the same
thing, given that SCSI does not define a maximum device queue depth...

> Please help with reviewing the ATA patch in this series.

For AHCI, we are dealing with single queue devices, always. For this case, I do
not think that the scsi budget is needed since we will always have scsi tag ==
ATA tag, between 0 and 31. So if you can allocate a tag, you can always submit
the command.

But for libsas HBAs, I am not sure at all if what you did is solid/works,
because I still do not understand it. Please provide more detailed explanations
in your code (comments) and commit messages to better explain what you are doing
is safe.

> 
> Thanks,
> 
> Bart.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags
  2025-12-19 23:06       ` Damien Le Moal
@ 2025-12-20  0:05         ` Bart Van Assche
  2025-12-20  0:13           ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2025-12-20  0:05 UTC (permalink / raw)
  To: Damien Le Moal, Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Jens Axboe, Ming Lei, James E.J. Bottomley

On 12/19/25 3:06 PM, Damien Le Moal wrote:
> On 12/20/25 02:35, Bart Van Assche wrote:
>> On 12/16/25 7:24 PM, Damien Le Moal wrote:
>>> On 12/17/25 07:30, Bart Van Assche wrote:
>>>> The SCSI core uses the budget map to restrict the number of commands
>>>> that are in flight per logical unit. That limit check can be left out if
>>>> host->cmd_per_lun >= host->can_queue and if the host tag set is shared
>>>> across all hardware queues or if there is only one hardware queue  Since
>>>
>>> Missing a period at the end of the sentence (before Since). But more
>>> importantly, this does not explain why the above is true, and frankly, I do not
>>> see it...
>> Hi Damien,
>>
>> The purpose of the SCSI device budget map is to prevent that the queue
>> depth limit for that SCSI device is exceeded. If there is only a single
>> hardware queue or there is a host-wide tag set and host->cmd_per_lun is
>> identical to host->can_queue, it is not possible that the queue depth
>> for a single SCSI device is exceeded and hence the SCSI device budget
>> map is not needed.
> 
> Still very confusing because as far as I understand things, a host is not
> necessarily a LUN/block device (you can have several devices/LUNs on the same
> host). So in general host->can_queue != device max queue depth. Also, I am not
> entirely sure if host->cmd_per_lun and max queue depth of the LUN are the same
> thing, given that SCSI does not define a maximum device queue depth...

Hi Damien,

There are important use cases, e.g. the UFS driver, where
host->can_queue is identical to the maximum queue depth per logical
unit. A single UFS device typically supports multiple logical units.

>> Please help with reviewing the ATA patch in this series.
> 
> For AHCI, we are dealing with single queue devices, always. For this case, I do
> not think that the scsi budget is needed since we will always have scsi tag ==
> ATA tag, between 0 and 31. So if you can allocate a tag, you can always submit
> the command.
> 
> But for libsas HBAs, I am not sure at all if what you did is solid/works,
> because I still do not understand it. Please provide more detailed explanations
> in your code (comments) and commit messages to better explain what you are doing
> is safe.

I plan to modify scsi_needs_budget_map() in patch 6/6 such that SCSI
hosts that define a .change_queue_depth method and/or that set
.track_queue_depth. This will prevent that this optimization applies to
libsas HBAs. From include/scsi/libsas.h:

#define __LIBSAS_SHT_BASE						\
	[ ... ]
	.change_queue_depth		= sas_change_queue_depth,	\
	[ ... ]

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags
  2025-12-20  0:05         ` Bart Van Assche
@ 2025-12-20  0:13           ` Damien Le Moal
  2025-12-20  0:28             ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2025-12-20  0:13 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Jens Axboe, Ming Lei, James E.J. Bottomley

On 12/20/25 09:05, Bart Van Assche wrote:
> On 12/19/25 3:06 PM, Damien Le Moal wrote:
>> On 12/20/25 02:35, Bart Van Assche wrote:
>>> On 12/16/25 7:24 PM, Damien Le Moal wrote:
>>>> On 12/17/25 07:30, Bart Van Assche wrote:
>>>>> The SCSI core uses the budget map to restrict the number of commands
>>>>> that are in flight per logical unit. That limit check can be left out if
>>>>> host->cmd_per_lun >= host->can_queue and if the host tag set is shared
>>>>> across all hardware queues or if there is only one hardware queue  Since
>>>>
>>>> Missing a period at the end of the sentence (before Since). But more
>>>> importantly, this does not explain why the above is true, and frankly, I do not
>>>> see it...
>>> Hi Damien,
>>>
>>> The purpose of the SCSI device budget map is to prevent that the queue
>>> depth limit for that SCSI device is exceeded. If there is only a single
>>> hardware queue or there is a host-wide tag set and host->cmd_per_lun is
>>> identical to host->can_queue, it is not possible that the queue depth
>>> for a single SCSI device is exceeded and hence the SCSI device budget
>>> map is not needed.
>>
>> Still very confusing because as far as I understand things, a host is not
>> necessarily a LUN/block device (you can have several devices/LUNs on the same
>> host). So in general host->can_queue != device max queue depth. Also, I am not
>> entirely sure if host->cmd_per_lun and max queue depth of the LUN are the same
>> thing, given that SCSI does not define a maximum device queue depth...
> 
> Hi Damien,
> 
> There are important use cases, e.g. the UFS driver, where
> host->can_queue is identical to the maximum queue depth per logical
> unit. A single UFS device typically supports multiple logical units.

I get the use case aspect of this. But the above still does not clearly explains
things. E.g.: "host->can_queue is identical to the maximum queue depth per
logical unit" -> As I mentioned, SCSI does not define/advertize a maximum queue
depth per LU (beside the transport defined maximum of course). So Is this
something that UFS defines outside of SCSI/SBC ? Also, for UFS, is it always one
host per LU ? (that would be odd, the "device" here should be the host and you
say it can have multiple LUs).

But if I understand this correctly, you are saying that a UFS device is like
SATA and can_queue == device max queue depth, so we are always guaranteed that
if you can allocate a tag, you will be able to issue the command, right ?

>>> Please help with reviewing the ATA patch in this series.
>>
>> For AHCI, we are dealing with single queue devices, always. For this case, I do
>> not think that the scsi budget is needed since we will always have scsi tag ==
>> ATA tag, between 0 and 31. So if you can allocate a tag, you can always submit
>> the command.
>>
>> But for libsas HBAs, I am not sure at all if what you did is solid/works,
>> because I still do not understand it. Please provide more detailed explanations
>> in your code (comments) and commit messages to better explain what you are doing
>> is safe.
> 
> I plan to modify scsi_needs_budget_map() in patch 6/6 such that SCSI
> hosts that define a .change_queue_depth method and/or that set
> .track_queue_depth. This will prevent that this optimization applies to
> libsas HBAs. From include/scsi/libsas.h:
> 
> #define __LIBSAS_SHT_BASE						\
> 	[ ... ]
> 	.change_queue_depth		= sas_change_queue_depth,	\
> 	[ ... ]

.change_queue_depth is defined for AHCI too, with ata_scsi_change_queue_depth().

I am still not sure what you are trying to say here and what this proposed
change will do.

> 
> Thanks,
> 
> Bart.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags
  2025-12-20  0:13           ` Damien Le Moal
@ 2025-12-20  0:28             ` Bart Van Assche
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-12-20  0:28 UTC (permalink / raw)
  To: Damien Le Moal, Martin K . Petersen
  Cc: linux-scsi, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Jens Axboe, Ming Lei, James E.J. Bottomley

On 12/19/25 4:13 PM, Damien Le Moal wrote:
> E.g.: "host->can_queue is identical to the maximum queue depth per
> logical unit" -> As I mentioned, SCSI does not define/advertize a maximum queue
> depth per LU (beside the transport defined maximum of course). So Is this
> something that UFS defines outside of SCSI/SBC ?

No, this is something that is supported since a long time by the Linux
kernel. scsi_alloc_sdev() uses host->cmd_per_lun when allocating the
SCSI device budget map. Hence, host->cmd_per_lun is the maximum queue
depth for a SCSI device. This limit is enforced since a very long time.
Before the budget map was introduced, the number of commands per SCSI
device was set as follows:

        scsi_change_queue_depth(sdev, sdev->host->cmd_per_lun ?: 1);

> Also, for UFS, is it always one
> host per LU ? (that would be odd, the "device" here should be the host and you
> say it can have multiple LUs).

No. There is one SCSI host per UFS device and there can be multiple
logical units per UFS device.

> But if I understand this correctly, you are saying that a UFS device is like
> SATA and can_queue == device max queue depth, so we are always guaranteed that
> if you can allocate a tag, you will be able to issue the command, right ?

That's correct.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-12-20  0:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-16 22:30 [PATCH v4 0/6] Increase SCSI IOPS Bart Van Assche
2025-12-16 22:30 ` [PATCH v4 1/6] block: Rename busy_tag_iter_fn into blk_mq_rq_iter_fn Bart Van Assche
2025-12-16 22:30 ` [PATCH v4 2/6] block: Introduce __blk_mq_tagset_iter() Bart Van Assche
2025-12-16 22:30 ` [PATCH v4 3/6] block: Introduce blk_mq_tagset_iter() Bart Van Assche
2025-12-16 22:30 ` [PATCH v4 4/6] ata: libata: Set .needs_budget_token Bart Van Assche
2025-12-16 22:30 ` [PATCH v4 5/6] scsi: core: Generalize scsi_device_busy() Bart Van Assche
2025-12-16 22:30 ` [PATCH v4 6/6] scsi: core: Improve IOPS in case of host-wide tags Bart Van Assche
2025-12-17  3:24   ` Damien Le Moal
2025-12-19 17:35     ` Bart Van Assche
2025-12-19 23:06       ` Damien Le Moal
2025-12-20  0:05         ` Bart Van Assche
2025-12-20  0:13           ` Damien Le Moal
2025-12-20  0:28             ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox