[PATCH 0/3] Improve host-wide tag IOPS

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] Improve host-wide tag IOPS
@ 2025-09-10 21:32 Bart Van Assche
  2025-09-10 21:32 ` [PATCH 1/3] block: Export blk_mq_all_tag_iter() Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-09-10 21:32 UTC (permalink / raw)
  To: Martin K . Petersen; +Cc: linux-scsi, linux-block, Bart Van Assche

Hi Martin,

The more UFS device IOPS increase, the more SCSI budget management becomes a
bottleneck. Hence this patch series that disables SCSI budget management for
host drivers that don't need it. Please consider this patch series for the next
merge window.

Thanks,

Bart.

Bart Van Assche (3):
  block: Export blk_mq_all_tag_iter()
  ufs: core: Use scsi_device_busy()
  scsi: core: Improve IOPS in case of host-wide tags

 block/blk-mq-tag.c         |  1 +
 block/blk-mq.h             |  2 --
 drivers/scsi/scsi.c        |  7 ++++-
 drivers/scsi/scsi_lib.c    | 60 +++++++++++++++++++++++++++++++++-----
 drivers/scsi/scsi_scan.c   | 11 ++++++-
 drivers/ufs/core/ufshcd.c  |  4 +--
 include/linux/blk-mq.h     |  2 ++
 include/scsi/scsi_device.h |  5 +---
 8 files changed, 75 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] block: Export blk_mq_all_tag_iter()
  2025-09-10 21:32 [PATCH 0/3] Improve host-wide tag IOPS Bart Van Assche
@ 2025-09-10 21:32 ` Bart Van Assche
  2025-09-11  8:32   ` Ming Lei
  2025-09-10 21:32 ` [PATCH 2/3] ufs: core: Use scsi_device_busy() Bart Van Assche
  2025-09-10 21:32 ` [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags Bart Van Assche
  2 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2025-09-10 21:32 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, linux-block, Bart Van Assche, Jens Axboe,
	Christoph Hellwig, Ming Lei, John Garry

Prepare for using blk_mq_all_tag_iter() in the SCSI core.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-tag.c     | 1 +
 block/blk-mq.h         | 2 --
 include/linux/blk-mq.h | 2 ++
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d880c50629d6..1d56ee8722c5 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -419,6 +419,7 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
 {
 	__blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
 }
+EXPORT_SYMBOL(blk_mq_all_tag_iter);
 
 /**
  * blk_mq_tagset_busy_iter - iterate over all started requests in a tag set
diff --git a/block/blk-mq.h b/block/blk-mq.h
index affb2e14b56e..944668f34856 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -179,8 +179,6 @@ void blk_mq_tag_update_sched_shared_tags(struct request_queue *q);
 void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
 		void *priv);
-void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
-		void *priv);
 
 static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt,
 						 struct blk_mq_hw_ctx *hctx)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2a5a828f19a0..8ed09783f289 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -921,6 +921,8 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs);
+void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
+		void *priv);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv);
 void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] ufs: core: Use scsi_device_busy()
  2025-09-10 21:32 [PATCH 0/3] Improve host-wide tag IOPS Bart Van Assche
  2025-09-10 21:32 ` [PATCH 1/3] block: Export blk_mq_all_tag_iter() Bart Van Assche
@ 2025-09-10 21:32 ` Bart Van Assche
  2025-09-11  9:18   ` Peter Wang (王信友)
  2025-09-10 21:32 ` [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags Bart Van Assche
  2 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2025-09-10 21:32 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, linux-block, Bart Van Assche, Jens Axboe,
	Christoph Hellwig, Ming Lei, John Garry, James E.J. Bottomley,
	Peter Wang, Avri Altman, Bean Huo

Use scsi_device_busy() instead of open-coding it. This patch prepares
for skipping the SCSI device budget map initialization in certain cases.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index e2157128e3bf..e03e555cc148 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -1287,13 +1287,13 @@ static bool ufshcd_is_devfreq_scaling_required(struct ufs_hba *hba,
  */
 static u32 ufshcd_pending_cmds(struct ufs_hba *hba)
 {
-	const struct scsi_device *sdev;
+	struct scsi_device *sdev;
 	unsigned long flags;
 	u32 pending = 0;
 
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	__shost_for_each_device(sdev, hba->host)
-		pending += sbitmap_weight(&sdev->budget_map);
+		pending += scsi_device_busy(sdev);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
 	return pending;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags
  2025-09-10 21:32 [PATCH 0/3] Improve host-wide tag IOPS Bart Van Assche
  2025-09-10 21:32 ` [PATCH 1/3] block: Export blk_mq_all_tag_iter() Bart Van Assche
  2025-09-10 21:32 ` [PATCH 2/3] ufs: core: Use scsi_device_busy() Bart Van Assche
@ 2025-09-10 21:32 ` Bart Van Assche
  2025-09-11  6:40   ` Hannes Reinecke
  2025-09-11  8:15   ` John Garry
  2 siblings, 2 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-09-10 21:32 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, linux-block, Bart Van Assche, Jens Axboe,
	Christoph Hellwig, Ming Lei, John Garry, James E.J. Bottomley

The SCSI core uses the budget map to enforce the cmd_per_lun limit.
That limit cannot be exceeded if host->cmd_per_lun >= host->can_queue
and if the host tag set is shared across all hardware queues.
Since scsi_mq_get_budget() shows up in all CPU profiles for fast SCSI
devices, do not allocate a budget map if cmd_per_lun >= can_queue and
if the host tag set is shared across all hardware queues.

On my UFS 4 test setup this patch improves IOPS by 1% and reduces the
time spent in scsi_mq_get_budget() from 0.22% to 0.01%.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi.c        |  7 ++++-
 drivers/scsi/scsi_lib.c    | 60 +++++++++++++++++++++++++++++++++-----
 drivers/scsi/scsi_scan.c   | 11 ++++++-
 include/scsi/scsi_device.h |  5 +---
 4 files changed, 70 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 9a0f467264b3..06066b694d8a 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -216,6 +216,8 @@ int scsi_device_max_queue_depth(struct scsi_device *sdev)
  */
 int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
 {
+	struct Scsi_Host *shost = sdev->host;
+
 	depth = min_t(int, depth, scsi_device_max_queue_depth(sdev));
 
 	if (depth > 0) {
@@ -226,7 +228,10 @@ int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
 	if (sdev->request_queue)
 		blk_set_queue_depth(sdev->request_queue, depth);
 
-	sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
+	if (shost->host_tagset && depth >= shost->can_queue)
+		sbitmap_free(&sdev->budget_map);
+	else
+		sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
 
 	return sdev->queue_depth;
 }
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0c65ecfedfbd..c546514d1049 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -396,7 +396,8 @@ void scsi_device_unbusy(struct scsi_device *sdev, struct scsi_cmnd *cmd)
 	if (starget->can_queue > 0)
 		atomic_dec(&starget->target_busy);
 
-	sbitmap_put(&sdev->budget_map, cmd->budget_token);
+	if (sdev->budget_map.map)
+		sbitmap_put(&sdev->budget_map, cmd->budget_token);
 	cmd->budget_token = -1;
 }
 
@@ -445,6 +446,47 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
+struct sdev_in_flight_data {
+	const struct scsi_device *sdev;
+	int count;
+};
+
+static bool scsi_device_check_in_flight(struct request *rq, void *data)
+{
+	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
+	struct sdev_in_flight_data *sifd = data;
+
+	if (cmd->device == sifd->sdev)
+		sifd->count++;
+
+	return true;
+}
+
+/**
+ * scsi_device_busy() - Number of commands allocated for a SCSI device
+ * @sdev: SCSI device.
+ *
+ * Note: There is a subtle difference between this function and
+ * scsi_host_busy(). scsi_host_busy() counts the number of commands that have
+ * been started. This function counts the number of commands that have been
+ * allocated. At least the UFS driver depends on this function counting commands
+ * that have already been allocated but that have not yet been started.
+ */
+int scsi_device_busy(const struct scsi_device *sdev)
+{
+	struct sdev_in_flight_data sifd = { .sdev = sdev };
+	struct blk_mq_tag_set *set = &sdev->host->tag_set;
+
+	if (sdev->budget_map.map)
+		return sbitmap_weight(&sdev->budget_map);
+	if (WARN_ON_ONCE(!set->shared_tags))
+		return 0;
+	blk_mq_all_tag_iter(set->shared_tags, scsi_device_check_in_flight,
+			    &sifd);
+	return sifd.count;
+}
+EXPORT_SYMBOL(scsi_device_busy);
+
 static inline bool scsi_device_is_busy(struct scsi_device *sdev)
 {
 	if (scsi_device_busy(sdev) >= sdev->queue_depth)
@@ -1358,11 +1400,13 @@ scsi_device_state_check(struct scsi_device *sdev, struct request *req)
 static inline int scsi_dev_queue_ready(struct request_queue *q,
 				  struct scsi_device *sdev)
 {
-	int token;
+	int token = INT_MAX;
 
-	token = sbitmap_get(&sdev->budget_map);
-	if (token < 0)
-		return -1;
+	if (sdev->budget_map.map) {
+		token = sbitmap_get(&sdev->budget_map);
+		if (token < 0)
+			return -1;
+	}
 
 	if (!atomic_read(&sdev->device_blocked))
 		return token;
@@ -1373,7 +1417,8 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 	 */
 	if (scsi_device_busy(sdev) > 1 ||
 	    atomic_dec_return(&sdev->device_blocked) > 0) {
-		sbitmap_put(&sdev->budget_map, token);
+		if (sdev->budget_map.map)
+			sbitmap_put(&sdev->budget_map, token);
 		return -1;
 	}
 
@@ -1749,7 +1794,8 @@ static void scsi_mq_put_budget(struct request_queue *q, int budget_token)
 {
 	struct scsi_device *sdev = q->queuedata;
 
-	sbitmap_put(&sdev->budget_map, budget_token);
+	if (sdev->budget_map.map)
+		sbitmap_put(&sdev->budget_map, budget_token);
 }
 
 /*
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3c6e089e80c3..6f2d0bf0e3ec 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -218,6 +218,7 @@ static void scsi_unlock_floptical(struct scsi_device *sdev,
 static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev,
 					unsigned int depth)
 {
+	struct Scsi_Host *shost = sdev->host;
 	int new_shift = sbitmap_calculate_shift(depth);
 	bool need_alloc = !sdev->budget_map.map;
 	bool need_free = false;
@@ -225,6 +226,13 @@ static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev,
 	int ret;
 	struct sbitmap sb_backup;
 
+	if (shost->host_tagset && depth >= shost->can_queue) {
+		memflags = blk_mq_freeze_queue(sdev->request_queue);
+		sbitmap_free(&sb_backup);
+		blk_mq_unfreeze_queue(sdev->request_queue, memflags);
+		return 0;
+	}
+
 	depth = min_t(unsigned int, depth, scsi_device_max_queue_depth(sdev));
 
 	/*
@@ -1112,7 +1120,8 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	scsi_cdl_check(sdev);
 
 	sdev->max_queue_depth = sdev->queue_depth;
-	WARN_ON_ONCE(sdev->max_queue_depth > sdev->budget_map.depth);
+	WARN_ON_ONCE(sdev->budget_map.map &&
+		     sdev->max_queue_depth > sdev->budget_map.depth);
 	sdev->sdev_bflags = *bflags;
 
 	/*
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 6d6500148c4b..3c7a95fa9b67 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -687,10 +687,7 @@ static inline int scsi_device_supports_vpd(struct scsi_device *sdev)
 	return 0;
 }
 
-static inline int scsi_device_busy(struct scsi_device *sdev)
-{
-	return sbitmap_weight(&sdev->budget_map);
-}
+int scsi_device_busy(const struct scsi_device *sdev);
 
 /* Macros to access the UNIT ATTENTION counters */
 #define scsi_get_ua_new_media_ctr(sdev) \

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags
  2025-09-10 21:32 ` [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags Bart Van Assche
@ 2025-09-11  6:40   ` Hannes Reinecke
  2025-09-11 15:45     ` Bart Van Assche
  2025-09-11  8:15   ` John Garry
  1 sibling, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2025-09-11  6:40 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig, Ming Lei,
	John Garry, James E.J. Bottomley

On 9/10/25 23:32, Bart Van Assche wrote:
> The SCSI core uses the budget map to enforce the cmd_per_lun limit.
> That limit cannot be exceeded if host->cmd_per_lun >= host->can_queue
> and if the host tag set is shared across all hardware queues.
> Since scsi_mq_get_budget() shows up in all CPU profiles for fast SCSI
> devices, do not allocate a budget map if cmd_per_lun >= can_queue and
> if the host tag set is shared across all hardware queues.
> 
> On my UFS 4 test setup this patch improves IOPS by 1% and reduces the
> time spent in scsi_mq_get_budget() from 0.22% to 0.01%.
> 
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: John Garry <john.g.garry@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/scsi/scsi.c        |  7 ++++-
>   drivers/scsi/scsi_lib.c    | 60 +++++++++++++++++++++++++++++++++-----
>   drivers/scsi/scsi_scan.c   | 11 ++++++-
>   include/scsi/scsi_device.h |  5 +---
>   4 files changed, 70 insertions(+), 13 deletions(-)
> 
That is actually a valid point.
There are devices which set 'cmd_per_lun' to the same value
as 'can_queue', rendering the budget map a bit pointless.
But calling blk_mq_all_tag_iter() is more expensive than a simple
sbitmap_weight(), so the improvement isn't _that_ big
(as demonstrated by just 1% performance increase).

> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 9a0f467264b3..06066b694d8a 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -216,6 +216,8 @@ int scsi_device_max_queue_depth(struct scsi_device *sdev)
>    */
>   int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
>   {
> +	struct Scsi_Host *shost = sdev->host;
> +
>   	depth = min_t(int, depth, scsi_device_max_queue_depth(sdev));
>   
>   	if (depth > 0) {
> @@ -226,7 +228,10 @@ int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
>   	if (sdev->request_queue)
>   		blk_set_queue_depth(sdev->request_queue, depth);
>   
> -	sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
> +	if (shost->host_tagset && depth >= shost->can_queue)
> +		sbitmap_free(&sdev->budget_map);
> +	else
> +		sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
>   
>   	return sdev->queue_depth;
>   }
I would make this static, and only allocate a budget_map if the
'cmd_per_lun' setting is smaller than the 'can_queue' setting.

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0c65ecfedfbd..c546514d1049 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -396,7 +396,8 @@ void scsi_device_unbusy(struct scsi_device *sdev, struct scsi_cmnd *cmd)
>   	if (starget->can_queue > 0)
>   		atomic_dec(&starget->target_busy);
>   
> -	sbitmap_put(&sdev->budget_map, cmd->budget_token);
> +	if (sdev->budget_map.map)
> +		sbitmap_put(&sdev->budget_map, cmd->budget_token);
>   	cmd->budget_token = -1;
>   }
>   
> @@ -445,6 +446,47 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
>   	spin_unlock_irqrestore(shost->host_lock, flags);
>   }
>   
> +struct sdev_in_flight_data {
> +	const struct scsi_device *sdev;
> +	int count;
> +};
> +
> +static bool scsi_device_check_in_flight(struct request *rq, void *data)
> +{
> +	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
> +	struct sdev_in_flight_data *sifd = data;
> +
> +	if (cmd->device == sifd->sdev)
> +		sifd->count++;
> +
> +	return true;
> +}
> +
> +/**
> + * scsi_device_busy() - Number of commands allocated for a SCSI device
> + * @sdev: SCSI device.
> + *
> + * Note: There is a subtle difference between this function and
> + * scsi_host_busy(). scsi_host_busy() counts the number of commands that have
> + * been started. This function counts the number of commands that have been
> + * allocated. At least the UFS driver depends on this function counting commands

But then please don't name the callback 'scsi_device_check_in_flight',
as 'in flight' means 'commands which have been started'.
Please name it 'scsi_device_check_allocated' to make the distinction
clear.

> + * that have already been allocated but that have not yet been started.
> + */
> +int scsi_device_busy(const struct scsi_device *sdev)
> +{
> +	struct sdev_in_flight_data sifd = { .sdev = sdev };
> +	struct blk_mq_tag_set *set = &sdev->host->tag_set;
> +
> +	if (sdev->budget_map.map)
> +		return sbitmap_weight(&sdev->budget_map);
> +	if (WARN_ON_ONCE(!set->shared_tags))
> +		return 0;

One wonders: what would happen if you would return '0' here if
there is only one LUN?

> +	blk_mq_all_tag_iter(set->shared_tags, scsi_device_check_in_flight,
> +			    &sifd);
> +	return sifd.count;
> +}
> +EXPORT_SYMBOL(scsi_device_busy);
> +
>   static inline bool scsi_device_is_busy(struct scsi_device *sdev)
>   {
>   	if (scsi_device_busy(sdev) >= sdev->queue_depth)
> @@ -1358,11 +1400,13 @@ scsi_device_state_check(struct scsi_device *sdev, struct request *req)
>   static inline int scsi_dev_queue_ready(struct request_queue *q,
>   				  struct scsi_device *sdev)
>   {
> -	int token;
> +	int token = INT_MAX;
>   
> -	token = sbitmap_get(&sdev->budget_map);
> -	if (token < 0)
> -		return -1;
> +	if (sdev->budget_map.map) {
> +		token = sbitmap_get(&sdev->budget_map);
> +		if (token < 0)
> +			return -1;
> +	}
>   
>   	if (!atomic_read(&sdev->device_blocked))
>   		return token;
> @@ -1373,7 +1417,8 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
>   	 */
>   	if (scsi_device_busy(sdev) > 1 ||
>   	    atomic_dec_return(&sdev->device_blocked) > 0) {
> -		sbitmap_put(&sdev->budget_map, token);
> +		if (sdev->budget_map.map)
> +			sbitmap_put(&sdev->budget_map, token);
>   		return -1;
>   	}
>   
> @@ -1749,7 +1794,8 @@ static void scsi_mq_put_budget(struct request_queue *q, int budget_token)
>   {
>   	struct scsi_device *sdev = q->queuedata;
>   
> -	sbitmap_put(&sdev->budget_map, budget_token);
> +	if (sdev->budget_map.map)
> +		sbitmap_put(&sdev->budget_map, budget_token);
>   }
>   
>   /*
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 3c6e089e80c3..6f2d0bf0e3ec 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -218,6 +218,7 @@ static void scsi_unlock_floptical(struct scsi_device *sdev,
>   static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev,
>   					unsigned int depth)
>   {
> +	struct Scsi_Host *shost = sdev->host;
>   	int new_shift = sbitmap_calculate_shift(depth);
>   	bool need_alloc = !sdev->budget_map.map;
>   	bool need_free = false;
> @@ -225,6 +226,13 @@ static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev,
>   	int ret;
>   	struct sbitmap sb_backup;
>   
> +	if (shost->host_tagset && depth >= shost->can_queue) {
> +		memflags = blk_mq_freeze_queue(sdev->request_queue);
> +		sbitmap_free(&sb_backup);

What are you freeing here?
The sbitmap was never allocated, so you should be able to simply
return 0 here...

> +		blk_mq_unfreeze_queue(sdev->request_queue, memflags);
> +		return 0;
> +	}
> +
>   	depth = min_t(unsigned int, depth, scsi_device_max_queue_depth(sdev));
>   
>   	/*
> @@ -1112,7 +1120,8 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
>   	scsi_cdl_check(sdev);
>   
>   	sdev->max_queue_depth = sdev->queue_depth;
> -	WARN_ON_ONCE(sdev->max_queue_depth > sdev->budget_map.depth);
> +	WARN_ON_ONCE(sdev->budget_map.map &&
> +		     sdev->max_queue_depth > sdev->budget_map.depth);
>   	sdev->sdev_bflags = *bflags;
>   
>   	/*
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 6d6500148c4b..3c7a95fa9b67 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -687,10 +687,7 @@ static inline int scsi_device_supports_vpd(struct scsi_device *sdev)
>   	return 0;
>   }
>   
> -static inline int scsi_device_busy(struct scsi_device *sdev)
> -{
> -	return sbitmap_weight(&sdev->budget_map);
> -}
> +int scsi_device_busy(const struct scsi_device *sdev);
>   
>   /* Macros to access the UNIT ATTENTION counters */
>   #define scsi_get_ua_new_media_ctr(sdev) \
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags
  2025-09-10 21:32 ` [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags Bart Van Assche
  2025-09-11  6:40   ` Hannes Reinecke
@ 2025-09-11  8:15   ` John Garry
  2025-09-11 15:59     ` Bart Van Assche
  2025-09-11 17:37     ` Bart Van Assche
  1 sibling, 2 replies; 13+ messages in thread
From: John Garry @ 2025-09-11  8:15 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig, Ming Lei,
	James E.J. Bottomley

On 10/09/2025 22:32, Bart Van Assche wrote:
> The SCSI core uses the budget map to enforce the cmd_per_lun limit.

That's not strictly true, as I mentioned in 
https://lore.kernel.org/linux-scsi/e7708546-c001-4f31-b895-69720755c3ac@acm.org/T/#m16d3bf6266faefee60addb48ae4b5cdd65e90a68

cmd_per_lun may be completely ignored by the LLD setting its own sdev 
queue depth.

> That limit cannot be exceeded if host->cmd_per_lun >= host->can_queue

Can host->cmd_per_lun > host->can_queue ever be true?

> and if the host tag set is shared across all hardware queues.

Sure, but what about single HW queue scenario? We should also enforce 
host->cmd_per_lun <= host->can_queue && sdev->max_queue_depth <= 
host->can_queue for that, right?

Most/all single HW queue SCSI LLDs do not set .host_tagset (even though 
they could).

> Since scsi_mq_get_budget() shows up in all CPU profiles for fast SCSI
> devices, do not allocate a budget map if cmd_per_lun >= can_queue and
> if the host tag set is shared across all hardware queues.
> 
> On my UFS 4 test setup this patch improves IOPS by 1% and reduces the
> time spent in scsi_mq_get_budget() from 0.22% to 0.01%.
> 
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: John Garry <john.g.garry@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/scsi/scsi.c        |  7 ++++-
>   drivers/scsi/scsi_lib.c    | 60 +++++++++++++++++++++++++++++++++-----
>   drivers/scsi/scsi_scan.c   | 11 ++++++-
>   include/scsi/scsi_device.h |  5 +---
>   4 files changed, 70 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 9a0f467264b3..06066b694d8a 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -216,6 +216,8 @@ int scsi_device_max_queue_depth(struct scsi_device *sdev)
>    */
>   int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
>   {
> +	struct Scsi_Host *shost = sdev->host;
> +
>   	depth = min_t(int, depth, scsi_device_max_queue_depth(sdev));
>   
>   	if (depth > 0) {
> @@ -226,7 +228,10 @@ int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
>   	if (sdev->request_queue)
>   		blk_set_queue_depth(sdev->request_queue, depth);
>   
> -	sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
> +	if (shost->host_tagset && depth >= shost->can_queue)
> +		sbitmap_free(&sdev->budget_map);

eh, what happens if we call this twice?

> +	else
> +		sbitmap_resize(&sdev->budget_map, sdev->queue_depth);

what if we set queue_depth = shost->can_queue (and free the budget map) 
and then later set lower than shost->can_queue (and try to reference the 
budget map)?

>   
>   	return sdev->queue_depth;
>   }
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0c65ecfedfbd..c546514d1049 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -396,7 +396,8 @@ void scsi_device_unbusy(struct scsi_device *sdev, struct scsi_cmnd *cmd)
>   	if (starget->can_queue > 0)
>   		atomic_dec(&starget->target_busy);
>   
> -	sbitmap_put(&sdev->budget_map, cmd->budget_token);
> +	if (sdev->budget_map.map)
> +		sbitmap_put(&sdev->budget_map, cmd->budget_token);
>   	cmd->budget_token = -1;
>   }
>   
> @@ -445,6 +446,47 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
>   	spin_unlock_irqrestore(shost->host_lock, flags);
>   }
>   
> +struct sdev_in_flight_data {
> +	const struct scsi_device *sdev;
> +	int count;
> +};
> +
> +static bool scsi_device_check_in_flight(struct request *rq, void *data)

so this does not check the cmd state (like scsi_host_check_in_flight() 
does), but it uses the same naming (scsi_xxx_check_in_flight)

> +{
> +	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
> +	struct sdev_in_flight_data *sifd = data;
> +
> +	if (cmd->device == sifd->sdev)
> +		sifd->count++;
> +
> +	return true;
> +}
> +
> +/**
> + * scsi_device_busy() - Number of commands allocated for a SCSI device
> + * @sdev: SCSI device.
> + *
> + * Note: There is a subtle difference between this function and
> + * scsi_host_busy(). scsi_host_busy() counts the number of commands that have
> + * been started. This function counts the number of commands that have been
> + * allocated. At least the UFS driver depends on this function counting commands
> + * that have already been allocated but that have not yet been started.
> + */
> +int scsi_device_busy(const struct scsi_device *sdev)
> +{
> +	struct sdev_in_flight_data sifd = { .sdev = sdev };
> +	struct blk_mq_tag_set *set = &sdev->host->tag_set;
> +
> +	if (sdev->budget_map.map)

I really dislike these checks

> +		return sbitmap_weight(&sdev->budget_map);
> +	if (WARN_ON_ONCE(!set->shared_tags))
> +		return 0;
> +	blk_mq_all_tag_iter(set->shared_tags, scsi_device_check_in_flight,
> +			    &sifd);
> +	return sifd.count;
> +}
> +EXPORT_SYMBOL(scsi_device_busy);
> +
>   static inline bool scsi_device_is_busy(struct scsi_device *sdev)
>   {
>   	if (scsi_device_busy(sdev) >= sdev->queue_depth)
> @@ -1358,11 +1400,13 @@ scsi_device_state_check(struct scsi_device *sdev, struct request *req)
>   static inline int scsi_dev_queue_ready(struct request_queue *q,
>   				  struct scsi_device *sdev)
>   {
> -	int token;
> +	int token = INT_MAX;
>   
> -	token = sbitmap_get(&sdev->budget_map);
> -	if (token < 0)
> -		return -1;
> +	if (sdev->budget_map.map) {

this can race with a call to scsi_change_queue_depth() (which may free 
sdev->budget_map.map), right?

scsi_change_queue_depth() does not seem to do any queue freezing.

> +		token = sbitmap_get(&sdev->budget_map);
> +		if (token < 0)
> +			return -1;
> +	}
>   

thanks,
John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] block: Export blk_mq_all_tag_iter()
  2025-09-10 21:32 ` [PATCH 1/3] block: Export blk_mq_all_tag_iter() Bart Van Assche
@ 2025-09-11  8:32   ` Ming Lei
  2025-09-11 16:49     ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2025-09-11  8:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, linux-scsi, linux-block, Jens Axboe,
	Christoph Hellwig, John Garry

On Thu, Sep 11, 2025 at 5:33 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> Prepare for using blk_mq_all_tag_iter() in the SCSI core.
>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: John Garry <john.g.garry@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq-tag.c     | 1 +
>  block/blk-mq.h         | 2 --
>  include/linux/blk-mq.h | 2 ++
>  3 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index d880c50629d6..1d56ee8722c5 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -419,6 +419,7 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
>  {
>         __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
>  }
> +EXPORT_SYMBOL(blk_mq_all_tag_iter);

IMO, it isn't correct to export an API for iterating over static
requests for drivers.

Thanks


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ufs: core: Use scsi_device_busy()
  2025-09-10 21:32 ` [PATCH 2/3] ufs: core: Use scsi_device_busy() Bart Van Assche
@ 2025-09-11  9:18   ` Peter Wang (王信友)
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Wang (王信友) @ 2025-09-11  9:18 UTC (permalink / raw)
  To: bvanassche@acm.org, martin.petersen@oracle.com
  Cc: beanhuo@micron.com, hch@infradead.org, linux-scsi@vger.kernel.org,
	ming.lei@redhat.com, john.g.garry@oracle.com, axboe@kernel.dk,
	avri.altman@sandisk.com, linux-block@vger.kernel.org,
	James.Bottomley@HansenPartnership.com

On Wed, 2025-09-10 at 14:32 -0700, Bart Van Assche wrote:
> Use scsi_device_busy() instead of open-coding it. This patch prepares
> for skipping the SCSI device budget map initialization in certain
> cases.
> 
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: John Garry <john.g.garry@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufshcd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Reviewed-by: Peter Wang <peter.wang@mediatek.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags
  2025-09-11  6:40   ` Hannes Reinecke
@ 2025-09-11 15:45     ` Bart Van Assche
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-09-11 15:45 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K . Petersen
  Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig, Ming Lei,
	John Garry, James E.J. Bottomley

On 9/10/25 11:40 PM, Hannes Reinecke wrote:
> That is actually a valid point.
> There are devices which set 'cmd_per_lun' to the same value
> as 'can_queue', rendering the budget map a bit pointless.
> But calling blk_mq_all_tag_iter() is more expensive than a simple
> sbitmap_weight(), so the improvement isn't _that_ big
> (as demonstrated by just 1% performance increase).

Hi Hannes,

In the test I ran blk_mq_all_tag_iter() was not called at all from the
hot path. More in general, I think that blk_mq_all_tag_iter() should
never be called from the command processing path.

The performance improvement in my test was only 1% because the UFS
device in my test setup only supports about 100 K IOPS. The number of
IOPS supported by UFS devices is expected to increase significantly in
the near future. The faster a SCSI device is, the more IOPS will improve
by optimizing SCSI budget allocation.

>> + * that have already been allocated but that have not yet been started.
>> + */
>> +int scsi_device_busy(const struct scsi_device *sdev)
>> +{
>> +    struct sdev_in_flight_data sifd = { .sdev = sdev };
>> +    struct blk_mq_tag_set *set = &sdev->host->tag_set;
>> +
>> +    if (sdev->budget_map.map)
>> +        return sbitmap_weight(&sdev->budget_map);
>> +    if (WARN_ON_ONCE(!set->shared_tags))
>> +        return 0;
> 
> One wonders: what would happen if you would return '0' here if
> there is only one LUN?

I don't think that the one LUN case should be handled separately.
The single hardware queue case however could be treated in the same way 
as the host-wide tag set case.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags
  2025-09-11  8:15   ` John Garry
@ 2025-09-11 15:59     ` Bart Van Assche
  2025-09-11 17:37     ` Bart Van Assche
  1 sibling, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-09-11 15:59 UTC (permalink / raw)
  To: John Garry, Martin K . Petersen
  Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig, Ming Lei,
	James E.J. Bottomley

On 9/11/25 1:15 AM, John Garry wrote:
> On 10/09/2025 22:32, Bart Van Assche wrote:
>> -    sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
>> +    if (shost->host_tagset && depth >= shost->can_queue)
>> +        sbitmap_free(&sdev->budget_map);
> 
> eh, what happens if we call this twice?

I have checked that calling sbitmap_free() twice is safe.

>> +    else
>> +        sbitmap_resize(&sdev->budget_map, sdev->queue_depth);
> 
> what if we set queue_depth = shost->can_queue (and free the budget map) 
> and then later set lower than shost->can_queue (and try to reference the 
> budget map)?

I will modify scsi_change_queue_depth() such that it allocates a budget
map if sdev->budget_map.map == NULL.

>> +static bool scsi_device_check_in_flight(struct request *rq, void *data)
> 
> so this does not check the cmd state (like scsi_host_check_in_flight() 
> does), but it uses the same naming (scsi_xxx_check_in_flight)

I will rename this function.

>> @@ -1358,11 +1400,13 @@ scsi_device_state_check(struct scsi_device 
>> *sdev, struct request *req)
>>   static inline int scsi_dev_queue_ready(struct request_queue *q,
>>                     struct scsi_device *sdev)
>>   {
>> -    int token;
>> +    int token = INT_MAX;
>> -    token = sbitmap_get(&sdev->budget_map);
>> -    if (token < 0)
>> -        return -1;
>> +    if (sdev->budget_map.map) {
> 
> this can race with a call to scsi_change_queue_depth() (which may free 
> sdev->budget_map.map), right?
> 
> scsi_change_queue_depth() does not seem to do any queue freezing.

Agreed, and I think that's a longstanding bug in the upstream kernel.
scsi_change_queue_depth() should freeze the request queue before it
modifies the budget map.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] block: Export blk_mq_all_tag_iter()
  2025-09-11  8:32   ` Ming Lei
@ 2025-09-11 16:49     ` Bart Van Assche
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2025-09-11 16:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin K . Petersen, linux-scsi, linux-block, Jens Axboe,
	Christoph Hellwig, John Garry

On 9/11/25 1:32 AM, Ming Lei wrote:
> On Thu, Sep 11, 2025 at 5:33 AM Bart Van Assche <bvanassche@acm.org> wrote:
>>
>> Prepare for using blk_mq_all_tag_iter() in the SCSI core.
>>
>> Cc: Jens Axboe <axboe@kernel.dk>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: John Garry <john.g.garry@oracle.com>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>   block/blk-mq-tag.c     | 1 +
>>   block/blk-mq.h         | 2 --
>>   include/linux/blk-mq.h | 2 ++
>>   3 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index d880c50629d6..1d56ee8722c5 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -419,6 +419,7 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
>>   {
>>          __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
>>   }
>> +EXPORT_SYMBOL(blk_mq_all_tag_iter);
> 
> IMO, it isn't correct to export an API for iterating over static
> requests for drivers.

Hi Ming,

A possible alternative is to add a new function that is similar to
blk_mq_tagset_busy_iter() except that it passes 0 as fourth argument to
__blk_mq_all_tag_iter() instead of BT_TAG_ITER_STARTED. Something like
this:

void blk_mq_tagset_iter(struct blk_mq_tag_set *tagset,
		blk_mq_tag_iter_fn *fn, void *priv)
{
	unsigned int flags = tagset->flags;
	int i, nr_tags, srcu_idx;

	srcu_idx = srcu_read_lock(&tagset->tags_srcu);

	nr_tags = blk_mq_is_shared_tags(flags) ? 1 : tagset->nr_hw_queues;

	for (i = 0; i < nr_tags; i++) {
		if (tagset->tags && tagset->tags[i])
			__blk_mq_all_tag_iter(tagset->tags[i], fn, priv, 0);
	}
	srcu_read_unlock(&tagset->tags_srcu, srcu_idx);
}
EXPORT_SYMBOL(blk_mq_tagset_busy_iter);

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags
  2025-09-11  8:15   ` John Garry
  2025-09-11 15:59     ` Bart Van Assche
@ 2025-09-11 17:37     ` Bart Van Assche
  2025-09-12 14:37       ` John Garry
  1 sibling, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2025-09-11 17:37 UTC (permalink / raw)
  To: John Garry, Martin K . Petersen
  Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig, Ming Lei,
	James E.J. Bottomley

On 9/11/25 1:15 AM, John Garry wrote:
> this can race with a call to scsi_change_queue_depth() (which may free 
> sdev->budget_map.map), right?
> 
> scsi_change_queue_depth() does not seem to do any queue freezing.

Hi John,

Are there any SCSI devices left about which we care and for which queue
depth tracking is important?

scsi_change_queue_depth() can be called from interrupt context as
follows:

LLD completion interrupt
   scsi_done()
     scsi_done_internal()
       blk_mq_complete_request()
         scsi_complete()
           scsi_decide_disposition()
             scsi_handle_queue_ramp_up()
               scsi_change_queue_depth()

Freezing a request queue requires thread context. Hence, the queue ramp
up queue depth increase would have to happen asynchronously, e.g. via
queue_work().

Here is another call chain:

scsi_error_handler()
   scsi_unjam_host()
     scsi_eh_ready_devs()
       scsi_eh_host_reset()
         scsi_eh_test_devices()
           scsi_eh_tur()
             scsi_send_eh_cmnd()
               scsi_eh_completed_normally()
                 scsi_handle_queue_full()
                   scsi_track_queue_full()
                     scsi_change_queue_depth()

Freezing the request queue probably would result in a deadlock for this 
call chain. So also for this call chain, changing the queue depth would
have to happen asynchronously.

Does anyone see any other options than either making 
scsi_change_queue_depth() a no-op or making all queue depth changes
asynchronously, except the call from scsi_alloc_sdev()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags
  2025-09-11 17:37     ` Bart Van Assche
@ 2025-09-12 14:37       ` John Garry
  0 siblings, 0 replies; 13+ messages in thread
From: John Garry @ 2025-09-12 14:37 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: linux-scsi, linux-block, Jens Axboe, Christoph Hellwig, Ming Lei,
	James E.J. Bottomley

On 11/09/2025 18:37, Bart Van Assche wrote:
> On 9/11/25 1:15 AM, John Garry wrote:
>> this can race with a call to scsi_change_queue_depth() (which may free 
>> sdev->budget_map.map), right?
>>
>> scsi_change_queue_depth() does not seem to do any queue freezing.
> 
> Hi John,
> 

It was not specifically freezing which I was concerned with. It was how 
with the freeing of the sbitmap now looks racy. Just resizing the 
sbitmap would not have such issues. I mentioned queue freezing as queue 
freezing was introduced in scsi_realloc_sdev_budget_map() for freeing 
the queue - I'm not sure why, though.

> Are there any SCSI devices left about which we care and for which queue
> depth tracking is important?

many drivers still set .track_queue_depth



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-09-12 14:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-10 21:32 [PATCH 0/3] Improve host-wide tag IOPS Bart Van Assche
2025-09-10 21:32 ` [PATCH 1/3] block: Export blk_mq_all_tag_iter() Bart Van Assche
2025-09-11  8:32   ` Ming Lei
2025-09-11 16:49     ` Bart Van Assche
2025-09-10 21:32 ` [PATCH 2/3] ufs: core: Use scsi_device_busy() Bart Van Assche
2025-09-11  9:18   ` Peter Wang (王信友)
2025-09-10 21:32 ` [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags Bart Van Assche
2025-09-11  6:40   ` Hannes Reinecke
2025-09-11 15:45     ` Bart Van Assche
2025-09-11  8:15   ` John Garry
2025-09-11 15:59     ` Bart Van Assche
2025-09-11 17:37     ` Bart Van Assche
2025-09-12 14:37       ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox