[PATCH 0/4] scsi: Support devices that don't have a cmd_per

Linux SCSI subsystem development
 help / color / mirror / Atom feed

* [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit
@ 2026-04-17 22:57 Mike Christie
  2026-04-17 22:57 ` [PATCH 1/4] scsi: Fix can_queue comments Mike Christie
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Mike Christie @ 2026-04-17 22:57 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, stefanha, eperezma

The following patches were made over Linus's and Martin's 7.1 trees.
They fix an issue where for virtio-scsi we export a lot of non-scsi
devices but are getting throttled by the cmd_per_lun_limit too early.
For example we export 1 or more NVMe or block devices and would like
to just pass command to them in way where virtio-scsi's hw queue
limits match the physical hardware. Or in some cases we are doing
cgroup based throttling on the host side, and we don't want the guest
to block IO when the host knows we have extra bandwidth.

The patches add a new cmd_per_lun value so drivers can indicate
when to avoid tracking queueing at the device wide level. They
then rely on just the block layer hw queue limits. And the patches
convert virtio-scsi. They also fix some can_queue related issues
discovered while testing/reviewing.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/4] scsi: Fix can_queue comments
  2026-04-17 22:57 [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit Mike Christie
@ 2026-04-17 22:57 ` Mike Christie
  2026-04-20  8:28   ` John Garry
  2026-04-17 22:57 ` [PATCH 2/4] scsi: qedi: Fix command overqueueing Mike Christie
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Mike Christie @ 2026-04-17 22:57 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, stefanha, eperezma
  Cc: Mike Christie

The Scsi_Host can_queue comment assumes the old pre-mq can_queue use or
it assumed host_tagset is set. This syncs the scsi_host_template and
Scsi_Host comment so they are in sync.

It also redirects the nr_hw_queues comment to can_queue so we only
have to describe how can_queue and nr_hw_queues are related in one
place.

I also dropped the non-interrupt vs interrupt driven comment because it
doesn't seem to apply anymore.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 include/scsi/scsi_host.h | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index f6e12565a81d..7c747b566bc3 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -381,10 +381,13 @@ struct scsi_host_template {
 	const char *proc_name;
 
 	/*
-	 * This determines if we will use a non-interrupt driven
-	 * or an interrupt driven scheme.  It is set to the maximum number
-	 * of simultaneous commands a single hw queue in HBA will accept
-	 * excluding internal commands.
+	 * If host_tagset is set, this is the maximum number of simultaneous
+	 * commands the host will accept excluding internal commands.
+	 *
+	 * If host_tagset is not set, this is the maximum number simultaneous
+	 * commands a single hw queue in the host will accept excluding
+	 * internal commands. In other words, the total queue depth per host
+	 * is nr_hw_queues * can_queue.
 	 */
 	int can_queue;
 
@@ -631,10 +634,7 @@ struct Scsi_Host {
 
 	int this_id;
 
-	/*
-	 * Number of commands this host can handle at the same time.
-	 * This excludes reserved commands as specified by nr_reserved_cmds.
-	 */
+	/* See scsi_host_template's can_queue. */
 	int can_queue;
 	/*
 	 * Number of reserved commands to allocate, if any.
@@ -653,10 +653,7 @@ struct Scsi_Host {
 	/*
 	 * In scsi-mq mode, the number of hardware queues supported by the LLD.
 	 *
-	 * Note: it is assumed that each hardware queue has a queue depth of
-	 * can_queue. In other words, the total queue depth per host
-	 * is nr_hw_queues * can_queue. However, for when host_tagset is set,
-	 * the total queue depth is can_queue.
+	 * See scsi_host_template's can_queue for queueing requirements.
 	 */
 	unsigned nr_hw_queues;
 	unsigned nr_maps;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/4] scsi: qedi: Fix command overqueueing
  2026-04-17 22:57 [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit Mike Christie
  2026-04-17 22:57 ` [PATCH 1/4] scsi: Fix can_queue comments Mike Christie
@ 2026-04-17 22:57 ` Mike Christie
  2026-04-20 16:45   ` Bart Van Assche
  2026-04-17 22:57 ` [PATCH 3/4] scsi: Support scsi_devices without a device wide limit Mike Christie
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Mike Christie @ 2026-04-17 22:57 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, stefanha, eperezma
  Cc: Mike Christie

qedi supports a total of can_queue commands over all queues so set
host_tagset when multiple queues are used.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 227ff7bd1bdc..0be0a9f30ee2 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -657,6 +657,8 @@ static struct qedi_ctx *qedi_host_alloc(struct pci_dev *pdev)
 	qedi->max_sqes = QEDI_SQ_SIZE;
 
 	shost->nr_hw_queues = MIN_NUM_CPUS_MSIX(qedi);
+	if (shost->nr_hw_queues > 1)
+		shost->host_tagset = 1;
 
 	pci_set_drvdata(pdev, qedi);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/4] scsi: Support scsi_devices without a device wide limit
  2026-04-17 22:57 [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit Mike Christie
  2026-04-17 22:57 ` [PATCH 1/4] scsi: Fix can_queue comments Mike Christie
  2026-04-17 22:57 ` [PATCH 2/4] scsi: qedi: Fix command overqueueing Mike Christie
@ 2026-04-17 22:57 ` Mike Christie
  2026-04-20 16:51   ` Bart Van Assche
  2026-04-22 13:15   ` Hannes Reinecke
  2026-04-17 22:57 ` [PATCH 4/4] virtio-scsi: " Mike Christie
  2026-04-20 17:33 ` [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit Stefan Hajnoczi
  4 siblings, 2 replies; 23+ messages in thread
From: Mike Christie @ 2026-04-17 22:57 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, stefanha, eperezma
  Cc: Mike Christie

For virtio-scsi, we export a wide variety of non-scsi devices like
NVMe (local and RDMA/TCP based) drives and block based devices using
ublk. And then it's common to have multiple high perf devices im a LVM
volume. The problem for these setups, is we can easily hit the 4096
scsi_device queue depth limit so we end up throttling IO in the guest
when the real device can handle more IO.

In these situations we don't have a device wide limit that maps to
cmd_per_lun. We have per hw queue limits or on the host we are doing
more dynamic throttling. To allow for these types of devices, this
patch allows drivers to set SCSI_UNLIMITED_CMD_PER_LUN for the
cmd_per_lun. When set, we will then only be limited by the per hw
queue limits.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/hosts.c     |  5 +++--
 drivers/scsi/scsi_scan.c | 25 ++++++++++++++-----------
 include/scsi/scsi_host.h |  4 ++++
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index e047747d4ecf..c93c59e847c5 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -238,8 +238,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 	}
 
 	/* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */
-	shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
-				   shost->can_queue);
+	if (shost->cmd_per_lun != SCSI_UNLIMITED_CMD_PER_LUN)
+		shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
+					   shost->can_queue);
 
 	error = scsi_init_sense_cache(shost);
 	if (error)
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 7b11bc7de0e3..ecc3638c1909 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -352,18 +352,20 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 	if (scsi_device_is_pseudo_dev(sdev))
 		return sdev;
 
-	depth = sdev->host->cmd_per_lun ?: 1;
+	if (sdev->host->cmd_per_lun != SCSI_UNLIMITED_CMD_PER_LUN) {
+		depth = sdev->host->cmd_per_lun ?: 1;
 
-	/*
-	 * Use .can_queue as budget map's depth because we have to
-	 * support adjusting queue depth from sysfs. Meantime use
-	 * default device queue depth to figure out sbitmap shift
-	 * since we use this queue depth most of times.
-	 */
-	if (scsi_realloc_sdev_budget_map(sdev, depth))
-		goto out_device_destroy;
+		/*
+		 * Use .can_queue as budget map's depth because we have to
+		 * support adjusting queue depth from sysfs. Meantime use
+		 * default device queue depth to figure out sbitmap shift
+		 * since we use this queue depth most of times.
+		 */
+		if (scsi_realloc_sdev_budget_map(sdev, depth))
+			goto out_device_destroy;
 
-	scsi_change_queue_depth(sdev, depth);
+		scsi_change_queue_depth(sdev, depth);
+	}
 
 	if (shost->hostt->sdev_init) {
 		ret = shost->hostt->sdev_init(sdev);
@@ -1108,7 +1110,8 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	 * Set up budget map again since memory consumption of the map depends
 	 * on actual queue depth.
 	 */
-	if (hostt->sdev_configure)
+	if (hostt->sdev_configure &&
+	    sdev->host->cmd_per_lun != SCSI_UNLIMITED_CMD_PER_LUN)
 		scsi_realloc_sdev_budget_map(sdev, sdev->queue_depth);
 
 	if (sdev->scsi_level >= SCSI_3)
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 7c747b566bc3..7555898dba25 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -443,6 +443,7 @@ struct scsi_host_template {
 	 */
 #define SCSI_DEFAULT_MAX_SECTORS	1024
 
+#define SCSI_UNLIMITED_CMD_PER_LUN	-1
 	/*
 	 * True if this host adapter can make good use of linked commands.
 	 * This will allow more than one command to be queued to a given
@@ -451,6 +452,9 @@ struct scsi_host_template {
 	 * command block per lun, 2 for two, etc.  Do not set this to 0.
 	 * You should make sure that the host adapter will do the right thing
 	 * before you try setting this above 1.
+	 *
+	 * Adapters that do not have a device limit can set this to
+	 * SCSI_UNLIMITED_CMD_PER_LUN.
 	 */
 	short cmd_per_lun;
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/4] virtio-scsi: Support scsi_devices without a device wide limit
  2026-04-17 22:57 [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit Mike Christie
                   ` (2 preceding siblings ...)
  2026-04-17 22:57 ` [PATCH 3/4] scsi: Support scsi_devices without a device wide limit Mike Christie
@ 2026-04-17 22:57 ` Mike Christie
  2026-04-20 17:30   ` Stefan Hajnoczi
  2026-04-20 17:37   ` Bart Van Assche
  2026-04-20 17:33 ` [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit Stefan Hajnoczi
  4 siblings, 2 replies; 23+ messages in thread
From: Mike Christie @ 2026-04-17 22:57 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, stefanha, eperezma
  Cc: Mike Christie

When exporting a NVMe drive or other high perf multiqueue enabled
devices we may want to pass commands from the guest to the physical
device without been throttled for artificial device wide limits. To
allow the user to tell virtio-scsi that we don't have a LU wide
command limit, this patch uses U32_MAX as a special cmd_per_lun value.

If U32_MAX is used for cmd_per_lun, virtio-scsi will set
SCSI_UNLIMITED_CMD_PER_LUN for the scsi_device's queue limit. In this
case there is no scsi_device wide queue limit and we only go by the
the virtqueue limits (virtqueue limit is translated to scsi host
can_queue which is translated to block layer per hardware queue limit).

There's a small chance of regression where an existing user could be
using U32_MAX and we have been setting the cmd_per_lun to can_queue.
However, I think in the cases the user was doing this, they will want
the new behavior where they are only limited by can_queue because
they have been trying to get the highest queue value possible.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/virtio_scsi.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 0ed8558dad72..9b31f613ad7e 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -953,7 +953,10 @@ static int virtscsi_probe(struct virtio_device *vdev)
 	shost->can_queue = virtqueue_get_vring_size(vscsi->req_vqs[0].vq);
 
 	cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1;
-	shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
+	if (cmd_per_lun == U32_MAX)
+		shost->cmd_per_lun = SCSI_UNLIMITED_CMD_PER_LUN;
+	else
+		shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
 	shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0xFFFF;
 
 	/* LUNs > 256 are reported with format 1, so they go in the range
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] scsi: Fix can_queue comments
  2026-04-17 22:57 ` [PATCH 1/4] scsi: Fix can_queue comments Mike Christie
@ 2026-04-20  8:28   ` John Garry
  0 siblings, 0 replies; 23+ messages in thread
From: John Garry @ 2026-04-20  8:28 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, james.bottomley,
	virtualization, mst, pbonzini, stefanha, eperezma

On 17/04/2026 23:57, Mike Christie wrote:
> The Scsi_Host can_queue comment assumes the old pre-mq can_queue use or
> it assumed host_tagset is set. This syncs the scsi_host_template and
> Scsi_Host comment so they are in sync.
> 
> It also redirects the nr_hw_queues comment to can_queue so we only
> have to describe how can_queue and nr_hw_queues are related in one
> place.
> 
> I also dropped the non-interrupt vs interrupt driven comment because it
> doesn't seem to apply anymore.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

Regardless of some nitpicking:
Reviewed-by: John Garry <john.g.garry@oracle.com>

> ---
>   include/scsi/scsi_host.h | 21 +++++++++------------
>   1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index f6e12565a81d..7c747b566bc3 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -381,10 +381,13 @@ struct scsi_host_template {
>   	const char *proc_name;
>   
>   	/*
> -	 * This determines if we will use a non-interrupt driven
> -	 * or an interrupt driven scheme.  It is set to the maximum number
> -	 * of simultaneous commands a single hw queue in HBA will accept
> -	 * excluding internal commands.
> +	 * If host_tagset is set, this is the maximum number of simultaneous
> +	 * commands the host will accept excluding internal commands.

nit: I'd have "... simultaneous commands the host will accept excluding 
internal commands over all HW queues".

> +	 *
> +	 * If host_tagset is not set, this is the maximum number simultaneous
> +	 * commands a single hw queue in the host will accept excluding
> +	 * internal commands. In other words, the total queue depth per host
> +	 * is nr_hw_queues * can_queue.
>   	 */
>   	int can_queue;
>   
> @@ -631,10 +634,7 @@ struct Scsi_Host {
>   
>   	int this_id;
>   
> -	/*
> -	 * Number of commands this host can handle at the same time.
> -	 * This excludes reserved commands as specified by nr_reserved_cmds.
> -	 */
> +	/* See scsi_host_template's can_queue. */
>   	int can_queue;
>   	/*
>   	 * Number of reserved commands to allocate, if any.
> @@ -653,10 +653,7 @@ struct Scsi_Host {
>   	/*
>   	 * In scsi-mq mode, the number of hardware queues supported by the LLD.

I think that this scsi-mq comment can be removed or fixed, as we have 
not had a separate mq mode in some time. I can do that as a separate 
change if you like.

>   	 *
> -	 * Note: it is assumed that each hardware queue has a queue depth of
> -	 * can_queue. In other words, the total queue depth per host
> -	 * is nr_hw_queues * can_queue. However, for when host_tagset is set,
> -	 * the total queue depth is can_queue.
> +	 * See scsi_host_template's can_queue for queueing requirements.

nit: I am not sure if we even need to mention this. By default, people 
would or should reference scsi_host_template

>   	 */
>   	unsigned nr_hw_queues;
>   	unsigned nr_maps;


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] scsi: qedi: Fix command overqueueing
  2026-04-17 22:57 ` [PATCH 2/4] scsi: qedi: Fix command overqueueing Mike Christie
@ 2026-04-20 16:45   ` Bart Van Assche
  2026-04-20 17:47     ` Mike Christie
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2026-04-20 16:45 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, james.bottomley,
	virtualization, mst, pbonzini, stefanha, eperezma

On 4/17/26 3:57 PM, Mike Christie wrote:
> qedi supports a total of can_queue commands over all queues so set
> host_tagset when multiple queues are used.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>   drivers/scsi/qedi/qedi_main.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
> index 227ff7bd1bdc..0be0a9f30ee2 100644
> --- a/drivers/scsi/qedi/qedi_main.c
> +++ b/drivers/scsi/qedi/qedi_main.c
> @@ -657,6 +657,8 @@ static struct qedi_ctx *qedi_host_alloc(struct pci_dev *pdev)
>   	qedi->max_sqes = QEDI_SQ_SIZE;
>   
>   	shost->nr_hw_queues = MIN_NUM_CPUS_MSIX(qedi);
> +	if (shost->nr_hw_queues > 1)
> +		shost->host_tagset = 1;
>   
>   	pci_set_drvdata(pdev, qedi);
>   

Why "if (shost->nr_hw_queues > 1)"? It is safe to set host_tagset even
if shost->nr_hw_queues == 1. See e.g. "[PATCH] ufs: core: Use a host-
wide tagset in SDB mode" 
(https://lore.kernel.org/linux-scsi/20260116180800.3085233-1-bvanassche@acm.org/).

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] scsi: Support scsi_devices without a device wide limit
  2026-04-17 22:57 ` [PATCH 3/4] scsi: Support scsi_devices without a device wide limit Mike Christie
@ 2026-04-20 16:51   ` Bart Van Assche
  2026-04-22 13:15   ` Hannes Reinecke
  1 sibling, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2026-04-20 16:51 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, james.bottomley,
	virtualization, mst, pbonzini, stefanha, eperezma


On 4/17/26 3:57 PM, Mike Christie wrote:
> +#define SCSI_UNLIMITED_CMD_PER_LUN	-1
>   	/*
>   	 * True if this host adapter can make good use of linked commands.
>   	 * This will allow more than one command to be queued to a given
> @@ -451,6 +452,9 @@ struct scsi_host_template {
>   	 * command block per lun, 2 for two, etc.  Do not set this to 0.
>   	 * You should make sure that the host adapter will do the right thing
>   	 * before you try setting this above 1.
> +	 *
> +	 * Adapters that do not have a device limit can set this to
> +	 * SCSI_UNLIMITED_CMD_PER_LUN.
>   	 */
>   	short cmd_per_lun;

Please make sure that SCSI_UNLIMITED_CMD_PER_LUN has type "short" 
instead of "int". Otherwise comparisons like "shost->cmd_per_lun !=
SCSI_UNLIMITED_CMD_PER_LUN" will trigger a conversion from "short" to
"int" for "shost->cmd_per_lun" before the actual conversion happens. I
don't think that's what we want ...

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] virtio-scsi: Support scsi_devices without a device wide limit
  2026-04-17 22:57 ` [PATCH 4/4] virtio-scsi: " Mike Christie
@ 2026-04-20 17:30   ` Stefan Hajnoczi
  2026-04-20 17:37   ` Bart Van Assche
  1 sibling, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2026-04-20 17:30 UTC (permalink / raw)
  To: Mike Christie
  Cc: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, eperezma

[-- Attachment #1: Type: text/plain, Size: 1731 bytes --]

On Fri, Apr 17, 2026 at 05:57:24PM -0500, Mike Christie wrote:
> When exporting a NVMe drive or other high perf multiqueue enabled
> devices we may want to pass commands from the guest to the physical
> device without been throttled for artificial device wide limits. To
> allow the user to tell virtio-scsi that we don't have a LU wide
> command limit, this patch uses U32_MAX as a special cmd_per_lun value.
> 
> If U32_MAX is used for cmd_per_lun, virtio-scsi will set
> SCSI_UNLIMITED_CMD_PER_LUN for the scsi_device's queue limit. In this
> case there is no scsi_device wide queue limit and we only go by the
> the virtqueue limits (virtqueue limit is translated to scsi host
> can_queue which is translated to block layer per hardware queue limit).
> 
> There's a small chance of regression where an existing user could be
> using U32_MAX and we have been setting the cmd_per_lun to can_queue.
> However, I think in the cases the user was doing this, they will want
> the new behavior where they are only limited by can_queue because
> they have been trying to get the highest queue value possible.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/virtio_scsi.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)

Hi Mike,
Please send a VIRTIO spec patch documenting the new meaning of U32_MAX
in the virtio-scsi's cmd_per_lun configuration field to
virtio-comment@lists.linux.dev. See
https://github.com/oasis-tcs/virtio-spec for details.

The Linux driver patches need to be be merged after the VIRTIO spec
change has been merged so that Linux stays spec-compliant and to avoid
collisions between in-progress VIRTIO changes.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit
  2026-04-17 22:57 [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit Mike Christie
                   ` (3 preceding siblings ...)
  2026-04-17 22:57 ` [PATCH 4/4] virtio-scsi: " Mike Christie
@ 2026-04-20 17:33 ` Stefan Hajnoczi
  2026-04-22 18:05   ` Mike Christie
  4 siblings, 1 reply; 23+ messages in thread
From: Stefan Hajnoczi @ 2026-04-20 17:33 UTC (permalink / raw)
  To: Mike Christie
  Cc: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, eperezma

[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]

On Fri, Apr 17, 2026 at 05:57:20PM -0500, Mike Christie wrote:
> The following patches were made over Linus's and Martin's 7.1 trees.
> They fix an issue where for virtio-scsi we export a lot of non-scsi
> devices but are getting throttled by the cmd_per_lun_limit too early.
> For example we export 1 or more NVMe or block devices and would like
> to just pass command to them in way where virtio-scsi's hw queue
> limits match the physical hardware. Or in some cases we are doing
> cgroup based throttling on the host side, and we don't want the guest
> to block IO when the host knows we have extra bandwidth.
> 
> The patches add a new cmd_per_lun value so drivers can indicate
> when to avoid tracking queueing at the device wide level. They
> then rely on just the block layer hw queue limits. And the patches
> convert virtio-scsi. They also fix some can_queue related issues
> discovered while testing/reviewing.

Hi Mike,
Is there a difference between setting cmd_per_lun to U32_MAX with your
patches versus setting cmd_per_lun to the virtqueue size without your
patches (this can already be done today without code changes in the
driver)?

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] virtio-scsi: Support scsi_devices without a device wide limit
  2026-04-17 22:57 ` [PATCH 4/4] virtio-scsi: " Mike Christie
  2026-04-20 17:30   ` Stefan Hajnoczi
@ 2026-04-20 17:37   ` Bart Van Assche
  1 sibling, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2026-04-20 17:37 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, james.bottomley,
	virtualization, mst, pbonzini, stefanha, eperezma

On 4/17/26 3:57 PM, Mike Christie wrote:
>   	cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1;
> -	shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
> +	if (cmd_per_lun == U32_MAX)
> +		shost->cmd_per_lun = SCSI_UNLIMITED_CMD_PER_LUN;
> +	else
> +		shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);

Although this has not been introduced by this patch: shost->cmd_per_lun 
is a signed 16-bits variable and can_queue has type u32 so the above
assignment can cause integer truncation.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] scsi: qedi: Fix command overqueueing
  2026-04-20 16:45   ` Bart Van Assche
@ 2026-04-20 17:47     ` Mike Christie
  2026-04-20 18:02       ` Bart Van Assche
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Christie @ 2026-04-20 17:47 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen, linux-scsi, james.bottomley,
	virtualization, mst, pbonzini, stefanha, eperezma



On 4/20/26 11:45 AM, Bart Van Assche wrote:
> On 4/17/26 3:57 PM, Mike Christie wrote:
>> qedi supports a total of can_queue commands over all queues so set
>> host_tagset when multiple queues are used.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>   drivers/scsi/qedi/qedi_main.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/ 
>> qedi_main.c
>> index 227ff7bd1bdc..0be0a9f30ee2 100644
>> --- a/drivers/scsi/qedi/qedi_main.c
>> +++ b/drivers/scsi/qedi/qedi_main.c
>> @@ -657,6 +657,8 @@ static struct qedi_ctx *qedi_host_alloc(struct 
>> pci_dev *pdev)
>>       qedi->max_sqes = QEDI_SQ_SIZE;
>>       shost->nr_hw_queues = MIN_NUM_CPUS_MSIX(qedi);
>> +    if (shost->nr_hw_queues > 1)
>> +        shost->host_tagset = 1;
>>       pci_set_drvdata(pdev, qedi);
> 
> Why "if (shost->nr_hw_queues > 1)"? It is safe to set host_tagset even
> if shost->nr_hw_queues == 1. See e.g. "[PATCH] ufs: core: Use a host-
> wide tagset in SDB mode" (https://lore.kernel.org/linux- 
> scsi/20260116180800.3085233-1-bvanassche@acm.org/).
> 
But you can't do batching with host_tagset right?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] scsi: qedi: Fix command overqueueing
  2026-04-20 17:47     ` Mike Christie
@ 2026-04-20 18:02       ` Bart Van Assche
  2026-04-20 18:48         ` Mike Christie
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2026-04-20 18:02 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, james.bottomley,
	virtualization, mst, pbonzini, stefanha, eperezma

On 4/20/26 10:47 AM, Mike Christie wrote:
> On 4/20/26 11:45 AM, Bart Van Assche wrote:
>> On 4/17/26 3:57 PM, Mike Christie wrote:
>>> qedi supports a total of can_queue commands over all queues so set
>>> host_tagset when multiple queues are used.
>>>
>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>>> ---
>>>   drivers/scsi/qedi/qedi_main.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/ 
>>> qedi_main.c
>>> index 227ff7bd1bdc..0be0a9f30ee2 100644
>>> --- a/drivers/scsi/qedi/qedi_main.c
>>> +++ b/drivers/scsi/qedi/qedi_main.c
>>> @@ -657,6 +657,8 @@ static struct qedi_ctx *qedi_host_alloc(struct 
>>> pci_dev *pdev)
>>>       qedi->max_sqes = QEDI_SQ_SIZE;
>>>       shost->nr_hw_queues = MIN_NUM_CPUS_MSIX(qedi);
>>> +    if (shost->nr_hw_queues > 1)
>>> +        shost->host_tagset = 1;
>>>       pci_set_drvdata(pdev, qedi);
>>
>> Why "if (shost->nr_hw_queues > 1)"? It is safe to set host_tagset even
>> if shost->nr_hw_queues == 1. See e.g. "[PATCH] ufs: core: Use a host-
>> wide tagset in SDB mode" (https://lore.kernel.org/linux- 
>> scsi/20260116180800.3085233-1-bvanassche@acm.org/).
>>
> But you can't do batching with host_tagset right?

Batching? Does this refer to struct io_comp_batch or perhaps to another
batching feature?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] scsi: qedi: Fix command overqueueing
  2026-04-20 18:02       ` Bart Van Assche
@ 2026-04-20 18:48         ` Mike Christie
  0 siblings, 0 replies; 23+ messages in thread
From: Mike Christie @ 2026-04-20 18:48 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen, linux-scsi, james.bottomley,
	virtualization, mst, pbonzini, stefanha, eperezma

On 4/20/26 1:02 PM, Bart Van Assche wrote:
> On 4/20/26 10:47 AM, Mike Christie wrote:
>> On 4/20/26 11:45 AM, Bart Van Assche wrote:
>>> On 4/17/26 3:57 PM, Mike Christie wrote:
>>>> qedi supports a total of can_queue commands over all queues so set
>>>> host_tagset when multiple queues are used.
>>>>
>>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>>>> ---
>>>>   drivers/scsi/qedi/qedi_main.c | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/ qedi_main.c
>>>> index 227ff7bd1bdc..0be0a9f30ee2 100644
>>>> --- a/drivers/scsi/qedi/qedi_main.c
>>>> +++ b/drivers/scsi/qedi/qedi_main.c
>>>> @@ -657,6 +657,8 @@ static struct qedi_ctx *qedi_host_alloc(struct pci_dev *pdev)
>>>>       qedi->max_sqes = QEDI_SQ_SIZE;
>>>>       shost->nr_hw_queues = MIN_NUM_CPUS_MSIX(qedi);
>>>> +    if (shost->nr_hw_queues > 1)
>>>> +        shost->host_tagset = 1;
>>>>       pci_set_drvdata(pdev, qedi);
>>>
>>> Why "if (shost->nr_hw_queues > 1)"? It is safe to set host_tagset even
>>> if shost->nr_hw_queues == 1. See e.g. "[PATCH] ufs: core: Use a host-
>>> wide tagset in SDB mode" (https://lore.kernel.org/linux- scsi/20260116180800.3085233-1-bvanassche@acm.org/).
>>>
>> But you can't do batching with host_tagset right?
> 
> Batching? Does this refer to struct io_comp_batch or perhaps to another
> batching feature?
> 
I was talking about when we unplug a queue we try to allocate
the tags/requests in a batch. But, ignore my comment. I mixed up
BLK_MQ_F_TAG_QUEUE_SHARED and BLK_MQ_F_TAG_HCTX_SHARED.

I'll fix my patch.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] scsi: Support scsi_devices without a device wide limit
  2026-04-17 22:57 ` [PATCH 3/4] scsi: Support scsi_devices without a device wide limit Mike Christie
  2026-04-20 16:51   ` Bart Van Assche
@ 2026-04-22 13:15   ` Hannes Reinecke
  2026-04-22 18:06     ` Mike Christie
  2026-04-23 10:02     ` John Garry
  1 sibling, 2 replies; 23+ messages in thread
From: Hannes Reinecke @ 2026-04-22 13:15 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, james.bottomley,
	virtualization, mst, pbonzini, stefanha, eperezma

On 4/18/26 00:57, Mike Christie wrote:
> For virtio-scsi, we export a wide variety of non-scsi devices like
> NVMe (local and RDMA/TCP based) drives and block based devices using
> ublk. And then it's common to have multiple high perf devices im a LVM
> volume. The problem for these setups, is we can easily hit the 4096
> scsi_device queue depth limit so we end up throttling IO in the guest
> when the real device can handle more IO.
> 
> In these situations we don't have a device wide limit that maps to
> cmd_per_lun. We have per hw queue limits or on the host we are doing
> more dynamic throttling. To allow for these types of devices, this
> patch allows drivers to set SCSI_UNLIMITED_CMD_PER_LUN for the
> cmd_per_lun. When set, we will then only be limited by the per hw
> queue limits.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>   drivers/scsi/hosts.c     |  5 +++--
>   drivers/scsi/scsi_scan.c | 25 ++++++++++++++-----------
>   include/scsi/scsi_host.h |  4 ++++
>   3 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index e047747d4ecf..c93c59e847c5 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -238,8 +238,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
>   	}
>   
>   	/* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */
> -	shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
> -				   shost->can_queue);
> +	if (shost->cmd_per_lun != SCSI_UNLIMITED_CMD_PER_LUN)
> +		shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
> +					   shost->can_queue);
>   
>   	error = scsi_init_sense_cache(shost);
>   	if (error)
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 7b11bc7de0e3..ecc3638c1909 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -352,18 +352,20 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
>   	if (scsi_device_is_pseudo_dev(sdev))
>   		return sdev;
>   
> -	depth = sdev->host->cmd_per_lun ?: 1;
> +	if (sdev->host->cmd_per_lun != SCSI_UNLIMITED_CMD_PER_LUN) {
> +		depth = sdev->host->cmd_per_lun ?: 1;
>   
Why don't we use a simple flag in the host (or host template) to
indicate that cmd_per_lun should be ignored?
I'm not in favour of using magic values for a setting which
otherwise is a limit.
Look to dev_loss_tmo as a bad example ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit
  2026-04-20 17:33 ` [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit Stefan Hajnoczi
@ 2026-04-22 18:05   ` Mike Christie
  2026-04-23  9:45     ` Hannes Reinecke
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Christie @ 2026-04-22 18:05 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, eperezma

On 4/20/26 12:33 PM, Stefan Hajnoczi wrote:
> On Fri, Apr 17, 2026 at 05:57:20PM -0500, Mike Christie wrote:
>> The following patches were made over Linus's and Martin's 7.1 trees.
>> They fix an issue where for virtio-scsi we export a lot of non-scsi
>> devices but are getting throttled by the cmd_per_lun_limit too early.
>> For example we export 1 or more NVMe or block devices and would like
>> to just pass command to them in way where virtio-scsi's hw queue
>> limits match the physical hardware. Or in some cases we are doing
>> cgroup based throttling on the host side, and we don't want the guest
>> to block IO when the host knows we have extra bandwidth.
>>
>> The patches add a new cmd_per_lun value so drivers can indicate
>> when to avoid tracking queueing at the device wide level. They
>> then rely on just the block layer hw queue limits. And the patches
>> convert virtio-scsi. They also fix some can_queue related issues
>> discovered while testing/reviewing.
> 
> Hi Mike,
> Is there a difference between setting cmd_per_lun to U32_MAX with your
> patches versus setting cmd_per_lun to the virtqueue size without your
> patches (this can already be done today without code changes in the
> driver)?

The problem today is that cmd_per_lun doesn't take into account the
multiqueue queues (virtqueues in virtio) so we have a low limit of 1024
commands total. On a 32-128 vCPU VM we can easily hit that as there's
lots of IO submission threads spread over lots of those CPUs. CPUs are
then mapped to block mq queues which are mapped to virtqueues so we are
hitting them hard.

That 1024 value comes from QEMU which limits virtqueue_size to 1024.
We could increase that to 4096 or 32K or whatever. The problem is that
we would then be wasting a lot of memory as we would be allocating lots
of really large virtqueues that would go underutilized (we are submitting
10s of thousands of total IOs but not to just a single queue).

So a possibly good balance between not having to use a magic number
(U32_MAX) plus having to update the spec would be to:

1. Fix up scsi-ml and virtio-scsi so they allow cmd_per_lun to be
greater than can_queue (virtqueue_size for virtio-scsi).

2. Increase the scsi-ml cap cmd_per_lun cap from 4096 to S16_MAX
(scsi-ml uses a short for cmd_per_lun).

The only drawback to this would be that for each scsi_device we track
running IO with a sbitmap. For my cases, we don't need it, so it would
be a waste of memory. For a S16_MAX worth of commands I think it would
be 128K wasted so not too bad for us as we don't have lots of these
types of high perf devices per VM.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] scsi: Support scsi_devices without a device wide limit
  2026-04-22 13:15   ` Hannes Reinecke
@ 2026-04-22 18:06     ` Mike Christie
  2026-04-23 10:02     ` John Garry
  1 sibling, 0 replies; 23+ messages in thread
From: Mike Christie @ 2026-04-22 18:06 UTC (permalink / raw)
  To: Hannes Reinecke, martin.petersen, linux-scsi, james.bottomley,
	virtualization, mst, pbonzini, stefanha, eperezma

On 4/22/26 8:15 AM, Hannes Reinecke wrote:
> On 4/18/26 00:57, Mike Christie wrote:
>> For virtio-scsi, we export a wide variety of non-scsi devices like
>> NVMe (local and RDMA/TCP based) drives and block based devices using
>> ublk. And then it's common to have multiple high perf devices im a LVM
>> volume. The problem for these setups, is we can easily hit the 4096
>> scsi_device queue depth limit so we end up throttling IO in the guest
>> when the real device can handle more IO.
>>
>> In these situations we don't have a device wide limit that maps to
>> cmd_per_lun. We have per hw queue limits or on the host we are doing
>> more dynamic throttling. To allow for these types of devices, this
>> patch allows drivers to set SCSI_UNLIMITED_CMD_PER_LUN for the
>> cmd_per_lun. When set, we will then only be limited by the per hw
>> queue limits.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>   drivers/scsi/hosts.c     |  5 +++--
>>   drivers/scsi/scsi_scan.c | 25 ++++++++++++++-----------
>>   include/scsi/scsi_host.h |  4 ++++
>>   3 files changed, 21 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>> index e047747d4ecf..c93c59e847c5 100644
>> --- a/drivers/scsi/hosts.c
>> +++ b/drivers/scsi/hosts.c
>> @@ -238,8 +238,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
>>       }
>>         /* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */
>> -    shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
>> -                   shost->can_queue);
>> +    if (shost->cmd_per_lun != SCSI_UNLIMITED_CMD_PER_LUN)
>> +        shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
>> +                       shost->can_queue);
>>         error = scsi_init_sense_cache(shost);
>>       if (error)
>> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
>> index 7b11bc7de0e3..ecc3638c1909 100644
>> --- a/drivers/scsi/scsi_scan.c
>> +++ b/drivers/scsi/scsi_scan.c
>> @@ -352,18 +352,20 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
>>       if (scsi_device_is_pseudo_dev(sdev))
>>           return sdev;
>>   -    depth = sdev->host->cmd_per_lun ?: 1;
>> +    if (sdev->host->cmd_per_lun != SCSI_UNLIMITED_CMD_PER_LUN) {
>> +        depth = sdev->host->cmd_per_lun ?: 1;
>>   
> Why don't we use a simple flag in the host (or host template) to
> indicate that cmd_per_lun should be ignored?

That's fine with me. I don't need it per scsi_device, but had thought
someone might. We can always change it later to a scsi_device flag if
it comes up.


> I'm not in favour of using magic values for a setting which
> otherwise is a limit.
> Look to dev_loss_tmo as a bad example ...
> 
> Cheers,
> 
> Hannes


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit
  2026-04-22 18:05   ` Mike Christie
@ 2026-04-23  9:45     ` Hannes Reinecke
  2026-04-23 16:40       ` Bart Van Assche
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Reinecke @ 2026-04-23  9:45 UTC (permalink / raw)
  To: Mike Christie, Stefan Hajnoczi
  Cc: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, eperezma

On 4/22/26 20:05, Mike Christie wrote:
> On 4/20/26 12:33 PM, Stefan Hajnoczi wrote:
>> On Fri, Apr 17, 2026 at 05:57:20PM -0500, Mike Christie wrote:
>>> The following patches were made over Linus's and Martin's 7.1 trees.
>>> They fix an issue where for virtio-scsi we export a lot of non-scsi
>>> devices but are getting throttled by the cmd_per_lun_limit too early.
>>> For example we export 1 or more NVMe or block devices and would like
>>> to just pass command to them in way where virtio-scsi's hw queue
>>> limits match the physical hardware. Or in some cases we are doing
>>> cgroup based throttling on the host side, and we don't want the guest
>>> to block IO when the host knows we have extra bandwidth.
>>>
>>> The patches add a new cmd_per_lun value so drivers can indicate
>>> when to avoid tracking queueing at the device wide level. They
>>> then rely on just the block layer hw queue limits. And the patches
>>> convert virtio-scsi. They also fix some can_queue related issues
>>> discovered while testing/reviewing.
>>
>> Hi Mike,
>> Is there a difference between setting cmd_per_lun to U32_MAX with your
>> patches versus setting cmd_per_lun to the virtqueue size without your
>> patches (this can already be done today without code changes in the
>> driver)?
> 
> The problem today is that cmd_per_lun doesn't take into account the
> multiqueue queues (virtqueues in virtio) so we have a low limit of 1024
> commands total. On a 32-128 vCPU VM we can easily hit that as there's
> lots of IO submission threads spread over lots of those CPUs. CPUs are
> then mapped to block mq queues which are mapped to virtqueues so we are
> hitting them hard.
> 
> That 1024 value comes from QEMU which limits virtqueue_size to 1024.
> We could increase that to 4096 or 32K or whatever. The problem is that
> we would then be wasting a lot of memory as we would be allocating lots
> of really large virtqueues that would go underutilized (we are submitting
> 10s of thousands of total IOs but not to just a single queue).
> 
> So a possibly good balance between not having to use a magic number
> (U32_MAX) plus having to update the spec would be to:
> 
> 1. Fix up scsi-ml and virtio-scsi so they allow cmd_per_lun to be
> greater than can_queue (virtqueue_size for virtio-scsi).
> 
> 2. Increase the scsi-ml cap cmd_per_lun cap from 4096 to S16_MAX
> (scsi-ml uses a short for cmd_per_lun).
> 
> The only drawback to this would be that for each scsi_device we track
> running IO with a sbitmap. For my cases, we don't need it, so it would
> be a waste of memory. For a S16_MAX worth of commands I think it would
> be 128K wasted so not too bad for us as we don't have lots of these
> types of high perf devices per VM.
> 
Ideally I would kill cmd_per_lun.
This really is a poor man's fairness algorithm (sole purpose is to
avoid starvation with many luns), and we really should look at if
we cannot replace it with tagsets.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] scsi: Support scsi_devices without a device wide limit
  2026-04-22 13:15   ` Hannes Reinecke
  2026-04-22 18:06     ` Mike Christie
@ 2026-04-23 10:02     ` John Garry
  2026-04-23 10:32       ` Hannes Reinecke
  1 sibling, 1 reply; 23+ messages in thread
From: John Garry @ 2026-04-23 10:02 UTC (permalink / raw)
  To: Hannes Reinecke, Mike Christie, martin.petersen, linux-scsi,
	james.bottomley, virtualization, mst, pbonzini, stefanha,
	eperezma

On 22/04/2026 14:15, Hannes Reinecke wrote:
>> --- a/drivers/scsi/scsi_scan.c
>> +++ b/drivers/scsi/scsi_scan.c
>> @@ -352,18 +352,20 @@ static struct scsi_device 
>> *scsi_alloc_sdev(struct scsi_target *starget,
>>       if (scsi_device_is_pseudo_dev(sdev))
>>           return sdev;
>> -    depth = sdev->host->cmd_per_lun ?: 1;
>> +    if (sdev->host->cmd_per_lun != SCSI_UNLIMITED_CMD_PER_LUN) {
>> +        depth = sdev->host->cmd_per_lun ?: 1;
> Why don't we use a simple flag in the host (or host template) to
> indicate that cmd_per_lun should be ignored?
> I'm not in favour of using magic values for a setting which
> otherwise is a limit.
> Look to dev_loss_tmo as a bad example ...

I think it's better to not have a flag and also keep cmd_per_lun, as 
then we need to sanitize one vs the other. As mentioned in the cover 
letter response, cmd_per_lun could be got rid off / reworked.

Personally I also dislike how the scsi budget code checks for a budget 
map being non-NULL (which is for reserved scsi devices, which doesn't 
need a per-sdev budget map).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] scsi: Support scsi_devices without a device wide limit
  2026-04-23 10:02     ` John Garry
@ 2026-04-23 10:32       ` Hannes Reinecke
  2026-04-27  1:33         ` Martin K. Petersen
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Reinecke @ 2026-04-23 10:32 UTC (permalink / raw)
  To: John Garry, Mike Christie, martin.petersen, linux-scsi,
	james.bottomley, virtualization, mst, pbonzini, stefanha,
	eperezma

On 4/23/26 12:02, John Garry wrote:
> On 22/04/2026 14:15, Hannes Reinecke wrote:
>>> --- a/drivers/scsi/scsi_scan.c
>>> +++ b/drivers/scsi/scsi_scan.c
>>> @@ -352,18 +352,20 @@ static struct scsi_device 
>>> *scsi_alloc_sdev(struct scsi_target *starget,
>>>       if (scsi_device_is_pseudo_dev(sdev))
>>>           return sdev;
>>> -    depth = sdev->host->cmd_per_lun ?: 1;
>>> +    if (sdev->host->cmd_per_lun != SCSI_UNLIMITED_CMD_PER_LUN) {
>>> +        depth = sdev->host->cmd_per_lun ?: 1;
>> Why don't we use a simple flag in the host (or host template) to
>> indicate that cmd_per_lun should be ignored?
>> I'm not in favour of using magic values for a setting which
>> otherwise is a limit.
>> Look to dev_loss_tmo as a bad example ...
> 
> I think it's better to not have a flag and also keep cmd_per_lun, as 
> then we need to sanitize one vs the other. As mentioned in the cover 
> letter response, cmd_per_lun could be got rid off / reworked.
> 
> Personally I also dislike how the scsi budget code checks for a budget 
> map being non-NULL (which is for reserved scsi devices, which doesn't 
> need a per-sdev budget map).

Let's see if we can schedule a session at LSF; I really would like to
get this one sorted out.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit
  2026-04-23  9:45     ` Hannes Reinecke
@ 2026-04-23 16:40       ` Bart Van Assche
  2026-04-24  5:45         ` Hannes Reinecke
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2026-04-23 16:40 UTC (permalink / raw)
  To: Hannes Reinecke, Mike Christie, Stefan Hajnoczi
  Cc: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, eperezma

On 4/23/26 2:45 AM, Hannes Reinecke wrote:
> Ideally I would kill cmd_per_lun.
> This really is a poor man's fairness algorithm (sole purpose is to
> avoid starvation with many luns), and we really should look at if
> we cannot replace it with tagsets.

Hmm ... isn't cmd_per_lun essential since the introduction of scsi-mq?
Without a host-wide tagset, and with n hardware queues,
blk_mq_alloc_tag_set() allocates (number of hardware queues) *
(shost->can_queue + shost->nr_reserved_cmds) requests. Each request
maps to one SCSI command. Setting cmd_per_lun to shost->can_queue may
be essential to avoid BUSY responses from a SCSI device. Here is an
example from the ib_srp driver (there are many more SCSI LLDs that
follow this pattern):
* During connection establishment, the SCSI target reports the
   maximum queue depth it supports. This response is used to initialize
   can_queue and cmd_per_lun.
* Multiple hardware queues are allocated, all supporting can_queue
   commands.
* cmd_per_lun is set to can_queue to avoid BUSY responses from the SCSI
   target. My experience is that for high performance SCSI targets even
   1% BUSY responses cause a significant performance drop.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit
  2026-04-23 16:40       ` Bart Van Assche
@ 2026-04-24  5:45         ` Hannes Reinecke
  0 siblings, 0 replies; 23+ messages in thread
From: Hannes Reinecke @ 2026-04-24  5:45 UTC (permalink / raw)
  To: Bart Van Assche, Mike Christie, Stefan Hajnoczi
  Cc: martin.petersen, linux-scsi, james.bottomley, virtualization, mst,
	pbonzini, eperezma

On 4/23/26 18:40, Bart Van Assche wrote:
> On 4/23/26 2:45 AM, Hannes Reinecke wrote:
>> Ideally I would kill cmd_per_lun.
>> This really is a poor man's fairness algorithm (sole purpose is to
>> avoid starvation with many luns), and we really should look at if
>> we cannot replace it with tagsets.
> 
> Hmm ... isn't cmd_per_lun essential since the introduction of scsi-mq?
> Without a host-wide tagset, and with n hardware queues,
> blk_mq_alloc_tag_set() allocates (number of hardware queues) *
> (shost->can_queue + shost->nr_reserved_cmds) requests. Each request
> maps to one SCSI command. Setting cmd_per_lun to shost->can_queue may
> be essential to avoid BUSY responses from a SCSI device. Here is an
> example from the ib_srp driver (there are many more SCSI LLDs that
> follow this pattern):
> * During connection establishment, the SCSI target reports the
>    maximum queue depth it supports. This response is used to initialize
>    can_queue and cmd_per_lun.
> * Multiple hardware queues are allocated, all supporting can_queue
>    commands.
> * cmd_per_lun is set to can_queue to avoid BUSY responses from the SCSI
>    target. My experience is that for high performance SCSI targets even
>    1% BUSY responses cause a significant performance drop.
> 
My point being that cmd_per_lun is a single setting, and is extremely
imprecise. At the same time we already have fine-grained request QoS
available by virtue of tagsets.
Seems like we need to have a 'device_tagset' setting, too.
Hmm.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] scsi: Support scsi_devices without a device wide limit
  2026-04-23 10:32       ` Hannes Reinecke
@ 2026-04-27  1:33         ` Martin K. Petersen
  0 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2026-04-27  1:33 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: John Garry, Mike Christie, martin.petersen, linux-scsi,
	james.bottomley, virtualization, mst, pbonzini, stefanha,
	eperezma


Hannes,

> Let's see if we can schedule a session at LSF; I really would like to
> get this one sorted out.

Done.

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-04-27  1:33 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-17 22:57 [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit Mike Christie
2026-04-17 22:57 ` [PATCH 1/4] scsi: Fix can_queue comments Mike Christie
2026-04-20  8:28   ` John Garry
2026-04-17 22:57 ` [PATCH 2/4] scsi: qedi: Fix command overqueueing Mike Christie
2026-04-20 16:45   ` Bart Van Assche
2026-04-20 17:47     ` Mike Christie
2026-04-20 18:02       ` Bart Van Assche
2026-04-20 18:48         ` Mike Christie
2026-04-17 22:57 ` [PATCH 3/4] scsi: Support scsi_devices without a device wide limit Mike Christie
2026-04-20 16:51   ` Bart Van Assche
2026-04-22 13:15   ` Hannes Reinecke
2026-04-22 18:06     ` Mike Christie
2026-04-23 10:02     ` John Garry
2026-04-23 10:32       ` Hannes Reinecke
2026-04-27  1:33         ` Martin K. Petersen
2026-04-17 22:57 ` [PATCH 4/4] virtio-scsi: " Mike Christie
2026-04-20 17:30   ` Stefan Hajnoczi
2026-04-20 17:37   ` Bart Van Assche
2026-04-20 17:33 ` [PATCH 0/4] scsi: Support devices that don't have a cmd_per_lun limit Stefan Hajnoczi
2026-04-22 18:05   ` Mike Christie
2026-04-23  9:45     ` Hannes Reinecke
2026-04-23 16:40       ` Bart Van Assche
2026-04-24  5:45         ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox