Linux SCSI subsystem development
 help / color / mirror / Atom feed
* [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues
@ 2026-05-13 17:35 David Jeffery
  2026-05-13 17:48 ` Bart Van Assche
  0 siblings, 1 reply; 4+ messages in thread
From: David Jeffery @ 2026-05-13 17:35 UTC (permalink / raw)
  To: linux-scsi, James E.J. Bottomley, Martin K. Petersen; +Cc: David Jeffery

While a scsi host is in a recovery state, scsi_mq_requeue_cmd will not set
the requeue list for a requeued command to be kicked in the future. The
expectation is a call to scsi_run_host_queues will kick all scsi devices
once the recovery state is cleared.

However, scsi_run_host_queues uses shost_for_each_device which uses
scsi_device_get and so will ignore devices in a partially removed state like
SDEV_CANCEL. But these devices may also have requeued requests, leaving
their requests stuck from not being kicked and causing the removal process
of the device to hang.

scsi_run_host_queues needs to run against more devices than the macro
shost_for_each_device allows. Instead of using the too limiting
scsi_device_get state checks, only ignore devices in SDEV_DEL state or
when unable to acquire a reference. Attempt to run the queues for all other
devices when scsi_run_host_queues is called.

Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
Signed-off-by: David Jeffery <djeffery@redhat.com>
---
 drivers/scsi/scsi_lib.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6e8c7a42603e..bb7281dc3633 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -575,10 +575,27 @@ void scsi_requeue_run_queue(struct work_struct *work)
 
 void scsi_run_host_queues(struct Scsi_Host *shost)
 {
-	struct scsi_device *sdev;
+	struct scsi_device *sdev, *prev = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(shost->host_lock, flags);
+	__shost_for_each_device(sdev, shost) {
+		if (sdev->sdev_state == SDEV_DEL ||
+		    !get_device(&sdev->sdev_gendev))
+			continue;
+		spin_unlock_irqrestore(shost->host_lock, flags);
 
-	shost_for_each_device(sdev, shost)
+		if (prev)
+			put_device(&prev->sdev_gendev);
 		scsi_run_queue(sdev->request_queue);
+
+		prev = sdev;
+
+		spin_lock_irqsave(shost->host_lock, flags);
+	}
+	spin_unlock_irqrestore(shost->host_lock, flags);
+	if (prev)
+		put_device(&prev->sdev_gendev);
 }
 
 static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues
  2026-05-13 17:35 [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues David Jeffery
@ 2026-05-13 17:48 ` Bart Van Assche
  2026-05-13 18:20   ` David Jeffery
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2026-05-13 17:48 UTC (permalink / raw)
  To: David Jeffery, linux-scsi, James E.J. Bottomley,
	Martin K. Petersen

On 5/13/26 10:35 AM, David Jeffery wrote:
> While a scsi host is in a recovery state, scsi_mq_requeue_cmd will not set
> the requeue list for a requeued command to be kicked in the future. The
> expectation is a call to scsi_run_host_queues will kick all scsi devices
> once the recovery state is cleared.
> 
> However, scsi_run_host_queues uses shost_for_each_device which uses
> scsi_device_get and so will ignore devices in a partially removed state like
> SDEV_CANCEL. But these devices may also have requeued requests, leaving
> their requests stuck from not being kicked and causing the removal process
> of the device to hang.
> 
> scsi_run_host_queues needs to run against more devices than the macro
> shost_for_each_device allows. Instead of using the too limiting
> scsi_device_get state checks, only ignore devices in SDEV_DEL state or
> when unable to acquire a reference. Attempt to run the queues for all other
> devices when scsi_run_host_queues is called.
 From scsi_host.h:

static inline int scsi_host_in_recovery(struct Scsi_Host *shost)
{
	return shost->shost_state == SHOST_RECOVERY ||
		shost->shost_state == SHOST_CANCEL_RECOVERY ||
		shost->shost_state == SHOST_DEL_RECOVERY ||
		shost->tmf_in_progress;
}

This function returns false for the SDEV_CANCEL state. Hence, even with
commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary") applied, the requeue list should still be kicked for SCSI
hosts in the SDEV_CANCEL state, isn't it?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues
  2026-05-13 17:48 ` Bart Van Assche
@ 2026-05-13 18:20   ` David Jeffery
  2026-05-13 21:24     ` Bart Van Assche
  0 siblings, 1 reply; 4+ messages in thread
From: David Jeffery @ 2026-05-13 18:20 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-scsi, James E.J. Bottomley, Martin K. Petersen

On Wed, May 13, 2026 at 1:48 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 5/13/26 10:35 AM, David Jeffery wrote:
> > While a scsi host is in a recovery state, scsi_mq_requeue_cmd will not set
> > the requeue list for a requeued command to be kicked in the future. The
> > expectation is a call to scsi_run_host_queues will kick all scsi devices
> > once the recovery state is cleared.
> >
> > However, scsi_run_host_queues uses shost_for_each_device which uses
> > scsi_device_get and so will ignore devices in a partially removed state like
> > SDEV_CANCEL. But these devices may also have requeued requests, leaving
> > their requests stuck from not being kicked and causing the removal process
> > of the device to hang.
> >
> > scsi_run_host_queues needs to run against more devices than the macro
> > shost_for_each_device allows. Instead of using the too limiting
> > scsi_device_get state checks, only ignore devices in SDEV_DEL state or
> > when unable to acquire a reference. Attempt to run the queues for all other
> > devices when scsi_run_host_queues is called.
>  From scsi_host.h:
>
> static inline int scsi_host_in_recovery(struct Scsi_Host *shost)
> {
>         return shost->shost_state == SHOST_RECOVERY ||
>                 shost->shost_state == SHOST_CANCEL_RECOVERY ||
>                 shost->shost_state == SHOST_DEL_RECOVERY ||
>                 shost->tmf_in_progress;
> }
>
> This function returns false for the SDEV_CANCEL state. Hence, even with
> commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
> necessary") applied, the requeue list should still be kicked for SCSI
> hosts in the SDEV_CANCEL state, isn't it?

SDEV_CANCEL is a scsi_device state while the scsi_host_in_recovery
macro is looking only at Scsi_Host state. An SDEV_CANCEL scsi_device
is unrelated to the scsi_host_in_recovery return value.

I will note that looking at scsi_device state in scsi_mq_requeue_cmd
in addition to the
scsi_host_in_recovery call does not fix the issue. The scsi_device can
be in SDEV_RUNNING state at the time of request requeue with the
attempted removal and transition of the scsi_device to SDEV_CANCEL
occurring later while error handling is still unfinished.

David Jeffery


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues
  2026-05-13 18:20   ` David Jeffery
@ 2026-05-13 21:24     ` Bart Van Assche
  0 siblings, 0 replies; 4+ messages in thread
From: Bart Van Assche @ 2026-05-13 21:24 UTC (permalink / raw)
  To: David Jeffery; +Cc: linux-scsi, James E.J. Bottomley, Martin K. Petersen

On 5/13/26 11:20 AM, David Jeffery wrote:
> SDEV_CANCEL is a scsi_device state while the scsi_host_in_recovery
> macro is looking only at Scsi_Host state. An SDEV_CANCEL scsi_device
> is unrelated to the scsi_host_in_recovery return value.
> 
> I will note that looking at scsi_device state in scsi_mq_requeue_cmd
> in addition to the
> scsi_host_in_recovery call does not fix the issue. The scsi_device can
> be in SDEV_RUNNING state at the time of request requeue with the
> attempted removal and transition of the scsi_device to SDEV_CANCEL
> occurring later while error handling is still unfinished.

Oops, I got confused when I wrote my previous reply. Let me take another
look at your patch.

Bart.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-13 21:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 17:35 [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues David Jeffery
2026-05-13 17:48 ` Bart Van Assche
2026-05-13 18:20   ` David Jeffery
2026-05-13 21:24     ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox