* [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues
@ 2026-05-13 17:35 David Jeffery
2026-05-13 17:48 ` Bart Van Assche
0 siblings, 1 reply; 4+ messages in thread
From: David Jeffery @ 2026-05-13 17:35 UTC (permalink / raw)
To: linux-scsi, James E.J. Bottomley, Martin K. Petersen; +Cc: David Jeffery
While a scsi host is in a recovery state, scsi_mq_requeue_cmd will not set
the requeue list for a requeued command to be kicked in the future. The
expectation is a call to scsi_run_host_queues will kick all scsi devices
once the recovery state is cleared.
However, scsi_run_host_queues uses shost_for_each_device which uses
scsi_device_get and so will ignore devices in a partially removed state like
SDEV_CANCEL. But these devices may also have requeued requests, leaving
their requests stuck from not being kicked and causing the removal process
of the device to hang.
scsi_run_host_queues needs to run against more devices than the macro
shost_for_each_device allows. Instead of using the too limiting
scsi_device_get state checks, only ignore devices in SDEV_DEL state or
when unable to acquire a reference. Attempt to run the queues for all other
devices when scsi_run_host_queues is called.
Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
Signed-off-by: David Jeffery <djeffery@redhat.com>
---
drivers/scsi/scsi_lib.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6e8c7a42603e..bb7281dc3633 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -575,10 +575,27 @@ void scsi_requeue_run_queue(struct work_struct *work)
void scsi_run_host_queues(struct Scsi_Host *shost)
{
- struct scsi_device *sdev;
+ struct scsi_device *sdev, *prev = NULL;
+ unsigned long flags;
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ __shost_for_each_device(sdev, shost) {
+ if (sdev->sdev_state == SDEV_DEL ||
+ !get_device(&sdev->sdev_gendev))
+ continue;
+ spin_unlock_irqrestore(shost->host_lock, flags);
- shost_for_each_device(sdev, shost)
+ if (prev)
+ put_device(&prev->sdev_gendev);
scsi_run_queue(sdev->request_queue);
+
+ prev = sdev;
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ }
+ spin_unlock_irqrestore(shost->host_lock, flags);
+ if (prev)
+ put_device(&prev->sdev_gendev);
}
static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
--
2.53.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues
2026-05-13 17:35 [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues David Jeffery
@ 2026-05-13 17:48 ` Bart Van Assche
2026-05-13 18:20 ` David Jeffery
0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2026-05-13 17:48 UTC (permalink / raw)
To: David Jeffery, linux-scsi, James E.J. Bottomley,
Martin K. Petersen
On 5/13/26 10:35 AM, David Jeffery wrote:
> While a scsi host is in a recovery state, scsi_mq_requeue_cmd will not set
> the requeue list for a requeued command to be kicked in the future. The
> expectation is a call to scsi_run_host_queues will kick all scsi devices
> once the recovery state is cleared.
>
> However, scsi_run_host_queues uses shost_for_each_device which uses
> scsi_device_get and so will ignore devices in a partially removed state like
> SDEV_CANCEL. But these devices may also have requeued requests, leaving
> their requests stuck from not being kicked and causing the removal process
> of the device to hang.
>
> scsi_run_host_queues needs to run against more devices than the macro
> shost_for_each_device allows. Instead of using the too limiting
> scsi_device_get state checks, only ignore devices in SDEV_DEL state or
> when unable to acquire a reference. Attempt to run the queues for all other
> devices when scsi_run_host_queues is called.
From scsi_host.h:
static inline int scsi_host_in_recovery(struct Scsi_Host *shost)
{
return shost->shost_state == SHOST_RECOVERY ||
shost->shost_state == SHOST_CANCEL_RECOVERY ||
shost->shost_state == SHOST_DEL_RECOVERY ||
shost->tmf_in_progress;
}
This function returns false for the SDEV_CANCEL state. Hence, even with
commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary") applied, the requeue list should still be kicked for SCSI
hosts in the SDEV_CANCEL state, isn't it?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues
2026-05-13 17:48 ` Bart Van Assche
@ 2026-05-13 18:20 ` David Jeffery
2026-05-13 21:24 ` Bart Van Assche
0 siblings, 1 reply; 4+ messages in thread
From: David Jeffery @ 2026-05-13 18:20 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-scsi, James E.J. Bottomley, Martin K. Petersen
On Wed, May 13, 2026 at 1:48 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 5/13/26 10:35 AM, David Jeffery wrote:
> > While a scsi host is in a recovery state, scsi_mq_requeue_cmd will not set
> > the requeue list for a requeued command to be kicked in the future. The
> > expectation is a call to scsi_run_host_queues will kick all scsi devices
> > once the recovery state is cleared.
> >
> > However, scsi_run_host_queues uses shost_for_each_device which uses
> > scsi_device_get and so will ignore devices in a partially removed state like
> > SDEV_CANCEL. But these devices may also have requeued requests, leaving
> > their requests stuck from not being kicked and causing the removal process
> > of the device to hang.
> >
> > scsi_run_host_queues needs to run against more devices than the macro
> > shost_for_each_device allows. Instead of using the too limiting
> > scsi_device_get state checks, only ignore devices in SDEV_DEL state or
> > when unable to acquire a reference. Attempt to run the queues for all other
> > devices when scsi_run_host_queues is called.
> From scsi_host.h:
>
> static inline int scsi_host_in_recovery(struct Scsi_Host *shost)
> {
> return shost->shost_state == SHOST_RECOVERY ||
> shost->shost_state == SHOST_CANCEL_RECOVERY ||
> shost->shost_state == SHOST_DEL_RECOVERY ||
> shost->tmf_in_progress;
> }
>
> This function returns false for the SDEV_CANCEL state. Hence, even with
> commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
> necessary") applied, the requeue list should still be kicked for SCSI
> hosts in the SDEV_CANCEL state, isn't it?
SDEV_CANCEL is a scsi_device state while the scsi_host_in_recovery
macro is looking only at Scsi_Host state. An SDEV_CANCEL scsi_device
is unrelated to the scsi_host_in_recovery return value.
I will note that looking at scsi_device state in scsi_mq_requeue_cmd
in addition to the
scsi_host_in_recovery call does not fix the issue. The scsi_device can
be in SDEV_RUNNING state at the time of request requeue with the
attempted removal and transition of the scsi_device to SDEV_CANCEL
occurring later while error handling is still unfinished.
David Jeffery
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues
2026-05-13 18:20 ` David Jeffery
@ 2026-05-13 21:24 ` Bart Van Assche
0 siblings, 0 replies; 4+ messages in thread
From: Bart Van Assche @ 2026-05-13 21:24 UTC (permalink / raw)
To: David Jeffery; +Cc: linux-scsi, James E.J. Bottomley, Martin K. Petersen
On 5/13/26 11:20 AM, David Jeffery wrote:
> SDEV_CANCEL is a scsi_device state while the scsi_host_in_recovery
> macro is looking only at Scsi_Host state. An SDEV_CANCEL scsi_device
> is unrelated to the scsi_host_in_recovery return value.
>
> I will note that looking at scsi_device state in scsi_mq_requeue_cmd
> in addition to the
> scsi_host_in_recovery call does not fix the issue. The scsi_device can
> be in SDEV_RUNNING state at the time of request requeue with the
> attempted removal and transition of the scsi_device to SDEV_CANCEL
> occurring later while error handling is still unfinished.
Oops, I got confused when I wrote my previous reply. Let me take another
look at your patch.
Bart.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-13 21:24 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 17:35 [PATCH] scsi: core: run queues for all non-SDEV_DEL devices from scsi_run_host_queues David Jeffery
2026-05-13 17:48 ` Bart Van Assche
2026-05-13 18:20 ` David Jeffery
2026-05-13 21:24 ` Bart Van Assche
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox