All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: core: Kick the requeue list after inserting when flushing
@ 2024-01-11 12:05 Niklas Cassel
  2024-01-11 18:01 ` Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Niklas Cassel @ 2024-01-11 12:05 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen, Bart Van Assche
  Cc: Damien Le Moal, Niklas Cassel, Kevin Locke, linux-scsi

When libata calls ata_link_abort() to abort all ata queued commands,
it calls blk_abort_request() on the SCSI command representing each QC.

This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add()
for each SCSI command.

scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds
the command to shost->eh_cmd_q.

This will wake up the SCSI EH, and eventually the libata EH strategy
handler will be called, which calls scsi_eh_flush_done_q() to either
flush retry or flush finish each failed command.

The commands that are flush retried by scsi_eh_flush_done_q() are
done so using scsi_queue_insert().

Before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with
the second argument set to true, indicating that it should always
kick/run the requeue list after inserting.

After commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() does not kick/run the requeue list
after inserting, if the current SCSI host state is recovery (which is
the case in the libata example above).

This optimization is probably fine in most cases, as I can only assume
that most often someone will eventually kick/run the queues.

However, that is not the case for scsi_eh_flush_done_q(), where we can
see that the request gets inserted to the requeue list, but the queue is
never started after the request has been inserted, leading to the block
layer waiting for the completion of command that never gets to run.

Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host
state is most likely always in recovery when this function is called.

Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after
inserting a flush retry command, so that scsi_eh_flush_done_q() keeps
the same behavior as before commit 8b566edbdbfb ("scsi: core: Only kick
the requeue list if necessary").

Simple reproducer for the libata example above:
$ hdparm -Y /dev/sda
$ echo 1 > /sys/class/scsi_device/0\:0\:0\:0/device/delete

Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
Reported-by: Kevin Locke <kevin@kevinlocke.name>
Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@kevinlocke.name/
Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
 drivers/scsi/scsi_error.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1223d34c04da..d983f4a0e9f1 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2196,15 +2196,18 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
 	struct scsi_cmnd *scmd, *next;
 
 	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
+		struct scsi_device *sdev = scmd->device;
+
 		list_del_init(&scmd->eh_entry);
-		if (scsi_device_online(scmd->device) &&
-		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd) &&
-			scsi_eh_should_retry_cmd(scmd)) {
+		if (scsi_device_online(sdev) && !scsi_noretry_cmd(scmd) &&
+		    scsi_cmd_retry_allowed(scmd) &&
+		    scsi_eh_should_retry_cmd(scmd)) {
 			SCSI_LOG_ERROR_RECOVERY(3,
 				scmd_printk(KERN_INFO, scmd,
 					     "%s: flush retry cmd\n",
 					     current->comm));
 				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+				blk_mq_kick_requeue_list(sdev->request_queue);
 		} else {
 			/*
 			 * If just we got sense for the device (called
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: core: Kick the requeue list after inserting when flushing
  2024-01-11 12:05 [PATCH] scsi: core: Kick the requeue list after inserting when flushing Niklas Cassel
@ 2024-01-11 18:01 ` Bart Van Assche
  2024-01-11 23:41 ` Damien Le Moal
  2024-01-12  2:38 ` Martin K. Petersen
  2 siblings, 0 replies; 4+ messages in thread
From: Bart Van Assche @ 2024-01-11 18:01 UTC (permalink / raw)
  To: Niklas Cassel, James E.J. Bottomley, Martin K. Petersen
  Cc: Damien Le Moal, Kevin Locke, linux-scsi

On 1/11/24 04:05, Niklas Cassel wrote:
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 1223d34c04da..d983f4a0e9f1 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -2196,15 +2196,18 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
>   	struct scsi_cmnd *scmd, *next;
>   
>   	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
> +		struct scsi_device *sdev = scmd->device;
> +
>   		list_del_init(&scmd->eh_entry);
> -		if (scsi_device_online(scmd->device) &&
> -		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd) &&
> -			scsi_eh_should_retry_cmd(scmd)) {
> +		if (scsi_device_online(sdev) && !scsi_noretry_cmd(scmd) &&
> +		    scsi_cmd_retry_allowed(scmd) &&
> +		    scsi_eh_should_retry_cmd(scmd)) {
>   			SCSI_LOG_ERROR_RECOVERY(3,
>   				scmd_printk(KERN_INFO, scmd,
>   					     "%s: flush retry cmd\n",
>   					     current->comm));
>   				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
> +				blk_mq_kick_requeue_list(sdev->request_queue);
>   		} else {
>   			/*
>   			 * If just we got sense for the device (called

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: core: Kick the requeue list after inserting when flushing
  2024-01-11 12:05 [PATCH] scsi: core: Kick the requeue list after inserting when flushing Niklas Cassel
  2024-01-11 18:01 ` Bart Van Assche
@ 2024-01-11 23:41 ` Damien Le Moal
  2024-01-12  2:38 ` Martin K. Petersen
  2 siblings, 0 replies; 4+ messages in thread
From: Damien Le Moal @ 2024-01-11 23:41 UTC (permalink / raw)
  To: Niklas Cassel, James E.J. Bottomley, Martin K. Petersen,
	Bart Van Assche
  Cc: Kevin Locke, linux-scsi

On 1/11/24 21:05, Niklas Cassel wrote:
> When libata calls ata_link_abort() to abort all ata queued commands,
> it calls blk_abort_request() on the SCSI command representing each QC.
> 
> This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add()
> for each SCSI command.
> 
> scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds
> the command to shost->eh_cmd_q.
> 
> This will wake up the SCSI EH, and eventually the libata EH strategy
> handler will be called, which calls scsi_eh_flush_done_q() to either
> flush retry or flush finish each failed command.
> 
> The commands that are flush retried by scsi_eh_flush_done_q() are
> done so using scsi_queue_insert().
> 
> Before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
> necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with
> the second argument set to true, indicating that it should always
> kick/run the requeue list after inserting.
> 
> After commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
> necessary"), __scsi_queue_insert() does not kick/run the requeue list
> after inserting, if the current SCSI host state is recovery (which is
> the case in the libata example above).
> 
> This optimization is probably fine in most cases, as I can only assume
> that most often someone will eventually kick/run the queues.
> 
> However, that is not the case for scsi_eh_flush_done_q(), where we can
> see that the request gets inserted to the requeue list, but the queue is
> never started after the request has been inserted, leading to the block
> layer waiting for the completion of command that never gets to run.
> 
> Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host
> state is most likely always in recovery when this function is called.
> 
> Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after
> inserting a flush retry command, so that scsi_eh_flush_done_q() keeps
> the same behavior as before commit 8b566edbdbfb ("scsi: core: Only kick
> the requeue list if necessary").
> 
> Simple reproducer for the libata example above:
> $ hdparm -Y /dev/sda
> $ echo 1 > /sys/class/scsi_device/0\:0\:0\:0/device/delete
> 
> Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
> Reported-by: Kevin Locke <kevin@kevinlocke.name>
> Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@kevinlocke.name/
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
>  drivers/scsi/scsi_error.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 1223d34c04da..d983f4a0e9f1 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -2196,15 +2196,18 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
>  	struct scsi_cmnd *scmd, *next;
>  
>  	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
> +		struct scsi_device *sdev = scmd->device;
> +
>  		list_del_init(&scmd->eh_entry);
> -		if (scsi_device_online(scmd->device) &&
> -		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd) &&
> -			scsi_eh_should_retry_cmd(scmd)) {
> +		if (scsi_device_online(sdev) && !scsi_noretry_cmd(scmd) &&
> +		    scsi_cmd_retry_allowed(scmd) &&
> +		    scsi_eh_should_retry_cmd(scmd)) {
>  			SCSI_LOG_ERROR_RECOVERY(3,
>  				scmd_printk(KERN_INFO, scmd,
>  					     "%s: flush retry cmd\n",
>  					     current->comm));
>  				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
> +				blk_mq_kick_requeue_list(sdev->request_queue);
>  		} else {
>  			/*
>  			 * If just we got sense for the device (called

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: core: Kick the requeue list after inserting when flushing
  2024-01-11 12:05 [PATCH] scsi: core: Kick the requeue list after inserting when flushing Niklas Cassel
  2024-01-11 18:01 ` Bart Van Assche
  2024-01-11 23:41 ` Damien Le Moal
@ 2024-01-12  2:38 ` Martin K. Petersen
  2 siblings, 0 replies; 4+ messages in thread
From: Martin K. Petersen @ 2024-01-12  2:38 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: James E.J. Bottomley, Martin K. Petersen, Bart Van Assche,
	Damien Le Moal, Kevin Locke, linux-scsi


Niklas,

> When libata calls ata_link_abort() to abort all ata queued commands,
> it calls blk_abort_request() on the SCSI command representing each QC.

Applied to 6.8/scsi-staging, thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-01-12  2:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-11 12:05 [PATCH] scsi: core: Kick the requeue list after inserting when flushing Niklas Cassel
2024-01-11 18:01 ` Bart Van Assche
2024-01-11 23:41 ` Damien Le Moal
2024-01-12  2:38 ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.