public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ata: libata: Set DID_TIME_OUT for commands that actually timed out
@ 2024-10-23 10:55 Niklas Cassel
  2024-10-23 11:01 ` Niklas Cassel
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Niklas Cassel @ 2024-10-23 10:55 UTC (permalink / raw)
  To: Damien Le Moal, Niklas Cassel, Hannes Reinecke
  Cc: Martin K . Petersen, Igor Pylypiv, stable, Lai, Yi, linux-ide

When ata_qc_complete() schedules a command for EH using
ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to
req->q->mq_ops->timeout() / scsi_timeout() being called.

scsi_timeout(), if the LLDD has no abort handler (libata has no abort
handler), will set host byte to DID_TIME_OUT, and then call
scsi_eh_scmd_add() to add the command to EH.

Thus, when commands first enter libata's EH strategy_handler, all the
commands that have been added to EH will have DID_TIME_OUT set.

Commit e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands
with sense data") clears this bogus DID_TIME_OUT flag for all commands
that reached libata's EH strategy_handler.

libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that
have not received a completion at the time of entering EH.

ata_eh_worth_retry() has no special handling for AC_ERR_TIMEOUT, so by
default timed out commands will get flag ATA_QCFLAG_RETRY set and will be
retried after the port has been reset (ata_eh_link_autopsy() always
triggers a port reset if any command has AC_ERR_TIMEOUT set).

For commands that have ATA_QCFLAG_RETRY set, but also has an error flag
set (e.g. AC_ERR_TIMEOUT), ata_eh_finish() will not increment
scmd->allowed, so the command will at most be retried (scmd->allowed
number of times, which by default is set to 3).

However, scsi_eh_flush_done_q() will only retry commands for which
scsi_noretry_cmd() returns false.

For commands that has DID_TIME_OUT set, if the command is either
has FAILFAST or if the command is a passthrough command, scsi_noretry_cmd()
will return true. Thus, such commands will never be retried.

Thus, make sure that libata sets SCSI's DID_TIME_OUT flag for commands that
actually timed out (libata's AC_ERR_TIMEOUT flag), such that timed out
commands will once again not be retried if they are also a FAILFAST or
passthrough command.

Cc: stable@vger.kernel.org
Fixes: e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands with sense data")
Reported-by: Lai, Yi <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/linux-ide/ZxYz871I3Blsi30F@ly-workstation/
Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
 drivers/ata/libata-eh.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index fa41ea57a978..3b303d4ae37a 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -651,6 +651,7 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 			/* the scmd has an associated qc */
 			if (!(qc->flags & ATA_QCFLAG_EH)) {
 				/* which hasn't failed yet, timeout */
+				set_host_byte(scmd, DID_TIME_OUT);
 				qc->err_mask |= AC_ERR_TIMEOUT;
 				qc->flags |= ATA_QCFLAG_EH;
 				nr_timedout++;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] ata: libata: Set DID_TIME_OUT for commands that actually timed out
  2024-10-23 10:55 [PATCH] ata: libata: Set DID_TIME_OUT for commands that actually timed out Niklas Cassel
@ 2024-10-23 11:01 ` Niklas Cassel
  2024-10-23 11:47 ` Damien Le Moal
  2024-10-24  9:20 ` Niklas Cassel
  2 siblings, 0 replies; 4+ messages in thread
From: Niklas Cassel @ 2024-10-23 11:01 UTC (permalink / raw)
  To: Damien Le Moal, Hannes Reinecke
  Cc: Martin K . Petersen, Igor Pylypiv, stable, Lai, Yi, linux-ide

On Wed, Oct 23, 2024 at 12:55:41PM +0200, Niklas Cassel wrote:
> When ata_qc_complete() schedules a command for EH using
> ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to
> req->q->mq_ops->timeout() / scsi_timeout() being called.
> 
> scsi_timeout(), if the LLDD has no abort handler (libata has no abort
> handler), will set host byte to DID_TIME_OUT, and then call
> scsi_eh_scmd_add() to add the command to EH.
> 
> Thus, when commands first enter libata's EH strategy_handler, all the
> commands that have been added to EH will have DID_TIME_OUT set.
> 
> Commit e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands
> with sense data") clears this bogus DID_TIME_OUT flag for all commands
> that reached libata's EH strategy_handler.
> 
> libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that
> have not received a completion at the time of entering EH.
> 
> ata_eh_worth_retry() has no special handling for AC_ERR_TIMEOUT, so by
> default timed out commands will get flag ATA_QCFLAG_RETRY set and will be
> retried after the port has been reset (ata_eh_link_autopsy() always
> triggers a port reset if any command has AC_ERR_TIMEOUT set).
> 
> For commands that have ATA_QCFLAG_RETRY set, but also has an error flag
> set (e.g. AC_ERR_TIMEOUT), ata_eh_finish() will not increment
> scmd->allowed, so the command will at most be retried (scmd->allowed
> number of times, which by default is set to 3).
> 
> However, scsi_eh_flush_done_q() will only retry commands for which
> scsi_noretry_cmd() returns false.
> 
> For commands that has DID_TIME_OUT set, if the command is either
> has FAILFAST or if the command is a passthrough command, scsi_noretry_cmd()

"if the command has either FAILFAST flag set or if ..."

I could fix up this sentence when applying.


> will return true. Thus, such commands will never be retried.
> 
> Thus, make sure that libata sets SCSI's DID_TIME_OUT flag for commands that
> actually timed out (libata's AC_ERR_TIMEOUT flag), such that timed out
> commands will once again not be retried if they are also a FAILFAST or
> passthrough command.
> 
> Cc: stable@vger.kernel.org
> Fixes: e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands with sense data")
> Reported-by: Lai, Yi <yi1.lai@linux.intel.com>
> Closes: https://lore.kernel.org/linux-ide/ZxYz871I3Blsi30F@ly-workstation/
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
>  drivers/ata/libata-eh.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index fa41ea57a978..3b303d4ae37a 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -651,6 +651,7 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
>  			/* the scmd has an associated qc */
>  			if (!(qc->flags & ATA_QCFLAG_EH)) {
>  				/* which hasn't failed yet, timeout */
> +				set_host_byte(scmd, DID_TIME_OUT);
>  				qc->err_mask |= AC_ERR_TIMEOUT;
>  				qc->flags |= ATA_QCFLAG_EH;
>  				nr_timedout++;
> -- 
> 2.47.0
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ata: libata: Set DID_TIME_OUT for commands that actually timed out
  2024-10-23 10:55 [PATCH] ata: libata: Set DID_TIME_OUT for commands that actually timed out Niklas Cassel
  2024-10-23 11:01 ` Niklas Cassel
@ 2024-10-23 11:47 ` Damien Le Moal
  2024-10-24  9:20 ` Niklas Cassel
  2 siblings, 0 replies; 4+ messages in thread
From: Damien Le Moal @ 2024-10-23 11:47 UTC (permalink / raw)
  To: Niklas Cassel, Hannes Reinecke
  Cc: Martin K . Petersen, Igor Pylypiv, stable, Lai, Yi, linux-ide

On 10/23/24 19:55, Niklas Cassel wrote:
> When ata_qc_complete() schedules a command for EH using
> ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to
> req->q->mq_ops->timeout() / scsi_timeout() being called.
> 
> scsi_timeout(), if the LLDD has no abort handler (libata has no abort
> handler), will set host byte to DID_TIME_OUT, and then call
> scsi_eh_scmd_add() to add the command to EH.
> 
> Thus, when commands first enter libata's EH strategy_handler, all the
> commands that have been added to EH will have DID_TIME_OUT set.
> 
> Commit e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands
> with sense data") clears this bogus DID_TIME_OUT flag for all commands
> that reached libata's EH strategy_handler.
> 
> libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that
> have not received a completion at the time of entering EH.
> 
> ata_eh_worth_retry() has no special handling for AC_ERR_TIMEOUT, so by
> default timed out commands will get flag ATA_QCFLAG_RETRY set and will be
> retried after the port has been reset (ata_eh_link_autopsy() always
> triggers a port reset if any command has AC_ERR_TIMEOUT set).
> 
> For commands that have ATA_QCFLAG_RETRY set, but also has an error flag
> set (e.g. AC_ERR_TIMEOUT), ata_eh_finish() will not increment
> scmd->allowed, so the command will at most be retried (scmd->allowed
> number of times, which by default is set to 3).
> 
> However, scsi_eh_flush_done_q() will only retry commands for which
> scsi_noretry_cmd() returns false.
> 
> For commands that has DID_TIME_OUT set, if the command is either
> has FAILFAST or if the command is a passthrough command, scsi_noretry_cmd()
> will return true. Thus, such commands will never be retried.
> 
> Thus, make sure that libata sets SCSI's DID_TIME_OUT flag for commands that
> actually timed out (libata's AC_ERR_TIMEOUT flag), such that timed out
> commands will once again not be retried if they are also a FAILFAST or
> passthrough command.
> 
> Cc: stable@vger.kernel.org
> Fixes: e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands with sense data")
> Reported-by: Lai, Yi <yi1.lai@linux.intel.com>
> Closes: https://lore.kernel.org/linux-ide/ZxYz871I3Blsi30F@ly-workstation/
> Signed-off-by: Niklas Cassel <cassel@kernel.org>

Looks good.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> ---
>  drivers/ata/libata-eh.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index fa41ea57a978..3b303d4ae37a 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -651,6 +651,7 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
>  			/* the scmd has an associated qc */
>  			if (!(qc->flags & ATA_QCFLAG_EH)) {
>  				/* which hasn't failed yet, timeout */
> +				set_host_byte(scmd, DID_TIME_OUT);
>  				qc->err_mask |= AC_ERR_TIMEOUT;
>  				qc->flags |= ATA_QCFLAG_EH;
>  				nr_timedout++;


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ata: libata: Set DID_TIME_OUT for commands that actually timed out
  2024-10-23 10:55 [PATCH] ata: libata: Set DID_TIME_OUT for commands that actually timed out Niklas Cassel
  2024-10-23 11:01 ` Niklas Cassel
  2024-10-23 11:47 ` Damien Le Moal
@ 2024-10-24  9:20 ` Niklas Cassel
  2 siblings, 0 replies; 4+ messages in thread
From: Niklas Cassel @ 2024-10-24  9:20 UTC (permalink / raw)
  To: Damien Le Moal, Hannes Reinecke, Niklas Cassel
  Cc: Martin K . Petersen, Igor Pylypiv, stable, Lai, Yi, linux-ide

On Wed, 23 Oct 2024 12:55:41 +0200, Niklas Cassel wrote:
> When ata_qc_complete() schedules a command for EH using
> ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to
> req->q->mq_ops->timeout() / scsi_timeout() being called.
> 
> scsi_timeout(), if the LLDD has no abort handler (libata has no abort
> handler), will set host byte to DID_TIME_OUT, and then call
> scsi_eh_scmd_add() to add the command to EH.
> 
> [...]

Applied to libata/linux.git (for-6.12-fixes), thanks!

[1/1] ata: libata: Set DID_TIME_OUT for commands that actually timed out
      https://git.kernel.org/libata/linux/c/8e59a2a5

Kind regards,
Niklas


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-10-24  9:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 10:55 [PATCH] ata: libata: Set DID_TIME_OUT for commands that actually timed out Niklas Cassel
2024-10-23 11:01 ` Niklas Cassel
2024-10-23 11:47 ` Damien Le Moal
2024-10-24  9:20 ` Niklas Cassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox