public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] libfc: fixup command abort handling
@ 2023-11-29 16:58 hare
  2023-11-29 16:58 ` [PATCH 1/3] libfc: don't schedule abort twice hare
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: hare @ 2023-11-29 16:58 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, linux-scsi, Hannes Reinecke

From: Hannes Reinecke <hare@suse.de>

Hi all,

when testing command timeout with the help of XDP I found
that scsi_try_to_abort_cmd() would always return 'SUCCESS'
for FCoE, even if no commands could be sent over the wire.
Which is not only surprising, but also can lead to data
corruption as commands were never aborted.
Root cause was that aborts had been sent twice, once
from FC error recovery and once from SCSI EH, with the
former inducing the latter to assume that the command
was already aborted.

As usual, comments and reviews are welcome.

Hannes Reinecke (3):
  libfc: don't schedule abort twice
  libfc: Fixup timeout error in fc_fcp_rec_error()
  libfc: map FC_TIMED_OUT to DID_TIME_OUT

 drivers/scsi/libfc/fc_fcp.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] libfc: don't schedule abort twice
  2023-11-29 16:58 [PATCH 0/3] libfc: fixup command abort handling hare
@ 2023-11-29 16:58 ` hare
  2023-12-04  8:04   ` Christoph Hellwig
  2023-11-29 16:58 ` [PATCH 2/3] libfc: Fixup timeout error in fc_fcp_rec_error() hare
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: hare @ 2023-11-29 16:58 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, linux-scsi, Hannes Reinecke

From: Hannes Reinecke <hare@suse.de>

The current FC error recovery is sending up to three REC (recovery)
frames in 10 second intervals, and as a final step sending an ABTS
after 30 seconds for the command itself.
Unfortunately sending an ABTS is also the action for the SCSI
abort handler, and the default timeout for scsi commands is also
30 seconds. This causes two ABTS to be scheduled, with the libfc
one slightly earlier. The ABTS scheduled by SCSI EH then sees the
command to be already aborted, and will always return with a 'GOOD'
status irrespective on the actual result from the first ABTS.
This causes the SCSI EH abort handler to always succeed, and
SCSI EH never to be engaged.
Fix this by not issuing an ABTS when a SCSI command is present
for the exchange, but rather wait for the abort scheduled from
SCSI EH.
And warn if an abort is already scheduled to avoid similar errors
in the future.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/libfc/fc_fcp.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
index 945adca5e72f..3f189cedf6db 100644
--- a/drivers/scsi/libfc/fc_fcp.c
+++ b/drivers/scsi/libfc/fc_fcp.c
@@ -265,6 +265,11 @@ static int fc_fcp_send_abort(struct fc_fcp_pkt *fsp)
 	if (!fsp->seq_ptr)
 		return -EINVAL;
 
+	if (fsp->state & FC_SRB_ABORT_PENDING) {
+		FC_FCP_DBG(fsp, "abort already pending\n");
+		return -EBUSY;
+	}
+
 	this_cpu_inc(fsp->lp->stats->FcpPktAborts);
 
 	fsp->state |= FC_SRB_ABORT_PENDING;
@@ -1690,11 +1695,12 @@ static void fc_fcp_recovery(struct fc_fcp_pkt *fsp, u8 code)
 	fsp->status_code = code;
 	fsp->cdb_status = 0;
 	fsp->io_status = 0;
-	/*
-	 * if this fails then we let the scsi command timer fire and
-	 * scsi-ml escalate.
-	 */
-	fc_fcp_send_abort(fsp);
+	if (!fsp->cmd)
+		/*
+		 * Only abort non-scsi commands; otherwise let the
+		 * scsi command timer fire and scsi-ml escalate.
+		 */
+		fc_fcp_send_abort(fsp);
 }
 
 /**
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] libfc: Fixup timeout error in fc_fcp_rec_error()
  2023-11-29 16:58 [PATCH 0/3] libfc: fixup command abort handling hare
  2023-11-29 16:58 ` [PATCH 1/3] libfc: don't schedule abort twice hare
@ 2023-11-29 16:58 ` hare
  2023-12-04  8:04   ` Christoph Hellwig
  2023-11-29 16:58 ` [PATCH 3/3] libfc: map FC_TIMED_OUT to DID_TIME_OUT hare
  2023-12-14  4:29 ` [PATCH 0/3] libfc: fixup command abort handling Martin K. Petersen
  3 siblings, 1 reply; 8+ messages in thread
From: hare @ 2023-11-29 16:58 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, linux-scsi, Hannes Reinecke

From: Hannes Reinecke <hare@suse.de>

We should set the status to FC_TIMED_OUT when a timeout error is
passed to fc_fcp_rec_error().

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/libfc/fc_fcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
index 3f189cedf6db..05be0810b5e3 100644
--- a/drivers/scsi/libfc/fc_fcp.c
+++ b/drivers/scsi/libfc/fc_fcp.c
@@ -1676,7 +1676,7 @@ static void fc_fcp_rec_error(struct fc_fcp_pkt *fsp, struct fc_frame *fp)
 		if (fsp->recov_retry++ < FC_MAX_RECOV_RETRY)
 			fc_fcp_rec(fsp);
 		else
-			fc_fcp_recovery(fsp, FC_ERROR);
+			fc_fcp_recovery(fsp, FC_TIMED_OUT);
 		break;
 	}
 	fc_fcp_unlock_pkt(fsp);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] libfc: map FC_TIMED_OUT to DID_TIME_OUT
  2023-11-29 16:58 [PATCH 0/3] libfc: fixup command abort handling hare
  2023-11-29 16:58 ` [PATCH 1/3] libfc: don't schedule abort twice hare
  2023-11-29 16:58 ` [PATCH 2/3] libfc: Fixup timeout error in fc_fcp_rec_error() hare
@ 2023-11-29 16:58 ` hare
  2023-12-04  8:04   ` Christoph Hellwig
  2023-12-14  4:29 ` [PATCH 0/3] libfc: fixup command abort handling Martin K. Petersen
  3 siblings, 1 reply; 8+ messages in thread
From: hare @ 2023-11-29 16:58 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, linux-scsi, Hannes Reinecke

From: Hannes Reinecke <hare@suse.de>

When an exchange is completed with FC_TIMED_OUT we should map it
to DID_TIME_OUT to inform the SCSI midlayer that this was a command
timeout; DID_BUS_BUSY implies that the command was never sent which
is not the case here.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/libfc/fc_fcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
index 05be0810b5e3..80be3a936d92 100644
--- a/drivers/scsi/libfc/fc_fcp.c
+++ b/drivers/scsi/libfc/fc_fcp.c
@@ -2062,9 +2062,9 @@ static void fc_io_compl(struct fc_fcp_pkt *fsp)
 		sc_cmd->result = (DID_PARITY << 16);
 		break;
 	case FC_TIMED_OUT:
-		FC_FCP_DBG(fsp, "Returning DID_BUS_BUSY to scsi-ml "
+		FC_FCP_DBG(fsp, "Returning DID_TIME_OUT to scsi-ml "
 			   "due to FC_TIMED_OUT\n");
-		sc_cmd->result = (DID_BUS_BUSY << 16) | fsp->io_status;
+		sc_cmd->result = (DID_TIME_OUT << 16);
 		break;
 	default:
 		FC_FCP_DBG(fsp, "Returning DID_ERROR to scsi-ml "
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] libfc: don't schedule abort twice
  2023-11-29 16:58 ` [PATCH 1/3] libfc: don't schedule abort twice hare
@ 2023-12-04  8:04   ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2023-12-04  8:04 UTC (permalink / raw)
  To: hare
  Cc: Martin K. Petersen, James Bottomley, Christoph Hellwig,
	linux-scsi, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] libfc: Fixup timeout error in fc_fcp_rec_error()
  2023-11-29 16:58 ` [PATCH 2/3] libfc: Fixup timeout error in fc_fcp_rec_error() hare
@ 2023-12-04  8:04   ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2023-12-04  8:04 UTC (permalink / raw)
  To: hare
  Cc: Martin K. Petersen, James Bottomley, Christoph Hellwig,
	linux-scsi, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] libfc: map FC_TIMED_OUT to DID_TIME_OUT
  2023-11-29 16:58 ` [PATCH 3/3] libfc: map FC_TIMED_OUT to DID_TIME_OUT hare
@ 2023-12-04  8:04   ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2023-12-04  8:04 UTC (permalink / raw)
  To: hare
  Cc: Martin K. Petersen, James Bottomley, Christoph Hellwig,
	linux-scsi, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] libfc: fixup command abort handling
  2023-11-29 16:58 [PATCH 0/3] libfc: fixup command abort handling hare
                   ` (2 preceding siblings ...)
  2023-11-29 16:58 ` [PATCH 3/3] libfc: map FC_TIMED_OUT to DID_TIME_OUT hare
@ 2023-12-14  4:29 ` Martin K. Petersen
  3 siblings, 0 replies; 8+ messages in thread
From: Martin K. Petersen @ 2023-12-14  4:29 UTC (permalink / raw)
  To: hare
  Cc: Martin K . Petersen, James Bottomley, Christoph Hellwig,
	linux-scsi, Hannes Reinecke

On Wed, 29 Nov 2023 17:58:29 +0100, hare@kernel.org wrote:

> when testing command timeout with the help of XDP I found
> that scsi_try_to_abort_cmd() would always return 'SUCCESS'
> for FCoE, even if no commands could be sent over the wire.
> Which is not only surprising, but also can lead to data
> corruption as commands were never aborted.
> Root cause was that aborts had been sent twice, once
> from FC error recovery and once from SCSI EH, with the
> former inducing the latter to assume that the command
> was already aborted.
> 
> [...]

Applied to 6.8/scsi-queue, thanks!

[1/3] libfc: don't schedule abort twice
      https://git.kernel.org/mkp/scsi/c/b57c4db5d23b
[2/3] libfc: Fixup timeout error in fc_fcp_rec_error()
      https://git.kernel.org/mkp/scsi/c/53122a49f497
[3/3] libfc: map FC_TIMED_OUT to DID_TIME_OUT
      https://git.kernel.org/mkp/scsi/c/be40572c22cc

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-12-14  4:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-29 16:58 [PATCH 0/3] libfc: fixup command abort handling hare
2023-11-29 16:58 ` [PATCH 1/3] libfc: don't schedule abort twice hare
2023-12-04  8:04   ` Christoph Hellwig
2023-11-29 16:58 ` [PATCH 2/3] libfc: Fixup timeout error in fc_fcp_rec_error() hare
2023-12-04  8:04   ` Christoph Hellwig
2023-11-29 16:58 ` [PATCH 3/3] libfc: map FC_TIMED_OUT to DID_TIME_OUT hare
2023-12-04  8:04   ` Christoph Hellwig
2023-12-14  4:29 ` [PATCH 0/3] libfc: fixup command abort handling Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox