public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: "Martin K . Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org
Cc: Sathya Prakash <sathya.prakash@broadcom.com>,
	Kashyap Desai <kashyap.desai@broadcom.com>,
	Sreekanth Reddy <sreekanth.reddy@broadcom.com>,
	Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>,
	mpi3mr-linuxdrv.pdl@broadcom.com,
	MPT-FusionLinux.pdl@broadcom.com
Subject: [PATCH 1/2] scsi: mpi3mr: Correctly handle ATA device errors
Date: Fri,  6 Jun 2025 14:27:46 +0900	[thread overview]
Message-ID: <20250606052747.742998-2-dlemoal@kernel.org> (raw)
In-Reply-To: <20250606052747.742998-1-dlemoal@kernel.org>

With the ATA error model, an NCQ command failure always triggers an
abort (termination) of all NCQ commands queued on the device. In such
case, the SAT or the host must handle the failed command according to
the command sense data and immediately retry all other NCQ commands that
were aborted due to the failed NCQ command.

For SAS HBAs controlled by the mpi3mr driver, NCQ command aborts are not
handled by the HBA SAT and sent back to the host, with an ioc log
information equal to 0x31080000 (IOC_LOGINFO_PREFIX_PL with the PL code
PL_LOGINFO_CODE_SATA_NCQ_FAIL_ALL_CMDS_AFTR_ERR). The function
mpi3mr_process_op_reply_desc() always forces a retry of commands
terminated with the status MPI3_IOCSTATUS_SCSI_IOC_TERMINATED using the
SCSI result DID_SOFT_ERROR, regardless of the ioc_loginfo for the
command. This correctly forces the retry of collateral NCQ abort
commands, but with the retry counter for the command being incremented.
If a command to an ATA device is subject to too many retries due to
other NCQ commands failing (e.g. read commands trying to access
unreadable sectors), the collateral NCQ abort commands may be terminated
with an error as they run out of retries. This violates the SAT
specifications and causes hard-to-debug command errors.

Solve this issue by modifying the handling of the
MPI3_IOCSTATUS_SCSI_IOC_TERMINATED status to check if a command is for
an ATA device and if the command ioc_loginfo indicates an NCQ collateral
abort. If that is the case, force the command retry using the
SCSI result DID_IMM_RETRY to avoid incrementing the command retry count.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/scsi/mpi3mr/mpi3mr_os.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c
index ce444efd859e..87983ea4e06e 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_os.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_os.c
@@ -49,6 +49,13 @@ static void mpi3mr_send_event_ack(struct mpi3mr_ioc *mrioc, u8 event,
 
 #define MPI3_EVENT_WAIT_FOR_DEVICES_TO_REFRESH	(0xFFFE)
 
+/*
+ * SAS Log info code for a NCQ collateral abort after an NCQ error:
+ * IOC_LOGINFO_PREFIX_PL | PL_LOGINFO_CODE_SATA_NCQ_FAIL_ALL_CMDS_AFTR_ERR
+ * See: drivers/message/fusion/lsi/mpi_log_sas.h
+ */
+#define IOC_LOGINFO_SATA_NCQ_FAIL_AFTER_ERR	0x31080000
+
 /**
  * mpi3mr_host_tag_for_scmd - Get host tag for a scmd
  * @mrioc: Adapter instance reference
@@ -3430,7 +3437,18 @@ void mpi3mr_process_op_reply_desc(struct mpi3mr_ioc *mrioc,
 		scmd->result = DID_NO_CONNECT << 16;
 		break;
 	case MPI3_IOCSTATUS_SCSI_IOC_TERMINATED:
-		scmd->result = DID_SOFT_ERROR << 16;
+		if (ioc_loginfo == IOC_LOGINFO_SATA_NCQ_FAIL_AFTER_ERR) {
+			/*
+			 * This is a ATA NCQ command aborted due to another NCQ
+			 * command failure. We must retry this command
+			 * immediately but without incrementing its retry
+			 * counter.
+			 */
+			WARN_ON_ONCE(xfer_count != 0);
+			scmd->result = DID_IMM_RETRY << 16;
+		} else {
+			scmd->result = DID_SOFT_ERROR << 16;
+		}
 		break;
 	case MPI3_IOCSTATUS_SCSI_TASK_TERMINATED:
 	case MPI3_IOCSTATUS_SCSI_EXT_TERMINATED:
-- 
2.49.0


  reply	other threads:[~2025-06-06  5:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06  5:27 [PATCH 0/2] Improve ATA NCQ command error in mpt3sas and mpi3mr Damien Le Moal
2025-06-06  5:27 ` Damien Le Moal [this message]
2025-06-06  5:27 ` [PATCH 2/2] scsi: mpt3sas: Correctly handle ATA device errors Damien Le Moal
2025-06-09  5:50 ` [PATCH 0/2] Improve ATA NCQ command error in mpt3sas and mpi3mr Christoph Hellwig
2025-06-09  7:09   ` Yafang Shao
2025-06-09  7:17     ` Damien Le Moal
2025-06-11  3:27       ` Yafang Shao
2025-06-11  3:57         ` Damien Le Moal
2025-06-11  5:42           ` Yafang Shao
2025-06-16  2:13     ` Yafang Shao
2025-06-16  2:28       ` Damien Le Moal
2025-06-16 12:40         ` Yafang Shao
2025-06-16 20:51 ` Martin K. Petersen
2025-06-20  3:00 ` Martin K. Petersen
2025-06-20 17:28   ` Sathya Prakash Veerichetty
2025-06-25  1:44 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250606052747.742998-2-dlemoal@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=MPT-FusionLinux.pdl@broadcom.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mpi3mr-linuxdrv.pdl@broadcom.com \
    --cc=sathya.prakash@broadcom.com \
    --cc=sreekanth.reddy@broadcom.com \
    --cc=suganath-prabu.subramani@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox