public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@gmail.com>
To: Kashyap Desai <kashyap.desai@broadcom.com>,
	Sumit Saxena <sumit.saxena@broadcom.com>,
	Shivasharan S <shivasharan.srikanteshwara@broadcom.com>,
	Chandrakanth patil <chandrakanth.patil@broadcom.com>
Cc: "James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	megaraidlinux.pdl@broadcom.com, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Ionut Nechita <ionut_n2001@yahoo.com>,
	Ionut Nechita <ionut.nechita@windriver.com>,
	stable@vger.kernel.org
Subject: [PATCH] scsi: megaraid_sas: return DID_SOFT_ERROR on zero-byte DONE_WITH_ERROR
Date: Sun,  8 Feb 2026 20:06:04 +0200	[thread overview]
Message-ID: <20260208180603.568353-2-sunlightlinux@gmail.com> (raw)

From: Ionut Nechita <ionut_n2001@yahoo.com>

When the MegaRAID firmware returns MFI_STAT_SCSI_DONE_WITH_ERROR (0x2d)
with zero bytes transferred on a data-bearing command, the driver
currently returns DID_OK to the SCSI midlayer. This causes the I/O to
appear complete with no data, leading to hung tasks that block
indefinitely.

Production systems show the following repeated pattern:

  sd 0:0:9:0: [sdb] tag#24 BRCM Debug mfi stat 0x2d, data len
      requested/completed 0x1000/0x0

  INFO: task systemd-udevd:267 blocked for more than 245 seconds.
  INFO: task modprobe:296 blocked for more than 246 seconds.

When the firmware reports DONE_WITH_ERROR with no data transferred and
no CHECK_CONDITION sense data, return DID_SOFT_ERROR instead of DID_OK.
This causes the SCSI midlayer to retry the command up to cmd->allowed
times (default 5), matching the established pattern used by mpt3sas and
smartpqi for similar conditions.

Commands with CHECK_CONDITION sense data are not affected -- they
continue to be completed immediately with the sense data intact.

Fixes: 9c915a8c99bc ("[SCSI] megaraid_sas: Add 9565/9285 specific code")
Cc: stable@vger.kernel.org
Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com>
---
 drivers/scsi/megaraid/megaraid_sas_base.c   | 16 ++++++++++++++++
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 14 +++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index abbbc4b36cd1d..de35b7d5094d7 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -3682,6 +3682,22 @@ megasas_complete_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd,
 				       hdr->sense_len);
 			}
 
+			/*
+			 * MFI firmware does not report actual bytes
+			 * transferred, so we cannot compute residuals.
+			 * If data was expected and no CHECK_CONDITION,
+			 * retry via DID_SOFT_ERROR. The SCSI midlayer
+			 * retries up to cmd->allowed times (default 5).
+			 */
+			if (hdr->scsi_status != SAM_STAT_CHECK_CONDITION &&
+			    scsi_bufflen(cmd->scmd) > 0) {
+				cmd->scmd->result = DID_SOFT_ERROR << 16;
+				dev_warn(&instance->pdev->dev,
+					"megaraid_sas: DONE_WITH_ERROR (stat 0x%x) on cmd 0x%x to tgt %d, retrying\n",
+					hdr->cmd_status, hdr->cmd,
+					hdr->target_id);
+			}
+
 			break;
 
 		case MFI_STAT_LD_OFFLINE:
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index a6794f49e9fae..6021f1363ef4c 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2066,7 +2066,19 @@ map_cmd_status(struct fusion_context *fusion,
 		resid = (scsi_bufflen(scmd) - data_length);
 		scsi_set_resid(scmd, resid);
 
-		if (resid &&
+		/*
+		 * If data was expected but zero bytes were transferred
+		 * and there is no CHECK_CONDITION sense data, retry via
+		 * DID_SOFT_ERROR. The SCSI midlayer retries up to
+		 * cmd->allowed times (default 5).
+		 */
+		if (data_length == 0 && scsi_bufflen(scmd) > 0 &&
+		    ext_status != SAM_STAT_CHECK_CONDITION) {
+			scmd->result = DID_SOFT_ERROR << 16;
+			scmd_printk(KERN_WARNING, scmd,
+				"megaraid_sas: zero data on DONE_WITH_ERROR (stat 0x%x, bufflen 0x%x), retrying\n",
+				status, scsi_bufflen(scmd));
+		} else if (resid &&
 			((cmd_type == READ_WRITE_LDIO) ||
 			(cmd_type == READ_WRITE_SYSPDIO)))
 			scmd_printk(KERN_INFO, scmd, "BRCM Debug mfi stat 0x%x, data len"
-- 
2.52.0


                 reply	other threads:[~2026-02-08 18:08 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260208180603.568353-2-sunlightlinux@gmail.com \
    --to=sunlightlinux@gmail.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=chandrakanth.patil@broadcom.com \
    --cc=ionut.nechita@windriver.com \
    --cc=ionut_n2001@yahoo.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=megaraidlinux.pdl@broadcom.com \
    --cc=shivasharan.srikanteshwara@broadcom.com \
    --cc=stable@vger.kernel.org \
    --cc=sumit.saxena@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox