All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@gmail.com>
To: Kashyap Desai <kashyap.desai@broadcom.com>,
	Sumit Saxena <sumit.saxena@broadcom.com>,
	Shivasharan S <shivasharan.srikanteshwara@broadcom.com>,
	Chandrakanth patil <chandrakanth.patil@broadcom.com>
Cc: "James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	megaraidlinux.pdl@broadcom.com, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Ionut Nechita <ionut_n2001@yahoo.com>,
	Ionut Nechita <ionut.nechita@windriver.com>,
	stable@vger.kernel.org
Subject: [PATCH] scsi: megaraid_sas: return DID_SOFT_ERROR on zero-byte DONE_WITH_ERROR
Date: Sun,  8 Feb 2026 20:06:04 +0200	[thread overview]
Message-ID: <20260208180603.568353-2-sunlightlinux@gmail.com> (raw)

From: Ionut Nechita <ionut_n2001@yahoo.com>

When the MegaRAID firmware returns MFI_STAT_SCSI_DONE_WITH_ERROR (0x2d)
with zero bytes transferred on a data-bearing command, the driver
currently returns DID_OK to the SCSI midlayer. This causes the I/O to
appear complete with no data, leading to hung tasks that block
indefinitely.

Production systems show the following repeated pattern:

  sd 0:0:9:0: [sdb] tag#24 BRCM Debug mfi stat 0x2d, data len
      requested/completed 0x1000/0x0

  INFO: task systemd-udevd:267 blocked for more than 245 seconds.
  INFO: task modprobe:296 blocked for more than 246 seconds.

When the firmware reports DONE_WITH_ERROR with no data transferred and
no CHECK_CONDITION sense data, return DID_SOFT_ERROR instead of DID_OK.
This causes the SCSI midlayer to retry the command up to cmd->allowed
times (default 5), matching the established pattern used by mpt3sas and
smartpqi for similar conditions.

Commands with CHECK_CONDITION sense data are not affected -- they
continue to be completed immediately with the sense data intact.

Fixes: 9c915a8c99bc ("[SCSI] megaraid_sas: Add 9565/9285 specific code")
Cc: stable@vger.kernel.org
Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com>
---
 drivers/scsi/megaraid/megaraid_sas_base.c   | 16 ++++++++++++++++
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 14 +++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index abbbc4b36cd1d..de35b7d5094d7 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -3682,6 +3682,22 @@ megasas_complete_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd,
 				       hdr->sense_len);
 			}
 
+			/*
+			 * MFI firmware does not report actual bytes
+			 * transferred, so we cannot compute residuals.
+			 * If data was expected and no CHECK_CONDITION,
+			 * retry via DID_SOFT_ERROR. The SCSI midlayer
+			 * retries up to cmd->allowed times (default 5).
+			 */
+			if (hdr->scsi_status != SAM_STAT_CHECK_CONDITION &&
+			    scsi_bufflen(cmd->scmd) > 0) {
+				cmd->scmd->result = DID_SOFT_ERROR << 16;
+				dev_warn(&instance->pdev->dev,
+					"megaraid_sas: DONE_WITH_ERROR (stat 0x%x) on cmd 0x%x to tgt %d, retrying\n",
+					hdr->cmd_status, hdr->cmd,
+					hdr->target_id);
+			}
+
 			break;
 
 		case MFI_STAT_LD_OFFLINE:
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index a6794f49e9fae..6021f1363ef4c 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2066,7 +2066,19 @@ map_cmd_status(struct fusion_context *fusion,
 		resid = (scsi_bufflen(scmd) - data_length);
 		scsi_set_resid(scmd, resid);
 
-		if (resid &&
+		/*
+		 * If data was expected but zero bytes were transferred
+		 * and there is no CHECK_CONDITION sense data, retry via
+		 * DID_SOFT_ERROR. The SCSI midlayer retries up to
+		 * cmd->allowed times (default 5).
+		 */
+		if (data_length == 0 && scsi_bufflen(scmd) > 0 &&
+		    ext_status != SAM_STAT_CHECK_CONDITION) {
+			scmd->result = DID_SOFT_ERROR << 16;
+			scmd_printk(KERN_WARNING, scmd,
+				"megaraid_sas: zero data on DONE_WITH_ERROR (stat 0x%x, bufflen 0x%x), retrying\n",
+				status, scsi_bufflen(scmd));
+		} else if (resid &&
 			((cmd_type == READ_WRITE_LDIO) ||
 			(cmd_type == READ_WRITE_SYSPDIO)))
 			scmd_printk(KERN_INFO, scmd, "BRCM Debug mfi stat 0x%x, data len"
-- 
2.52.0


                 reply	other threads:[~2026-02-08 18:08 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260208180603.568353-2-sunlightlinux@gmail.com \
    --to=sunlightlinux@gmail.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=chandrakanth.patil@broadcom.com \
    --cc=ionut.nechita@windriver.com \
    --cc=ionut_n2001@yahoo.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=megaraidlinux.pdl@broadcom.com \
    --cc=shivasharan.srikanteshwara@broadcom.com \
    --cc=stable@vger.kernel.org \
    --cc=sumit.saxena@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.