From: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@gmail.com>
To: Kashyap Desai <kashyap.desai@broadcom.com>,
Sumit Saxena <sumit.saxena@broadcom.com>,
Shivasharan S <shivasharan.srikanteshwara@broadcom.com>,
Chandrakanth patil <chandrakanth.patil@broadcom.com>
Cc: "James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>,
"Martin K . Petersen" <martin.petersen@oracle.com>,
megaraidlinux.pdl@broadcom.com, linux-scsi@vger.kernel.org,
linux-kernel@vger.kernel.org,
Ionut Nechita <ionut_n2001@yahoo.com>,
Ionut Nechita <ionut.nechita@windriver.com>,
stable@vger.kernel.org
Subject: [PATCH] scsi: megaraid_sas: return DID_SOFT_ERROR on zero-byte DONE_WITH_ERROR
Date: Sun, 8 Feb 2026 20:06:04 +0200 [thread overview]
Message-ID: <20260208180603.568353-2-sunlightlinux@gmail.com> (raw)
From: Ionut Nechita <ionut_n2001@yahoo.com>
When the MegaRAID firmware returns MFI_STAT_SCSI_DONE_WITH_ERROR (0x2d)
with zero bytes transferred on a data-bearing command, the driver
currently returns DID_OK to the SCSI midlayer. This causes the I/O to
appear complete with no data, leading to hung tasks that block
indefinitely.
Production systems show the following repeated pattern:
sd 0:0:9:0: [sdb] tag#24 BRCM Debug mfi stat 0x2d, data len
requested/completed 0x1000/0x0
INFO: task systemd-udevd:267 blocked for more than 245 seconds.
INFO: task modprobe:296 blocked for more than 246 seconds.
When the firmware reports DONE_WITH_ERROR with no data transferred and
no CHECK_CONDITION sense data, return DID_SOFT_ERROR instead of DID_OK.
This causes the SCSI midlayer to retry the command up to cmd->allowed
times (default 5), matching the established pattern used by mpt3sas and
smartpqi for similar conditions.
Commands with CHECK_CONDITION sense data are not affected -- they
continue to be completed immediately with the sense data intact.
Fixes: 9c915a8c99bc ("[SCSI] megaraid_sas: Add 9565/9285 specific code")
Cc: stable@vger.kernel.org
Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com>
---
drivers/scsi/megaraid/megaraid_sas_base.c | 16 ++++++++++++++++
drivers/scsi/megaraid/megaraid_sas_fusion.c | 14 +++++++++++++-
2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index abbbc4b36cd1d..de35b7d5094d7 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -3682,6 +3682,22 @@ megasas_complete_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd,
hdr->sense_len);
}
+ /*
+ * MFI firmware does not report actual bytes
+ * transferred, so we cannot compute residuals.
+ * If data was expected and no CHECK_CONDITION,
+ * retry via DID_SOFT_ERROR. The SCSI midlayer
+ * retries up to cmd->allowed times (default 5).
+ */
+ if (hdr->scsi_status != SAM_STAT_CHECK_CONDITION &&
+ scsi_bufflen(cmd->scmd) > 0) {
+ cmd->scmd->result = DID_SOFT_ERROR << 16;
+ dev_warn(&instance->pdev->dev,
+ "megaraid_sas: DONE_WITH_ERROR (stat 0x%x) on cmd 0x%x to tgt %d, retrying\n",
+ hdr->cmd_status, hdr->cmd,
+ hdr->target_id);
+ }
+
break;
case MFI_STAT_LD_OFFLINE:
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index a6794f49e9fae..6021f1363ef4c 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2066,7 +2066,19 @@ map_cmd_status(struct fusion_context *fusion,
resid = (scsi_bufflen(scmd) - data_length);
scsi_set_resid(scmd, resid);
- if (resid &&
+ /*
+ * If data was expected but zero bytes were transferred
+ * and there is no CHECK_CONDITION sense data, retry via
+ * DID_SOFT_ERROR. The SCSI midlayer retries up to
+ * cmd->allowed times (default 5).
+ */
+ if (data_length == 0 && scsi_bufflen(scmd) > 0 &&
+ ext_status != SAM_STAT_CHECK_CONDITION) {
+ scmd->result = DID_SOFT_ERROR << 16;
+ scmd_printk(KERN_WARNING, scmd,
+ "megaraid_sas: zero data on DONE_WITH_ERROR (stat 0x%x, bufflen 0x%x), retrying\n",
+ status, scsi_bufflen(scmd));
+ } else if (resid &&
((cmd_type == READ_WRITE_LDIO) ||
(cmd_type == READ_WRITE_SYSPDIO)))
scmd_printk(KERN_INFO, scmd, "BRCM Debug mfi stat 0x%x, data len"
--
2.52.0
reply other threads:[~2026-02-08 18:08 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260208180603.568353-2-sunlightlinux@gmail.com \
--to=sunlightlinux@gmail.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=chandrakanth.patil@broadcom.com \
--cc=ionut.nechita@windriver.com \
--cc=ionut_n2001@yahoo.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=megaraidlinux.pdl@broadcom.com \
--cc=shivasharan.srikanteshwara@broadcom.com \
--cc=stable@vger.kernel.org \
--cc=sumit.saxena@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox