* [PATCH] scsi: megaraid_sas: return DID_SOFT_ERROR on zero-byte DONE_WITH_ERROR
@ 2026-02-08 18:06 Ionut Nechita (Sunlight Linux)
0 siblings, 0 replies; only message in thread
From: Ionut Nechita (Sunlight Linux) @ 2026-02-08 18:06 UTC (permalink / raw)
To: Kashyap Desai, Sumit Saxena, Shivasharan S, Chandrakanth patil
Cc: James E . J . Bottomley, Martin K . Petersen, megaraidlinux.pdl,
linux-scsi, linux-kernel, Ionut Nechita, Ionut Nechita, stable
From: Ionut Nechita <ionut_n2001@yahoo.com>
When the MegaRAID firmware returns MFI_STAT_SCSI_DONE_WITH_ERROR (0x2d)
with zero bytes transferred on a data-bearing command, the driver
currently returns DID_OK to the SCSI midlayer. This causes the I/O to
appear complete with no data, leading to hung tasks that block
indefinitely.
Production systems show the following repeated pattern:
sd 0:0:9:0: [sdb] tag#24 BRCM Debug mfi stat 0x2d, data len
requested/completed 0x1000/0x0
INFO: task systemd-udevd:267 blocked for more than 245 seconds.
INFO: task modprobe:296 blocked for more than 246 seconds.
When the firmware reports DONE_WITH_ERROR with no data transferred and
no CHECK_CONDITION sense data, return DID_SOFT_ERROR instead of DID_OK.
This causes the SCSI midlayer to retry the command up to cmd->allowed
times (default 5), matching the established pattern used by mpt3sas and
smartpqi for similar conditions.
Commands with CHECK_CONDITION sense data are not affected -- they
continue to be completed immediately with the sense data intact.
Fixes: 9c915a8c99bc ("[SCSI] megaraid_sas: Add 9565/9285 specific code")
Cc: stable@vger.kernel.org
Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com>
---
drivers/scsi/megaraid/megaraid_sas_base.c | 16 ++++++++++++++++
drivers/scsi/megaraid/megaraid_sas_fusion.c | 14 +++++++++++++-
2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index abbbc4b36cd1d..de35b7d5094d7 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -3682,6 +3682,22 @@ megasas_complete_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd,
hdr->sense_len);
}
+ /*
+ * MFI firmware does not report actual bytes
+ * transferred, so we cannot compute residuals.
+ * If data was expected and no CHECK_CONDITION,
+ * retry via DID_SOFT_ERROR. The SCSI midlayer
+ * retries up to cmd->allowed times (default 5).
+ */
+ if (hdr->scsi_status != SAM_STAT_CHECK_CONDITION &&
+ scsi_bufflen(cmd->scmd) > 0) {
+ cmd->scmd->result = DID_SOFT_ERROR << 16;
+ dev_warn(&instance->pdev->dev,
+ "megaraid_sas: DONE_WITH_ERROR (stat 0x%x) on cmd 0x%x to tgt %d, retrying\n",
+ hdr->cmd_status, hdr->cmd,
+ hdr->target_id);
+ }
+
break;
case MFI_STAT_LD_OFFLINE:
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index a6794f49e9fae..6021f1363ef4c 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2066,7 +2066,19 @@ map_cmd_status(struct fusion_context *fusion,
resid = (scsi_bufflen(scmd) - data_length);
scsi_set_resid(scmd, resid);
- if (resid &&
+ /*
+ * If data was expected but zero bytes were transferred
+ * and there is no CHECK_CONDITION sense data, retry via
+ * DID_SOFT_ERROR. The SCSI midlayer retries up to
+ * cmd->allowed times (default 5).
+ */
+ if (data_length == 0 && scsi_bufflen(scmd) > 0 &&
+ ext_status != SAM_STAT_CHECK_CONDITION) {
+ scmd->result = DID_SOFT_ERROR << 16;
+ scmd_printk(KERN_WARNING, scmd,
+ "megaraid_sas: zero data on DONE_WITH_ERROR (stat 0x%x, bufflen 0x%x), retrying\n",
+ status, scsi_bufflen(scmd));
+ } else if (resid &&
((cmd_type == READ_WRITE_LDIO) ||
(cmd_type == READ_WRITE_SYSPDIO)))
scmd_printk(KERN_INFO, scmd, "BRCM Debug mfi stat 0x%x, data len"
--
2.52.0
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-02-08 18:08 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-08 18:06 [PATCH] scsi: megaraid_sas: return DID_SOFT_ERROR on zero-byte DONE_WITH_ERROR Ionut Nechita (Sunlight Linux)
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.