From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Chandrakanth Patil <chandrakanth.patil@broadcom.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Sasha Levin <sashal@kernel.org>,
sathya.prakash@broadcom.com, kashyap.desai@broadcom.com,
sumit.saxena@broadcom.com, sreekanth.reddy@broadcom.com,
mpi3mr-linuxdrv.pdl@broadcom.com, linux-scsi@vger.kernel.org
Subject: [PATCH AUTOSEL 6.17-5.15] scsi: mpi3mr: Fix controller init failure on fault during queue creation
Date: Sat, 25 Oct 2025 11:56:14 -0400 [thread overview]
Message-ID: <20251025160905.3857885-143-sashal@kernel.org> (raw)
In-Reply-To: <20251025160905.3857885-1-sashal@kernel.org>
From: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
[ Upstream commit 829fa1582b6ff607b0e2fe41ba1c45c77f686618 ]
Firmware can enter a transient fault while creating operational queues.
The driver fails the load immediately.
Add a retry loop that checks controller status and history bit after
queue creation. If either indicates a fault, retry init up to a set
limit before failing.
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Link: https://lore.kernel.org/r/20250820084138.228471-3-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
## BACKPORT RECOMMENDATION: **YES**
Based on my comprehensive analysis of commit 829fa1582b6ff ("scsi:
mpi3mr: Fix controller init failure on fault during queue creation"), I
**strongly recommend backporting this commit to stable kernel trees**.
---
### **DETAILED ANALYSIS**
#### **1. What the Commit Fixes**
This commit addresses a **critical controller initialization bug** in
the Broadcom mpi3mr SCSI driver:
**The Problem:**
- Firmware can enter a **transient fault state** while the driver is
creating operational queues during controller initialization
- Without this fix, the driver doesn't detect this fault condition
- The driver proceeds with a partially initialized controller that is
actually in a faulted state
- This results in **driver load failure** and the controller becoming
unusable
**The Fix:**
The commit adds fault detection immediately after operational queue
creation in `mpi3mr_create_op_queues()` (lines 2413-2420 in
`drivers/scsi/mpi3mr/mpi3mr_fw.c`):
```c
ioc_status = readl(&mrioc->sysif_regs->ioc_status);
ioc_state = mpi3mr_get_iocstate(mrioc);
if ((ioc_status & MPI3_SYSIF_IOC_STATUS_RESET_HISTORY) ||
ioc_state != MRIOC_STATE_READY) {
mpi3mr_print_fault_info(mrioc);
retval = -1;
goto out_failed;
}
```
This check:
1. **Reads the IOC status register** to check for the reset history bit
2. **Gets the IOC state** to verify the controller is in READY state
3. **If either check fails**, prints fault information and returns error
to trigger retry
#### **2. Integration with Existing Retry Mechanism**
The commit message mentions "Add a retry loop" but the code change
itself doesn't add a new loop. Instead, it **enables the existing retry
mechanism** that was already present in the calling functions:
- **`mpi3mr_init_ioc()`** (lines 4398-4405): Has `retry < 2` loop that
retries controller init up to 3 times total
- **`mpi3mr_reinit_ioc()`** (lines 4591-4598): Has identical retry logic
for controller reset/resume
By returning -1 when a fault is detected, this commit allows these retry
mechanisms to properly handle transient firmware faults during queue
creation, potentially recovering the controller instead of failing
immediately.
#### **3. Code Quality and Consistency**
**Excellent code quality:**
- **Follows established patterns**: The exact same fault checking
pattern appears in multiple locations throughout the driver:
- Line 1536-1538: In `mpi3mr_bring_ioc_ready()` (added by commit
9634bb07083cf)
- Line 4563-4565: In the reset/resume path
- Line 4588-4590: In port enable handling (mpi3mr_os.c)
- **Uses existing helper functions**:
- `mpi3mr_get_iocstate()` - Returns current IOC state enum
- `mpi3mr_print_fault_info()` - Prints detailed fault code information
for debugging
- **Minimal and focused**: Only 10 lines added (2 variable declarations
+ 8 lines of fault checking)
#### **4. Risk Assessment: VERY LOW RISK**
**Why this is safe to backport:**
1. **Defensive check only**: The code only triggers when the controller
is **actually in a fault state**
2. **No behavior change for normal operation**: When the controller is
healthy (the common case), this check passes immediately with no
impact
3. **Uses well-tested code paths**: The `goto out_failed` path already
existed and is used when queue creation fails for other reasons
4. **Hardware-specific impact**: Only affects Broadcom mpi3mr controller
users, no impact on other drivers or subsystems
5. **Small change scope**: Confined to a single function in a single
driver file
6. **No API changes**: Uses existing data structures and functions
**Regression risk analysis:**
- If the check incorrectly triggers: Would cause initialization retry
(at worst, slight delay)
- If the check fails to trigger: Same behavior as before (no worse than
current state)
- False positive potential: Very low - directly reads hardware registers
#### **5. Dependencies and Compatibility**
**All dependencies exist in stable kernels:**
- `MPI3_SYSIF_IOC_STATUS_RESET_HISTORY` constant: Defined in
`drivers/scsi/mpi3mr/mpi/mpi30_transport.h:91` (part of MPI3 headers,
present since driver introduction)
- `MRIOC_STATE_READY` enum: Defined in
`drivers/scsi/mpi3mr/mpi3mr.h:286` (core driver enum)
- `mpi3mr_get_iocstate()`: Function in `mpi3mr_fw.c:1115` (core driver
function)
- `mpi3mr_print_fault_info()`: Function in `mpi3mr_fw.c:1088` (core
driver function)
**Related work:**
- This builds on commit 9634bb07083cf ("scsi: mpi3mr: Enhance the Enable
Controller retry logic") merged in v6.12, which added similar fault
detection to the IOC enable phase
- This commit extends that pattern to the operational queue creation
phase
#### **6. User Impact**
**Benefits of backporting:**
- **Improves reliability** for systems with Broadcom mpi3mr controllers
- **Enables automatic recovery** from transient firmware faults during
initialization
- **Better diagnostics** via `mpi3mr_print_fault_info()` when faults
occur
- **Prevents complete driver load failures** that would require manual
intervention
**Who this helps:**
- Users with Broadcom MPI3 Storage Controllers (Broadcom/Avago/LSI
SAS/SATA/NVMe controllers)
- Data center environments where automatic recovery is critical
- Systems that may experience transient firmware issues during boot or
reset
#### **7. Stable Tree Criteria Alignment**
This commit **perfectly meets all stable tree backporting criteria:**
✅ **Fixes an important bug** - Controller initialization failure
✅ **Small and contained** - Only 10 lines added
✅ **No new features** - Pure bug fix
✅ **No architectural changes** - Uses existing patterns
✅ **Minimal regression risk** - Defensive check only
✅ **Confined to subsystem** - Only affects mpi3mr driver
✅ **Clear and obvious fix** - Easy to understand and verify
✅ **Vendor-supported** - From Broadcom engineer with proper testing
**Note:** While the commit doesn't include an explicit `Cc:
stable@vger.kernel.org` tag, this is not uncommon for driver-specific
fixes and doesn't diminish its suitability for backporting.
#### **8. Technical Details for Backport**
**Commit statistics:**
```
1 file changed, 10 insertions(+)
drivers/scsi/mpi3mr/mpi3mr_fw.c | 10 ++++++++++
```
**Applies cleanly to stable trees that have:**
- The mpi3mr driver (introduced in v5.15)
- The retry mechanism in calling functions (present since driver
introduction)
- The helper functions used (present since driver introduction)
**No cherry-pick conflicts expected** because:
- Changes are localized to one function
- No dependencies on unreleased features
- Uses only stable APIs and data structures
---
### **CONCLUSION**
**Backport Status: YES - STRONGLY RECOMMENDED**
This commit represents a **textbook example of a stable tree
candidate:**
- Fixes a real, user-visible bug
- Small, focused, and easy to verify
- Follows established patterns
- Minimal risk
- Improves system reliability
The commit should be backported to **all stable kernel series that
include the mpi3mr driver** (v5.15 and later) to improve controller
initialization reliability for users with Broadcom MPI3 storage
controllers.
drivers/scsi/mpi3mr/mpi3mr_fw.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c
index 9e18cc2747104..8fe6e0bf342e2 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_fw.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c
@@ -2353,6 +2353,8 @@ static int mpi3mr_create_op_queues(struct mpi3mr_ioc *mrioc)
{
int retval = 0;
u16 num_queues = 0, i = 0, msix_count_op_q = 1;
+ u32 ioc_status;
+ enum mpi3mr_iocstate ioc_state;
num_queues = min_t(int, mrioc->facts.max_op_reply_q,
mrioc->facts.max_op_req_q);
@@ -2408,6 +2410,14 @@ static int mpi3mr_create_op_queues(struct mpi3mr_ioc *mrioc)
retval = -1;
goto out_failed;
}
+ ioc_status = readl(&mrioc->sysif_regs->ioc_status);
+ ioc_state = mpi3mr_get_iocstate(mrioc);
+ if ((ioc_status & MPI3_SYSIF_IOC_STATUS_RESET_HISTORY) ||
+ ioc_state != MRIOC_STATE_READY) {
+ mpi3mr_print_fault_info(mrioc);
+ retval = -1;
+ goto out_failed;
+ }
mrioc->num_op_reply_q = mrioc->num_op_req_q = i;
ioc_info(mrioc,
"successfully created %d operational queue pairs(default/polled) queue = (%d/%d)\n",
--
2.51.0
next prev parent reply other threads:[~2025-10-25 16:15 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251025160905.3857885-1-sashal@kernel.org>
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-5.4] scsi: lpfc: Define size of debugfs entry for xri rebalancing Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] scsi: ufs: ufs-qcom: Disable lane clocks during phy hibern8 Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-6.12] PCI/ERR: Update device error_state already after reset Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] scsi: ufs: core: Change MCQ interrupt enable flow Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17-5.4] scsi: lpfc: Check return status of lpfc_reset_flush_io_context during TGT_RESET Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17-6.1] scsi: ufs: host: mediatek: Fix invalid access in vccqx handling Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17-6.1] scsi: ufs: host: mediatek: Change reset sequence for improved stability Sasha Levin
2025-10-25 15:56 ` Sasha Levin [this message]
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17-6.6] scsi: ufs: host: mediatek: Disable auto-hibern8 during power mode changes Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17-6.12] scsi: lpfc: Clean up allocated queues when queue setup mbox commands fail Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17] scsi: ufs: ufs-qcom: Align programming sequence of Shared ICE for UFS controller v5 Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.12] scsi: ufs: host: mediatek: Fix PWM mode switch issue Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.6] scsi: ufs: host: mediatek: Enhance recovery on hibernation exit failure Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-5.15] scsi: libfc: Fix potential buffer overflow in fc_ct_ms_fill() Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.12] scsi: ufs: exynos: fsd: Gate ref_clk and put UFS device in reset on suspend Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-5.10] scsi: pm80xx: Fix race condition caused by static variables Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17] scsi: ufs: host: mediatek: Fix adapt issue after PA_Init Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.6] scsi: ufs: core: Disable timestamp functionality if not supported Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.1] scsi: ufs: host: mediatek: Assign power mode userdata before FASTAUTO mode change Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17-6.12] scsi: mpi3mr: Fix I/O failures during controller reset Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17] scsi: mpi3mr: Fix device loss during enclosure reboot due to zero link speed Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.12] scsi: lpfc: Ensure PLOGI_ACC is sent prior to PRLI in Point to Point topology Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.6] scsi: ufs: host: mediatek: Fix auto-hibern8 timer configuration Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.12] scsi: ufs: host: mediatek: Fix unbalanced IRQ enable issue Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.1] scsi: ufs: host: mediatek: Enhance recovery on resume failure Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.1] scsi: mpt3sas: Add support for 22.5 Gbps SAS link rate Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.12] scsi: lpfc: Decrement ndlp kref after FDISC retries exhausted Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-5.4] scsi: pm8001: Use int instead of u32 to store error codes Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-6.12] scsi: ufs: host: mediatek: Correct system PM flow Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-5.15] scsi: lpfc: Remove ndlp kref decrement clause for F_Port_Ctrl in lpfc_cleanup Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251025160905.3857885-143-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=chandrakanth.patil@broadcom.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mpi3mr-linuxdrv.pdl@broadcom.com \
--cc=patches@lists.linux.dev \
--cc=sathya.prakash@broadcom.com \
--cc=sreekanth.reddy@broadcom.com \
--cc=stable@vger.kernel.org \
--cc=sumit.saxena@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox