From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DA1B304BAB; Sat, 25 Oct 2025 16:15:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761408944; cv=none; b=Yxa9Q7owJ9PkRUVfO04mwxwLAyEBmYF6OtmnF495wSXUGR09i12/gHXm8cnLVRzaEjKd86II1V8BAZr0rNvrR4P9oB8yMZv9tYx157qvUj1XLJjdIw/NUx2sx6woAg7e7AJT0RKI40v2CZtGJELdgs44u6xssadcC4p65l6+0Ng= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761408944; c=relaxed/simple; bh=T9bUWhzKcNiJkbij6q64Jm/158oQJgeuuLcU3fQ1LHc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XzSrBNYN8q6uxAz29uWGKsspm7WNOpdcPpjSodaJumcB9f9Ky2Wxor664np83GHZLxvbbAd0dl83pqMfY8U+i/mzPQZWNejppZkrjTlcq1JBrF/ls21/69Vwv+hL4qwP4XSp69aDEDlJ0NIrKvmw0WjyVogghsF6WnR2NelLwo0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=J3qNX6zh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="J3qNX6zh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DFE57C116C6; Sat, 25 Oct 2025 16:15:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761408944; bh=T9bUWhzKcNiJkbij6q64Jm/158oQJgeuuLcU3fQ1LHc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=J3qNX6zhIjPRrofqkqE5k20+uE57j/HqNxlolYjvGWtwVX8C+N6bCYTvUT2py8jmx siunjsyisc1xZcuDgx8aIQTJtAtHRbD0oq/CheelsTQAbAbVJp0bcxE53ThmA65ZRj ITE1tGUFzheMQy7M0EJDE6rd6PjMgNEx4ZYg2sdfiEM6O54JxULwYQCTfMgouBDoNH +ssFRMSG8phxCGch7UzKM4lRjgwOY1B56baTvN+C4Qubsfz94SyEVQ4T3/SHcPymKd +0ztf7O20ipcdArDsrcuGrTbpoRBonUa6duWxJpXBNnpHvcYIfkoMzW7j1bPh/R+eE enuQhp1xYSM4A== From: Sasha Levin To: patches@lists.linux.dev, stable@vger.kernel.org Cc: Chandrakanth Patil , "Martin K. Petersen" , Sasha Levin , sathya.prakash@broadcom.com, kashyap.desai@broadcom.com, sumit.saxena@broadcom.com, sreekanth.reddy@broadcom.com, mpi3mr-linuxdrv.pdl@broadcom.com, linux-scsi@vger.kernel.org Subject: [PATCH AUTOSEL 6.17-5.15] scsi: mpi3mr: Fix controller init failure on fault during queue creation Date: Sat, 25 Oct 2025 11:56:14 -0400 Message-ID: <20251025160905.3857885-143-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251025160905.3857885-1-sashal@kernel.org> References: <20251025160905.3857885-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.17.5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Chandrakanth Patil [ Upstream commit 829fa1582b6ff607b0e2fe41ba1c45c77f686618 ] Firmware can enter a transient fault while creating operational queues. The driver fails the load immediately. Add a retry loop that checks controller status and history bit after queue creation. If either indicates a fault, retry init up to a set limit before failing. Signed-off-by: Chandrakanth Patil Link: https://lore.kernel.org/r/20250820084138.228471-3-chandrakanth.patil@broadcom.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- LLM Generated explanations, may be completely bogus: ## BACKPORT RECOMMENDATION: **YES** Based on my comprehensive analysis of commit 829fa1582b6ff ("scsi: mpi3mr: Fix controller init failure on fault during queue creation"), I **strongly recommend backporting this commit to stable kernel trees**. --- ### **DETAILED ANALYSIS** #### **1. What the Commit Fixes** This commit addresses a **critical controller initialization bug** in the Broadcom mpi3mr SCSI driver: **The Problem:** - Firmware can enter a **transient fault state** while the driver is creating operational queues during controller initialization - Without this fix, the driver doesn't detect this fault condition - The driver proceeds with a partially initialized controller that is actually in a faulted state - This results in **driver load failure** and the controller becoming unusable **The Fix:** The commit adds fault detection immediately after operational queue creation in `mpi3mr_create_op_queues()` (lines 2413-2420 in `drivers/scsi/mpi3mr/mpi3mr_fw.c`): ```c ioc_status = readl(&mrioc->sysif_regs->ioc_status); ioc_state = mpi3mr_get_iocstate(mrioc); if ((ioc_status & MPI3_SYSIF_IOC_STATUS_RESET_HISTORY) || ioc_state != MRIOC_STATE_READY) { mpi3mr_print_fault_info(mrioc); retval = -1; goto out_failed; } ``` This check: 1. **Reads the IOC status register** to check for the reset history bit 2. **Gets the IOC state** to verify the controller is in READY state 3. **If either check fails**, prints fault information and returns error to trigger retry #### **2. Integration with Existing Retry Mechanism** The commit message mentions "Add a retry loop" but the code change itself doesn't add a new loop. Instead, it **enables the existing retry mechanism** that was already present in the calling functions: - **`mpi3mr_init_ioc()`** (lines 4398-4405): Has `retry < 2` loop that retries controller init up to 3 times total - **`mpi3mr_reinit_ioc()`** (lines 4591-4598): Has identical retry logic for controller reset/resume By returning -1 when a fault is detected, this commit allows these retry mechanisms to properly handle transient firmware faults during queue creation, potentially recovering the controller instead of failing immediately. #### **3. Code Quality and Consistency** **Excellent code quality:** - **Follows established patterns**: The exact same fault checking pattern appears in multiple locations throughout the driver: - Line 1536-1538: In `mpi3mr_bring_ioc_ready()` (added by commit 9634bb07083cf) - Line 4563-4565: In the reset/resume path - Line 4588-4590: In port enable handling (mpi3mr_os.c) - **Uses existing helper functions**: - `mpi3mr_get_iocstate()` - Returns current IOC state enum - `mpi3mr_print_fault_info()` - Prints detailed fault code information for debugging - **Minimal and focused**: Only 10 lines added (2 variable declarations + 8 lines of fault checking) #### **4. Risk Assessment: VERY LOW RISK** **Why this is safe to backport:** 1. **Defensive check only**: The code only triggers when the controller is **actually in a fault state** 2. **No behavior change for normal operation**: When the controller is healthy (the common case), this check passes immediately with no impact 3. **Uses well-tested code paths**: The `goto out_failed` path already existed and is used when queue creation fails for other reasons 4. **Hardware-specific impact**: Only affects Broadcom mpi3mr controller users, no impact on other drivers or subsystems 5. **Small change scope**: Confined to a single function in a single driver file 6. **No API changes**: Uses existing data structures and functions **Regression risk analysis:** - If the check incorrectly triggers: Would cause initialization retry (at worst, slight delay) - If the check fails to trigger: Same behavior as before (no worse than current state) - False positive potential: Very low - directly reads hardware registers #### **5. Dependencies and Compatibility** **All dependencies exist in stable kernels:** - `MPI3_SYSIF_IOC_STATUS_RESET_HISTORY` constant: Defined in `drivers/scsi/mpi3mr/mpi/mpi30_transport.h:91` (part of MPI3 headers, present since driver introduction) - `MRIOC_STATE_READY` enum: Defined in `drivers/scsi/mpi3mr/mpi3mr.h:286` (core driver enum) - `mpi3mr_get_iocstate()`: Function in `mpi3mr_fw.c:1115` (core driver function) - `mpi3mr_print_fault_info()`: Function in `mpi3mr_fw.c:1088` (core driver function) **Related work:** - This builds on commit 9634bb07083cf ("scsi: mpi3mr: Enhance the Enable Controller retry logic") merged in v6.12, which added similar fault detection to the IOC enable phase - This commit extends that pattern to the operational queue creation phase #### **6. User Impact** **Benefits of backporting:** - **Improves reliability** for systems with Broadcom mpi3mr controllers - **Enables automatic recovery** from transient firmware faults during initialization - **Better diagnostics** via `mpi3mr_print_fault_info()` when faults occur - **Prevents complete driver load failures** that would require manual intervention **Who this helps:** - Users with Broadcom MPI3 Storage Controllers (Broadcom/Avago/LSI SAS/SATA/NVMe controllers) - Data center environments where automatic recovery is critical - Systems that may experience transient firmware issues during boot or reset #### **7. Stable Tree Criteria Alignment** This commit **perfectly meets all stable tree backporting criteria:** ✅ **Fixes an important bug** - Controller initialization failure ✅ **Small and contained** - Only 10 lines added ✅ **No new features** - Pure bug fix ✅ **No architectural changes** - Uses existing patterns ✅ **Minimal regression risk** - Defensive check only ✅ **Confined to subsystem** - Only affects mpi3mr driver ✅ **Clear and obvious fix** - Easy to understand and verify ✅ **Vendor-supported** - From Broadcom engineer with proper testing **Note:** While the commit doesn't include an explicit `Cc: stable@vger.kernel.org` tag, this is not uncommon for driver-specific fixes and doesn't diminish its suitability for backporting. #### **8. Technical Details for Backport** **Commit statistics:** ``` 1 file changed, 10 insertions(+) drivers/scsi/mpi3mr/mpi3mr_fw.c | 10 ++++++++++ ``` **Applies cleanly to stable trees that have:** - The mpi3mr driver (introduced in v5.15) - The retry mechanism in calling functions (present since driver introduction) - The helper functions used (present since driver introduction) **No cherry-pick conflicts expected** because: - Changes are localized to one function - No dependencies on unreleased features - Uses only stable APIs and data structures --- ### **CONCLUSION** **Backport Status: YES - STRONGLY RECOMMENDED** This commit represents a **textbook example of a stable tree candidate:** - Fixes a real, user-visible bug - Small, focused, and easy to verify - Follows established patterns - Minimal risk - Improves system reliability The commit should be backported to **all stable kernel series that include the mpi3mr driver** (v5.15 and later) to improve controller initialization reliability for users with Broadcom MPI3 storage controllers. drivers/scsi/mpi3mr/mpi3mr_fw.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c index 9e18cc2747104..8fe6e0bf342e2 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_fw.c +++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c @@ -2353,6 +2353,8 @@ static int mpi3mr_create_op_queues(struct mpi3mr_ioc *mrioc) { int retval = 0; u16 num_queues = 0, i = 0, msix_count_op_q = 1; + u32 ioc_status; + enum mpi3mr_iocstate ioc_state; num_queues = min_t(int, mrioc->facts.max_op_reply_q, mrioc->facts.max_op_req_q); @@ -2408,6 +2410,14 @@ static int mpi3mr_create_op_queues(struct mpi3mr_ioc *mrioc) retval = -1; goto out_failed; } + ioc_status = readl(&mrioc->sysif_regs->ioc_status); + ioc_state = mpi3mr_get_iocstate(mrioc); + if ((ioc_status & MPI3_SYSIF_IOC_STATUS_RESET_HISTORY) || + ioc_state != MRIOC_STATE_READY) { + mpi3mr_print_fault_info(mrioc); + retval = -1; + goto out_failed; + } mrioc->num_op_reply_q = mrioc->num_op_req_q = i; ioc_info(mrioc, "successfully created %d operational queue pairs(default/polled) queue = (%d/%d)\n", -- 2.51.0