public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Ranjan Kumar <ranjan.kumar@broadcom.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Sasha Levin <sashal@kernel.org>,
	sathya.prakash@broadcom.com, kashyap.desai@broadcom.com,
	sumit.saxena@broadcom.com, sreekanth.reddy@broadcom.com,
	James.Bottomley@HansenPartnership.com,
	mpi3mr-linuxdrv.pdl@broadcom.com, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 6.19-6.12] scsi: mpi3mr: Clear reset history on ready and recheck state after timeout
Date: Tue, 10 Mar 2026 05:01:19 -0400	[thread overview]
Message-ID: <20260310090145.2709021-19-sashal@kernel.org> (raw)
In-Reply-To: <20260310090145.2709021-1-sashal@kernel.org>

From: Ranjan Kumar <ranjan.kumar@broadcom.com>

[ Upstream commit dbd53975ed4132d161b6a97ebe785a262380182d ]

The driver retains reset history even after the IOC has successfully
reached the READY state. That leaves stale reset information active during
normal operation and can mislead recovery and diagnostics.  In addition, if
the IOC becomes READY just as the ready timeout loop exits, the driver
still follows the failure path and may retry or report failure incorrectly.

Clear reset history once READY is confirmed so driver state matches actual
IOC status. After the timeout loop, recheck the IOC state and treat READY
as success instead of failing.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://patch.msgid.link/20260225082622.82588-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have a clear picture of the changes. Let me finalize my analysis.

## Analysis

### What the commit does

This commit makes two small changes to `mpi3mr_bring_ioc_ready()` in the
mpi3mr SCSI driver:

1. **Clears reset history on READY** (line 1533): When the IOC
   transitions to READY state inside the polling loop, it now calls
   `mpi3mr_clear_reset_history()` before returning success. Without
   this, stale reset history bits remain set, which can confuse
   subsequent recovery logic that checks
   `MPI3_SYSIF_IOC_STATUS_RESET_HISTORY`.

2. **Rechecks state after timeout** (lines 1553-1560): After the
   `do...while` timeout loop exits, the code does one final check to see
   if the IOC became READY right at the boundary. Previously, if the IOC
   transitioned to READY during the last `msleep(100)` (after the
   elapsed time already exceeded the timeout), the driver would fall
   through to `out_failed` and potentially retry unnecessarily or report
   failure.

### Bug assessment

**Race at timeout boundary (bug #2):** This is a real timing bug. The
loop checks `elapsed_time_sec < mrioc->ready_timeout` *after*
`msleep(100)`. If the IOC becomes READY during that last sleep,
`elapsed_time_sec` has already exceeded the timeout, so the loop exits.
Without the post-loop recheck, the driver goes to `out_failed`, which
triggers up to 2 retries (unnecessary soft resets) or reports
initialization failure. This can cause:
- Unnecessary controller resets during probe/resume
- Spurious initialization failures if retries are exhausted
- I/O failures during reset recovery (since `mpi3mr_reinit_ioc` fails)

**Stale reset history (bug #1):** If reset history is not cleared when
the IOC reaches READY, subsequent iterations of the loop in future calls
to this function will see `MPI3_SYSIF_IOC_STATUS_RESET_HISTORY` set and
jump to `out_failed` (line 1537-1540), causing spurious failure.

### Stable kernel criteria

- **Fixes a real bug:** Yes - timing race causes false initialization
  failures; stale state causes spurious failures in subsequent
  operations.
- **Obviously correct:** Yes - `mpi3mr_clear_reset_history()` is already
  called elsewhere in the same function (at line 1510 after soft reset).
  The post-timeout recheck follows the same pattern as the in-loop
  check.
- **Small and contained:** Yes - 10 lines added, single file, single
  function.
- **No new features:** Correct - pure bug fix.
- **Risk:** Very low. The added code only adds a success path where
  there was none (post-timeout READY check) and clears a status bit that
  should be cleared (reset history). No existing success/failure paths
  are altered.

### Impact

This function is called on three critical paths: PCI probe, controller
reset recovery, and system resume. A false failure here means the SCSI
controller doesn't initialize, which means loss of access to all
attached storage devices. This affects real users with Broadcom MPI3
(SAS/NVMe) storage controllers.

### Verification

- Confirmed `mpi3mr_clear_reset_history` is a trivial inline that clears
  the RESET_HISTORY bit in IOC status register (lines 1256-1263).
- Confirmed the function is called from probe (`mpi3mr_init_ioc`), reset
  (`mpi3mr_reinit_ioc`), and resume paths.
- Confirmed that `mpi3mr_clear_reset_history` is already used elsewhere
  in the same function, so the pattern is established.
- Confirmed the timeout race: the `do...while` loop exits when
  `elapsed_time_sec >= ready_timeout`, but the IOC could have become
  READY during the final `msleep(100)`, which is not checked without
  this patch.
- Confirmed that falling through to `out_failed` triggers retries (up to
  2) or returns failure (line 1564-1577).
- The commit is authored by Broadcom (the hardware vendor), accepted by
  the SCSI maintainer Martin Petersen, indicating domain expertise.
- Could NOT verify specific user reports of this race triggering in the
  field (commit message implies it was found via code
  review/diagnostics), but the race window is real and the consequences
  are severe.

**YES**

 drivers/scsi/mpi3mr/mpi3mr_fw.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c
index 8c4bb7169a87c..6d36575997871 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_fw.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c
@@ -1530,6 +1530,7 @@ static int mpi3mr_bring_ioc_ready(struct mpi3mr_ioc *mrioc)
 			ioc_info(mrioc,
 			    "successfully transitioned to %s state\n",
 			    mpi3mr_iocstate_name(ioc_state));
+			mpi3mr_clear_reset_history(mrioc);
 			return 0;
 		}
 		ioc_status = readl(&mrioc->sysif_regs->ioc_status);
@@ -1549,6 +1550,15 @@ static int mpi3mr_bring_ioc_ready(struct mpi3mr_ioc *mrioc)
 		elapsed_time_sec = jiffies_to_msecs(jiffies - start_time)/1000;
 	} while (elapsed_time_sec < mrioc->ready_timeout);
 
+	ioc_state = mpi3mr_get_iocstate(mrioc);
+	if (ioc_state == MRIOC_STATE_READY) {
+		ioc_info(mrioc,
+		    "successfully transitioned to %s state after %llu seconds\n",
+		    mpi3mr_iocstate_name(ioc_state), elapsed_time_sec);
+		mpi3mr_clear_reset_history(mrioc);
+		return 0;
+	}
+
 out_failed:
 	elapsed_time_sec = jiffies_to_msecs(jiffies - start_time)/1000;
 	if ((retry < 2) && (elapsed_time_sec < (mrioc->ready_timeout - 60))) {
-- 
2.51.0


  parent reply	other threads:[~2026-03-10  9:02 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10  9:01 [PATCH AUTOSEL 6.19-6.18] ALSA: hda/hdmi: Add Tegra238 HDA codec device ID Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] ASoC: amd: acp: Add ACP6.3 match entries for Cirrus Logic parts Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.10] nvme-pci: ensure we're polling a polled queue Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] ASoC: cs35l56: Only patch ASP registers if the DAI is part of a DAIlink Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.12] ALSA: hda/senary: Ensure EAPD is enabled during init Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.10] ASoC: fsl_easrc: Fix event generation in fsl_easrc_iec958_set_reg() Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.12] kbuild: install-extmod-build: Package resolve_btfids if necessary Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.12] scsi: devinfo: Add BLIST_SKIP_IO_HINTS for Iomega ZIP Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] block: break pcpu_alloc_mutex dependency on freeze_lock Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] platform/x86: oxpec: Add support for OneXPlayer X1z Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19] spi: spi-dw-dma: fix print error log when wait finish transaction Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.6] HID: asus: add xg mobile 2023 external hardware support Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] ASoC: rt1321: fix DMIC ch2/3 mask issue Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.12] drm/ttm/tests: Fix build failure on PREEMPT_RT Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.12] bpf: Fix u32/s32 bounds when ranges cross min/max boundary Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.10] HID: mcp2221: cancel last I2C command on read error Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.15] platform/x86: intel-hid: Add Dell 14 Plus 2-in-1 to dmi_vgbs_allow_list Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.10] HID: asus: avoid memory leak in asus_report_fixup() Sasha Levin
2026-03-10  9:01 ` Sasha Levin [this message]
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] platform/x86: oxpec: Add support for Aokzoe A2 Pro Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19] platform/x86: hp-wmi: Add Victus 16-d0xxx support Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.10] platform/x86: touchscreen_dmi: Add quirk for y-inverted Goodix touchscreen on SUPI S10 Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.1] HID: apple: avoid memory leak in apple_report_fixup() Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.10] platform/x86: intel-hid: Enable 5-button array on ThinkPad X1 Fold 16 Gen 1 Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] ASoC: Intel: sof_sdw: Add quirk for Alienware Area 51 (2025) 0CCD SKU Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] platform/x86: hp-wmi: Add Omen 16-xd0xxx fan and thermal support Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.12] HID: apple: Add EPOMAKER TH87 to the non-apple keyboards list Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] platform/x86: hp-wmi: Add Omen 16-wf0xxx fan and thermal support Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.15] nvme-pci: cap queue creation to used queues Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.10] dma-buf: Include ioctl.h in UAPI header Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] platform/x86: oxpec: Add support for OneXPlayer X1 Air Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19] platform/x86: hp-wmi: add Omen 14-fb1xxx (board 8E41) support Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.15] net: usb: r8152: add TRENDnet TUC-ET2G Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] platform/x86: oxpec: Add support for OneXPlayer APEX Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.10] ASoC: fsl_easrc: Fix event generation in fsl_easrc_iec958_put_bits() Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.15] HID: magicmouse: fix battery reporting for Apple Magic Trackpad 2 Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.18] HID: intel-ish-hid: ipc: Add Nova Lake-H/S PCI device IDs Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.1] nvme-fabrics: use kfree_sensitive() for DHCHAP secrets Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.15] module: Fix kernel panic when a symbol st_shndx is out of bounds Sasha Levin
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-5.15] HID: magicmouse: avoid memory leak in magicmouse_report_fixup() Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260310090145.2709021-19-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mpi3mr-linuxdrv.pdl@broadcom.com \
    --cc=patches@lists.linux.dev \
    --cc=ranjan.kumar@broadcom.com \
    --cc=sathya.prakash@broadcom.com \
    --cc=sreekanth.reddy@broadcom.com \
    --cc=stable@vger.kernel.org \
    --cc=sumit.saxena@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox