public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Ranjan Kumar <ranjan.kumar@broadcom.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Sasha Levin <sashal@kernel.org>,
	sathya.prakash@broadcom.com, kashyap.desai@broadcom.com,
	sumit.saxena@broadcom.com, sreekanth.reddy@broadcom.com,
	James.Bottomley@HansenPartnership.com,
	mpi3mr-linuxdrv.pdl@broadcom.com, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 6.19-6.12] scsi: mpi3mr: Clear reset history on ready and recheck state after timeout
Date: Tue, 10 Mar 2026 05:01:19 -0400	[thread overview]
Message-ID: <20260310090145.2709021-19-sashal@kernel.org> (raw)
In-Reply-To: <20260310090145.2709021-1-sashal@kernel.org>

From: Ranjan Kumar <ranjan.kumar@broadcom.com>

[ Upstream commit dbd53975ed4132d161b6a97ebe785a262380182d ]

The driver retains reset history even after the IOC has successfully
reached the READY state. That leaves stale reset information active during
normal operation and can mislead recovery and diagnostics.  In addition, if
the IOC becomes READY just as the ready timeout loop exits, the driver
still follows the failure path and may retry or report failure incorrectly.

Clear reset history once READY is confirmed so driver state matches actual
IOC status. After the timeout loop, recheck the IOC state and treat READY
as success instead of failing.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://patch.msgid.link/20260225082622.82588-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have a clear picture of the changes. Let me finalize my analysis.

## Analysis

### What the commit does

This commit makes two small changes to `mpi3mr_bring_ioc_ready()` in the
mpi3mr SCSI driver:

1. **Clears reset history on READY** (line 1533): When the IOC
   transitions to READY state inside the polling loop, it now calls
   `mpi3mr_clear_reset_history()` before returning success. Without
   this, stale reset history bits remain set, which can confuse
   subsequent recovery logic that checks
   `MPI3_SYSIF_IOC_STATUS_RESET_HISTORY`.

2. **Rechecks state after timeout** (lines 1553-1560): After the
   `do...while` timeout loop exits, the code does one final check to see
   if the IOC became READY right at the boundary. Previously, if the IOC
   transitioned to READY during the last `msleep(100)` (after the
   elapsed time already exceeded the timeout), the driver would fall
   through to `out_failed` and potentially retry unnecessarily or report
   failure.

### Bug assessment

**Race at timeout boundary (bug #2):** This is a real timing bug. The
loop checks `elapsed_time_sec < mrioc->ready_timeout` *after*
`msleep(100)`. If the IOC becomes READY during that last sleep,
`elapsed_time_sec` has already exceeded the timeout, so the loop exits.
Without the post-loop recheck, the driver goes to `out_failed`, which
triggers up to 2 retries (unnecessary soft resets) or reports
initialization failure. This can cause:
- Unnecessary controller resets during probe/resume
- Spurious initialization failures if retries are exhausted
- I/O failures during reset recovery (since `mpi3mr_reinit_ioc` fails)

**Stale reset history (bug #1):** If reset history is not cleared when
the IOC reaches READY, subsequent iterations of the loop in future calls
to this function will see `MPI3_SYSIF_IOC_STATUS_RESET_HISTORY` set and
jump to `out_failed` (line 1537-1540), causing spurious failure.

### Stable kernel criteria

- **Fixes a real bug:** Yes - timing race causes false initialization
  failures; stale state causes spurious failures in subsequent
  operations.
- **Obviously correct:** Yes - `mpi3mr_clear_reset_history()` is already
  called elsewhere in the same function (at line 1510 after soft reset).
  The post-timeout recheck follows the same pattern as the in-loop
  check.
- **Small and contained:** Yes - 10 lines added, single file, single
  function.
- **No new features:** Correct - pure bug fix.
- **Risk:** Very low. The added code only adds a success path where
  there was none (post-timeout READY check) and clears a status bit that
  should be cleared (reset history). No existing success/failure paths
  are altered.

### Impact

This function is called on three critical paths: PCI probe, controller
reset recovery, and system resume. A false failure here means the SCSI
controller doesn't initialize, which means loss of access to all
attached storage devices. This affects real users with Broadcom MPI3
(SAS/NVMe) storage controllers.

### Verification

- Confirmed `mpi3mr_clear_reset_history` is a trivial inline that clears
  the RESET_HISTORY bit in IOC status register (lines 1256-1263).
- Confirmed the function is called from probe (`mpi3mr_init_ioc`), reset
  (`mpi3mr_reinit_ioc`), and resume paths.
- Confirmed that `mpi3mr_clear_reset_history` is already used elsewhere
  in the same function, so the pattern is established.
- Confirmed the timeout race: the `do...while` loop exits when
  `elapsed_time_sec >= ready_timeout`, but the IOC could have become
  READY during the final `msleep(100)`, which is not checked without
  this patch.
- Confirmed that falling through to `out_failed` triggers retries (up to
  2) or returns failure (line 1564-1577).
- The commit is authored by Broadcom (the hardware vendor), accepted by
  the SCSI maintainer Martin Petersen, indicating domain expertise.
- Could NOT verify specific user reports of this race triggering in the
  field (commit message implies it was found via code
  review/diagnostics), but the race window is real and the consequences
  are severe.

**YES**

 drivers/scsi/mpi3mr/mpi3mr_fw.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c
index 8c4bb7169a87c..6d36575997871 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_fw.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c
@@ -1530,6 +1530,7 @@ static int mpi3mr_bring_ioc_ready(struct mpi3mr_ioc *mrioc)
 			ioc_info(mrioc,
 			    "successfully transitioned to %s state\n",
 			    mpi3mr_iocstate_name(ioc_state));
+			mpi3mr_clear_reset_history(mrioc);
 			return 0;
 		}
 		ioc_status = readl(&mrioc->sysif_regs->ioc_status);
@@ -1549,6 +1550,15 @@ static int mpi3mr_bring_ioc_ready(struct mpi3mr_ioc *mrioc)
 		elapsed_time_sec = jiffies_to_msecs(jiffies - start_time)/1000;
 	} while (elapsed_time_sec < mrioc->ready_timeout);
 
+	ioc_state = mpi3mr_get_iocstate(mrioc);
+	if (ioc_state == MRIOC_STATE_READY) {
+		ioc_info(mrioc,
+		    "successfully transitioned to %s state after %llu seconds\n",
+		    mpi3mr_iocstate_name(ioc_state), elapsed_time_sec);
+		mpi3mr_clear_reset_history(mrioc);
+		return 0;
+	}
+
 out_failed:
 	elapsed_time_sec = jiffies_to_msecs(jiffies - start_time)/1000;
 	if ((retry < 2) && (elapsed_time_sec < (mrioc->ready_timeout - 60))) {
-- 
2.51.0


      parent reply	other threads:[~2026-03-10  9:02 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260310090145.2709021-1-sashal@kernel.org>
2026-03-10  9:01 ` [PATCH AUTOSEL 6.19-6.12] scsi: devinfo: Add BLIST_SKIP_IO_HINTS for Iomega ZIP Sasha Levin
2026-03-10  9:01 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260310090145.2709021-19-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mpi3mr-linuxdrv.pdl@broadcom.com \
    --cc=patches@lists.linux.dev \
    --cc=ranjan.kumar@broadcom.com \
    --cc=sathya.prakash@broadcom.com \
    --cc=sreekanth.reddy@broadcom.com \
    --cc=stable@vger.kernel.org \
    --cc=sumit.saxena@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox