[PATCH AUTOSEL 6.18] scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.18] scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1
@ 2025-12-15  8:55 Sasha Levin
  2025-12-15  8:55 ` [PATCH AUTOSEL 6.18-5.10] scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" Sasha Levin
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Sasha Levin @ 2025-12-15  8:55 UTC (permalink / raw)
  To: patches, stable
  Cc: Suganath Prabu S, Ranjan Kumar, Martin K. Petersen, Sasha Levin,
	sathya.prakash, kashyap.desai, sumit.saxena, sreekanth.reddy,
	mpi3mr-linuxdrv.pdl, linux-scsi

From: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>

[ Upstream commit 4588e65cfd66fc8bbd9969ea730db39b60a36a30 ]

Avoid scanning SAS/SATA devices in channel 1 when SAS transport is
enabled, as the SAS/SATA devices are exposed through channel 0.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/stable/20251120071955.463475-1-suganath-prabu.subramani%40broadcom.com
Link: https://patch.msgid.link/20251120071955.463475-1-suganath-prabu.subramani@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

### DEPENDENCY VERIFICATION

The `non_stl` field and `sas_transport_enabled` were both introduced in
commit c4723e68a0d81 ("scsi: mpi3mr: Enable STL on HBAs where multipath
is disabled") from August 2022 (Linux 6.0 cycle).

This means the patch is only applicable to stable kernels 6.1.y and
later (not 5.15.y).

### SUMMARY

**Bug being fixed**: Duplicate SAS/SATA device entries appearing in both
channel 0 (SAS transport) and channel 1 (generic SCSI) when SAS
transport is enabled, causing user confusion and management issues.

**Stable kernel criteria assessment**:
- ✅ **Obviously correct**: Simple conditional checks with clear logic
- ✅ **Fixes a real bug**: Duplicate device enumeration is a real user-
  visible issue
- ✅ **Small and contained**: Only 4 lines of actual code change
- ✅ **No new features**: Just corrects existing device enumeration logic
- ✅ **Tested**: Merged through maintainer tree with proper sign-offs
- ✅ **Intentional stable submission**: Link to stable mailing list
  present

**Risk vs Benefit**:
- **Risk**: Very low - simple conditional check, worst case is device
  visibility issue
- **Benefit**: Fixes confusing duplicate device entries for MPI3MR users
  with SAS transport

**Concerns**:
1. Requires commit c4723e68a0d81 to be present (6.1.y and later only)
2. Version bump in header should be stripped for stable backport

The explicit submission to the stable mailing list, the small surgical
nature of the fix, and the clear bug it addresses make this a valid
stable backport candidate for kernels 6.1.y and newer.

**YES**

 drivers/scsi/mpi3mr/mpi3mr.h    | 4 ++--
 drivers/scsi/mpi3mr/mpi3mr_os.c | 4 +++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/mpi3mr/mpi3mr.h b/drivers/scsi/mpi3mr/mpi3mr.h
index 6742684e2990a..31d68c151b207 100644
--- a/drivers/scsi/mpi3mr/mpi3mr.h
+++ b/drivers/scsi/mpi3mr/mpi3mr.h
@@ -56,8 +56,8 @@ extern struct list_head mrioc_list;
 extern int prot_mask;
 extern atomic64_t event_counter;
 
-#define MPI3MR_DRIVER_VERSION	"8.15.0.5.50"
-#define MPI3MR_DRIVER_RELDATE	"12-August-2025"
+#define MPI3MR_DRIVER_VERSION	"8.15.0.5.51"
+#define MPI3MR_DRIVER_RELDATE	"18-November-2025"
 
 #define MPI3MR_DRIVER_NAME	"mpi3mr"
 #define MPI3MR_DRIVER_LICENSE	"GPL"
diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c
index b88633e1efe27..d4ca878d08869 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_os.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_os.c
@@ -1184,6 +1184,8 @@ static void mpi3mr_update_tgtdev(struct mpi3mr_ioc *mrioc,
 	if (is_added == true)
 		tgtdev->io_throttle_enabled =
 		    (flags & MPI3_DEVICE0_FLAGS_IO_THROTTLING_REQUIRED) ? 1 : 0;
+	if (!mrioc->sas_transport_enabled)
+		tgtdev->non_stl = 1;
 
 	switch (flags & MPI3_DEVICE0_FLAGS_MAX_WRITE_SAME_MASK) {
 	case MPI3_DEVICE0_FLAGS_MAX_WRITE_SAME_256_LB:
@@ -4844,7 +4846,7 @@ static int mpi3mr_target_alloc(struct scsi_target *starget)
 	spin_lock_irqsave(&mrioc->tgtdev_lock, flags);
 	if (starget->channel == mrioc->scsi_device_channel) {
 		tgt_dev = __mpi3mr_get_tgtdev_by_perst_id(mrioc, starget->id);
-		if (tgt_dev && !tgt_dev->is_hidden) {
+		if (tgt_dev && !tgt_dev->is_hidden && tgt_dev->non_stl) {
 			scsi_tgt_priv_data->starget = starget;
 			scsi_tgt_priv_data->dev_handle = tgt_dev->dev_handle;
 			scsi_tgt_priv_data->perst_id = tgt_dev->perst_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.18-5.10] scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
  2025-12-15  8:55 [PATCH AUTOSEL 6.18] scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1 Sasha Levin
@ 2025-12-15  8:55 ` Sasha Levin
  2025-12-15  8:55 ` [PATCH AUTOSEL 6.18-6.1] scsi: ufs: core: Fix EH failure after W-LUN resume error Sasha Levin
  2025-12-15  8:55 ` [PATCH AUTOSEL 6.18-5.10] scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset Sasha Levin
  2 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-12-15  8:55 UTC (permalink / raw)
  To: patches, stable
  Cc: Xingui Yang, Jason Yan, John Garry, Martin K. Petersen,
	Sasha Levin, dlemoal, johannes.thumshirn, wdhh6, linux-scsi

From: Xingui Yang <yangxingui@huawei.com>

[ Upstream commit 278712d20bc8ec29d1ad6ef9bdae9000ef2c220c ]

This reverts commit ab2068a6fb84751836a84c26ca72b3beb349619d.

When probing the exp-attached sata device, libsas/libata will issue a
hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a
broadcast event will be received after the disk probe fails, and this
commit causes the probe will be re-executed on the disk, and a faulty
disk may get into an indefinite loop of probe.

Therefore, revert this commit, although it can fix some temporary issues
with disk probe failure.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251202065627.140361-1-yangxingui@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Analysis Summary

### 1. Bug Description
This revert fixes an **infinite probe loop** regression introduced by
commit `ab2068a6fb84`. The original commit cleared the expander phy
attached address when device probe failed, intending to prevent
"flutter" detection. However, this creates an infinite loop:

1. Expander-attached SATA device probe fails
2. Hard reset issued in `sas_probe_sata()` → `ata_sas_async_probe()`
3. Broadcast event received after probe failure
4. Because address was cleared, device appears "new"
5. Probe re-executed → goto step 1

**Result**: Faulty disks get stuck in indefinite probe loops.

### 2. Stable Kernel Rules Assessment
| Criterion | Assessment |
|-----------|------------|
| Fixes real bug | ✅ YES - infinite loop (severe) |
| Obviously correct | ✅ YES - pure revert, restores known-working
behavior |
| Small and contained | ✅ YES - removes ~14 lines from one function in
one file |
| No new features | ✅ YES - removes code, adds nothing |
| Tested | ✅ YES - Reviewed-by from Jason Yan and John Garry |

### 3. Affected Stable Trees
- Original problematic commit first appeared in **v6.10-rc7**
- Present in: v6.10.x, v6.11.x, v6.12.x, v6.13.x, v6.14.x, v6.15.x,
  v6.16.x stable trees
- **NOT** in v6.1.x or v6.6.x LTS trees (they don't need this fix)

### 4. Risk Assessment
**Very Low Risk**:
- Pure revert restoring pre-v6.10 behavior that worked for years
- No complex logic changes - simply removes the problematic code
- Isolated to `sas_fail_probe()` function in `sas_internal.h`
- Same author (Xingui Yang) who wrote the original is now reverting it
- Multiple reviewers approved (John Garry from Oracle is a SCSI
  maintainer)

### 5. User Impact
- **High impact** for users with SAS expanders and faulty/failing SATA
  drives
- Enterprise storage systems commonly use SAS expanders
- Infinite loops during device probe can render systems unusable
- The trade-off (accepting "flutter" detection vs. infinite loops)
  clearly favors the revert

### 6. Dependencies
None - this is a self-contained revert that requires no other commits.

## Conclusion

This is a textbook case for stable backporting:
- Fixes a **severe regression** (infinite loop)
- **Pure revert** with minimal risk
- **Well-reviewed** by subsystem experts
- Clearly affects real users with SAS/SATA storage configurations
- Original author acknowledges the bug is worse than the original issue
  it tried to fix

The commit should be backported to all stable trees v6.10.x and newer
that contain the original problematic commit.

**YES**

 drivers/scsi/libsas/sas_internal.h | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 6706f2be8d274..da5408c701cdd 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -145,20 +145,6 @@ static inline void sas_fail_probe(struct domain_device *dev, const char *func, i
 		func, dev->parent ? "exp-attached" :
 		"direct-attached",
 		SAS_ADDR(dev->sas_addr), err);
-
-	/*
-	 * If the device probe failed, the expander phy attached address
-	 * needs to be reset so that the phy will not be treated as flutter
-	 * in the next revalidation
-	 */
-	if (dev->parent && !dev_is_expander(dev->dev_type)) {
-		struct sas_phy *phy = dev->phy;
-		struct domain_device *parent = dev->parent;
-		struct ex_phy *ex_phy = &parent->ex_dev.ex_phy[phy->number];
-
-		memset(ex_phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
-	}
-
 	sas_unregister_dev(dev->port, dev);
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.18-6.1] scsi: ufs: core: Fix EH failure after W-LUN resume error
  2025-12-15  8:55 [PATCH AUTOSEL 6.18] scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1 Sasha Levin
  2025-12-15  8:55 ` [PATCH AUTOSEL 6.18-5.10] scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" Sasha Levin
@ 2025-12-15  8:55 ` Sasha Levin
  2025-12-15  8:55 ` [PATCH AUTOSEL 6.18-5.10] scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset Sasha Levin
  2 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-12-15  8:55 UTC (permalink / raw)
  To: patches, stable
  Cc: Brian Kao, Bart Van Assche, Martin K. Petersen, Sasha Levin,
	peter.wang, avri.altman, beanhuo, adrian.hunter, quic_nguyenb,
	linux-scsi

From: Brian Kao <powenkao@google.com>

[ Upstream commit b4bb6daf4ac4d4560044ecdd81e93aa2f6acbb06 ]

When a W-LUN resume fails, its parent devices in the SCSI hierarchy,
including the scsi_target, may be runtime suspended. Subsequently, the
error handler in ufshcd_recover_pm_error() fails to set the W-LUN device
back to active because the parent target is not active.  This results in
the following errors:

  google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ...
  ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000
  ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5
  ...
  ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active

Address this by:

 1. Ensuring the W-LUN's parent scsi_target is runtime resumed before
    attempting to set the W-LUN to active within
    ufshcd_recover_pm_error().

 2. Explicitly checking for power.runtime_error on the HBA and W-LUN
    devices before calling pm_runtime_set_active() to clear the error
    state.

 3. Adding pm_runtime_get_sync(hba->dev) in
    ufshcd_err_handling_prepare() to ensure the HBA itself is active
    during error recovery, even if a child device resume failed.

These changes ensure the device power states are managed correctly
during error recovery.

Signed-off-by: Brian Kao <powenkao@google.com>
Tested-by: Brian Kao <powenkao@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251112063214.1195761-1-powenkao@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Analysis Summary

### Problem Description
This commit fixes a bug where UFS error recovery fails after a W-LUN
(Well Known Logical Unit Number) resume error. When a W-LUN resume
fails:

1. The parent devices in the SCSI hierarchy (including `scsi_target`)
   remain runtime suspended
2. The error handler's `ufshcd_recover_pm_error()` attempts to set the
   W-LUN device active
3. The PM core rejects this because the parent target is not active,
   producing the error:
  ```
  runtime PM trying to activate child device 0:0:0:49488 but parent
  (target0:0:0) is not active
  ```

### Technical Analysis of the Fix

**Original (buggy) code:**
```c
ret = pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev);
if (ret)
    ret = pm_runtime_set_active(hba->dev);
```

This incorrectly tries to activate a child device while its parent is
suspended - violating the PM hierarchy requirement.

**The fix does three things:**
1. **Adds `pm_runtime_get_sync(hba->dev)` in
   `ufshcd_err_handling_prepare()`** - Ensures HBA is active during
   error recovery even if a child device resume failed
2. **Fixes PM hierarchy in `ufshcd_recover_pm_error()`** - Explicitly
   resumes the parent `scsi_target` before trying to activate the child
   W-LUN
3. **Checks `power.runtime_error`** - Only clears error state on devices
   that actually have an error, rather than relying on return values

### Stable Tree Criteria Assessment

| Criterion | Assessment |
|-----------|------------|
| Fixes real bug | ✅ YES - Error handling completely fails without this
|
| Obviously correct | ✅ YES - Follows standard PM parent-before-child
rules |
| Small and contained | ✅ YES - ~35 lines in one file, error handling
path only |
| No new features | ✅ YES - Pure bugfix |
| Tested | ✅ YES - Has Tested-by and Reviewed-by: Bart Van Assche |
| Exists in stable | ✅ YES - `ufshcd_recover_pm_error()` introduced in
v5.10 |
| Dependencies | ✅ NONE - Uses standard, long-existing PM APIs |

### User Impact
- **Affected systems**: Android devices, embedded systems with UFS
  storage
- **Severity**: Error handler failure can lead to I/O failures and
  potential data loss
- **Frequency**: Occurs when W-LUN resume fails and error handler tries
  to recover

### Risk Assessment
- **Low regression risk**: Changes are in error handling code path only
- **Well-understood fix**: Standard PM hierarchy handling pattern
- **Good review**: Bart Van Assche (experienced SCSI/block maintainer)
  reviewed it

### Conclusion

This is a clear-cut bug fix for UFS error handling. The bug causes the
error handler to fail completely when a W-LUN resume error occurs, which
can leave the storage subsystem in an unrecoverable state. The fix
correctly implements PM hierarchy requirements (parent must be active
before child), is well-tested, has expert review, and uses standard APIs
that exist in all stable trees since v5.10. The changes are contained to
the error handling path with minimal regression risk.

**YES**

 drivers/ufs/core/ufshcd.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index d6a060a724618..ce52c3bafbe8f 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6498,6 +6498,11 @@ static void ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend)
 
 static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
 {
+	/*
+	 * A WLUN resume failure could potentially lead to the HBA being
+	 * runtime suspended, so take an extra reference on hba->dev.
+	 */
+	pm_runtime_get_sync(hba->dev);
 	ufshcd_rpm_get_sync(hba);
 	if (pm_runtime_status_suspended(&hba->ufs_device_wlun->sdev_gendev) ||
 	    hba->is_sys_suspended) {
@@ -6537,6 +6542,7 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
 	if (ufshcd_is_clkscaling_supported(hba))
 		ufshcd_clk_scaling_suspend(hba, false);
 	ufshcd_rpm_put(hba);
+	pm_runtime_put(hba->dev);
 }
 
 static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
@@ -6551,28 +6557,42 @@ static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
 #ifdef CONFIG_PM
 static void ufshcd_recover_pm_error(struct ufs_hba *hba)
 {
+	struct scsi_target *starget = hba->ufs_device_wlun->sdev_target;
 	struct Scsi_Host *shost = hba->host;
 	struct scsi_device *sdev;
 	struct request_queue *q;
-	int ret;
+	bool resume_sdev_queues = false;
 
 	hba->is_sys_suspended = false;
+
 	/*
-	 * Set RPM status of wlun device to RPM_ACTIVE,
-	 * this also clears its runtime error.
+	 * Ensure the parent's error status is cleared before proceeding
+	 * to the child, as the parent must be active to activate the child.
 	 */
-	ret = pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev);
+	if (hba->dev->power.runtime_error) {
+		/* hba->dev has no functional parent thus simplily set RPM_ACTIVE */
+		pm_runtime_set_active(hba->dev);
+		resume_sdev_queues = true;
+	}
+
+	if (hba->ufs_device_wlun->sdev_gendev.power.runtime_error) {
+		/*
+		 * starget, parent of wlun, might be suspended if wlun resume failed.
+		 * Make sure parent is resumed before set child (wlun) active.
+		 */
+		pm_runtime_get_sync(&starget->dev);
+		pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev);
+		pm_runtime_put_sync(&starget->dev);
+		resume_sdev_queues = true;
+	}
 
-	/* hba device might have a runtime error otherwise */
-	if (ret)
-		ret = pm_runtime_set_active(hba->dev);
 	/*
 	 * If wlun device had runtime error, we also need to resume those
 	 * consumer scsi devices in case any of them has failed to be
 	 * resumed due to supplier runtime resume failure. This is to unblock
 	 * blk_queue_enter in case there are bios waiting inside it.
 	 */
-	if (!ret) {
+	if (resume_sdev_queues) {
 		shost_for_each_device(sdev, shost) {
 			q = sdev->request_queue;
 			if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.18-5.10] scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
  2025-12-15  8:55 [PATCH AUTOSEL 6.18] scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1 Sasha Levin
  2025-12-15  8:55 ` [PATCH AUTOSEL 6.18-5.10] scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" Sasha Levin
  2025-12-15  8:55 ` [PATCH AUTOSEL 6.18-6.1] scsi: ufs: core: Fix EH failure after W-LUN resume error Sasha Levin
@ 2025-12-15  8:55 ` Sasha Levin
  2 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-12-15  8:55 UTC (permalink / raw)
  To: patches, stable
  Cc: Wen Xiong, Kyle Mahlkuch, Martin K. Petersen, Sasha Levin,
	James.Bottomley, linux-scsi

From: Wen Xiong <wenxiong@linux.ibm.com>

[ Upstream commit 6ac3484fb13b2fc7f31cfc7f56093e7d0ce646a5 ]

A dynamic remove/add storage adapter test hits EEH on PowerPC:

  EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160
  EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650
  EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr]
  EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr]
  EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr]
  EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440
  EEH: [c00000000017e558] process_one_work+0x298/0x590
  EEH: [c00000000017eef8] worker_thread+0xa8/0x620
  EEH: [c00000000018be34] kthread+0x124/0x130
  EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64

A PCIe bus trace reveals that a vector of MSI-X is cleared to 0 by
irqbalance daemon. If we disable irqbalance daemon, we won't see the
issue.

With debug enabled in ipr driver:

  [   44.103071] ipr: Entering __ipr_remove
  [   44.103083] ipr: Entering ipr_initiate_ioa_bringdown
  [   44.103091] ipr: Entering ipr_reset_shutdown_ioa
  [   44.103099] ipr: Leaving ipr_reset_shutdown_ioa
  [   44.103105] ipr: Leaving ipr_initiate_ioa_bringdown
  [   44.149918] ipr: Entering ipr_reset_ucode_download
  [   44.149935] ipr: Entering ipr_reset_alert
  [   44.150032] ipr: Entering ipr_reset_start_timer
  [   44.150038] ipr: Leaving ipr_reset_alert
  [   44.244343] scsi 1:2:3:0: alua: Detached
  [   44.254300] ipr: Entering ipr_reset_start_bist
  [   44.254320] ipr: Entering ipr_reset_start_timer
  [   44.254325] ipr: Leaving ipr_reset_start_bist
  [   44.364329] scsi 1:2:4:0: alua: Detached
  [   45.134341] scsi 1:2:5:0: alua: Detached
  [   45.860949] ipr: Entering ipr_reset_shutdown_ioa
  [   45.860962] ipr: Leaving ipr_reset_shutdown_ioa
  [   45.860966] ipr: Entering ipr_reset_alert
  [   45.861028] ipr: Entering ipr_reset_start_timer
  [   45.861035] ipr: Leaving ipr_reset_alert
  [   45.964302] ipr: Entering ipr_reset_start_bist
  [   45.964309] ipr: Entering ipr_reset_start_timer
  [   45.964313] ipr: Leaving ipr_reset_start_bist
  [   46.264301] ipr: Entering ipr_reset_bist_done
  [   46.264309] ipr: Leaving ipr_reset_bist_done

During adapter reset, ipr device driver blocks config space access but
can't block MMIO access for MSI-X entries.  There is very small window:
irqbalance daemon kicks in during adapter reset before ipr driver calls
pci_restore_state(pdev) to restore MSI-X table.

irqbalance daemon reads back all 0 for that MSI-X vector in
__pci_read_msi_msg().

irqbalance daemon:

  msi_domain_set_affinity()
  ->irq_chip_set_affinity_patent()
  ->xive_irq_set_affinity()
  ->irq_chip_compose_msi_msg()
    ->pseries_msi_compose_msg()
    ->__pci_read_msi_msg(): read all 0 since didn't call pci_restore_state
  ->irq_chip_write_msi_msg()
    -> pci_write_msg_msi(): write 0 to the msix vector entry

When ipr driver calls pci_restore_state(pdev) in
ipr_reset_restore_cfg_space(), the MSI-X vector entry has been cleared
by irqbalance daemon in pci_write_msg_msix().

  pci_restore_state()
  ->__pci_restore_msix_state()

Below is the MSI-X table for ipr adapter after irqbalance daemon kicked
in during adapter reset:

  Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0
  Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0
  Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0
  Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0
  Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0
  Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0
  Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0
  Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0
  Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0
  ---------> Hit EEH since msix vector of index=8 are 0
  Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0
  Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0
  Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0

  [   46.264312] ipr: Entering ipr_reset_restore_cfg_space
  [   46.267439] ipr: Entering ipr_fail_all_ops
  [   46.267447] ipr: Leaving ipr_fail_all_ops
  [   46.267451] ipr: Leaving ipr_reset_restore_cfg_space
  [   46.267454] ipr: Entering ipr_ioa_bringdown_done
  [   46.267458] ipr: Leaving ipr_ioa_bringdown_done
  [   46.267467] ipr: Entering ipr_worker_thread
  [   46.267470] ipr: Leaving ipr_worker_thread

IRQ balancing is not required during adapter reset.

Enable "IRQ_NO_BALANCING" flag before starting adapter reset and disable
it after calling pci_restore_state(). The irqbalance daemon is disabled
for this short period of time (~2s).

Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com>
Link: https://patch.msgid.link/20251028142427.3969819-2-wenxiong@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

The APIs are well-established with 124 callers for
`irq_set_status_flags` - this approach is standard practice in the
kernel.

---

## Summary

**Why this should be backported:**

1. **Fixes a real, serious bug**: EEH events on PowerPC are significant
   hardware errors that can cause system instability and require
   recovery
2. **Clear root cause and fix**: Race condition between irqbalance
   daemon and adapter reset is clearly identified and surgically fixed
3. **Small, self-contained change**: ~30 lines added to a single driver
   file
4. **Uses standard kernel APIs**: `irq_set_status_flags()` and
   `irq_clear_status_flags()` are used by 124+ other callers - well-
   tested mechanism
5. **Low regression risk**: Changes only affect the ipr driver during
   adapter reset, no impact on other subsystems
6. **Critical for affected users**: Systems running IBM Power with ipr
   storage adapters and irqbalance enabled will hit this during hot-plug
   operations
7. **No dependencies**: All required APIs have been in the kernel for
   many years

**Risk Assessment**: LOW
- The fix is narrowly scoped to the reset path
- Uses battle-tested IRQ infrastructure
- Only affects ipr driver users during specific operations

**Benefit**: HIGH for affected users - prevents EEH crashes during
adapter reset operations

This commit meets all stable kernel criteria: it fixes a real bug, is
obviously correct, is small and contained, and introduces no new
features. The mechanism used (temporarily disabling IRQ balancing during
critical operations) is a standard pattern used throughout the kernel.

**YES**

 drivers/scsi/ipr.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 44214884deaf5..d62bb7d0e4164 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -61,8 +61,8 @@
 #include <linux/hdreg.h>
 #include <linux/reboot.h>
 #include <linux/stringify.h>
+#include <linux/irq.h>
 #include <asm/io.h>
-#include <asm/irq.h>
 #include <asm/processor.h>
 #include <scsi/scsi.h>
 #include <scsi/scsi_host.h>
@@ -7843,6 +7843,30 @@ static int ipr_dump_mailbox_wait(struct ipr_cmnd *ipr_cmd)
 	return IPR_RC_JOB_RETURN;
 }
 
+/**
+ * ipr_set_affinity_nobalance
+ * @ioa_cfg:	ipr_ioa_cfg struct for an ipr device
+ * @flag:	bool
+ *	true: ensable "IRQ_NO_BALANCING" bit for msix interrupt
+ *	false: disable "IRQ_NO_BALANCING" bit for msix interrupt
+ * Description: This function will be called to disable/enable
+ *	"IRQ_NO_BALANCING" to avoid irqbalance daemon
+ *	kicking in during adapter reset.
+ **/
+static void ipr_set_affinity_nobalance(struct ipr_ioa_cfg *ioa_cfg, bool flag)
+{
+	int irq, i;
+
+	for (i = 0; i < ioa_cfg->nvectors; i++) {
+		irq = pci_irq_vector(ioa_cfg->pdev, i);
+
+		if (flag)
+			irq_set_status_flags(irq, IRQ_NO_BALANCING);
+		else
+			irq_clear_status_flags(irq, IRQ_NO_BALANCING);
+	}
+}
+
 /**
  * ipr_reset_restore_cfg_space - Restore PCI config space.
  * @ipr_cmd:	ipr command struct
@@ -7867,6 +7891,7 @@ static int ipr_reset_restore_cfg_space(struct ipr_cmnd *ipr_cmd)
 		return IPR_RC_JOB_CONTINUE;
 	}
 
+	ipr_set_affinity_nobalance(ioa_cfg, false);
 	ipr_fail_all_ops(ioa_cfg);
 
 	if (ioa_cfg->sis64) {
@@ -7946,6 +7971,7 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
 		rc = pci_write_config_byte(ioa_cfg->pdev, PCI_BIST, PCI_BIST_START);
 
 	if (rc == PCIBIOS_SUCCESSFUL) {
+		ipr_set_affinity_nobalance(ioa_cfg, true);
 		ipr_cmd->job_step = ipr_reset_bist_done;
 		ipr_reset_start_timer(ipr_cmd, IPR_WAIT_FOR_BIST_TIMEOUT);
 		rc = IPR_RC_JOB_RETURN;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-12-15  8:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-15  8:55 [PATCH AUTOSEL 6.18] scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1 Sasha Levin
2025-12-15  8:55 ` [PATCH AUTOSEL 6.18-5.10] scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" Sasha Levin
2025-12-15  8:55 ` [PATCH AUTOSEL 6.18-6.1] scsi: ufs: core: Fix EH failure after W-LUN resume error Sasha Levin
2025-12-15  8:55 ` [PATCH AUTOSEL 6.18-5.10] scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox