From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>,
Wen Xiong <wenxiong@linux.ibm.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 27/72] scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
Date: Thu, 15 Jan 2026 17:48:37 +0100 [thread overview]
Message-ID: <20260115164144.479599502@linuxfoundation.org> (raw)
In-Reply-To: <20260115164143.482647486@linuxfoundation.org>
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wen Xiong <wenxiong@linux.ibm.com>
[ Upstream commit 6ac3484fb13b2fc7f31cfc7f56093e7d0ce646a5 ]
A dynamic remove/add storage adapter test hits EEH on PowerPC:
EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160
EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650
EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr]
EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr]
EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr]
EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440
EEH: [c00000000017e558] process_one_work+0x298/0x590
EEH: [c00000000017eef8] worker_thread+0xa8/0x620
EEH: [c00000000018be34] kthread+0x124/0x130
EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
A PCIe bus trace reveals that a vector of MSI-X is cleared to 0 by
irqbalance daemon. If we disable irqbalance daemon, we won't see the
issue.
With debug enabled in ipr driver:
[ 44.103071] ipr: Entering __ipr_remove
[ 44.103083] ipr: Entering ipr_initiate_ioa_bringdown
[ 44.103091] ipr: Entering ipr_reset_shutdown_ioa
[ 44.103099] ipr: Leaving ipr_reset_shutdown_ioa
[ 44.103105] ipr: Leaving ipr_initiate_ioa_bringdown
[ 44.149918] ipr: Entering ipr_reset_ucode_download
[ 44.149935] ipr: Entering ipr_reset_alert
[ 44.150032] ipr: Entering ipr_reset_start_timer
[ 44.150038] ipr: Leaving ipr_reset_alert
[ 44.244343] scsi 1:2:3:0: alua: Detached
[ 44.254300] ipr: Entering ipr_reset_start_bist
[ 44.254320] ipr: Entering ipr_reset_start_timer
[ 44.254325] ipr: Leaving ipr_reset_start_bist
[ 44.364329] scsi 1:2:4:0: alua: Detached
[ 45.134341] scsi 1:2:5:0: alua: Detached
[ 45.860949] ipr: Entering ipr_reset_shutdown_ioa
[ 45.860962] ipr: Leaving ipr_reset_shutdown_ioa
[ 45.860966] ipr: Entering ipr_reset_alert
[ 45.861028] ipr: Entering ipr_reset_start_timer
[ 45.861035] ipr: Leaving ipr_reset_alert
[ 45.964302] ipr: Entering ipr_reset_start_bist
[ 45.964309] ipr: Entering ipr_reset_start_timer
[ 45.964313] ipr: Leaving ipr_reset_start_bist
[ 46.264301] ipr: Entering ipr_reset_bist_done
[ 46.264309] ipr: Leaving ipr_reset_bist_done
During adapter reset, ipr device driver blocks config space access but
can't block MMIO access for MSI-X entries. There is very small window:
irqbalance daemon kicks in during adapter reset before ipr driver calls
pci_restore_state(pdev) to restore MSI-X table.
irqbalance daemon reads back all 0 for that MSI-X vector in
__pci_read_msi_msg().
irqbalance daemon:
msi_domain_set_affinity()
->irq_chip_set_affinity_patent()
->xive_irq_set_affinity()
->irq_chip_compose_msi_msg()
->pseries_msi_compose_msg()
->__pci_read_msi_msg(): read all 0 since didn't call pci_restore_state
->irq_chip_write_msi_msg()
-> pci_write_msg_msi(): write 0 to the msix vector entry
When ipr driver calls pci_restore_state(pdev) in
ipr_reset_restore_cfg_space(), the MSI-X vector entry has been cleared
by irqbalance daemon in pci_write_msg_msix().
pci_restore_state()
->__pci_restore_msix_state()
Below is the MSI-X table for ipr adapter after irqbalance daemon kicked
in during adapter reset:
Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0
Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0
Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0
Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0
Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0
Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0
Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0
Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0
Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0
---------> Hit EEH since msix vector of index=8 are 0
Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0
Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0
Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0
Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0
Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0
Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0
Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0
[ 46.264312] ipr: Entering ipr_reset_restore_cfg_space
[ 46.267439] ipr: Entering ipr_fail_all_ops
[ 46.267447] ipr: Leaving ipr_fail_all_ops
[ 46.267451] ipr: Leaving ipr_reset_restore_cfg_space
[ 46.267454] ipr: Entering ipr_ioa_bringdown_done
[ 46.267458] ipr: Leaving ipr_ioa_bringdown_done
[ 46.267467] ipr: Entering ipr_worker_thread
[ 46.267470] ipr: Leaving ipr_worker_thread
IRQ balancing is not required during adapter reset.
Enable "IRQ_NO_BALANCING" flag before starting adapter reset and disable
it after calling pci_restore_state(). The irqbalance daemon is disabled
for this short period of time (~2s).
Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com>
Link: https://patch.msgid.link/20251028142427.3969819-2-wenxiong@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/scsi/ipr.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 8c062afb2918d..89b8990208506 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -62,8 +62,8 @@
#include <linux/hdreg.h>
#include <linux/reboot.h>
#include <linux/stringify.h>
+#include <linux/irq.h>
#include <asm/io.h>
-#include <asm/irq.h>
#include <asm/processor.h>
#include <scsi/scsi.h>
#include <scsi/scsi_host.h>
@@ -8669,6 +8669,30 @@ static int ipr_dump_mailbox_wait(struct ipr_cmnd *ipr_cmd)
return IPR_RC_JOB_RETURN;
}
+/**
+ * ipr_set_affinity_nobalance
+ * @ioa_cfg: ipr_ioa_cfg struct for an ipr device
+ * @flag: bool
+ * true: ensable "IRQ_NO_BALANCING" bit for msix interrupt
+ * false: disable "IRQ_NO_BALANCING" bit for msix interrupt
+ * Description: This function will be called to disable/enable
+ * "IRQ_NO_BALANCING" to avoid irqbalance daemon
+ * kicking in during adapter reset.
+ **/
+static void ipr_set_affinity_nobalance(struct ipr_ioa_cfg *ioa_cfg, bool flag)
+{
+ int irq, i;
+
+ for (i = 0; i < ioa_cfg->nvectors; i++) {
+ irq = pci_irq_vector(ioa_cfg->pdev, i);
+
+ if (flag)
+ irq_set_status_flags(irq, IRQ_NO_BALANCING);
+ else
+ irq_clear_status_flags(irq, IRQ_NO_BALANCING);
+ }
+}
+
/**
* ipr_reset_restore_cfg_space - Restore PCI config space.
* @ipr_cmd: ipr command struct
@@ -8693,6 +8717,7 @@ static int ipr_reset_restore_cfg_space(struct ipr_cmnd *ipr_cmd)
return IPR_RC_JOB_CONTINUE;
}
+ ipr_set_affinity_nobalance(ioa_cfg, false);
ipr_fail_all_ops(ioa_cfg);
if (ioa_cfg->sis64) {
@@ -8772,6 +8797,7 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
rc = pci_write_config_byte(ioa_cfg->pdev, PCI_BIST, PCI_BIST_START);
if (rc == PCIBIOS_SUCCESSFUL) {
+ ipr_set_affinity_nobalance(ioa_cfg, true);
ipr_cmd->job_step = ipr_reset_bist_done;
ipr_reset_start_timer(ipr_cmd, IPR_WAIT_FOR_BIST_TIMEOUT);
rc = IPR_RC_JOB_RETURN;
--
2.51.0
next prev parent reply other threads:[~2026-01-15 17:11 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-15 16:48 [PATCH 6.1 00/72] 6.1.161-rc1 review Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 01/72] atm: Fix dma_free_coherent() size Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 02/72] net: 3com: 3c59x: fix possible null dereference in vortex_probe1() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 03/72] btrfs: always detect conflicting inodes when logging inode refs Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 04/72] mei: me: add nova lake point S DID Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 05/72] lib/crypto: aes: Fix missing MMU protection for AES S-box Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 06/72] counter: interrupt-cnt: Drop IRQF_NO_THREAD flag Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 07/72] drm/pl111: Fix error handling in pl111_amba_probe Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 08/72] gpio: rockchip: mark the GPIO controller as sleeping Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 09/72] wifi: avoid kernel-infoleak from struct iw_point Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 10/72] libceph: prevent potential out-of-bounds reads in handle_auth_done() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 11/72] libceph: replace overzealous BUG_ON in osdmap_apply_incremental() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 12/72] libceph: make free_choose_arg_map() resilient to partial allocation Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 13/72] libceph: return the handler error from mon_handle_auth_done() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 14/72] libceph: make calc_target() set t->paused, not just clear it Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 15/72] ext4: introduce ITAIL helper Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 16/72] ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 17/72] net: Add locking to protect skb->dev access in ip_output Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 18/72] tls: Use __sk_dst_get() and dst_dev_rcu() in get_netdev_for_sock() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 19/72] csky: fix csky_cmpxchg_fixup not working Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 20/72] ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 21/72] alpha: dont reference obsolete termio struct for TC* constants Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 22/72] NFSv4: ensure the open stateid seqid doesnt go backwards Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 23/72] NFS: Fix up the automount fs_context to use the correct cred Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 24/72] smb/client: fix NT_STATUS_UNABLE_TO_FREE_VM value Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 25/72] smb/client: fix NT_STATUS_DEVICE_DOOR_OPEN value Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 26/72] smb/client: fix NT_STATUS_NO_DATA_DETECTED value Greg Kroah-Hartman
2026-01-15 16:48 ` Greg Kroah-Hartman [this message]
2026-01-15 16:48 ` [PATCH 6.1 28/72] scsi: ufs: core: Fix EH failure after W-LUN resume error Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 29/72] scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 30/72] arm64: dts: add off-on-delay-us for usdhc2 regulator Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 31/72] ARM: dts: imx6q-ba16: fix RTC interrupt level Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 32/72] arm64: dts: imx8mp: Fix LAN8740Ai PHY reference clock on DH electronics i.MX8M Plus DHCOM Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 33/72] netfilter: nft_synproxy: avoid possible data-race on update operation Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 34/72] netfilter: nf_tables: fix memory leak in nf_tables_newrule() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 35/72] netfilter: nf_conncount: update last_gc only when GC has been performed Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 36/72] net: marvell: prestera: fix NULL dereference on devlink_alloc() failure Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 37/72] bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 38/72] net: mscc: ocelot: Fix crash when adding interface under a lag Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 39/72] inet: ping: Fix icmp out counting Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 40/72] net: sock: fix hardened usercopy panic in sock_recv_errqueue Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 41/72] netdev: preserve NETIF_F_ALL_FOR_ALL across TSO updates Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 42/72] net/mlx5e: Dont print error message due to invalid module Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 43/72] net: wwan: iosm: Fix memory leak in ipc_mux_deinit() Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 44/72] eth: bnxt: move and rename reset helpers Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 45/72] bnxt_en: Fix potential data corruption with HW GRO/LRO Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 46/72] net: fix memory leak in skb_segment_list for GRO packets Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 47/72] HID: quirks: work around VID/PID conflict for appledisplay Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 48/72] net/sched: sch_qfq: Fix NULL deref when deactivating inactive aggregate in qfq_reset Greg Kroah-Hartman
2026-01-15 16:48 ` [PATCH 6.1 49/72] net: usb: pegasus: fix memory leak in update_eth_regs_async() Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 50/72] net: enetc: fix build warning when PAGE_SIZE is greater than 128K Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 51/72] arp: do not assume dev_hard_header() does not change skb->head Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 52/72] pinctrl: qcom: lpass-lpi: mark the GPIO controller as sleeping Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 53/72] mm/pagewalk: add walk_page_range_vma() Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 54/72] ksm: use range-walk function to jump over holes in scan_get_next_rmap_item Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 55/72] ALSA: ac97bus: Use guard() for mutex locks Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 56/72] ALSA: ac97: fix a double free in snd_ac97_controller_register() Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 57/72] nfsd: provide locking for v4_end_grace Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 58/72] NFS: trace: show TIMEDOUT instead of 0x6e Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 59/72] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 60/72] NFSD: Remove NFSERR_EAGAIN Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 61/72] bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4K Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 62/72] bpf: Make variables in bpf_prog_test_run_xdp less confusing Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 63/72] bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 64/72] bpf, test_run: Subtract size of xdp_frame from allowed metadata size Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 65/72] bpf: Fix reference count leak in bpf_prog_test_run_xdp() Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 66/72] powercap: fix race condition in register_control_type() Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 67/72] powercap: fix sscanf() error return value handling Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 68/72] can: j1939: make j1939_session_activate() fail if device is no longer registered Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 69/72] ASoC: amd: yc: Add quirk for Honor MagicBook X16 2025 Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 70/72] ASoC: fsl_sai: Add missing registers to cache default Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 71/72] scsi: sg: Fix occasional bogus elapsed time that exceeds timeout Greg Kroah-Hartman
2026-01-15 16:49 ` [PATCH 6.1 72/72] bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path Greg Kroah-Hartman
2026-01-15 19:15 ` [PATCH 6.1 00/72] 6.1.161-rc1 review Brett A C Sheffield
2026-01-15 19:36 ` Slade Watkins
2026-01-15 22:14 ` Florian Fainelli
2026-01-16 7:55 ` Francesco Dolcini
2026-01-16 9:14 ` Peter Schneider
2026-01-16 10:22 ` Ron Economos
2026-01-16 10:33 ` Jon Hunter
2026-01-16 15:39 ` Mark Brown
2026-01-16 17:44 ` Hardik Garg
2026-01-16 19:28 ` Shuah Khan
2026-01-17 14:35 ` Miguel Ojeda
2026-01-19 11:10 ` Jeffrin Thalakkottoor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260115164144.479599502@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=Kyle.Mahlkuch@ibm.com \
--cc=martin.petersen@oracle.com \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=wenxiong@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox