* Re: [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH [not found] ` <90777da90abe02c87d30968bfedc9168@linux.ibm.com> @ 2025-03-20 8:23 ` Thomas Gleixner 2025-03-20 8:48 ` Thomas Gleixner 0 siblings, 1 reply; 6+ messages in thread From: Thomas Gleixner @ 2025-03-20 8:23 UTC (permalink / raw) To: Wen Xiong; +Cc: linux-kernel, gjoyce, linux-pci, Bjorn Helgaas, linux-scsi On Wed, Mar 19 2025 at 21:58, Wen Xiong wrote: >> The real problem has nothing to do with a remove/add operation. The >> problem is solely in the probe function. > > I don't think we have problems in probe function since this driver has > been in productions for many many years. Seriously? It does not matter at all whether you had it many years in production or not. Fact is that the driver is operational and after that a device reset happens, which wipes the config space. That _IS_ the problem. > Also we didn't see the issue before the "MSI domain" patchset dropping > into linux interrupt code(no issue in rhel92 release). That's completely irrelevant. See above. > Device reset is not called in probe function. Right. The reset is part of PCI error handling, which happens _AFTER_ the driver has set up interrupts. > We don't see the issue without dynamically remove/add operation. > There is a small window which irqbalance daemon kicks in during device > reset. So it took about over 6 hours to recreate the issue when doing > remove/add loop operation. Sure. You need a loop to hit the window. And it does not matter whether it's the probe or the remove which triggers it. Fact is that the reset wipes out the config space, which means that any read from the config space between reset and restore will return garbage. That problem is not restricted to the interrupt code. It's a general problem. > We can't find the good way to fix the issue in both of device drivers. > So we look for some help in interrupt code. No. This is _NOT_ a interrupt specific problem. You are observing the symptom related to interrupts, but any other code which reads from config space during the reset window has exactly the same problem. The PCI error handling resets the device asynchronously to any other operation which might access the config space. Yes, set_affinity() is one possible way to hit that due to the implementation detail of pseries_msi_compose_msg(), which reads the MSI message composed by the underlying hypervisor back from config space. But even if it would not read back and compose the message itself then set_affinity() would create inconsistent state because: reset() compose() write() restore() I.e. the reset machinery overwrites the new message, which means this ends up with inconsistent state. So this is a general problem with PCI error handling and _not_ a problem of the interrupt subsystem. I have no idea what to do about that, but this needs to be looked at from the PCI error handling side and not papered over at the messenger. Thanks, tglx ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH 2025-03-20 8:23 ` [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH Thomas Gleixner @ 2025-03-20 8:48 ` Thomas Gleixner 2025-03-27 21:36 ` Wen Xiong 0 siblings, 1 reply; 6+ messages in thread From: Thomas Gleixner @ 2025-03-20 8:48 UTC (permalink / raw) To: Wen Xiong; +Cc: linux-kernel, gjoyce, linux-pci, Bjorn Helgaas, linux-scsi On Thu, Mar 20 2025 at 09:23, Thomas Gleixner wrote: > On Wed, Mar 19 2025 at 21:58, Wen Xiong wrote: >> We don't see the issue without dynamically remove/add operation. >> There is a small window which irqbalance daemon kicks in during device >> reset. So it took about over 6 hours to recreate the issue when doing >> remove/add loop operation. > > Sure. You need a loop to hit the window. And it does not matter whether > it's the probe or the remove which triggers it. Fact is that the reset > wipes out the config space, which means that any read from the config > space between reset and restore will return garbage. That problem is not > restricted to the interrupt code. It's a general problem. After looking at the code again, it's a problem in the remove() function: __ipr_remove() ipr_initiate_ioa_bringdown() // resets device restore_config_space() .... ipr_free_all_resources() free_irqs() So yes, it's not probe(). But the question is pretty much the same. Why is a reset issued while the driver is fully operational and resources are still in use? Don't even think about telling me that this is a problem of the MSI interrupt rework. It is not. It's been broken forever. You _cannot_ pull the rung under a fully operational driver and expect that all involved parts will just magically handle this gracefully. What about tearing down resources first and then issuing the reset? Thanks, tglx ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH 2025-03-20 8:48 ` Thomas Gleixner @ 2025-03-27 21:36 ` Wen Xiong 2025-03-28 11:27 ` Thomas Gleixner 0 siblings, 1 reply; 6+ messages in thread From: Wen Xiong @ 2025-03-27 21:36 UTC (permalink / raw) To: Thomas Gleixner Cc: linux-kernel, gjoyce, linux-pci, Bjorn Helgaas, linux-scsi > What about tearing down resources first and then issuing the reset? This SAS adapter supports dual controller configuration. Normally we have two adapters in a system. We config one of them as Primary adapter and another one as Secondary adapter. When doing remove operation on primary adapter, the Secondary adapter is going to be failover and config as primary by adapter firmware. During failover process, adapter firmware requests the secondary adapter reset, then sets it as primary adapter. Secondary adapter failover triggers adapter reset(ipr_reset_get_unit_check_job()). [ 940.742698] ipr 0206:a0:00.0: 9070: IOA requested reset -> FW requested [ 940.742733] ipr 0206:a0:00.0: Adapter to Adapter Link Failed Due to SAS Fabric Change [PRC: 17101C25] [ 940.742768] ipr 0206:a0:00.0: Remote IOA VPID/SN: IBM 57B4001SISIOA 00458021 When secondary adapter doing a reset, we use the same code path as removing operation. We can’t free irqs for Secondary adapter since kernel has assigned the irqs for Secondary adapter. Actually we discussed about "calling pci_free_irq_vectors()" before doing bist reset when we trying to fix in device driver. That might cause other problems. It is also not what a user would expect. For example, if they disabled irq balance and manually setup irq binding and affinity, if we go and free and reallocate the interrupts across a reset, this would wipe out those changes, which would not be expected. Thanks, Wendy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH 2025-03-27 21:36 ` Wen Xiong @ 2025-03-28 11:27 ` Thomas Gleixner 2025-04-01 20:14 ` Wen Xiong 0 siblings, 1 reply; 6+ messages in thread From: Thomas Gleixner @ 2025-03-28 11:27 UTC (permalink / raw) To: Wen Xiong; +Cc: linux-kernel, gjoyce, linux-pci, Bjorn Helgaas, linux-scsi On Thu, Mar 27 2025 at 16:36, Wen Xiong wrote: > When secondary adapter doing a reset, we use the same code path as > removing operation. We can’t free irqs for Secondary adapter since > kernel has assigned the irqs for Secondary adapter. > > Actually we discussed about "calling pci_free_irq_vectors()" before > doing bist reset when we trying to fix in device driver. > That might cause other problems. It is also not what a user would > expect. For example, if they disabled irq balance and manually setup irq > binding and affinity, if we go and free and reallocate the interrupts > across a reset, this would wipe out those changes, which would not be > expected. You are completely missing the point. This is not a problem restricted to PCI/MSI interrupts. You have interrupts, work, queues and whatever in fully operational state. Then you tell the firmware to reset the adapter and swap the secondary adapter in. This reset operation brings the hardware temporary into an undefined state as you have observed. How is any part of the kernel, which can access and operate on this adapter, supposed to deal with that correctly? You are now focussing solely on papering over the symptom in PCI/MSI because that's where you actually observed the fallout. But as you know curing the symptom is not fixing anything because the underlying root cause still persists and will roar its ugly head at some other place sooner than later. So no, instead of sprinkling horrible band-aids all over the place, you have to sit down and think about a properly orchestrated mechanism for this failover scenario. I understand that you want to preserve the state of the secondary adapter including interrupt numbers and related settings, but for that you have to properly freeze _all_ related kernel resources first, then let the firmware do the swap and once that's complete unfreeze everything again. That will need support in other subsystems, like the interrupt layer, but that needs to be properly designed and integrated so that it works with all possible PCI/MSI backend implementations and under all circumstances. Setting affinity is not the only way to make this go wrong, there are other operations which access the underlying hardware and config space. You just haven't triggered them yet, but they exist. And it's not a problem restricted to the pSeries MSI backend implementation either. That's just one way to expose it. Even if I put the secondary swap problem aside and just look at a single instance, __ipr_remove() is already broken in itself. As I pointed out to you before: __ipr_remove() ipr_initiate_ioa_bringdown() // resets device -> undefined state restore_config_space() .... ipr_free_all_resources() free_irqs() Up to the point where interrupts are freed, they are fully operational and exposed to user space. The related functionality can hit the undefined state not only via set_affinity() independent of the swap problem. The adapter swap is just a worse variant of the above because it affects the secondary side behind its back. This needs quite some work to resolve properly, but that's the only way to fix it once and forever. Proliferating the 'Plug and Pray' approach with a lot of handwaving is just helping you to close your ticket, but it's not a solution. Thanks, tglx ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH 2025-03-28 11:27 ` Thomas Gleixner @ 2025-04-01 20:14 ` Wen Xiong 2025-04-02 8:33 ` Thomas Gleixner 0 siblings, 1 reply; 6+ messages in thread From: Wen Xiong @ 2025-04-01 20:14 UTC (permalink / raw) To: Thomas Gleixner Cc: linux-kernel, gjoyce, linux-pci, Bjorn Helgaas, linux-scsi On 2025-03-28 06:27, Thomas Gleixner wrote: > You are completely missing the point. This is not a problem restricted > to PCI/MSI interrupts. > Thanks for all suggestions! I will investigate more and check if we can fix it in device driver. Looks several device drivers in kernel use two kernel APIs to set/clean IRQ_NO_BALACING status for a given irq. irq_set_status_flags(irq, IRQ_NO_BALANCING); irq_clear_status_flags(irq, IRQ_NO_BALANCING); With these two APIs, device driver can decide if using kernel irq balance or setting irq affinity by driver itself? So we can set/clear IRQ_NO_BALANCING during reset. Thanks, Wendy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH 2025-04-01 20:14 ` Wen Xiong @ 2025-04-02 8:33 ` Thomas Gleixner 0 siblings, 0 replies; 6+ messages in thread From: Thomas Gleixner @ 2025-04-02 8:33 UTC (permalink / raw) To: Wen Xiong; +Cc: linux-kernel, gjoyce, linux-pci, Bjorn Helgaas, linux-scsi On Tue, Apr 01 2025 at 15:14, Wen Xiong wrote: > On 2025-03-28 06:27, Thomas Gleixner wrote: > >> You are completely missing the point. This is not a problem restricted >> to PCI/MSI interrupts. >> > Thanks for all suggestions! I will investigate more and check if we can > fix it in device driver. > > Looks several device drivers in kernel use two kernel APIs to set/clean > IRQ_NO_BALACING status for a given irq. > > irq_set_status_flags(irq, IRQ_NO_BALANCING); > irq_clear_status_flags(irq, IRQ_NO_BALANCING); > > With these two APIs, device driver can decide if using kernel irq > balance or setting irq affinity by driver itself? > > So we can set/clear IRQ_NO_BALANCING during reset. That's not sufficient. You are papering over the symptom. I explained you that affinity setting is not the only way to wreckage this. But feel free to ignore me and apply your bandaid fix as you see fit. Thanks, tglx ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-04-02 8:33 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1742386474-13717-1-git-send-email-wenxiong@linux.ibm.com>
[not found] ` <87a59h3sk9.ffs@tglx>
[not found] ` <90777da90abe02c87d30968bfedc9168@linux.ibm.com>
2025-03-20 8:23 ` [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH Thomas Gleixner
2025-03-20 8:48 ` Thomas Gleixner
2025-03-27 21:36 ` Wen Xiong
2025-03-28 11:27 ` Thomas Gleixner
2025-04-01 20:14 ` Wen Xiong
2025-04-02 8:33 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).