* [RFC PATCH] riscv: Fix PCI warning by enabling PCI_MSI_ARCH_FALLBACKS @ 2024-12-13 11:57 Alexandre Ghiti 2024-12-13 13:12 ` Thomas Gleixner 0 siblings, 1 reply; 12+ messages in thread From: Alexandre Ghiti @ 2024-12-13 11:57 UTC (permalink / raw) To: Paul Walmsley, Palmer Dabbelt, Anup Patel, Sunil V L, Thomas Gleixner, linux-riscv, linux-kernel Cc: Alexandre Ghiti When the interrupt controller is not using the IMSIC and ACPI is enabled, the following warning appears: [ 0.866401] WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x2c/0x32 [ 0.867071] Modules linked in: [ 0.867389] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc2-00001-g795582ce7e24-dirty #44 [ 0.867538] Hardware name: QEMU QEMU Virtual Machine, BIOS [ 0.867672] epc : pci_msi_setup_msi_irqs+0x2c/0x32 [ 0.867738] ra : __pci_enable_msix_range+0x30c/0x596 [ 0.867783] epc : ffffffff8050af80 ra : ffffffff8050a66e sp : ff20000000023750 [ 0.867809] gp : ffffffff815153b0 tp : ff60000080108000 t0 : ff60000081109600 [ 0.867833] t1 : 0000000000000228 t2 : 0000000000000004 s0 : ff20000000023860 [ 0.867857] s1 : ff60000080de1000 a0 : ff60000080de1000 a1 : 0000000000000005 [ 0.867880] a2 : 0000000000000011 a3 : 0000000000000000 a4 : 0000000000000000 [ 0.867902] a5 : 0000000000000000 a6 : ff600000806368f0 a7 : fffffffffffffff0 [ 0.867925] s2 : 0000000000000005 s3 : ffffffffffffffff s4 : 0000000000000000 [ 0.867948] s5 : ff60000080de10c0 s6 : 0000000000000005 s7 : 0000000000000005 [ 0.867970] s8 : ff20000000023a08 s9 : ff600000811093c0 s10: 000000000000002c [ 0.867993] s11: ff60000081109410 t3 : 0000000000000001 t4 : ff600000803a2878 [ 0.868014] t5 : 0000000000000004 t6 : ff60000080357450 [ 0.868036] status: 0000000200000120 badaddr: ffffffff8050af80 cause: 0000000000000003 [ 0.868186] [<ffffffff8050af80>] pci_msi_setup_msi_irqs+0x2c/0x32 [ 0.868339] [<ffffffff80509172>] pci_alloc_irq_vectors_affinity+0xb8/0xe2 [ 0.868362] [<ffffffff8059d62c>] vp_find_vqs_msix+0x12a/0x370 [ 0.868385] [<ffffffff8059d8a0>] vp_find_vqs+0x2e/0x1de [ 0.868402] [<ffffffff8059bd80>] vp_modern_find_vqs+0x12/0x4e [ 0.868425] [<ffffffff80624a50>] init_vq+0x2b4/0x336 [ 0.868448] [<ffffffff80624c36>] virtblk_probe+0xd4/0x90e [ 0.868469] [<ffffffff80594e02>] virtio_dev_probe+0x14a/0x1e6 [ 0.868488] [<ffffffff805fe04c>] really_probe+0x86/0x234 [ 0.868509] [<ffffffff805fe256>] __driver_probe_device+0x5c/0xda [ 0.868529] [<ffffffff805fe392>] driver_probe_device+0x2c/0xb2 [ 0.868549] [<ffffffff805fe512>] __driver_attach+0x6c/0x11a [ 0.868569] [<ffffffff805fc17e>] bus_for_each_dev+0x60/0xae [ 0.868588] [<ffffffff805fda7c>] driver_attach+0x1a/0x22 [ 0.868607] [<ffffffff805fd398>] bus_add_driver+0xce/0x1d6 [ 0.868627] [<ffffffff805ff0b2>] driver_register+0x3e/0xd8 [ 0.868647] [<ffffffff80594614>] __register_virtio_driver+0x1e/0x2c [ 0.868694] [<ffffffff80a31b82>] virtio_blk_init+0x6a/0x9e [ 0.868733] [<ffffffff8000f128>] do_one_initcall+0x58/0x194 [ 0.868755] [<ffffffff80a011b0>] kernel_init_freeable+0x224/0x28e [ 0.868775] [<ffffffff809e4e48>] kernel_init+0x1e/0x13a [ 0.868795] [<ffffffff809ed952>] ret_from_fork+0xe/0x18 So enable PCI_MSI_ARCH_FALLBACKS to get rid of this. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> --- This is an RFC as I'm really not sure this is the right fix, Anup/Sunil/Thomas if you have any idea, please step in! Thanks arch/riscv/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index d4a7ca0388c0..40d51feac2bb 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -199,6 +199,7 @@ config RISCV select PCI_DOMAINS_GENERIC if PCI select PCI_ECAM if (ACPI && PCI) select PCI_MSI if PCI + select PCI_MSI_ARCH_FALLBACKS if PCI select RISCV_ALTERNATIVE if !XIP_KERNEL select RISCV_APLIC select RISCV_IMSIC -- 2.39.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] riscv: Fix PCI warning by enabling PCI_MSI_ARCH_FALLBACKS 2024-12-13 11:57 [RFC PATCH] riscv: Fix PCI warning by enabling PCI_MSI_ARCH_FALLBACKS Alexandre Ghiti @ 2024-12-13 13:12 ` Thomas Gleixner 2024-12-13 13:51 ` Alexandre Ghiti 0 siblings, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2024-12-13 13:12 UTC (permalink / raw) To: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Anup Patel, Sunil V L, linux-riscv, linux-kernel Cc: Alexandre Ghiti On Fri, Dec 13 2024 at 12:57, Alexandre Ghiti wrote: > When the interrupt controller is not using the IMSIC and ACPI is enabled, > the following warning appears: > > [ 0.866401] WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x2c/0x32 > [ 0.867071] Modules linked in: > [ 0.867389] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc2-00001-g795582ce7e24-dirty #44 > [ 0.867538] Hardware name: QEMU QEMU Virtual Machine, BIOS > [ 0.867672] epc : pci_msi_setup_msi_irqs+0x2c/0x32 > [ 0.867738] ra : __pci_enable_msix_range+0x30c/0x596 Removing a ton of badly formatted stack trace: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages > > So enable PCI_MSI_ARCH_FALLBACKS to get rid of this. No. PCI_MSI_ARCH_FALLBACKS is really only meant for architectures which implement the legacy fallbacks and not to paper over the underlying logic bug in the pci/msi code. Of course the loongson folks ran into the same problem two years ago and went for the sloppy fix without talking to anyone... Thanks for bringing it up instead of silently slapping it into the RISCV tree ! The uncompiled patch below should fix this for real. Thanks, tglx --- --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -185,7 +185,6 @@ config LOONGARCH select PCI_DOMAINS_GENERIC select PCI_ECAM if ACPI select PCI_LOONGSON - select PCI_MSI_ARCH_FALLBACKS select PCI_QUIRKS select PERF_USE_VMALLOC select RTC_LIB --- a/drivers/pci/msi/irqdomain.c +++ b/drivers/pci/msi/irqdomain.c @@ -350,8 +350,11 @@ bool pci_msi_domain_supports(struct pci_ domain = dev_get_msi_domain(&pdev->dev); - if (!domain || !irq_domain_is_hierarchy(domain)) - return mode == ALLOW_LEGACY; + if (!domain || !irq_domain_is_hierarchy(domain)) { + if (IS_ENABLED(CONFIG_PCI_MSI_ARCH_FALLBACKS)) + return mode == ALLOW_LEGACY; + return false; + } if (!irq_domain_is_msi_parent(domain)) { /* --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -442,6 +442,10 @@ int __pci_enable_msi_range(struct pci_de if (nvec > maxvec) nvec = maxvec; + /* Test for the availability of MSI support */ + if (!pci_msi_domain_supports(dev, 0, ALLOW_LEGACY)) + return -ENOTSUPP; + rc = pci_setup_msi_context(dev); if (rc) return rc; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] riscv: Fix PCI warning by enabling PCI_MSI_ARCH_FALLBACKS 2024-12-13 13:12 ` Thomas Gleixner @ 2024-12-13 13:51 ` Alexandre Ghiti 2024-12-14 11:50 ` [Patch] PCI/MSI: Handle lack of irqdomain gracefully Thomas Gleixner 0 siblings, 1 reply; 12+ messages in thread From: Alexandre Ghiti @ 2024-12-13 13:51 UTC (permalink / raw) To: Thomas Gleixner Cc: Paul Walmsley, Palmer Dabbelt, Anup Patel, Sunil V L, linux-riscv, linux-kernel Hi Thomas, On Fri, Dec 13, 2024 at 2:12 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Fri, Dec 13 2024 at 12:57, Alexandre Ghiti wrote: > > When the interrupt controller is not using the IMSIC and ACPI is enabled, > > the following warning appears: > > > > [ 0.866401] WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x2c/0x32 > > [ 0.867071] Modules linked in: > > [ 0.867389] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc2-00001-g795582ce7e24-dirty #44 > > [ 0.867538] Hardware name: QEMU QEMU Virtual Machine, BIOS > > [ 0.867672] epc : pci_msi_setup_msi_irqs+0x2c/0x32 > > [ 0.867738] ra : __pci_enable_msix_range+0x30c/0x596 > > Removing a ton of badly formatted stack trace: > > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages Thanks for the pointer. > > > > > So enable PCI_MSI_ARCH_FALLBACKS to get rid of this. > > No. PCI_MSI_ARCH_FALLBACKS is really only meant for architectures which > implement the legacy fallbacks and not to paper over the underlying > logic bug in the pci/msi code. Of course the loongson folks ran into the > same problem two years ago and went for the sloppy fix without talking > to anyone... > > Thanks for bringing it up instead of silently slapping it into the RISCV > tree ! > > The uncompiled patch below should fix this for real. It does, when applied the warning disappears (on riscv at least). You can add: Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Thanks for your quick answer! Alex > > Thanks, > > tglx > --- > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -185,7 +185,6 @@ config LOONGARCH > select PCI_DOMAINS_GENERIC > select PCI_ECAM if ACPI > select PCI_LOONGSON > - select PCI_MSI_ARCH_FALLBACKS > select PCI_QUIRKS > select PERF_USE_VMALLOC > select RTC_LIB > --- a/drivers/pci/msi/irqdomain.c > +++ b/drivers/pci/msi/irqdomain.c > @@ -350,8 +350,11 @@ bool pci_msi_domain_supports(struct pci_ > > domain = dev_get_msi_domain(&pdev->dev); > > - if (!domain || !irq_domain_is_hierarchy(domain)) > - return mode == ALLOW_LEGACY; > + if (!domain || !irq_domain_is_hierarchy(domain)) { > + if (IS_ENABLED(CONFIG_PCI_MSI_ARCH_FALLBACKS)) > + return mode == ALLOW_LEGACY; > + return false; > + } > > if (!irq_domain_is_msi_parent(domain)) { > /* > --- a/drivers/pci/msi/msi.c > +++ b/drivers/pci/msi/msi.c > @@ -442,6 +442,10 @@ int __pci_enable_msi_range(struct pci_de > if (nvec > maxvec) > nvec = maxvec; > > + /* Test for the availability of MSI support */ > + if (!pci_msi_domain_supports(dev, 0, ALLOW_LEGACY)) > + return -ENOTSUPP; > + > rc = pci_setup_msi_context(dev); > if (rc) > return rc; ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Patch] PCI/MSI: Handle lack of irqdomain gracefully 2024-12-13 13:51 ` Alexandre Ghiti @ 2024-12-14 11:50 ` Thomas Gleixner 2024-12-17 13:08 ` [tip: irq/urgent] " tip-bot2 for Thomas Gleixner ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Thomas Gleixner @ 2024-12-14 11:50 UTC (permalink / raw) To: Alexandre Ghiti Cc: Paul Walmsley, Palmer Dabbelt, Anup Patel, Sunil V L, linux-riscv, linux-kernel, Bjorn Helgaas Alexandre observed a warning emitted from pci_msi_setup_msi_irqs() on a RISCV platform which does not provide PCI/MSI support: WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x2c/0x32 __pci_enable_msix_range+0x30c/0x596 pci_msi_setup_msi_irqs+0x2c/0x32 pci_alloc_irq_vectors_affinity+0xb8/0xe2 RISCV uses hierarchical interrupt domains and correctly does not implement the legacy fallback. The warning triggers from the legacy fallback stub. That warning is bogus as the PCI/MSI layer knows whether a PCI/MSI parent domain is associated with the device or not. There is a check for MSI-X, which has a legacy assumption. But that legacy fallback assumption is only valid when legacy support is enabled, but otherwise the check should simply return -ENOTSUPP. Loongarch tripped over the same problem and blindly enabled legacy support without implementing the legacy fallbacks. There are weak implementations which return an error, so the problem was papered over. Correct pci_msi_domain_supports() to evaluate the legacy mode and add the missing supported check into the MSI enable path to complete it. Fixes: d2a463b29741 ("PCI/MSI: Reject multi-MSI early") Reported-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: stable@vger.kernel.org --- drivers/pci/msi/irqdomain.c | 7 +++++-- drivers/pci/msi/msi.c | 4 ++++ 2 files changed, 9 insertions(+), 2 deletions(-) --- a/drivers/pci/msi/irqdomain.c +++ b/drivers/pci/msi/irqdomain.c @@ -350,8 +350,11 @@ bool pci_msi_domain_supports(struct pci_ domain = dev_get_msi_domain(&pdev->dev); - if (!domain || !irq_domain_is_hierarchy(domain)) - return mode == ALLOW_LEGACY; + if (!domain || !irq_domain_is_hierarchy(domain)) { + if (IS_ENABLED(CONFIG_PCI_MSI_ARCH_FALLBACKS)) + return mode == ALLOW_LEGACY; + return false; + } if (!irq_domain_is_msi_parent(domain)) { /* --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -433,6 +433,10 @@ int __pci_enable_msi_range(struct pci_de if (WARN_ON_ONCE(dev->msi_enabled)) return -EINVAL; + /* Test for the availability of MSI support */ + if (!pci_msi_domain_supports(dev, 0, ALLOW_LEGACY)) + return -ENOTSUPP; + nvec = pci_msi_vec_count(dev); if (nvec < 0) return nvec; ^ permalink raw reply [flat|nested] 12+ messages in thread
* [tip: irq/urgent] PCI/MSI: Handle lack of irqdomain gracefully 2024-12-14 11:50 ` [Patch] PCI/MSI: Handle lack of irqdomain gracefully Thomas Gleixner @ 2024-12-17 13:08 ` tip-bot2 for Thomas Gleixner 2025-02-03 19:16 ` [Patch] " patchwork-bot+linux-riscv 2026-03-11 11:22 ` Uwe Kleine-König 2 siblings, 0 replies; 12+ messages in thread From: tip-bot2 for Thomas Gleixner @ 2024-12-17 13:08 UTC (permalink / raw) To: linux-tip-commits Cc: Alexandre Ghiti, Thomas Gleixner, stable, x86, linux-kernel, maz The following commit has been merged into the irq/urgent branch of tip: Commit-ID: a60b990798eb17433d0283788280422b1bd94b18 Gitweb: https://git.kernel.org/tip/a60b990798eb17433d0283788280422b1bd94b18 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Sat, 14 Dec 2024 12:50:18 +01:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Mon, 16 Dec 2024 10:59:47 +01:00 PCI/MSI: Handle lack of irqdomain gracefully Alexandre observed a warning emitted from pci_msi_setup_msi_irqs() on a RISCV platform which does not provide PCI/MSI support: WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x2c/0x32 __pci_enable_msix_range+0x30c/0x596 pci_msi_setup_msi_irqs+0x2c/0x32 pci_alloc_irq_vectors_affinity+0xb8/0xe2 RISCV uses hierarchical interrupt domains and correctly does not implement the legacy fallback. The warning triggers from the legacy fallback stub. That warning is bogus as the PCI/MSI layer knows whether a PCI/MSI parent domain is associated with the device or not. There is a check for MSI-X, which has a legacy assumption. But that legacy fallback assumption is only valid when legacy support is enabled, but otherwise the check should simply return -ENOTSUPP. Loongarch tripped over the same problem and blindly enabled legacy support without implementing the legacy fallbacks. There are weak implementations which return an error, so the problem was papered over. Correct pci_msi_domain_supports() to evaluate the legacy mode and add the missing supported check into the MSI enable path to complete it. Fixes: d2a463b29741 ("PCI/MSI: Reject multi-MSI early") Reported-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/87ed2a8ow5.ffs@tglx --- drivers/pci/msi/irqdomain.c | 7 +++++-- drivers/pci/msi/msi.c | 4 ++++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c index 5691257..d7ba879 100644 --- a/drivers/pci/msi/irqdomain.c +++ b/drivers/pci/msi/irqdomain.c @@ -350,8 +350,11 @@ bool pci_msi_domain_supports(struct pci_dev *pdev, unsigned int feature_mask, domain = dev_get_msi_domain(&pdev->dev); - if (!domain || !irq_domain_is_hierarchy(domain)) - return mode == ALLOW_LEGACY; + if (!domain || !irq_domain_is_hierarchy(domain)) { + if (IS_ENABLED(CONFIG_PCI_MSI_ARCH_FALLBACKS)) + return mode == ALLOW_LEGACY; + return false; + } if (!irq_domain_is_msi_parent(domain)) { /* diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c index 3a45879..2f647ca 100644 --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -433,6 +433,10 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, if (WARN_ON_ONCE(dev->msi_enabled)) return -EINVAL; + /* Test for the availability of MSI support */ + if (!pci_msi_domain_supports(dev, 0, ALLOW_LEGACY)) + return -ENOTSUPP; + nvec = pci_msi_vec_count(dev); if (nvec < 0) return nvec; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Patch] PCI/MSI: Handle lack of irqdomain gracefully 2024-12-14 11:50 ` [Patch] PCI/MSI: Handle lack of irqdomain gracefully Thomas Gleixner 2024-12-17 13:08 ` [tip: irq/urgent] " tip-bot2 for Thomas Gleixner @ 2025-02-03 19:16 ` patchwork-bot+linux-riscv 2026-03-11 11:22 ` Uwe Kleine-König 2 siblings, 0 replies; 12+ messages in thread From: patchwork-bot+linux-riscv @ 2025-02-03 19:16 UTC (permalink / raw) To: Thomas Gleixner Cc: linux-riscv, alexghiti, anup, linux-kernel, palmer, paul.walmsley, helgaas Hello: This patch was applied to riscv/linux.git (fixes) by Thomas Gleixner <tglx@linutronix.de>: On Sat, 14 Dec 2024 12:50:18 +0100 you wrote: > Alexandre observed a warning emitted from pci_msi_setup_msi_irqs() on a > RISCV platform which does not provide PCI/MSI support: > > WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x2c/0x32 > __pci_enable_msix_range+0x30c/0x596 > pci_msi_setup_msi_irqs+0x2c/0x32 > pci_alloc_irq_vectors_affinity+0xb8/0xe2 > > [...] Here is the summary with links: - PCI/MSI: Handle lack of irqdomain gracefully https://git.kernel.org/riscv/c/a60b990798eb You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch] PCI/MSI: Handle lack of irqdomain gracefully 2024-12-14 11:50 ` [Patch] PCI/MSI: Handle lack of irqdomain gracefully Thomas Gleixner 2024-12-17 13:08 ` [tip: irq/urgent] " tip-bot2 for Thomas Gleixner 2025-02-03 19:16 ` [Patch] " patchwork-bot+linux-riscv @ 2026-03-11 11:22 ` Uwe Kleine-König 2026-04-22 8:50 ` Thorsten Leemhuis 2026-04-22 21:22 ` Thomas Gleixner 2 siblings, 2 replies; 12+ messages in thread From: Uwe Kleine-König @ 2026-03-11 11:22 UTC (permalink / raw) To: Thomas Gleixner Cc: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Anup Patel, Sunil V L, linux-riscv, linux-kernel, Bjorn Helgaas, 1127635, Aaron D. Johnson, regressions [-- Attachment #1: Type: text/plain, Size: 12425 bytes --] Control: forwarded -1 https://lore.kernel.org/lkml/abE_QoS5DM-ZltaV@monoceros #regzbot introduced: a60b990798eb17433d0283788280422b1bd94b18 #regzbot from: "Aaron D. Johnson" <debbugreporter@fnord.greeley.co.us> #regzbot monitor: https://bugs.debian.org/1127635 Hello, On Sat, Dec 14, 2024 at 12:50:18PM +0100, Thomas Gleixner wrote: > Alexandre observed a warning emitted from pci_msi_setup_msi_irqs() on a > RISCV platform which does not provide PCI/MSI support: > > WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x2c/0x32 > __pci_enable_msix_range+0x30c/0x596 > pci_msi_setup_msi_irqs+0x2c/0x32 > pci_alloc_irq_vectors_affinity+0xb8/0xe2 > > RISCV uses hierarchical interrupt domains and correctly does not implement > the legacy fallback. The warning triggers from the legacy fallback stub. > > That warning is bogus as the PCI/MSI layer knows whether a PCI/MSI parent > domain is associated with the device or not. There is a check for MSI-X, > which has a legacy assumption. But that legacy fallback assumption is only > valid when legacy support is enabled, but otherwise the check should simply > return -ENOTSUPP. > > Loongarch tripped over the same problem and blindly enabled legacy support > without implementing the legacy fallbacks. There are weak implementations > which return an error, so the problem was papered over. > > Correct pci_msi_domain_supports() to evaluate the legacy mode and add > the missing supported check into the MSI enable path to complete it. > > Fixes: d2a463b29741 ("PCI/MSI: Reject multi-MSI early") > Reported-by: Alexandre Ghiti <alexghiti@rivosinc.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> > Cc: stable@vger.kernel.org this patch became a60b990798eb17433d0283788280422b1bd94b18 in v6.13-rc5 and was backported to 6.12.y and 6.6.y (aed157301c65 and b1f7476e07b9 respectively). A Debian user (Aaron, on Cc:) on powerpc has boot problems and bisected them to this commit. The relevant boot log of the failure is: [ 2.643879] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 2.643891] Faulting instruction address: 0xc000000000a39514 [ 2.643902] Oops: Kernel access of bad area, sig: 11 [#1] [ 2.643909] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 2.643920] Modules linked in: ohci_pci(+) ehci_hcd nvme_fabrics ohci_hcd nvme_keyring nvme_core usbcore nvme_auth scsi_transport_fc ipr configfs ehea(+) usb_common [ 2.643965] CPU: 5 UID: 0 PID: 250 Comm: (udev-worker) Not tainted 6.12.17-powerpc64 #1 Debian 6.12.17-1 [ 2.643976] Hardware name: IBM,8204-E8A POWER6 (architected) 0x3e0302 0xf000002 of:IBM,EL350_118 hv:phyp pSeries [ 2.643986] NIP: c000000000a39514 LR: c000000000a36ed8 CTR: c000000000a35820 [ 2.643995] REGS: c0000000351f6f60 TRAP: 0300 Not tainted (6.12.17-powerpc64 Debian 6.12.17-1) [ 2.644004] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 24222288 XER: 00000000 [ 2.644031] CFAR: c00000000000cfc4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 [ 2.644031] GPR00: c000000000a36ed8 c0000000351f7200 c00000000182e200 c0000003df294000 [ 2.644031] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 2.644031] GPR08: 0000000000000001 0000000000000000 c00000000228fcc0 0000000044222288 [ 2.644031] GPR12: c000000000a35820 c00000000eeacb00 0000000000000020 0000010037fcab20 [ 2.644031] GPR16: 0000000022222248 0000000000020000 0000000000000000 00003fffebe8bb80 [ 2.644031] GPR20: 0000000000000000 c00000000204db60 c00000000204dd60 c00000000b1ae780 [ 2.644031] GPR24: 0000000000000000 00003fff8c9ac758 0000000000000000 c0000003df294000 [ 2.644031] GPR28: 0000000000000001 0000000000000000 c0000003df294000 0000000000000001 [ 2.644164] NIP [c000000000a39514] pci_msi_domain_supports (drivers/pci/msi/irqdomain.c:366) [ 2.644181] LR [c000000000a36ed8] __pci_enable_msi_range (drivers/pci/msi/msi.c:437) [ 2.644192] Call Trace: [ 2.644197] [c0000000351f7200] [c0000000351f7304] 0xc0000000351f7304 (unreliable) [ 2.644211] [c0000000351f7340] [c000000000a3578c] pci_alloc_irq_vectors_affinity (drivers/pci/msi/api.c:277) [ 2.644225] [c0000000351f73d0] [c0003d0007d2f4d4] usb_hcd_pci_probe (drivers/usb/core/hcd-pci.c:192) usbcore [ 2.644246] [c0000000351f7470] [c0003d00084e6030] ohci_pci_probe (drivers/usb/host/ohci-pci.c:285) ohci_pci [ 2.644260] [c0000000351f7490] [c000000000a260e8] local_pci_probe (drivers/pci/pci-driver.c:324) [ 2.644274] [c0000000351f7510] [c000000000a26218] pci_call_probe (drivers/pci/pci-driver.c:392 (discriminator 1)) [ 2.644287] [c0000000351f7670] [c000000000a27348] pci_device_probe (drivers/pci/pci-driver.c:452) [ 2.644300] [c0000000351f76b0] [c000000000b2e658] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658) [ 2.644314] [c0000000351f7740] [c000000000b2eb24] __driver_probe_device (drivers/base/dd.c:800) [ 2.644327] [c0000000351f77c0] [c000000000b2edc4] driver_probe_device (drivers/base/dd.c:831) [ 2.644340] [c0000000351f7800] [c000000000b2f188] __driver_attach (drivers/base/dd.c:1217) [ 2.644352] [c0000000351f7880] [c000000000b2ac64] bus_for_each_dev (drivers/base/bus.c:370) [ 2.644365] [c0000000351f78e0] [c000000000b2dac4] driver_attach (drivers/base/dd.c:1234) [ 2.644377] [c0000000351f7900] [c000000000b2cd98] bus_add_driver (drivers/base/bus.c:675) [ 2.644389] [c0000000351f7990] [c000000000b30ae4] driver_register (drivers/base/driver.c:246) [ 2.644402] [c0000000351f7a00] [c000000000a24f88] __pci_register_driver (drivers/pci/pci-driver.c:1450) [ 2.644415] [c0000000351f7a20] [c0003d00084e6800] ohci_pci_init (drivers/usb/host/ohci-pci.c:308) ohci_pci [ 2.644429] [c0000000351f7a50] [c00000000000fd60] do_one_initcall (init/main.c:1269) [ 2.644444] [c0000000351f7b30] [c0000000002760f8] do_init_module (kernel/module/main.c:2543) [ 2.644460] [c0000000351f7bb0] [c000000000278fe4] init_module_from_file (kernel/module/main.c:3199) [ 2.644473] [c0000000351f7c90] [c0000000002793e0] sys_finit_module (kernel/module/main.c:3211 kernel/module/main.c:3238 kernel/module/main.c:3221) [ 2.644487] [c0000000351f7da0] [c00000000002c084] system_call_exception (arch/powerpc/kernel/syscall.c:171) [ 2.644500] [c0000000351f7e50] [c00000000000cb54] system_call_common (arch/powerpc/kernel/interrupt_64.S:292) [ 2.644515] --- interrupt: c00 at 0x3fff8d653d8c [ 2.644522] NIP: 00003fff8d653d8c LR: 00003fff8c9a4680 CTR: 0000000000000000 [ 2.644531] REGS: c0000000351f7e80 TRAP: 0c00 Not tainted (6.12.17-powerpc64 Debian 6.12.17-1) [ 2.644541] MSR: 800000000200f032 <SF,VEC,EE,PR,FP,ME,IR,DR,RI> CR: 22222222 XER: 00000000 [ 2.644573] IRQMASK: 0 [ 2.644573] GPR00: 0000000000000161 00003fffebe8b640 00003fff8d757100 0000000000000052 [ 2.644573] GPR04: 00003fff8c9ac758 0000000000000004 0000000000000058 000000000000005a [ 2.644573] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 2.644573] GPR12: 0000000000000000 00003fff8de947c0 0000000000000020 0000010037fcab20 [ 2.644573] GPR16: 0000000022222248 0000000000020000 0000000000000000 00003fffebe8bb80 [ 2.644573] GPR20: 0000000000000000 00003fffebe8bb70 0000000000000007 0000010037fca210 [ 2.644573] GPR24: 0000000000000000 0000000000000000 0000010037f6be40 0000000000000004 [ 2.644573] GPR28: 00003fff8c9ac758 0000000000020000 0000000000000004 0000010037fca210 [ 2.644698] NIP [00003fff8d653d8c] 0x3fff8d653d8c [ 2.644705] LR [00003fff8c9a4680] 0x3fff8c9a4680 [ 2.644713] --- interrupt: c00 [ 2.644719] Code: 4182002c e92a0088 80690000 7c632038 7c632278 7c630034 5463d97e 786307e0 4e800020 60000000 60000000 e92a0020 <80690000> 4bffffd8 60000000 7ca50034 All code ======== 0:* 41 82 00 2c beq 0x2c <-- trapping instruction 4: e9 2a 00 88 ld r9,136(r10) 8: 80 69 00 00 lwz r3,0(r9) c: 7c 63 20 38 and r3,r3,r4 10: 7c 63 22 78 xor r3,r3,r4 14: 7c 63 00 34 cntlzw r3,r3 18: 54 63 d9 7e srwi r3,r3,5 1c: 78 63 07 e0 clrldi r3,r3,63 20: 4e 80 00 20 blr 24: 60 00 00 00 nop 28: 60 00 00 00 nop 2c: e9 2a 00 20 ld r9,32(r10) 30: 80 69 00 00 lwz r3,0(r9) 34: 4b ff ff d8 b 0xc 38: 60 00 00 00 nop 3c: 7c a5 00 34 cntlzw r5,r5 Code starting with the faulting instruction =========================================== 0: 80 69 00 00 lwz r3,0(r9) 4: 4b ff ff d8 b 0xffffffffffffffdc 8: 60 00 00 00 nop c: 7c a5 00 34 cntlzw r5,r5 [ 2.644769] ---[ end trace 0000000000000000 ]--- (That's the bug splat from the bug report piped through scripts/decode_stacktrace.sh) The kernel has CONFIG_PCI_MSI_ARCH_FALLBACKS=y, so the first hunk shouldn't change anything. The disassembly of pci_msi_domain_supports in the kernel looks as follows: c000000000a394c0 <pci_msi_domain_supports>: pci_msi_domain_supports(): debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:334 c000000000a394c0: 60 00 00 00 nop c000000000a394c4: 60 00 00 00 nop debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:353 c000000000a394c8: e9 43 02 e8 ld r10,744(r3) c000000000a394cc: 2c 2a 00 00 cmpdi r10,0 c000000000a394d0: 41 82 00 50 beq c000000000a39520 <pci_msi_domain_supports+0x60> irq_domain_is_hierarchy(): debian/build/build_powerpc_none_powerpc64/include/linux/irqdomain.h:661 c000000000a394d4: 81 2a 00 28 lwz r9,40(r10) pci_msi_domain_supports(): debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:353 (discriminator 1) c000000000a394d8: 71 28 00 01 andi. r8,r9,1 c000000000a394dc: 41 82 00 44 beq c000000000a39520 <pci_msi_domain_supports+0x60> debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:359 (discriminator 1) c000000000a394e0: 71 29 01 00 andi. r9,r9,256 c000000000a394e4: 41 82 00 2c beq c000000000a39510 <pci_msi_domain_supports+0x50> debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:375 c000000000a394e8: e9 2a 00 88 ld r9,136(r10) c000000000a394ec: 80 69 00 00 lwz r3,0(r9) debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:378 c000000000a394f0: 7c 63 20 38 and r3,r3,r4 c000000000a394f4: 7c 63 22 78 xor r3,r3,r4 c000000000a394f8: 7c 63 00 34 cntlzw r3,r3 c000000000a394fc: 54 63 d9 7e srwi r3,r3,5 debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:379 c000000000a39500: 78 63 07 e0 clrldi r3,r3,63 c000000000a39504: 4e 80 00 20 blr c000000000a39508: 60 00 00 00 nop c000000000a3950c: 60 00 00 00 nop debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:366 c000000000a39510: e9 2a 00 20 ld r9,32(r10) c000000000a39514: 80 69 00 00 lwz r3,0(r9) c000000000a39518: 4b ff ff d8 b c000000000a394f0 <pci_msi_domain_supports+0x30> c000000000a3951c: 60 00 00 00 nop debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:355 c000000000a39520: 7c a5 00 34 cntlzw r5,r5 c000000000a39524: 54 a3 d9 7e srwi r3,r5,5 debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:379 c000000000a39528: 78 63 07 e0 clrldi r3,r3,63 c000000000a3952c: 4e 80 00 20 blr so the trapping happens in drivers/pci/msi/irqdomain.c:366 which is: 365 info = domain->host_data; 366 supported = info->flags; According to the register dump domain == r10 == NULL, but then this code would not have been reached and the faulting instruction would be at c000000000a39510. So maybe it's only .host_data = NULL and the register dump is unreliable?? The offsets match: .host_data is at offset 32 of struct irq_domain and .flags is at offset 0 of struct msi_domain_info. For more details see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1127635 . Does someone spot the issue? Best regards Uwe [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch] PCI/MSI: Handle lack of irqdomain gracefully 2026-03-11 11:22 ` Uwe Kleine-König @ 2026-04-22 8:50 ` Thorsten Leemhuis 2026-04-22 15:07 ` Aaron D. Johnson 2026-04-22 19:52 ` Thomas Gleixner 2026-04-22 21:22 ` Thomas Gleixner 1 sibling, 2 replies; 12+ messages in thread From: Thorsten Leemhuis @ 2026-04-22 8:50 UTC (permalink / raw) To: Uwe Kleine-König, Thomas Gleixner Cc: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Anup Patel, Sunil V L, linux-riscv, linux-kernel, Bjorn Helgaas, 1127635, Aaron D. Johnson, regressions On 3/11/26 12:22, Uwe Kleine-König wrote: > Control: forwarded -1 https://lore.kernel.org/lkml/abE_QoS5DM-ZltaV@monoceros > > #regzbot introduced: a60b990798eb17433d0283788280422b1bd94b18 Thomas, in case you missed it: this is a change of yours: a60b990798eb17 ("PCI/MSI: Handle lack of irqdomain gracefully") [v6.13-rc5] > #regzbot from: "Aaron D. Johnson" <debbugreporter@fnord.greeley.co.us> > #regzbot monitor: https://bugs.debian.org/1127635 Thx for forwarding the regression. Nothing happened since then -- or am I missing something? If so: is that okay for everybody, or should we do anything about this? BTW, did anyone check if this happens with mainline (6.13/7.0) as well to rule out that this is something that only happenens in the stable series it was backported too? If it's the latter I wonder if reverting it there might be a easy way to resolve this. Ciao, Thorsten On 3/4/26 09:47, Alexander Stein wrote: > On Sat, Dec 14, 2024 at 12:50:18PM +0100, Thomas Gleixner wrote: >> Alexandre observed a warning emitted from pci_msi_setup_msi_irqs() on a >> RISCV platform which does not provide PCI/MSI support: >> >> WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x2c/0x32 >> __pci_enable_msix_range+0x30c/0x596 >> pci_msi_setup_msi_irqs+0x2c/0x32 >> pci_alloc_irq_vectors_affinity+0xb8/0xe2 >> >> RISCV uses hierarchical interrupt domains and correctly does not implement >> the legacy fallback. The warning triggers from the legacy fallback stub. >> >> That warning is bogus as the PCI/MSI layer knows whether a PCI/MSI parent >> domain is associated with the device or not. There is a check for MSI-X, >> which has a legacy assumption. But that legacy fallback assumption is only >> valid when legacy support is enabled, but otherwise the check should simply >> return -ENOTSUPP. >> >> Loongarch tripped over the same problem and blindly enabled legacy support >> without implementing the legacy fallbacks. There are weak implementations >> which return an error, so the problem was papered over. >> >> Correct pci_msi_domain_supports() to evaluate the legacy mode and add >> the missing supported check into the MSI enable path to complete it. >> >> Fixes: d2a463b29741 ("PCI/MSI: Reject multi-MSI early") >> Reported-by: Alexandre Ghiti <alexghiti@rivosinc.com> >> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> >> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> >> Cc: stable@vger.kernel.org > > this patch became a60b990798eb17433d0283788280422b1bd94b18 in v6.13-rc5 > and was backported to 6.12.y and 6.6.y (aed157301c65 and b1f7476e07b9 > respectively). > > A Debian user (Aaron, on Cc:) on powerpc has boot problems and bisected > them to this commit. The relevant boot log of the failure is: > > [ 2.643879] BUG: Kernel NULL pointer dereference on read at 0x00000000 > [ 2.643891] Faulting instruction address: 0xc000000000a39514 > [ 2.643902] Oops: Kernel access of bad area, sig: 11 [#1] > [ 2.643909] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > [ 2.643920] Modules linked in: ohci_pci(+) ehci_hcd nvme_fabrics ohci_hcd nvme_keyring nvme_core usbcore nvme_auth scsi_transport_fc ipr configfs ehea(+) usb_common > [ 2.643965] CPU: 5 UID: 0 PID: 250 Comm: (udev-worker) Not tainted 6.12.17-powerpc64 #1 Debian 6.12.17-1 > [ 2.643976] Hardware name: IBM,8204-E8A POWER6 (architected) 0x3e0302 0xf000002 of:IBM,EL350_118 hv:phyp pSeries > [ 2.643986] NIP: c000000000a39514 LR: c000000000a36ed8 CTR: c000000000a35820 > [ 2.643995] REGS: c0000000351f6f60 TRAP: 0300 Not tainted (6.12.17-powerpc64 Debian 6.12.17-1) > [ 2.644004] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 24222288 XER: 00000000 > [ 2.644031] CFAR: c00000000000cfc4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 > [ 2.644031] GPR00: c000000000a36ed8 c0000000351f7200 c00000000182e200 c0000003df294000 > [ 2.644031] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 2.644031] GPR08: 0000000000000001 0000000000000000 c00000000228fcc0 0000000044222288 > [ 2.644031] GPR12: c000000000a35820 c00000000eeacb00 0000000000000020 0000010037fcab20 > [ 2.644031] GPR16: 0000000022222248 0000000000020000 0000000000000000 00003fffebe8bb80 > [ 2.644031] GPR20: 0000000000000000 c00000000204db60 c00000000204dd60 c00000000b1ae780 > [ 2.644031] GPR24: 0000000000000000 00003fff8c9ac758 0000000000000000 c0000003df294000 > [ 2.644031] GPR28: 0000000000000001 0000000000000000 c0000003df294000 0000000000000001 > [ 2.644164] NIP [c000000000a39514] pci_msi_domain_supports (drivers/pci/msi/irqdomain.c:366) > [ 2.644181] LR [c000000000a36ed8] __pci_enable_msi_range (drivers/pci/msi/msi.c:437) > [ 2.644192] Call Trace: > [ 2.644197] [c0000000351f7200] [c0000000351f7304] 0xc0000000351f7304 (unreliable) > [ 2.644211] [c0000000351f7340] [c000000000a3578c] pci_alloc_irq_vectors_affinity (drivers/pci/msi/api.c:277) > [ 2.644225] [c0000000351f73d0] [c0003d0007d2f4d4] usb_hcd_pci_probe (drivers/usb/core/hcd-pci.c:192) usbcore > [ 2.644246] [c0000000351f7470] [c0003d00084e6030] ohci_pci_probe (drivers/usb/host/ohci-pci.c:285) ohci_pci > [ 2.644260] [c0000000351f7490] [c000000000a260e8] local_pci_probe (drivers/pci/pci-driver.c:324) > [ 2.644274] [c0000000351f7510] [c000000000a26218] pci_call_probe (drivers/pci/pci-driver.c:392 (discriminator 1)) > [ 2.644287] [c0000000351f7670] [c000000000a27348] pci_device_probe (drivers/pci/pci-driver.c:452) > [ 2.644300] [c0000000351f76b0] [c000000000b2e658] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658) > [ 2.644314] [c0000000351f7740] [c000000000b2eb24] __driver_probe_device (drivers/base/dd.c:800) > [ 2.644327] [c0000000351f77c0] [c000000000b2edc4] driver_probe_device (drivers/base/dd.c:831) > [ 2.644340] [c0000000351f7800] [c000000000b2f188] __driver_attach (drivers/base/dd.c:1217) > [ 2.644352] [c0000000351f7880] [c000000000b2ac64] bus_for_each_dev (drivers/base/bus.c:370) > [ 2.644365] [c0000000351f78e0] [c000000000b2dac4] driver_attach (drivers/base/dd.c:1234) > [ 2.644377] [c0000000351f7900] [c000000000b2cd98] bus_add_driver (drivers/base/bus.c:675) > [ 2.644389] [c0000000351f7990] [c000000000b30ae4] driver_register (drivers/base/driver.c:246) > [ 2.644402] [c0000000351f7a00] [c000000000a24f88] __pci_register_driver (drivers/pci/pci-driver.c:1450) > [ 2.644415] [c0000000351f7a20] [c0003d00084e6800] ohci_pci_init (drivers/usb/host/ohci-pci.c:308) ohci_pci > [ 2.644429] [c0000000351f7a50] [c00000000000fd60] do_one_initcall (init/main.c:1269) > [ 2.644444] [c0000000351f7b30] [c0000000002760f8] do_init_module (kernel/module/main.c:2543) > [ 2.644460] [c0000000351f7bb0] [c000000000278fe4] init_module_from_file (kernel/module/main.c:3199) > [ 2.644473] [c0000000351f7c90] [c0000000002793e0] sys_finit_module (kernel/module/main.c:3211 kernel/module/main.c:3238 kernel/module/main.c:3221) > [ 2.644487] [c0000000351f7da0] [c00000000002c084] system_call_exception (arch/powerpc/kernel/syscall.c:171) > [ 2.644500] [c0000000351f7e50] [c00000000000cb54] system_call_common (arch/powerpc/kernel/interrupt_64.S:292) > [ 2.644515] --- interrupt: c00 at 0x3fff8d653d8c > [ 2.644522] NIP: 00003fff8d653d8c LR: 00003fff8c9a4680 CTR: 0000000000000000 > [ 2.644531] REGS: c0000000351f7e80 TRAP: 0c00 Not tainted (6.12.17-powerpc64 Debian 6.12.17-1) > [ 2.644541] MSR: 800000000200f032 <SF,VEC,EE,PR,FP,ME,IR,DR,RI> CR: 22222222 XER: 00000000 > [ 2.644573] IRQMASK: 0 > [ 2.644573] GPR00: 0000000000000161 00003fffebe8b640 00003fff8d757100 0000000000000052 > [ 2.644573] GPR04: 00003fff8c9ac758 0000000000000004 0000000000000058 000000000000005a > [ 2.644573] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 2.644573] GPR12: 0000000000000000 00003fff8de947c0 0000000000000020 0000010037fcab20 > [ 2.644573] GPR16: 0000000022222248 0000000000020000 0000000000000000 00003fffebe8bb80 > [ 2.644573] GPR20: 0000000000000000 00003fffebe8bb70 0000000000000007 0000010037fca210 > [ 2.644573] GPR24: 0000000000000000 0000000000000000 0000010037f6be40 0000000000000004 > [ 2.644573] GPR28: 00003fff8c9ac758 0000000000020000 0000000000000004 0000010037fca210 > [ 2.644698] NIP [00003fff8d653d8c] 0x3fff8d653d8c > [ 2.644705] LR [00003fff8c9a4680] 0x3fff8c9a4680 > [ 2.644713] --- interrupt: c00 > [ 2.644719] Code: 4182002c e92a0088 80690000 7c632038 7c632278 7c630034 5463d97e 786307e0 4e800020 60000000 60000000 e92a0020 <80690000> 4bffffd8 60000000 7ca50034 > All code > ======== > 0:* 41 82 00 2c beq 0x2c <-- trapping instruction > 4: e9 2a 00 88 ld r9,136(r10) > 8: 80 69 00 00 lwz r3,0(r9) > c: 7c 63 20 38 and r3,r3,r4 > 10: 7c 63 22 78 xor r3,r3,r4 > 14: 7c 63 00 34 cntlzw r3,r3 > 18: 54 63 d9 7e srwi r3,r3,5 > 1c: 78 63 07 e0 clrldi r3,r3,63 > 20: 4e 80 00 20 blr > 24: 60 00 00 00 nop > 28: 60 00 00 00 nop > 2c: e9 2a 00 20 ld r9,32(r10) > 30: 80 69 00 00 lwz r3,0(r9) > 34: 4b ff ff d8 b 0xc > 38: 60 00 00 00 nop > 3c: 7c a5 00 34 cntlzw r5,r5 > > Code starting with the faulting instruction > =========================================== > 0: 80 69 00 00 lwz r3,0(r9) > 4: 4b ff ff d8 b 0xffffffffffffffdc > 8: 60 00 00 00 nop > c: 7c a5 00 34 cntlzw r5,r5 > [ 2.644769] ---[ end trace 0000000000000000 ]--- > > > (That's the bug splat from the bug report piped through > scripts/decode_stacktrace.sh) > > The kernel has CONFIG_PCI_MSI_ARCH_FALLBACKS=y, so the first hunk > shouldn't change anything. > > The disassembly of pci_msi_domain_supports in the kernel looks as > follows: > > c000000000a394c0 <pci_msi_domain_supports>: > pci_msi_domain_supports(): > debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:334 > c000000000a394c0: 60 00 00 00 nop > c000000000a394c4: 60 00 00 00 nop > debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:353 > c000000000a394c8: e9 43 02 e8 ld r10,744(r3) > c000000000a394cc: 2c 2a 00 00 cmpdi r10,0 > c000000000a394d0: 41 82 00 50 beq c000000000a39520 <pci_msi_domain_supports+0x60> > irq_domain_is_hierarchy(): > debian/build/build_powerpc_none_powerpc64/include/linux/irqdomain.h:661 > c000000000a394d4: 81 2a 00 28 lwz r9,40(r10) > pci_msi_domain_supports(): > debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:353 (discriminator 1) > c000000000a394d8: 71 28 00 01 andi. r8,r9,1 > c000000000a394dc: 41 82 00 44 beq c000000000a39520 <pci_msi_domain_supports+0x60> > debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:359 (discriminator 1) > c000000000a394e0: 71 29 01 00 andi. r9,r9,256 > c000000000a394e4: 41 82 00 2c beq c000000000a39510 <pci_msi_domain_supports+0x50> > debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:375 > c000000000a394e8: e9 2a 00 88 ld r9,136(r10) > c000000000a394ec: 80 69 00 00 lwz r3,0(r9) > debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:378 > c000000000a394f0: 7c 63 20 38 and r3,r3,r4 > c000000000a394f4: 7c 63 22 78 xor r3,r3,r4 > c000000000a394f8: 7c 63 00 34 cntlzw r3,r3 > c000000000a394fc: 54 63 d9 7e srwi r3,r3,5 > debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:379 > c000000000a39500: 78 63 07 e0 clrldi r3,r3,63 > c000000000a39504: 4e 80 00 20 blr > c000000000a39508: 60 00 00 00 nop > c000000000a3950c: 60 00 00 00 nop > debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:366 > c000000000a39510: e9 2a 00 20 ld r9,32(r10) > c000000000a39514: 80 69 00 00 lwz r3,0(r9) > c000000000a39518: 4b ff ff d8 b c000000000a394f0 <pci_msi_domain_supports+0x30> > c000000000a3951c: 60 00 00 00 nop > debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:355 > c000000000a39520: 7c a5 00 34 cntlzw r5,r5 > c000000000a39524: 54 a3 d9 7e srwi r3,r5,5 > debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:379 > c000000000a39528: 78 63 07 e0 clrldi r3,r3,63 > c000000000a3952c: 4e 80 00 20 blr > > > so the trapping happens in drivers/pci/msi/irqdomain.c:366 which is: > > 365 info = domain->host_data; > 366 supported = info->flags; > > According to the register dump domain == r10 == NULL, but then this code > would not have been reached and the faulting instruction would be at > c000000000a39510. So maybe it's only .host_data = NULL and the register > dump is unreliable?? > > The offsets match: .host_data is at offset 32 of struct > irq_domain and .flags is at offset 0 of struct msi_domain_info. > > For more details see > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1127635 . > > Does someone spot the issue? > > Best regards > Uwe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch] PCI/MSI: Handle lack of irqdomain gracefully 2026-04-22 8:50 ` Thorsten Leemhuis @ 2026-04-22 15:07 ` Aaron D. Johnson 2026-04-22 19:52 ` Thomas Gleixner 1 sibling, 0 replies; 12+ messages in thread From: Aaron D. Johnson @ 2026-04-22 15:07 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Uwe Kleine-König, Thomas Gleixner, Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Anup Patel, Sunil V L, linux-riscv, linux-kernel, Bjorn Helgaas, 1127635, regressions Thorsten Leemhuis writes: > Thx for forwarding the regression. Nothing happened since then -- > or am I missing something? If so: is that okay for everybody, or > should we do anything about this? I have not seen any discussion. As the initial reporter, I can accept that it is a very niche hardware platform and may well go unfixed forever. The machine in question has left my possession. It may come back in the future. Or maybe not. The next-generation newer ppc64 machine I have (an IBM 8205-E6C p740) does not exhibit this behavior. > BTW, did anyone check if this happens with mainline (6.13/7.0) as > well to rule out that this is something that only happenens in the > stable series it was backported too? If it's the latter I wonder if > reverting it there might be a easy way to resolve this. I didn't check mainline specifically, no. All bisection was done against the stable git repo. Thanks! - Aaron ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch] PCI/MSI: Handle lack of irqdomain gracefully 2026-04-22 8:50 ` Thorsten Leemhuis 2026-04-22 15:07 ` Aaron D. Johnson @ 2026-04-22 19:52 ` Thomas Gleixner 1 sibling, 0 replies; 12+ messages in thread From: Thomas Gleixner @ 2026-04-22 19:52 UTC (permalink / raw) To: Thorsten Leemhuis, Uwe Kleine-König Cc: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Anup Patel, Sunil V L, linux-riscv, linux-kernel, Bjorn Helgaas, 1127635, Aaron D. Johnson, regressions On Wed, Apr 22 2026 at 10:50, Thorsten Leemhuis wrote: > On 3/11/26 12:22, Uwe Kleine-König wrote: >> Control: forwarded -1 https://lore.kernel.org/lkml/abE_QoS5DM-ZltaV@monoceros >> >> #regzbot introduced: a60b990798eb17433d0283788280422b1bd94b18 > > Thomas, in case you missed it: this is a change of yours: a60b990798eb17 > ("PCI/MSI: Handle lack of irqdomain gracefully") [v6.13-rc5] Yes. That fell through the cracks. >> #regzbot from: "Aaron D. Johnson" <debbugreporter@fnord.greeley.co.us> >> #regzbot monitor: https://bugs.debian.org/1127635 > > Thx for forwarding the regression. Nothing happened since then -- or am > I missing something? If so: is that okay for everybody, or should we do > anything about this? > > BTW, did anyone check if this happens with mainline (6.13/7.0) as well > to rule out that this is something that only happenens in the stable > series it was backported too? If it's the latter I wonder if reverting > it there might be a easy way to resolve this. Let me have a look. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch] PCI/MSI: Handle lack of irqdomain gracefully 2026-03-11 11:22 ` Uwe Kleine-König 2026-04-22 8:50 ` Thorsten Leemhuis @ 2026-04-22 21:22 ` Thomas Gleixner 2026-04-22 22:34 ` Aaron D. Johnson 1 sibling, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2026-04-22 21:22 UTC (permalink / raw) To: Uwe Kleine-König Cc: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Anup Patel, Sunil V L, linux-riscv, linux-kernel, Bjorn Helgaas, 1127635, Aaron D. Johnson, regressions On Wed, Mar 11 2026 at 12:22, Uwe Kleine-König wrote: > this patch became a60b990798eb17433d0283788280422b1bd94b18 in v6.13-rc5 > and was backported to 6.12.y and 6.6.y (aed157301c65 and b1f7476e07b9 > respectively). > > A Debian user (Aaron, on Cc:) on powerpc has boot problems and bisected > them to this commit. The relevant boot log of the failure is: > [ 2.643879] BUG: Kernel NULL pointer dereference on read at 0x00000000 > [ 2.643891] Faulting instruction address: 0xc000000000a39514 > [ 2.643902] Oops: Kernel access of bad area, sig: 11 [#1] > [ 2.643909] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > [ 2.643920] Modules linked in: ohci_pci(+) ehci_hcd nvme_fabrics ohci_hcd nvme_keyring nvme_core usbcore nvme_auth scsi_transport_fc ipr configfs ehea(+) usb_common > [ 2.643965] CPU: 5 UID: 0 PID: 250 Comm: (udev-worker) Not tainted 6.12.17-powerpc64 #1 Debian 6.12.17-1 > [ 2.643976] Hardware name: IBM,8204-E8A POWER6 (architected) 0x3e0302 0xf000002 of:IBM,EL350_118 hv:phyp pSeries > [ 2.643986] NIP: c000000000a39514 LR: c000000000a36ed8 CTR: c000000000a35820 > [ 2.643995] REGS: c0000000351f6f60 TRAP: 0300 Not tainted (6.12.17-powerpc64 Debian 6.12.17-1) > [ 2.644004] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 24222288 XER: 00000000 > [ 2.644031] CFAR: c00000000000cfc4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 > [ 2.644031] GPR00: c000000000a36ed8 c0000000351f7200 c00000000182e200 c0000003df294000 > [ 2.644031] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 2.644031] GPR08: 0000000000000001 0000000000000000 c00000000228fcc0 0000000044222288 > [ 2.644031] GPR12: c000000000a35820 c00000000eeacb00 0000000000000020 0000010037fcab20 > [ 2.644031] GPR16: 0000000022222248 0000000000020000 0000000000000000 00003fffebe8bb80 > [ 2.644031] GPR20: 0000000000000000 c00000000204db60 c00000000204dd60 c00000000b1ae780 > [ 2.644031] GPR24: 0000000000000000 00003fff8c9ac758 0000000000000000 c0000003df294000 > [ 2.644031] GPR28: 0000000000000001 0000000000000000 c0000003df294000 0000000000000001 > [ 2.644164] NIP [c000000000a39514] pci_msi_domain_supports (drivers/pci/msi/irqdomain.c:366) > [ 2.644181] LR [c000000000a36ed8] __pci_enable_msi_range (drivers/pci/msi/msi.c:437) > [ 2.644192] Call Trace: > [ 2.644197] [c0000000351f7200] [c0000000351f7304] 0xc0000000351f7304 (unreliable) > [ 2.644211] [c0000000351f7340] [c000000000a3578c] pci_alloc_irq_vectors_affinity (drivers/pci/msi/api.c:277) > ======== > 0:* 41 82 00 2c beq 0x2c <-- trapping instruction > 4: e9 2a 00 88 ld r9,136(r10) > 8: 80 69 00 00 lwz r3,0(r9) > c: 7c 63 20 38 and r3,r3,r4 > 10: 7c 63 22 78 xor r3,r3,r4 > 14: 7c 63 00 34 cntlzw r3,r3 > 18: 54 63 d9 7e srwi r3,r3,5 > 1c: 78 63 07 e0 clrldi r3,r3,63 > 20: 4e 80 00 20 blr > 24: 60 00 00 00 nop > 28: 60 00 00 00 nop > 2c: e9 2a 00 20 ld r9,32(r10) > 30: 80 69 00 00 lwz r3,0(r9) > 34: 4b ff ff d8 b 0xc > 38: 60 00 00 00 nop > 3c: 7c a5 00 34 cntlzw r5,r5 > > Code starting with the faulting instruction > =========================================== > 0: 80 69 00 00 lwz r3,0(r9) > [ 2.644031] GPR08: 0000000000000001 0000000000000000 c00000000228fcc0 0000000044222288 So R9 is NULL, R10 is the domain pointer. > 4: 4b ff ff d8 b 0xffffffffffffffdc > 8: 60 00 00 00 nop > c: 7c a5 00 34 cntlzw r5,r5 > [ 2.644769] ---[ end trace 0000000000000000 ]--- > > > (That's the bug splat from the bug report piped through > scripts/decode_stacktrace.sh) > > The kernel has CONFIG_PCI_MSI_ARCH_FALLBACKS=y, so the first hunk > shouldn't change anything. Correct. But the Ooops is in the unchanged code part of pci_msi_domain_supports(). > so the trapping happens in drivers/pci/msi/irqdomain.c:366 which is: > > 365 info = domain->host_data; > 366 supported = info->flags; > > According to the register dump domain == r10 == NULL, but then this code No. You are looking at the wrong register set. The second one is the user space register set from the syscall entry. R10 contains the domain pointer and R9 is NULL, which does not make any sense. On 6.12 power64 still uses the global PCI/MSI domain model. According to the splat this is pseries so the global PCI/MSI domain is created in __pseries_msi_allocate_domains() via pci_msi_create_irq_domain(). The latter takes a pointer to static struct msi_domain_info_pseries_msi_domain_info; which is assigned to the global PCI/MSI domain::host_data. Upstream got rid of that and uses per device domains, so it might have been magically fixed by now, but I doubt it: That new check in __pci_enable_msi_range() is benign as the actual allocation code further down relies on domain::host_data being a valid pointer as well. It might not reach that point due to the subsequent checks, but if the PCI device has pdev::dev::msi::domain populated, then this has to be either a global PCI/MSI domain or a MSI parent domain. Both have domain::host_data populated with a msi_domain_info pointer. Something is mighty fishy here. Aaron, can you please apply the patch below and see whether it fixes the issue and provide the dmesg with the output of those pr_warn()'s? The other information which would be useful: When you boot a kernel with the commit reverted and look at that OHCI controller with lspci -vvv then you should see whether it has MSI enabled or not. If it has MSI enabled, then please provide the output of /sys/kernel/debug/irq/irqs/$IRQNR You need to enable CONFIG_GENERIC_IRQ_DEBUGFS for that. And that's actually useful for the debug patch below too because you can then look at the domain name output and gather more information from /sys/kernel/debug/irq/domains/$NAME Thanks, tglx --- drivers/pci/msi/irqdomain.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) --- a/drivers/pci/msi/irqdomain.c +++ b/drivers/pci/msi/irqdomain.c @@ -115,6 +115,8 @@ struct irq_domain *pci_msi_create_irq_do struct msi_domain_info *info, struct irq_domain *parent) { + struct irq_domain *domain; + if (WARN_ON(info->flags & MSI_FLAG_LEVEL_CAPABLE)) info->flags &= ~MSI_FLAG_LEVEL_CAPABLE; @@ -135,7 +137,12 @@ struct irq_domain *pci_msi_create_irq_do /* Let the core update the bus token */ info->bus_token = DOMAIN_BUS_PCI_MSI; - return msi_create_irq_domain(fwnode, info, parent); + domain = msi_create_irq_domain(fwnode, info, parent); + if (domain) { + pr_warn("Created global PCI/MSI domain %lx %s flags: %x\n", + (unsigned long)domain, domain->name, domain->flags); + } + return domain; } EXPORT_SYMBOL_GPL(pci_msi_create_irq_domain); @@ -356,6 +363,12 @@ bool pci_msi_domain_supports(struct pci_ return false; } + if (!domain->host_data) { + pr_warn("Device MSI domain %lx %s %x lacks host data\n", + (unsigned long)domain, domain->name, domain->flags); + return false; + } + if (!irq_domain_is_msi_parent(domain)) { /* * For "global" PCI/MSI interrupt domains the associated ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch] PCI/MSI: Handle lack of irqdomain gracefully 2026-04-22 21:22 ` Thomas Gleixner @ 2026-04-22 22:34 ` Aaron D. Johnson 0 siblings, 0 replies; 12+ messages in thread From: Aaron D. Johnson @ 2026-04-22 22:34 UTC (permalink / raw) To: Thomas Gleixner Cc: Uwe Kleine-König, Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Anup Patel, Sunil V L, linux-riscv, linux-kernel, Bjorn Helgaas, 1127635, regressions Thomas Gleixner writes: > Aaron, can you please apply the patch below and see whether it fixes > the issue and provide the dmesg with the output of those > pr_warn()'s? I can build it. But as stated earlier, the machine is no longer in my possession. If it comes back, (and it might -- its new owner is having problems supplying sufficient input power in his apartment), I will be happy to test patches. > The other information which would be useful: When you boot a kernel > with the commit reverted and look at that OHCI controller with lspci > -vvv then you should see whether it has MSI enabled or not. If it > has MSI enabled, then please provide the output of > > /sys/kernel/debug/irq/irqs/$IRQNR > > You need to enable CONFIG_GENERIC_IRQ_DEBUGFS for that. Noted. And thanks for the time looking into it. - Aaron ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-04-22 22:35 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-13 11:57 [RFC PATCH] riscv: Fix PCI warning by enabling PCI_MSI_ARCH_FALLBACKS Alexandre Ghiti 2024-12-13 13:12 ` Thomas Gleixner 2024-12-13 13:51 ` Alexandre Ghiti 2024-12-14 11:50 ` [Patch] PCI/MSI: Handle lack of irqdomain gracefully Thomas Gleixner 2024-12-17 13:08 ` [tip: irq/urgent] " tip-bot2 for Thomas Gleixner 2025-02-03 19:16 ` [Patch] " patchwork-bot+linux-riscv 2026-03-11 11:22 ` Uwe Kleine-König 2026-04-22 8:50 ` Thorsten Leemhuis 2026-04-22 15:07 ` Aaron D. Johnson 2026-04-22 19:52 ` Thomas Gleixner 2026-04-22 21:22 ` Thomas Gleixner 2026-04-22 22:34 ` Aaron D. Johnson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox