From: Marc Zyngier <maz@kernel.org>
To: Johan Hovold <johan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>,
linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org,
anna-maria@linutronix.de, shawnguo@kernel.org,
s.hauer@pengutronix.de, festevam@gmail.com, bhelgaas@google.com,
rdunlap@infradead.org, vidyas@nvidia.com,
ilpo.jarvinen@linux.intel.com, apatel@ventanamicro.com,
kevin.tian@intel.com, nipun.gupta@amd.com, den@valinux.co.jp,
andrew@lunn.ch, gregory.clement@bootlin.com,
sebastian.hesselbarth@gmail.com, gregkh@linuxfoundation.org,
rafael@kernel.org, alex.williamson@redhat.com, will@kernel.org,
lorenzo.pieralisi@arm.com, jgg@mellanox.com,
ammarfaizi2@gnuweeb.org, robin.murphy@arm.com,
lpieralisi@kernel.org, nm@ti.com, kristo@kernel.org,
vkoul@kernel.org, okaya@kernel.org, agross@kernel.org,
andersson@kernel.org, mark.rutland@arm.com,
shameerali.kolothum.thodi@huawei.com, yuzenghui@huawei.com
Subject: Re: [patch V4 00/21] genirq, irqchip: Convert ARM MSI handling to per device MSI domains
Date: Wed, 17 Jul 2024 13:54:40 +0100 [thread overview]
Message-ID: <86msmg2n73.wl-maz@kernel.org> (raw)
In-Reply-To: <Zpdxe4ce-XwDEods@hovoldconsulting.com>
On Wed, 17 Jul 2024 08:23:39 +0100,
Johan Hovold <johan@kernel.org> wrote:
>
> On Tue, Jul 16, 2024 at 07:21:39PM +0100, Marc Zyngier wrote:
> > On Tue, 16 Jul 2024 15:53:28 +0100,
> > Johan Hovold <johan@kernel.org> wrote:
> > > On Tue, Jul 16, 2024 at 11:30:05AM +0100, Marc Zyngier wrote:
> > > > On Mon, 15 Jul 2024 15:10:01 +0100,
> > > > Johan Hovold <johan@kernel.org> wrote:
> > > > > On Mon, Jul 15, 2024 at 01:58:13PM +0100, Marc Zyngier wrote:
> > > > > > On Mon, 15 Jul 2024 12:18:47 +0100,
> > > > > > Johan Hovold <johan@kernel.org> wrote:
>
> > > > > > > This series only showed up in linux-next last Friday and broke interrupt
> > > > > > > handling on Qualcomm platforms like sc8280xp (e.g. Lenovo ThinkPad X13s)
> > > > > > > and x1e80100 that use the GIC ITS for PCIe MSIs.
> > > > > > >
> > > > > > > I've applied the series (21 commits from linux-next) on top of 6.10 and
> > > > > > > can confirm that the breakage is caused by commits:
> > > > > > >
> > > > > > > 3d1c927c08fc ("irqchip/gic-v3-its: Switch platform MSI to MSI parent")
> > > > > > > 233db05bc37f ("irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]")
> > > > > > >
> > > > > > > Applying the series up until the change before 3d1c927c08fc unbreaks the
> > > > > > > wifi on one machine:
> > > > > > >
> > > > > > > ath11k_pci 0006:01:00.0: failed to enable msi: -22
> > > > > > > ath11k_pci 0006:01:00.0: probe with driver ath11k_pci failed with error -22
> > >
> > > Correction, this doesn't fix the wifi, but I'm not seeing these errors
> > > with the commit before cc23d1dfc959 as the ath11k driver doesn't get
>
> [ This was supposed to say 3d1c927c08fc, which is the mainline hash,
> sorry. ]
>
> > > this far (or doesn't probe at all).
> >
> > I think we need to track one thing at a time. The wifi and nvme
> > problems seem subtly different... Which is the exact commit that
> > breaks nvme on your machine?
>
> Yeah, forget about 3d1c927c08fc for now, which may have been a red
> herring since we're also appear to be dealing with some sort of race and
> (some) symptoms keep changing from boot to boot. The only thing that for
> certain is that the series breaks MSI and that the NVMe breaks with
> commit 233db05bc37f ("irqchip/gic-v3-its: Provide MSI parent for
> PCI/MSI[-X]").
>
> > > > So is this issue actually tied to the async probing? Does it always
> > > > work if you disable it?
> > >
> > > There seem to multiple issues here.
> > >
> > > With the full series applied and normal async (i.e. parallel) probing of
> > > the PCIe controllers I sometimes see allocation failing with -ENOSPC
> > > (e.g. the above ath11k errors). This seems to indicate broken locking
> > > somewhere.
> >
> > Your log doesn't support this theory. At least not from an ITS
> > perspective, as it keeps dishing out INTIDs (and it is very hard to
> > run out of IRQs with the ITS).
>
> The log I shared was with synchronous probing which takes parallel
> allocation out of the equation (and gives more readable logs) so that is
> expected. See below for a log with normal async probing that may give
> some more insight into the race as well (i.e. when ath11k allocation
> fails with -ENOSPC.)
Huh, this log is actually pointing at something very ugly. Not a race,
but some horrible ID confusion. See below.
>
> > > With synchronous probing, allocation always seems to succeed but the
> > > ath11k (and modem) drivers time out as no interrupts are received.
> > >
> > > The NVMe driver sometimes falls back to INTx signalling and can access
> > > the drive, but often end up with an MSIX (?!) allocation and then fails
> > > to probe:
> > >
> > > [ 132.084740] nvme nvme0: I/O tag 17 (1011) QID 0 timeout, completion polled
> >
> > So one of my test boxes (ThunderX) fails this exact way, while another
> > (Synquacer) is pretty happy. Still trying to understand the difference
> > in behaviour.
> >
> > How do you enforce synchronous probing?
>
> I believe there is a kernel parameter for this (e.g.
> module.async_probe), but I just disable async probing for the Qualcomm
> PCIe driver I'm using:
I had tried this module parameter, but it didn't change anything on my
end.
>
> --- a/drivers/pci/controller/dwc/pcie-qcom.c
> +++ b/drivers/pci/controller/dwc/pcie-qcom.c
> @@ -1684,7 +1684,7 @@ static struct platform_driver qcom_pcie_driver = {
> .name = "qcom-pcie",
> .of_match_table = qcom_pcie_match,
> .pm = &qcom_pcie_pm_ops,
> - .probe_type = PROBE_PREFER_ASYNCHRONOUS,
> + //.probe_type = PROBE_PREFER_ASYNCHRONOUS,
> },
> };
I'll have a look whether the TX1 PCIe driver uses this. It's
positively ancient, so I wouldn't bet that it has been touched
significantly in the past 5 years.
[...]
> [ 8.692011] Reusing ITT for devID 0
> [ 8.693668] Reusing ITT for devID 0
This is really odd. It indicates that you have several devices sharing
the same DeviceID, which I seriously doubt it is the case in a
laptop. Do you have any non-transparent bridge here? lspci would help.
> [ 8.693871] pcieport 0006:00:00.0: PME: Signaling with IRQ 228
> [ 8.694116] pcieport 0006:00:00.0: AER: enabled with IRQ 228
> [ 8.696453] pci 0004:00:00.0: PCI bridge to [bus 01-ff]
> [ 8.703760] IRQ206 -> 0-7 CPU2
> [ 8.710986] pci 0004:00:00.0: bridge window [mem 0x34300000-0x343fffff]
> [ 8.711136] Reusing ITT for devID 0
Where is the bus number gone?
> [ 8.717093] IRQ207 -> 0-7 CPU3
> [ 8.723889] Reusing ITT for devID 0
> [ 8.729600] IRQ208 -> 0-7 CPU4
> [ 8.736507] pcieport 0004:00:00.0: PME: Signaling with IRQ 229
> [ 8.744261] IRQ209 -> 0-7 CPU5
> [ 8.750757] pcieport 0004:00:00.0: AER: enabled with IRQ 229
> [ 8.758038] IRQ210 -> 0-7 CPU6
> [ 9.071793] IRQ211 -> 0-7 CPU7
> [ 9.071807] IRQ212 -> 0-7 CPU0
> [ 9.071819] IRQ213 -> 0-7 CPU1
> [ 9.071831] IRQ214 -> 0-7 CPU2
> [ 9.071842] IRQ215 -> 0-7 CPU3
> [ 9.071852] IRQ216 -> 0-7 CPU4
> [ 9.071863] IRQ217 -> 0-7 CPU5
> [ 9.071875] IRQ218 -> 0-7 CPU6
> [ 9.071886] IRQ219 -> 0-7 CPU7
> [ 9.071897] IRQ220 -> 0-7 CPU0
> [ 9.071907] IRQ221 -> 0-7 CPU1
> [ 9.071920] IRQ222 -> 0-7 CPU2
> [ 9.071930] IRQ223 -> 0-7 CPU3
> [ 9.071941] IRQ224 -> 0-7 CPU4
> [ 9.071952] IRQ225 -> 0-7 CPU5
> [ 9.071962] IRQ226 -> 0-7 CPU6
> [ 9.071973] IRQ227 -> 0-7 CPU7
> [ 9.073568] Reusing ITT for devID 0
> [ 9.073607] ID:0 pID:8192 vID:196
> [ 9.073618] IRQ196 -> 0-7 CPU0
> [ 9.073717] IRQ196 -> 0-7 CPU0
> [ 9.073737] pcieport 0002:00:00.0: PME: Signaling with IRQ 196
> [ 9.086532] pcieport 0002:00:00.0: AER: enabled with IRQ 196
> [ 9.102057] mhi-pci-generic 0004:01:00.0: MHI PCI device found: foxconn-sdx55
> [ 9.109830] mhi-pci-generic 0004:01:00.0: BAR 0 [mem 0x34300000-0x34300fff 64bit]: assigned
> [ 9.119027] mhi-pci-generic 0004:01:00.0: enabling device (0000 -> 0002)
> [ 9.127271] ITS: alloc 8224:8
> [ 9.141500] ITT 8 entries, 3 bits
> [ 9.144502] ID:0 pID:8224 vID:198
> [ 9.144597] ID:1 pID:8225 vID:199
> [ 9.144605] ID:2 pID:8226 vID:200
> [ 9.144612] ID:3 pID:8227 vID:201
> [ 9.144619] ID:4 pID:8228 vID:202
> [ 9.144689] IRQ198 -> 0-7 CPU1
> [ 9.144888] IRQ199 -> 0-7 CPU2
> [ 9.144901] IRQ200 -> 0-7 CPU3
> [ 9.144914] IRQ201 -> 0-7 CPU4
> [ 9.144927] IRQ202 -> 0-7 CPU5
> [ 9.151264] IRQ198 -> 0-7 CPU1
> [ 9.151479] IRQ199 -> 0-7 CPU2
> [ 9.151673] IRQ200 -> 0-7 CPU3
> [ 9.151849] IRQ201 -> 0-7 CPU4
> [ 9.152056] IRQ202 -> 0-7 CPU5
> [ 9.159972] mhi mhi0: Requested to power ON
> [ 9.165275] mhi mhi0: Power on setup success
> [ 9.279951] ath11k_pci 0006:01:00.0: BAR 0 [mem 0x30400000-0x305fffff 64bit]: assigned
> [ 9.288208] ath11k_pci 0006:01:00.0: enabling device (0000 -> 0002)
> [ 9.301708] nvme nvme0: pci function 0002:01:00.0
> [ 9.307052] Reusing ITT for devID 100
> [ 9.315457] nvme 0002:01:00.0: enabling device (0000 -> 0002)
This is device 0002:01:00.0...
> [ 9.326554] Reusing ITT for devID 100
... seen as device 0000:01:00.0. WTF???
> [ 9.336332] ath11k_pci 0006:01:00.0: ath11k_pci_alloc_msi - requesting one vector failed: -28
I'm starting to suspect that the new code doesn't carry all the
required bits for the DevID, and that we end-up trying to allocated
interrupts from the pool allocated to another device, which can never
be a good thing, and would explain why everything dies a painful
death.
Can you run the same trace with the whole thing reverted? I think
we're on something here.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2024-07-17 12:54 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-23 15:18 [patch V4 00/21] genirq, irqchip: Convert ARM MSI handling to per device MSI domains Thomas Gleixner
2024-06-23 15:18 ` [patch V4 01/21] PCI/MSI: Provide MSI_FLAG_PCI_MSI_MASK_PARENT Thomas Gleixner
2024-06-26 19:05 ` [patch V4-1 " Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Shivamurthy Shastri
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Shivamurthy Shastri
2024-06-23 15:18 ` [patch V4 02/21] irqchip: Provide irq-msi-lib Thomas Gleixner
2024-07-01 10:18 ` Lorenzo Pieralisi
2024-07-03 13:57 ` Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 03/21] irqchip/gic-v3-its: Provide MSI parent infrastructure Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 04/21] irqchip/irq-msi-lib: Prepare for PCI MSI/MSIX Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 05/21] irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X] Thomas Gleixner
2024-06-28 22:24 ` Catalin Marinas
2024-06-29 8:37 ` Thomas Gleixner
2024-06-29 9:42 ` Marc Zyngier
2024-06-29 9:50 ` Marc Zyngier
2024-06-29 10:11 ` Marc Zyngier
2024-06-29 10:44 ` Thomas Gleixner
2024-06-29 19:51 ` Thomas Gleixner
2024-06-30 9:55 ` Catalin Marinas
2024-06-29 9:18 ` Marc Zyngier
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 06/21] irqchip/irq-msi-lib: Prepare for DEVICE MSI to replace platform MSI Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 07/21] irqchip/mbigen: Prepare for real per device MSI Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 08/21] irqchip/irq-msi-lib: Prepare for DOMAIN_BUS_WIRED_TO_MSI Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 09/21] irqchip/gic-v3-its: Switch platform MSI to MSI parent Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 10/21] irqchip/mbigen: Remove platform_msi_create_device_domain() fallback Thomas Gleixner
2024-06-25 14:42 ` Lorenzo Pieralisi
2024-06-26 9:13 ` Hanjun Guo
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 11/21] genirq/msi: Remove platform_msi_create_device_domain() Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:39 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 12/21] irqchip/gic_v3_mbi: Switch over to parent domain Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:38 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 13/21] irqchip/gic-v2m: Switch to device MSI Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:38 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 14/21] irqchip/imx-mu-msi: Switch to MSI parent Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:38 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 15/21] irqchip/irq-mvebu-icu: Prepare for real per device MSI Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:38 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:18 ` [patch V4 16/21] irqchip/mvebu-gicp: Switch to MSI parent Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:38 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:19 ` [patch V4 17/21] irqchip/mvebu-odmi: Switch to parent MSI Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:38 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:19 ` [patch V4 18/21] irqchip/irq-mvebu-sei: Switch to MSI parent Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:38 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:19 ` [patch V4 19/21] irqchip/irq-mvebu-icu: Remove platform MSI leftovers Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:38 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:19 ` [patch V4 20/21] genirq/msi: " Thomas Gleixner
2024-06-25 10:02 ` Greg KH
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:38 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-23 15:19 ` [patch V4 21/21] genirq/msi: Move msi_device_data to core Thomas Gleixner
2024-07-10 16:25 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2024-07-18 18:38 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2024-06-25 19:46 ` [patch V4 00/21] genirq, irqchip: Convert ARM MSI handling to per device MSI domains Rob Herring
2024-06-26 19:03 ` Thomas Gleixner
2024-07-15 11:18 ` Johan Hovold
2024-07-15 12:58 ` Marc Zyngier
2024-07-15 14:10 ` Johan Hovold
2024-07-16 10:30 ` Marc Zyngier
2024-07-16 14:53 ` Johan Hovold
2024-07-16 18:21 ` Marc Zyngier
2024-07-17 7:23 ` Johan Hovold
2024-07-17 12:54 ` Marc Zyngier [this message]
2024-07-17 13:38 ` Johan Hovold
2024-07-17 18:07 ` Marc Zyngier
2024-07-17 20:10 ` Marc Zyngier
2024-07-18 7:30 ` Johan Hovold
2024-07-15 13:10 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86msmg2n73.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=agross@kernel.org \
--cc=alex.williamson@redhat.com \
--cc=ammarfaizi2@gnuweeb.org \
--cc=andersson@kernel.org \
--cc=andrew@lunn.ch \
--cc=anna-maria@linutronix.de \
--cc=apatel@ventanamicro.com \
--cc=bhelgaas@google.com \
--cc=den@valinux.co.jp \
--cc=festevam@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=gregory.clement@bootlin.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=jgg@mellanox.com \
--cc=johan@kernel.org \
--cc=kevin.tian@intel.com \
--cc=kristo@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=lpieralisi@kernel.org \
--cc=mark.rutland@arm.com \
--cc=nipun.gupta@amd.com \
--cc=nm@ti.com \
--cc=okaya@kernel.org \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=robin.murphy@arm.com \
--cc=s.hauer@pengutronix.de \
--cc=sebastian.hesselbarth@gmail.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=shawnguo@kernel.org \
--cc=tglx@linutronix.de \
--cc=vidyas@nvidia.com \
--cc=vkoul@kernel.org \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.