From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9FBDCC3DA49 for ; Tue, 16 Jul 2024 18:22:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=DBheXPKj453dzQYL4mrAZQ4336NWRdvyd+jm4DVE6Gw=; b=MYhz6CBOLCBnSLSTGnpLaGZ0UN WSnytzMQrKDzM48LUhN3z100+j212KBRqcPGN3ySWAseBWfoQnTfu/U09F22bA+jkxwDgtGh0JOeQ /SNeEIGmZqYUcvj0w4l3CNHGOd8lnPLVpBH4ys7eACde8MTFyuVw7l7rZ9sH06g+KP8WX8v7EsaaC Xu9G0tV5ADuLjRHGvgjIAfGlKSoOgl8rJpIiU1VTOnhYWidfZlBs/0AnF4Qdx+lM4NGZIgJ4m6W7I 02ZQXhKGgu4lDvvgEWKJcS2cAaCTiZPUy183fGdGAdm6SaDw+aO7CDbmKrrB+XN0Fkq9xvw2LvRX+ Dgwqvkag==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sTmoP-0000000BFKz-2KqQ; Tue, 16 Jul 2024 18:22:05 +0000 Received: from sin.source.kernel.org ([145.40.73.55]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sTmo5-0000000BFFn-2Cnv for linux-arm-kernel@lists.infradead.org; Tue, 16 Jul 2024 18:21:47 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 4799ECE138E; Tue, 16 Jul 2024 18:21:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6AA22C116B1; Tue, 16 Jul 2024 18:21:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721154102; bh=ntHqDMyy/4I/yJt5qk7IPvkZsikYxf20GiACJzvxlp8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Moe6rkWHIpcISLL9Bdv+AIX3fWvauZDaxK6/G/uXTLrViAMp3gu2NAEKxY4bWa8i9 HZFVra7oHheW9IwUVFEW1rh9y359eLgJ9RHkS1GIjBCIKNAn01u8ggLgsF/HhKJDQZ rFRO/WiIos5YcA9ANiZte57+IW6XZ2Ucb+Y+itOtIr7c3A/HWlVYt5/W+117hS2tMf bXimsSdEXKKaTPeTGBf2slLCFtN16gvfzj3CRdNxIPDjL6BFLgsGW8Ncz2etBTHUUG Cfl9iyyV+XSvFOjNNLgkTBu4CySnCGMHoFG75PeFjRuFl4wFzo4zC7E6wV1iMUYiNV Z3Rxh5xR7VO9w== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1sTmnz-00Cu51-NC; Tue, 16 Jul 2024 19:21:39 +0100 Date: Tue, 16 Jul 2024 19:21:39 +0100 Message-ID: <86plrd2o5o.wl-maz@kernel.org> From: Marc Zyngier To: Johan Hovold Cc: Thomas Gleixner , LKML , linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, anna-maria@linutronix.de, shawnguo@kernel.org, s.hauer@pengutronix.de, festevam@gmail.com, bhelgaas@google.com, rdunlap@infradead.org, vidyas@nvidia.com, ilpo.jarvinen@linux.intel.com, apatel@ventanamicro.com, kevin.tian@intel.com, nipun.gupta@amd.com, den@valinux.co.jp, andrew@lunn.ch, gregory.clement@bootlin.com, sebastian.hesselbarth@gmail.com, gregkh@linuxfoundation.org, rafael@kernel.org, alex.williamson@redhat.com, will@kernel.org, lorenzo.pieralisi@arm.com, jgg@mellanox.com, ammarfaizi2@gnuweeb.org, robin.murphy@arm.com, lpieralisi@kernel.org, nm@ti.com, kristo@kernel.org, vkoul@kernel.org, okaya@kernel.org, agross@kernel.org, andersson@kernel.org, mark.rutland@arm.com, shameerali.kolothum.thodi@huawei.com, yuzenghui@huawei.com Subject: Re: [patch V4 00/21] genirq, irqchip: Convert ARM MSI handling to per device MSI domains In-Reply-To: References: <20240623142137.448898081@linutronix.de> <878qy26cd6.wl-maz@kernel.org> <86r0bt39zm.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.3 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: johan@kernel.org, tglx@linutronix.de, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, anna-maria@linutronix.de, shawnguo@kernel.org, s.hauer@pengutronix.de, festevam@gmail.com, bhelgaas@google.com, rdunlap@infradead.org, vidyas@nvidia.com, ilpo.jarvinen@linux.intel.com, apatel@ventanamicro.com, kevin.tian@intel.com, nipun.gupta@amd.com, den@valinux.co.jp, andrew@lunn.ch, gregory.clement@bootlin.com, sebastian.hesselbarth@gmail.com, gregkh@linuxfoundation.org, rafael@kernel.org, alex.williamson@redhat.com, will@kernel.org, lorenzo.pieralisi@arm.com, jgg@mellanox.com, ammarfaizi2@gnuweeb.org, robin.murphy@arm.com, lpieralisi@kernel.org, nm@ti.com, kristo@kernel.org, vkoul@kernel.org, okaya@kernel.org, agross@kernel.org, andersson@kernel.org, mark.rutland@arm.com, shameerali.kolothum.thodi@huawei.com, yuzenghui@huawei.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240716_112145_949635_174A3551 X-CRM114-Status: GOOD ( 32.12 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org [Dropping shivamurthy.shastri@linutronix.de who is now bouncing...] On Tue, 16 Jul 2024 15:53:28 +0100, Johan Hovold wrote: > > On Tue, Jul 16, 2024 at 11:30:05AM +0100, Marc Zyngier wrote: > > On Mon, 15 Jul 2024 15:10:01 +0100, > > Johan Hovold wrote: > > > On Mon, Jul 15, 2024 at 01:58:13PM +0100, Marc Zyngier wrote: > > > > On Mon, 15 Jul 2024 12:18:47 +0100, > > > > Johan Hovold wrote: > > > > > On Sun, Jun 23, 2024 at 05:18:31PM +0200, Thomas Gleixner wrote: > > > > > > This is version 4 of the series to convert ARM MSI handling over to > > > > > > per device MSI domains. > > > > > > > > This series only showed up in linux-next last Friday and broke interrupt > > > > > handling on Qualcomm platforms like sc8280xp (e.g. Lenovo ThinkPad X13s) > > > > > and x1e80100 that use the GIC ITS for PCIe MSIs. > > > > > > > > > > I've applied the series (21 commits from linux-next) on top of 6.10 and > > > > > can confirm that the breakage is caused by commits: > > > > > > > > > > 3d1c927c08fc ("irqchip/gic-v3-its: Switch platform MSI to MSI parent") > > > > > 233db05bc37f ("irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]") > > > > > > > > > > Applying the series up until the change before 3d1c927c08fc unbreaks the > > > > > wifi on one machine: > > > > > > > > > > ath11k_pci 0006:01:00.0: failed to enable msi: -22 > > > > > ath11k_pci 0006:01:00.0: probe with driver ath11k_pci failed with error -22 > > Correction, this doesn't fix the wifi, but I'm not seeing these errors > with the commit before cc23d1dfc959 as the ath11k driver doesn't get > this far (or doesn't probe at all). I think we need to track one thing at a time. The wifi and nvme problems seem subtly different... Which is the exact commit that breaks nvme on your machine? [...] > > So is this issue actually tied to the async probing? Does it always > > work if you disable it? > > There seem to multiple issues here. > > With the full series applied and normal async (i.e. parallel) probing of > the PCIe controllers I sometimes see allocation failing with -ENOSPC > (e.g. the above ath11k errors). This seems to indicate broken locking > somewhere. Your log doesn't support this theory. At least not from an ITS perspective, as it keeps dishing out INTIDs (and it is very hard to run out of IRQs with the ITS). > > With synchronous probing, allocation always seems to succeed but the > ath11k (and modem) drivers time out as no interrupts are received. > > The NVMe driver sometimes falls back to INTx signalling and can access > the drive, but often end up with an MSIX (?!) allocation and then fails > to probe: > > [ 132.084740] nvme nvme0: I/O tag 17 (1011) QID 0 timeout, completion polled So one of my test boxes (ThunderX) fails this exact way, while another (Synquacer) is pretty happy. Still trying to understand the difference in behaviour. How do you enforce synchronous probing? M. -- Without deviation from the norm, progress is not possible.