From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D47E2C41513 for ; Wed, 17 Jul 2024 18:07:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=yYCQnY/iwRbolEYUa418bWf7W0Jt0QmchHnHfi6yV8Y=; b=kWv7OdwW1hRFlE/7HcsWZzRm2M OwgGPiXR3QAoH7aH+QPRfIunXCZ2lboqIv9m4ecinDtuIApoz5xhDMIZn75PZNb7kFrXpga3SKNK1 puZWcJaeZ/Ivs3AzVWj6+F7U4GxLADlsYU2SetbDuleEDLdtkKRcFw+jPFYLqgSTIhQyDzEDgPjXP NpVbEOB2ot2JC/xNjfTPmNpNospaH1cKQ2GJuN6IKC7GmhsiBdeMqW5FdLcLKdz423TNMYQXsUVAt w9Gfl0L7epRgZd5nA08PWYdTROmPpK1F/ypFyrF96g3NY9/ej8OtXhQBb75EQLfnt4jhN5uhgN+yJ rfA9u/Aw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sU944-0000000EYv9-2gPv; Wed, 17 Jul 2024 18:07:44 +0000 Received: from sin.source.kernel.org ([145.40.73.55]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sU93j-0000000EYoW-1dZI for linux-arm-kernel@lists.infradead.org; Wed, 17 Jul 2024 18:07:25 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 430AECE178F; Wed, 17 Jul 2024 18:07:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5437FC2BD10; Wed, 17 Jul 2024 18:07:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721239639; bh=jlQGj+y5xWjWBac1NDIXq5sTb3PhV8K7u2hVU+hdMbQ=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=iHMWdR4QtDHjNb6qgyX5cSH0r8cdiH4JIcMRiOab4qdxMLQTj2x72M0L2NJcri1bW BdE8DN/dSAGnaAk+lhisCcnYW2c/i/ZACZp/D5TcAQQb7iLZDM6DC25k4Z2PNDea0M 5wFEVDGuaLNZ7qnxns4D4K9P1yGM27AzEkDzQTgS+fPP9mUIcN06iyTLXhYKPsoQkS Xfz4GgZs+zvmJb8Isdt31wmEddpwt1syyrtwPf5RCNRtHTrxeC/4F1n7d3i4n45jHk 17vKjtS/WUXXwHr4NjsrQRYqF1HXb73CrmvVcPR375CDPfeywpegjK1oy+vumUYqPG 8tImRDDHkUuaA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1sU93c-00DDwI-F9; Wed, 17 Jul 2024 19:07:16 +0100 Date: Wed, 17 Jul 2024 19:07:15 +0100 Message-ID: <86le1z3nak.wl-maz@kernel.org> From: Marc Zyngier To: Johan Hovold Cc: Thomas Gleixner , LKML , linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, anna-maria@linutronix.de, shawnguo@kernel.org, s.hauer@pengutronix.de, festevam@gmail.com, bhelgaas@google.com, rdunlap@infradead.org, vidyas@nvidia.com, ilpo.jarvinen@linux.intel.com, apatel@ventanamicro.com, kevin.tian@intel.com, nipun.gupta@amd.com, den@valinux.co.jp, andrew@lunn.ch, gregory.clement@bootlin.com, sebastian.hesselbarth@gmail.com, gregkh@linuxfoundation.org, rafael@kernel.org, alex.williamson@redhat.com, will@kernel.org, lorenzo.pieralisi@arm.com, jgg@mellanox.com, ammarfaizi2@gnuweeb.org, robin.murphy@arm.com, lpieralisi@kernel.org, nm@ti.com, kristo@kernel.org, vkoul@kernel.org, okaya@kernel.org, agross@kernel.org, andersson@kernel.org, mark.rutland@arm.com, shameerali.kolothum.thodi@huawei.com, yuzenghui@huawei.com Subject: Re: [patch V4 00/21] genirq, irqchip: Convert ARM MSI handling to per device MSI domains In-Reply-To: References: <20240623142137.448898081@linutronix.de> <878qy26cd6.wl-maz@kernel.org> <86r0bt39zm.wl-maz@kernel.org> <86plrd2o5o.wl-maz@kernel.org> <86msmg2n73.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.3 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: johan@kernel.org, tglx@linutronix.de, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, anna-maria@linutronix.de, shawnguo@kernel.org, s.hauer@pengutronix.de, festevam@gmail.com, bhelgaas@google.com, rdunlap@infradead.org, vidyas@nvidia.com, ilpo.jarvinen@linux.intel.com, apatel@ventanamicro.com, kevin.tian@intel.com, nipun.gupta@amd.com, den@valinux.co.jp, andrew@lunn.ch, gregory.clement@bootlin.com, sebastian.hesselbarth@gmail.com, gregkh@linuxfoundation.org, rafael@kernel.org, alex.williamson@redhat.com, will@kernel.org, lorenzo.pieralisi@arm.com, jgg@mellanox.com, ammarfaizi2@gnuweeb.org, robin.murphy@arm.com, lpieralisi@kernel.org, nm@ti.com, kristo@kernel.org, vkoul@kernel.org, okaya@kernel.org, agross@kernel.org, andersson@kernel.org, mark.rutland@arm.com, shameerali.kolothum.thodi@huawei.com, yuzenghui@huawei.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240717_110723_791465_34765675 X-CRM114-Status: GOOD ( 33.85 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, 17 Jul 2024 14:38:59 +0100, Johan Hovold wrote: > > On Wed, Jul 17, 2024 at 01:54:40PM +0100, Marc Zyngier wrote: > > On Wed, 17 Jul 2024 08:23:39 +0100, > > Johan Hovold wrote: > > > > I believe there is a kernel parameter for this (e.g. > > > module.async_probe), but I just disable async probing for the Qualcomm > > > PCIe driver I'm using: > > > > I had tried this module parameter, but it didn't change anything on my > > end. > > > I'll have a look whether the TX1 PCIe driver uses this. It's > > positively ancient, so I wouldn't bet that it has been touched > > significantly in the past 5 years. > > Perhaps async probing just changes the symptoms, the NVMe and wifi > doesn't work in either case. Yeah, my impression is that this changes the order in which LPIs get allocated, but the core symptom is the same. > > > > [ 8.692011] Reusing ITT for devID 0 > > > [ 8.693668] Reusing ITT for devID 0 > > > > This is really odd. It indicates that you have several devices sharing > > the same DeviceID, which I seriously doubt it is the case in a > > laptop. Do you have any non-transparent bridge here? lspci would help. > > Yeah, and these messages do not show up without the series (see log > below). They are there in the previous synchronous log however. > > 0002:00:00.0 PCI bridge: Qualcomm Technologies, Inc SC8280XP PCI Express Root Port > 0002:01:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller BG4 (DRAM-less) > 0004:00:00.0 PCI bridge: Qualcomm Technologies, Inc SC8280XP PCI Express Root Port > 0004:01:00.0 Unassigned class [ff00]: Qualcomm Technologies, Inc SDX55 [Snapdragon X55 5G] > 0006:00:00.0 PCI bridge: Qualcomm Technologies, Inc SC8280XP PCI Express Root Port > 0006:01:00.0 Network controller: Qualcomm Technologies, Inc QCNFA765 Wireless Network Adapter (rev 01) Right, this is a very straightforward setup, Design-crap-ware-style. Nothing that would alias any device. > > > I'm starting to suspect that the new code doesn't carry all the > > required bits for the DevID, and that we end-up trying to allocated > > interrupts from the pool allocated to another device, which can never > > be a good thing, and would explain why everything dies a painful > > death. > > > > Can you run the same trace with the whole thing reverted? I think > > we're on something here. > > See below, using normal asynchronous probing like the previous log. And as expected, no aliasing showing up in this log. Somehow, we're not able to distinguish between the different PCI domains anymore, leading to all sorts of funnies. For the record, I've added some extra debug in the its driver and ran the result on TX1, old and new kernels. Before this series: [ 10.139806] nvme nvme0: pci function 0006:58:00.0 [ 10.158599] nvme 0006:58:00.0: devid = 35800 With this series: [ 10.143729] nvme nvme0: pci function 0006:58:00.0 [ 10.181775] nvme 0006:58:00.0: devid = 5800 Clearly, we've lost something in the battle. I'll keep digging. M. -- Without deviation from the norm, progress is not possible.