From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DA2438AC97 for ; Wed, 24 Jun 2026 07:07:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782284852; cv=none; b=unoLjCvnaQW2r88N8wnVRavNS5xmZt5J6NgbXWNqGXzZjLFN64RorfnuDYQzx1lOUpPtB2DLN+azN98YyVpMdIvEoVpknBTH0M82w+m1lQJXi5zMV8ihURacw3amoidV+OF0ja5MpyBmKD8Z7j3ZiIV4ilRc4g3VLNxiqu7BBWw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782284852; c=relaxed/simple; bh=ssOOtlZmLQPObJHSb/EeOvENXoWYXrsQoUG+CTMvmO8=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=HH7CSNwEOHkeBycIW4Olmur8wuEEclgfArqmB1ZnuAeHOfhx6T+rk6vWmvDCBmC8pd72v2AUipd90yowSYpSvAo+cfa2445zjgGiZmE4/PJMm8GA4u9oq6EALcqOmAG9DRlcyfdtK/P3LoRjhHDtY3eYugs1VJ1V6wJ6z0JqJsA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nsuonfua; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nsuonfua" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E72D01F000E9; Wed, 24 Jun 2026 07:07:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782284850; bh=VpV1gN08nKO8YCzhBJRd5RD/lOT6aLuB8BvZqIGFAEQ=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=nsuonfua9Wo9YT5Z4UxNpyMKx0sqJvYUsfxwWr97uN+jUBF45OinbAad/NaKzkV7B wKqAqHtAsppqMI+9bm2PtU8oOQEMyh1LHjoNc94YecPRv/n65p5iLf7NCbmR2t18eO QsZxjR1YfbPI6/50PAUbk3/ZNtNeJGbSGJE/H6c5sYaI00zL6Q5/DXOivmKAfNJbWE 6w2uw29fhiYrZQecL+/otVrqvCg+5RDyZUxzeIeekOWmiRMYjPaUPZLRSQjSZqj+UF VJYgSlTwRMhfcCWiv9HmeU7rWSuy44/N+oFJvvc/EDFzY6EvATycASQp92jn5W71bo flGi2UPmn7LzQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wcHho-0000000FWti-2NWR; Wed, 24 Jun 2026 07:07:28 +0000 Date: Wed, 24 Jun 2026 08:07:28 +0100 Message-ID: <86o6h0quvj.wl-maz@kernel.org> From: Marc Zyngier To: Jinqian Yang Cc: , , , , , , , Subject: Re: [RFC PATCH] irqchip/gic-v3-its: enable dynamic MSI-X allocation In-Reply-To: <20260624025345.458387-1-yangjinqian1@huawei.com> References: <20260624025345.458387-1-yangjinqian1@huawei.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: yangjinqian1@huawei.com, lpieralisi@kernel.org, tglx@kernel.org, alex@shazbot.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, liuyonglong@huawei.com, wangzhou1@hisilicon.com, linuxarm@huawei.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Wed, 24 Jun 2026 03:53:45 +0100, Jinqian Yang wrote: > > On ARM64 platforms with GICv3 ITS, VFIO PCI passthrough currently > cannot dynamically allocate MSI-X vectors after MSI-X has been > enabled. When QEMU needs to extend the vector range, it must > disable MSI-X, free all interrupts, then re-enable with a larger > allocation. This creates an interrupt loss window for already-active > vectors. > > Consider HNS3 with RoCE: NIC and RDMA share one PCI device and > ITS DeviceID, with MSI-X vectors partitioned as NIC (lower range) > then RoCE (starting at base_vector = num_nic_msi). In VFIO > passthrough, loading hns_roce after hns3 forces QEMU to tear down > all interrupts before re-allocating the larger range. During this > process, NIC interrupts may be lost. Testing confirmed that this > occasionally occurs, causing the network port reset to fail. Well, that's what you get for not exposing differentiated functions. Eventually, you face the reality that this is a poor design. > > ITS_MSI_FLAGS_SUPPORTED lacks MSI_FLAG_PCI_MSIX_ALLOC_DYN, causing > pci_msix_can_alloc_dyn() to return false. VFIO then sets > has_dyn_msix=false and never clears VFIO_IRQ_INFO_NORESIZE for > MSI-X, keeping the old "disable and reallocate" behavior. > > The essential prerequisite for enabling this flag is the fix to > msi_prepare() call timing (commit 1396e89e09f0 ("genirq/msi: Move > prepare() call to per-device allocation")): msi_prepare() is > now called once at per-device domain creation with hwsize, so ITS > creates an ITT with sufficient capacity for all MSI-X vectors. > Without this fix, msi_prepare() was called per-allocation with > semi-random nvec, maybe resulting in an ITT too small for dynamic > vector addition. How is this paragraph relevant? The kernel has had this fix for over a year, and backporting this series is not something I plan to ever do. > > With this in place, dynamic MSI-X allocation works correctly: > msi_domain_alloc_irq_at() uses populate_alloc_info() to copy the > pre-prepared alloc_data without re-invoking msi_prepare(), so each > new vector simply gets a LPI entry in the already-allocated ITT, > without affecting existing vectors. > > Signed-off-by: Jinqian Yang > --- > drivers/irqchip/irq-gic-its-msi-parent.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/irqchip/irq-gic-its-msi-parent.c b/drivers/irqchip/irq-gic-its-msi-parent.c > index b9257103a999..b2b9d2068bb1 100644 > --- a/drivers/irqchip/irq-gic-its-msi-parent.c > +++ b/drivers/irqchip/irq-gic-its-msi-parent.c > @@ -18,7 +18,8 @@ > > #define ITS_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK | \ > MSI_FLAG_PCI_MSIX | \ > - MSI_FLAG_MULTI_PCI_MSI) > + MSI_FLAG_MULTI_PCI_MSI | \ > + MSI_FLAG_PCI_MSIX_ALLOC_DYN) > > static int its_translate_frame_address(struct fwnode_handle *msi_node, phys_addr_t *pa) > { What has this been tested with? In which conditions? Thanks, M. -- Without deviation from the norm, progress is not possible.