* [RFC PATCH] irqchip/gic-v3-its: enable dynamic MSI-X allocation
@ 2026-06-24 2:53 Jinqian Yang
2026-06-24 7:07 ` Marc Zyngier
0 siblings, 1 reply; 3+ messages in thread
From: Jinqian Yang @ 2026-06-24 2:53 UTC (permalink / raw)
To: lpieralisi, maz, tglx, alex
Cc: linux-kernel, linux-arm-kernel, liuyonglong, wangzhou1, linuxarm,
Jinqian Yang
On ARM64 platforms with GICv3 ITS, VFIO PCI passthrough currently
cannot dynamically allocate MSI-X vectors after MSI-X has been
enabled. When QEMU needs to extend the vector range, it must
disable MSI-X, free all interrupts, then re-enable with a larger
allocation. This creates an interrupt loss window for already-active
vectors.
Consider HNS3 with RoCE: NIC and RDMA share one PCI device and
ITS DeviceID, with MSI-X vectors partitioned as NIC (lower range)
then RoCE (starting at base_vector = num_nic_msi). In VFIO
passthrough, loading hns_roce after hns3 forces QEMU to tear down
all interrupts before re-allocating the larger range. During this
process, NIC interrupts may be lost. Testing confirmed that this
occasionally occurs, causing the network port reset to fail.
ITS_MSI_FLAGS_SUPPORTED lacks MSI_FLAG_PCI_MSIX_ALLOC_DYN, causing
pci_msix_can_alloc_dyn() to return false. VFIO then sets
has_dyn_msix=false and never clears VFIO_IRQ_INFO_NORESIZE for
MSI-X, keeping the old "disable and reallocate" behavior.
The essential prerequisite for enabling this flag is the fix to
msi_prepare() call timing (commit 1396e89e09f0 ("genirq/msi: Move
prepare() call to per-device allocation")): msi_prepare() is
now called once at per-device domain creation with hwsize, so ITS
creates an ITT with sufficient capacity for all MSI-X vectors.
Without this fix, msi_prepare() was called per-allocation with
semi-random nvec, maybe resulting in an ITT too small for dynamic
vector addition.
With this in place, dynamic MSI-X allocation works correctly:
msi_domain_alloc_irq_at() uses populate_alloc_info() to copy the
pre-prepared alloc_data without re-invoking msi_prepare(), so each
new vector simply gets a LPI entry in the already-allocated ITT,
without affecting existing vectors.
Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
---
drivers/irqchip/irq-gic-its-msi-parent.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-its-msi-parent.c b/drivers/irqchip/irq-gic-its-msi-parent.c
index b9257103a999..b2b9d2068bb1 100644
--- a/drivers/irqchip/irq-gic-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-its-msi-parent.c
@@ -18,7 +18,8 @@
#define ITS_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK | \
MSI_FLAG_PCI_MSIX | \
- MSI_FLAG_MULTI_PCI_MSI)
+ MSI_FLAG_MULTI_PCI_MSI | \
+ MSI_FLAG_PCI_MSIX_ALLOC_DYN)
static int its_translate_frame_address(struct fwnode_handle *msi_node, phys_addr_t *pa)
{
--
2.33.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [RFC PATCH] irqchip/gic-v3-its: enable dynamic MSI-X allocation
2026-06-24 2:53 [RFC PATCH] irqchip/gic-v3-its: enable dynamic MSI-X allocation Jinqian Yang
@ 2026-06-24 7:07 ` Marc Zyngier
2026-06-24 9:29 ` Jinqian Yang
0 siblings, 1 reply; 3+ messages in thread
From: Marc Zyngier @ 2026-06-24 7:07 UTC (permalink / raw)
To: Jinqian Yang
Cc: lpieralisi, tglx, alex, linux-kernel, linux-arm-kernel,
liuyonglong, wangzhou1, linuxarm
On Wed, 24 Jun 2026 03:53:45 +0100,
Jinqian Yang <yangjinqian1@huawei.com> wrote:
>
> On ARM64 platforms with GICv3 ITS, VFIO PCI passthrough currently
> cannot dynamically allocate MSI-X vectors after MSI-X has been
> enabled. When QEMU needs to extend the vector range, it must
> disable MSI-X, free all interrupts, then re-enable with a larger
> allocation. This creates an interrupt loss window for already-active
> vectors.
>
> Consider HNS3 with RoCE: NIC and RDMA share one PCI device and
> ITS DeviceID, with MSI-X vectors partitioned as NIC (lower range)
> then RoCE (starting at base_vector = num_nic_msi). In VFIO
> passthrough, loading hns_roce after hns3 forces QEMU to tear down
> all interrupts before re-allocating the larger range. During this
> process, NIC interrupts may be lost. Testing confirmed that this
> occasionally occurs, causing the network port reset to fail.
Well, that's what you get for not exposing differentiated functions.
Eventually, you face the reality that this is a poor design.
>
> ITS_MSI_FLAGS_SUPPORTED lacks MSI_FLAG_PCI_MSIX_ALLOC_DYN, causing
> pci_msix_can_alloc_dyn() to return false. VFIO then sets
> has_dyn_msix=false and never clears VFIO_IRQ_INFO_NORESIZE for
> MSI-X, keeping the old "disable and reallocate" behavior.
>
> The essential prerequisite for enabling this flag is the fix to
> msi_prepare() call timing (commit 1396e89e09f0 ("genirq/msi: Move
> prepare() call to per-device allocation")): msi_prepare() is
> now called once at per-device domain creation with hwsize, so ITS
> creates an ITT with sufficient capacity for all MSI-X vectors.
> Without this fix, msi_prepare() was called per-allocation with
> semi-random nvec, maybe resulting in an ITT too small for dynamic
> vector addition.
How is this paragraph relevant? The kernel has had this fix for over a
year, and backporting this series is not something I plan to ever do.
>
> With this in place, dynamic MSI-X allocation works correctly:
> msi_domain_alloc_irq_at() uses populate_alloc_info() to copy the
> pre-prepared alloc_data without re-invoking msi_prepare(), so each
> new vector simply gets a LPI entry in the already-allocated ITT,
> without affecting existing vectors.
>
> Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
> ---
> drivers/irqchip/irq-gic-its-msi-parent.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-gic-its-msi-parent.c b/drivers/irqchip/irq-gic-its-msi-parent.c
> index b9257103a999..b2b9d2068bb1 100644
> --- a/drivers/irqchip/irq-gic-its-msi-parent.c
> +++ b/drivers/irqchip/irq-gic-its-msi-parent.c
> @@ -18,7 +18,8 @@
>
> #define ITS_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK | \
> MSI_FLAG_PCI_MSIX | \
> - MSI_FLAG_MULTI_PCI_MSI)
> + MSI_FLAG_MULTI_PCI_MSI | \
> + MSI_FLAG_PCI_MSIX_ALLOC_DYN)
>
> static int its_translate_frame_address(struct fwnode_handle *msi_node, phys_addr_t *pa)
> {
What has this been tested with? In which conditions?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC PATCH] irqchip/gic-v3-its: enable dynamic MSI-X allocation
2026-06-24 7:07 ` Marc Zyngier
@ 2026-06-24 9:29 ` Jinqian Yang
0 siblings, 0 replies; 3+ messages in thread
From: Jinqian Yang @ 2026-06-24 9:29 UTC (permalink / raw)
To: Marc Zyngier
Cc: lpieralisi, tglx, alex, linux-kernel, linux-arm-kernel,
liuyonglong, wangzhou1, linuxarm
On 2026/6/24 15:07, Marc Zyngier wrote:
> On Wed, 24 Jun 2026 03:53:45 +0100,
> Jinqian Yang <yangjinqian1@huawei.com> wrote:
>>
>> On ARM64 platforms with GICv3 ITS, VFIO PCI passthrough currently
>> cannot dynamically allocate MSI-X vectors after MSI-X has been
>> enabled. When QEMU needs to extend the vector range, it must
>> disable MSI-X, free all interrupts, then re-enable with a larger
>> allocation. This creates an interrupt loss window for already-active
>> vectors.
>>
>> Consider HNS3 with RoCE: NIC and RDMA share one PCI device and
>> ITS DeviceID, with MSI-X vectors partitioned as NIC (lower range)
>> then RoCE (starting at base_vector = num_nic_msi). In VFIO
>> passthrough, loading hns_roce after hns3 forces QEMU to tear down
>> all interrupts before re-allocating the larger range. During this
>> process, NIC interrupts may be lost. Testing confirmed that this
>> occasionally occurs, causing the network port reset to fail.
>
> Well, that's what you get for not exposing differentiated functions.
> Eventually, you face the reality that this is a poor design.
>
Fair point, though this is not unique to HNS3.. All major NIC+RDMA
vendors share the same PCI function.
>>
>> ITS_MSI_FLAGS_SUPPORTED lacks MSI_FLAG_PCI_MSIX_ALLOC_DYN, causing
>> pci_msix_can_alloc_dyn() to return false. VFIO then sets
>> has_dyn_msix=false and never clears VFIO_IRQ_INFO_NORESIZE for
>> MSI-X, keeping the old "disable and reallocate" behavior.
>>
>> The essential prerequisite for enabling this flag is the fix to
>> msi_prepare() call timing (commit 1396e89e09f0 ("genirq/msi: Move
>> prepare() call to per-device allocation")): msi_prepare() is
>> now called once at per-device domain creation with hwsize, so ITS
>> creates an ITT with sufficient capacity for all MSI-X vectors.
>> Without this fix, msi_prepare() was called per-allocation with
>> semi-random nvec, maybe resulting in an ITT too small for dynamic
>> vector addition.
>
> How is this paragraph relevant? The kernel has had this fix for over a
> year, and backporting this series is not something I plan to ever do.
>
Will remove from commit msg.
>>
>> With this in place, dynamic MSI-X allocation works correctly:
>> msi_domain_alloc_irq_at() uses populate_alloc_info() to copy the
>> pre-prepared alloc_data without re-invoking msi_prepare(), so each
>> new vector simply gets a LPI entry in the already-allocated ITT,
>> without affecting existing vectors.
>>
>> Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
>> ---
>> drivers/irqchip/irq-gic-its-msi-parent.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-its-msi-parent.c b/drivers/irqchip/irq-gic-its-msi-parent.c
>> index b9257103a999..b2b9d2068bb1 100644
>> --- a/drivers/irqchip/irq-gic-its-msi-parent.c
>> +++ b/drivers/irqchip/irq-gic-its-msi-parent.c
>> @@ -18,7 +18,8 @@
>>
>> #define ITS_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK | \
>> MSI_FLAG_PCI_MSIX | \
>> - MSI_FLAG_MULTI_PCI_MSI)
>> + MSI_FLAG_MULTI_PCI_MSI | \
>> + MSI_FLAG_PCI_MSIX_ALLOC_DYN)
>>
>> static int its_translate_frame_address(struct fwnode_handle *msi_node, phys_addr_t *pa)
>> {
>
> What has this been tested with? In which conditions?
>
Tested on Hisilicon HIP09 (ARM64, GICv3/GICv4.1) with latest
upstream kernel and QEMU 8.2.
VFIO passthrough of HNS3 NIC to VM: load both hns3 and
hns_roce_hw_v2 drivers, then trigger FLR. Without the flag,
QEMU disables/re-enables MSI-X around FLR, causing occasional
link up failure due to interrupt loss.
Thanks,
Jinqian
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-24 9:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 2:53 [RFC PATCH] irqchip/gic-v3-its: enable dynamic MSI-X allocation Jinqian Yang
2026-06-24 7:07 ` Marc Zyngier
2026-06-24 9:29 ` Jinqian Yang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.