[PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore
@ 2026-03-09 12:31 Julian Vetter
  2026-03-09 12:31 ` [PATCH v3 4/7] x86/passthrough: Use extended destination ID in PT MSI bind/unbind Julian Vetter
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Julian Vetter @ 2026-03-09 12:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall,
	Julian Vetter

Register a check callback for the IOAPIC HVM save/restore entry,
following the pattern established by vpic_check() for the virtual PIC.
The function first verifies the target domain actually has a virtual
IO-APIC, returning -ENODEV otherwise. It then iterates all redirection
table entries and rejects any saved state where reserved bit fields are
non-zero, ensuring the state being loaded, represents actually reachable
hardware state before the restore is committed.

Signed-off-by: Julian Vetter <julian.vetter@vates.tech>
---
Changes in V3:
- As suggested by Jan, added new patch that does a generic check of the
  vIOAPIC state before migration
---
 xen/arch/x86/hvm/vioapic.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 7c725f9e47..b9acdd8af6 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -594,6 +594,35 @@ int vioapic_get_trigger_mode(const struct domain *d, unsigned int gsi)
     return vioapic->redirtbl[pin].fields.trig_mode;
 }
 
+static int cf_check ioapic_check(const struct domain *d, hvm_domain_context_t *h)
+{
+    const HVM_SAVE_TYPE(IOAPIC) *s;
+    unsigned int i;
+
+    if ( !has_vioapic(d) )
+        return -ENODEV;
+
+    s = hvm_get_entry(IOAPIC, h);
+    if ( !s )
+        return -ENODATA;
+
+    for ( i = 0; i < ARRAY_SIZE(s->redirtbl); i++ )
+    {
+        const union vioapic_redir_entry *e = &s->redirtbl[i];
+
+        /*
+         * Check to-be-loaded values are within valid range, for them to
+         * represent actually reachable state.
+         */
+        if ( e->fields.reserve ||
+             e->fields.reserved[0] || e->fields.reserved[1] ||
+             e->fields.reserved[2] || e->fields.reserved[3] )
+            return -EINVAL;
+    }
+
+    return 0;
+}
+
 static int cf_check ioapic_save(struct vcpu *v, hvm_domain_context_t *h)
 {
     const struct domain *d = v->domain;
-- 
2.51.0



--
 | Vates

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/7] x86/passthrough: Use extended destination ID in PT MSI bind/unbind
  2026-03-09 12:31 [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Julian Vetter
@ 2026-03-09 12:31 ` Julian Vetter
  2026-03-11 15:30   ` Jan Beulich
  2026-03-09 12:31 ` [PATCH v3 2/7] x86/msi: Define extended destination ID masks and IO-APIC RTE fields Julian Vetter
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Julian Vetter @ 2026-03-09 12:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall,
	Julian Vetter

In pt_irq_create_bind() and _hvm_dpci_msi_eoi() replace the bare
MASK_EXTR(..., XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) calls with
XEN_DOMCTL_VMSI_X86_FULL_DEST() so that the high 7 destination ID bits
stored in XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK are included when
computing the target vCPU for MSI passthrough IRQs. Increase the local
dest and delivery_mode variables to uint32_t to match.

Signed-off-by: Julian Vetter <julian.vetter@vates.tech>
---
Changes in V3:
- New patch, but no changes to previous patchset
---
 xen/drivers/passthrough/x86/hvm.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index b73bb55055..9c3c8d28d6 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -281,7 +281,7 @@ int pt_irq_create_bind(
     {
     case PT_IRQ_TYPE_MSI:
     {
-        uint8_t dest, delivery_mode;
+        uint32_t dest, delivery_mode;
         bool dest_mode;
         int dest_vcpu_id;
         const struct vcpu *vcpu;
@@ -357,8 +357,7 @@ int pt_irq_create_bind(
             }
         }
         /* Calculate dest_vcpu_id for MSI-type pirq migration. */
-        dest = MASK_EXTR(pirq_dpci->gmsi.gflags,
-                         XEN_DOMCTL_VMSI_X86_DEST_ID_MASK);
+        dest = XEN_DOMCTL_VMSI_X86_FULL_DEST(pirq_dpci->gmsi.gflags);
         dest_mode = pirq_dpci->gmsi.gflags & XEN_DOMCTL_VMSI_X86_DM_MASK;
         delivery_mode = MASK_EXTR(pirq_dpci->gmsi.gflags,
                                   XEN_DOMCTL_VMSI_X86_DELIV_MASK);
@@ -807,8 +806,7 @@ static int cf_check _hvm_dpci_msi_eoi(
     if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
          (pirq_dpci->gmsi.gvec == vector) )
     {
-        unsigned int dest = MASK_EXTR(pirq_dpci->gmsi.gflags,
-                                      XEN_DOMCTL_VMSI_X86_DEST_ID_MASK);
+        unsigned int dest = XEN_DOMCTL_VMSI_X86_FULL_DEST(pirq_dpci->gmsi.gflags);
         bool dest_mode = pirq_dpci->gmsi.gflags & XEN_DOMCTL_VMSI_X86_DM_MASK;
 
         if ( vlapic_match_dest(vcpu_vlapic(current), NULL, 0, dest,
-- 
2.51.0



--
Julian Vetter | Vates Hypervisor & Kernel Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/7] x86/passthrough: Use extended destination ID in PT MSI bind/unbind
  2026-03-09 12:31 ` [PATCH v3 4/7] x86/passthrough: Use extended destination ID in PT MSI bind/unbind Julian Vetter
@ 2026-03-11 15:30   ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2026-03-11 15:30 UTC (permalink / raw)
  To: Julian Vetter
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 09.03.2026 13:31, Julian Vetter wrote:
> In pt_irq_create_bind() and _hvm_dpci_msi_eoi() replace the bare
> MASK_EXTR(..., XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) calls with
> XEN_DOMCTL_VMSI_X86_FULL_DEST() so that the high 7 destination ID bits
> stored in XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK are included when
> computing the target vCPU for MSI passthrough IRQs. Increase the local
> dest and delivery_mode variables to uint32_t to match.

Why exactly would the latter also need widening?

> --- a/xen/drivers/passthrough/x86/hvm.c
> +++ b/xen/drivers/passthrough/x86/hvm.c
> @@ -281,7 +281,7 @@ int pt_irq_create_bind(
>      {
>      case PT_IRQ_TYPE_MSI:
>      {
> -        uint8_t dest, delivery_mode;
> +        uint32_t dest, delivery_mode;

Please prefer here (for dest) ...

> @@ -807,8 +806,7 @@ static int cf_check _hvm_dpci_msi_eoi(
>      if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>           (pirq_dpci->gmsi.gvec == vector) )
>      {
> -        unsigned int dest = MASK_EXTR(pirq_dpci->gmsi.gflags,
> -                                      XEN_DOMCTL_VMSI_X86_DEST_ID_MASK);
> +        unsigned int dest = XEN_DOMCTL_VMSI_X86_FULL_DEST(pirq_dpci->gmsi.gflags);

... the type you found already being in use here. See ./CODING_STYLE as to
the use of fixed-width types.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 2/7] x86/msi: Define extended destination ID masks and IO-APIC RTE fields
  2026-03-09 12:31 [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Julian Vetter
  2026-03-09 12:31 ` [PATCH v3 4/7] x86/passthrough: Use extended destination ID in PT MSI bind/unbind Julian Vetter
@ 2026-03-09 12:31 ` Julian Vetter
  2026-03-10 17:09   ` Jan Beulich
  2026-03-11 10:15   ` Roger Pau Monné
  2026-03-09 12:31 ` [PATCH v3 3/7] x86/hvm: Support extended destination IDs in virtual MSI and IO-APIC Julian Vetter
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 23+ messages in thread
From: Julian Vetter @ 2026-03-09 12:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall,
	Julian Vetter

x2APIC guests with more than 128 vCPUs need destination IDs beyond the
8-bit range provided by the standard MSI address and IO-APIC RTE fields.
The Intel spec allows bits 11:5 of the MSI address and bits 55:49 of the
IO-APIC RTE to carry the high 7 bits of the destination ID when the
platform advertises support, expanding the range to 15 bits total.
Add MSI_ADDR_EXT_DEST_ID_MASK for the extended MSI address bits, and
IO_APIC_REDIR_{DEST,EXT_DEST}_MASK with a VIOAPIC_RTE_DEST() helper to
extract the combined 15-bit destination from an IO-APIC RTE. Extend the
IO-APIC RTE save/restore struct with an ext_dest_id field, and add
XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK plus
XEN_DOMCTL_VMSI_X86_FULL_DEST() to the domctl MSI gflags interface so
callers can pass and extract the full destination ID through the
existing pt_irq_bind path.

Signed-off-by: Julian Vetter <julian.vetter@vates.tech>
---
Changes in V3:
- Addressed comments from Jan
  - Replaced reserved field by generically named reserved2
  - Added proper constants for the shift of the upper bits of the
    MSI/RTE IDs
  - Added reference for the MSI_ADDR_EXT_DEST_ID_MASK
---
 xen/arch/x86/hvm/vioapic.c             |  2 +-
 xen/arch/x86/include/asm/hvm/vioapic.h | 13 +++++++++++++
 xen/arch/x86/include/asm/msi.h         | 10 +++++++++-
 xen/include/public/arch-x86/hvm/save.h |  4 +++-
 xen/include/public/domctl.h            | 18 ++++++++++++------
 5 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index b9acdd8af6..d7a4105a57 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -616,7 +616,7 @@ static int cf_check ioapic_check(const struct domain *d, hvm_domain_context_t *h
          */
         if ( e->fields.reserve ||
              e->fields.reserved[0] || e->fields.reserved[1] ||
-             e->fields.reserved[2] || e->fields.reserved[3] )
+             e->fields.reserved[2] || e->fields.reserved2 )
             return -EINVAL;
     }
 
diff --git a/xen/arch/x86/include/asm/hvm/vioapic.h b/xen/arch/x86/include/asm/hvm/vioapic.h
index 68af6dce79..5d7da5a092 100644
--- a/xen/arch/x86/include/asm/hvm/vioapic.h
+++ b/xen/arch/x86/include/asm/hvm/vioapic.h
@@ -32,6 +32,19 @@
 #define VIOAPIC_EDGE_TRIG  0
 #define VIOAPIC_LEVEL_TRIG 1
 
+/*
+ * Extract the destination ID from a 64-bit IO-APIC RTE, including the
+ * extended bits (55:49) used when XEN_HVM_CPUID_EXT_DEST_ID is advertised.
+ */
+#define IO_APIC_REDIR_DEST_MASK         (0xffULL << 56)
+#define IO_APIC_REDIR_EXT_DEST_MASK     (0x7fULL << 49)
+
+#define VIOAPIC_RTE_DEST_ID_UPPER_BITS  8
+#define VIOAPIC_RTE_DEST(rte) \
+    (MASK_EXTR((rte), IO_APIC_REDIR_DEST_MASK) | \
+     (MASK_EXTR((rte), IO_APIC_REDIR_EXT_DEST_MASK) << \
+      VIOAPIC_RTE_DEST_ID_UPPER_BITS))
+
 #define VIOAPIC_DEFAULT_BASE_ADDRESS  0xfec00000U
 #define VIOAPIC_MEM_LENGTH            0x100
 
diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
index 00059d4a3a..8d87d0c10d 100644
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -51,9 +51,17 @@
 #define MSI_ADDR_REDIRECTION_MASK   (1 << MSI_ADDR_REDIRECTION_SHIFT)
 
 #define MSI_ADDR_DEST_ID_SHIFT		12
-#define	 MSI_ADDR_DEST_ID_MASK		0x00ff000
+#define MSI_ADDR_DEST_ID_UPPER_BITS	8
+#define MSI_ADDR_DEST_ID_MASK		0x00ff000
 #define  MSI_ADDR_DEST_ID(dest)		(((dest) << MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK)
 
+/*
+ * Per the Intel x2APIC specification, in physical destination mode bits 11:5
+ * of the MSI address carry APIC ID bits [14:8] (the "Extended Destination ID"),
+ * extending the addressable range from 8 to 15 bits.
+ */
+#define MSI_ADDR_EXT_DEST_ID_MASK	0x0000fe0
+
 /* MAX fixed pages reserved for mapping MSIX tables. */
 #define FIX_MSIX_MAX_PAGES              512
 
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 9c4bfc7ebd..483097d940 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -359,7 +359,9 @@ union vioapic_redir_entry
         uint8_t trig_mode:1;
         uint8_t mask:1;
         uint8_t reserve:7;
-        uint8_t reserved[4];
+        uint8_t reserved[3];
+        uint8_t reserved2:1;
+        uint8_t ext_dest_id:7;
         uint8_t dest_id;
     } fields;
 };
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 8f6708c0a7..6d425e34ac 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -615,12 +615,14 @@ struct xen_domctl_bind_pt_irq {
         struct {
             uint8_t gvec;
             uint32_t gflags;
-#define XEN_DOMCTL_VMSI_X86_DEST_ID_MASK 0x0000ff
-#define XEN_DOMCTL_VMSI_X86_RH_MASK      0x000100
-#define XEN_DOMCTL_VMSI_X86_DM_MASK      0x000200
-#define XEN_DOMCTL_VMSI_X86_DELIV_MASK   0x007000
-#define XEN_DOMCTL_VMSI_X86_TRIG_MASK    0x008000
-#define XEN_DOMCTL_VMSI_X86_UNMASKED     0x010000
+#define XEN_DOMCTL_VMSI_X86_DEST_ID_MASK        0x0000ff
+#define XEN_DOMCTL_VMSI_X86_DEST_ID_BITS        8
+#define XEN_DOMCTL_VMSI_X86_RH_MASK             0x000100
+#define XEN_DOMCTL_VMSI_X86_DM_MASK             0x000200
+#define XEN_DOMCTL_VMSI_X86_DELIV_MASK          0x007000
+#define XEN_DOMCTL_VMSI_X86_TRIG_MASK           0x008000
+#define XEN_DOMCTL_VMSI_X86_UNMASKED            0x010000
+#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK    0xfe0000
 
             uint64_aligned_t gtable;
         } msi;
@@ -630,6 +632,10 @@ struct xen_domctl_bind_pt_irq {
     } u;
 };
 
+#define XEN_DOMCTL_VMSI_X86_FULL_DEST(gflags) \
+        (MASK_EXTR((gflags), XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) | \
+        (MASK_EXTR((gflags), XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK) << \
+         XEN_DOMCTL_VMSI_X86_DEST_ID_BITS))
 
 /* Bind machine I/O address range -> HVM address range. */
 /* XEN_DOMCTL_memory_mapping */
-- 
2.51.0



--
Julian Vetter | Vates Hypervisor & Kernel Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/7] x86/msi: Define extended destination ID masks and IO-APIC RTE fields
  2026-03-09 12:31 ` [PATCH v3 2/7] x86/msi: Define extended destination ID masks and IO-APIC RTE fields Julian Vetter
@ 2026-03-10 17:09   ` Jan Beulich
  2026-03-11 10:15   ` Roger Pau Monné
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2026-03-10 17:09 UTC (permalink / raw)
  To: Julian Vetter
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 09.03.2026 13:31, Julian Vetter wrote:
> --- a/xen/arch/x86/include/asm/hvm/vioapic.h
> +++ b/xen/arch/x86/include/asm/hvm/vioapic.h
> @@ -32,6 +32,19 @@
>  #define VIOAPIC_EDGE_TRIG  0
>  #define VIOAPIC_LEVEL_TRIG 1
>  
> +/*
> + * Extract the destination ID from a 64-bit IO-APIC RTE, including the
> + * extended bits (55:49) used when XEN_HVM_CPUID_EXT_DEST_ID is advertised.
> + */
> +#define IO_APIC_REDIR_DEST_MASK         (0xffULL << 56)
> +#define IO_APIC_REDIR_EXT_DEST_MASK     (0x7fULL << 49)
> +
> +#define VIOAPIC_RTE_DEST_ID_UPPER_BITS  8

The name suggests this is the number of upper bits, which it isn't. You shouldn't
need this constant anyway, see below.

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -615,12 +615,14 @@ struct xen_domctl_bind_pt_irq {
>          struct {
>              uint8_t gvec;
>              uint32_t gflags;
> -#define XEN_DOMCTL_VMSI_X86_DEST_ID_MASK 0x0000ff
> -#define XEN_DOMCTL_VMSI_X86_RH_MASK      0x000100
> -#define XEN_DOMCTL_VMSI_X86_DM_MASK      0x000200
> -#define XEN_DOMCTL_VMSI_X86_DELIV_MASK   0x007000
> -#define XEN_DOMCTL_VMSI_X86_TRIG_MASK    0x008000
> -#define XEN_DOMCTL_VMSI_X86_UNMASKED     0x010000
> +#define XEN_DOMCTL_VMSI_X86_DEST_ID_MASK        0x0000ff
> +#define XEN_DOMCTL_VMSI_X86_DEST_ID_BITS        8

This constant is redundant with _MASK. It is generally helpful to avoid
such redundancies (especially in the public interface), and derive the wanted
value from the main (most generally usable) constant. IOW here maybe

#define XEN_DOMCTL_VMSI_X86_DEST_ID_BITS        8
#define XEN_DOMCTL_VMSI_X86_DEST_ID_MASK        ((1U << XEN_DOMCTL_VMSI_X86_DEST_ID_BITS) - 1)

But of course it could also be done the other way around, albeit in a public
header this may end up more difficult. (Generally the mask wants to be the
main definition, as everything else can be derived from them.)

> @@ -630,6 +632,10 @@ struct xen_domctl_bind_pt_irq {
>      } u;
>  };
>  
> +#define XEN_DOMCTL_VMSI_X86_FULL_DEST(gflags) \
> +        (MASK_EXTR((gflags), XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) | \
> +        (MASK_EXTR((gflags), XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK) << \
> +         XEN_DOMCTL_VMSI_X86_DEST_ID_BITS))

There's no MASK_EXTR() in the public interface.

Also, nit: No need to parenthesize gflags when used in macro invocations like
this one. Iirc there was a similar pattern elsewhere in the patch.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/7] x86/msi: Define extended destination ID masks and IO-APIC RTE fields
  2026-03-09 12:31 ` [PATCH v3 2/7] x86/msi: Define extended destination ID masks and IO-APIC RTE fields Julian Vetter
  2026-03-10 17:09   ` Jan Beulich
@ 2026-03-11 10:15   ` Roger Pau Monné
  1 sibling, 0 replies; 23+ messages in thread
From: Roger Pau Monné @ 2026-03-11 10:15 UTC (permalink / raw)
  To: Julian Vetter
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall

On Mon, Mar 09, 2026 at 12:31:02PM +0000, Julian Vetter wrote:
> x2APIC guests with more than 128 vCPUs need destination IDs beyond the

I think this needs to be re-worded:

"HVM guests with APIC IDs greater than 254 would be unable to use
those IDs as the target of external interrupts due to the lack of
emulated IOMMU.  However there's an unofficial extension to the MSI
messages format that re-use some reserved bits to expand the
destination field from 8 to 15 bits."

Or similar.

> 8-bit range provided by the standard MSI address and IO-APIC RTE fields.
> The Intel spec allows bits 11:5 of the MSI address and bits 55:49 of the
> IO-APIC RTE to carry the high 7 bits of the destination ID when the
> platform advertises support, expanding the range to 15 bits total.

Hm, I'm really unsure the Intel spec allows for any of this.  This is
something that has been done on the side, re-using bits marked as
reserved in the spec.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 3/7] x86/hvm: Support extended destination IDs in virtual MSI and IO-APIC
  2026-03-09 12:31 [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Julian Vetter
  2026-03-09 12:31 ` [PATCH v3 4/7] x86/passthrough: Use extended destination ID in PT MSI bind/unbind Julian Vetter
  2026-03-09 12:31 ` [PATCH v3 2/7] x86/msi: Define extended destination ID masks and IO-APIC RTE fields Julian Vetter
@ 2026-03-09 12:31 ` Julian Vetter
  2026-03-11 15:27   ` Jan Beulich
  2026-03-09 12:31 ` [PATCH v3 7/7] x86/cpuid: Advertise XEN_HVM_CPUID_EXT_DEST_ID when device model opts in Julian Vetter
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Julian Vetter @ 2026-03-09 12:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall,
	Julian Vetter

Use the newly defined masks to extract the full 15-bit destination ID
from guest MSI addresses and IO-APIC RTEs. In hvm_inject_msi() combine
the standard bits [19:12] with the extended bits [11:5] of the MSI
address into a 15-bit destination ID for LAPIC delivery. Increase the
dest parameter of vmsi_deliver() and hvm_girq_dest_2_vcpu_id() from
uint8_t to uint32_t. In vmsi_deliver_pirq() extract the full destination
from gflags via XEN_DOMCTL_VMSI_X86_FULL_DEST(). In msi_gflags() pack
the extended bits from the MSI address into the new
XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK field of gflags. In
vioapic_deliver() read the combined 15-bit destination using the
VIOAPIC_RTE_DEST() macro. Extend ioapic_check() to check for extended
destination bits set in a domain that does not advertise
XEN_HVM_CPUID_EXT_DEST_ID and refuse to restore the IO-APIC state,
preventing silent interrupt misrouting after live migration.

Signed-off-by: Julian Vetter <julian.vetter@vates.tech>
---
Changes in V3:
- Added additional check to the vioapic_check that makes sure that the
  extended bits are not set for domains that don't support it
- Addressed comments from Jan -> Replaced all constants by a proper define
---
 xen/arch/x86/hvm/irq.c             | 11 ++++++++++-
 xen/arch/x86/hvm/vioapic.c         | 21 +++++++++++++++++++--
 xen/arch/x86/hvm/vmsi.c            |  8 +++++---
 xen/arch/x86/include/asm/hvm/hvm.h |  4 ++--
 4 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 5f64361113..b520fc1150 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -374,7 +374,16 @@ int hvm_set_pci_link_route(struct domain *d, u8 link, u8 isa_irq)
 int hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data)
 {
     uint32_t tmp = (uint32_t) addr;
-    uint8_t  dest = (tmp & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    /*
+     * Standard MSI destination address bits 19:12 (8 bits).
+     * Extended MSI destination address bits 11:5 (7 more bits).
+     *
+     * As XEN_HVM_CPUID_EXT_DEST_ID is advertised, the guest may use bits 11:5
+     * for high destination ID bits, expanding to 15 bits total. Guests unaware
+     * of this feature set these bits to 0, so this is backwards-compatible.
+     */
+    uint32_t dest = (MASK_EXTR(tmp, MSI_ADDR_EXT_DEST_ID_MASK) << MSI_ADDR_DEST_ID_BITS) |
+                     MASK_EXTR(tmp, MSI_ADDR_DEST_ID_MASK);
     uint8_t  dest_mode = !!(tmp & MSI_ADDR_DESTMODE_MASK);
     uint8_t  delivery_mode = (data & MSI_DATA_DELIVERY_MODE_MASK)
         >> MSI_DATA_DELIVERY_MODE_SHIFT;
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index d7a4105a57..9602572dc4 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -411,7 +411,9 @@ static void ioapic_inj_irq(
 
 static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
 {
-    uint16_t dest = vioapic->redirtbl[pin].fields.dest_id;
+    uint32_t dest = ((uint32_t)vioapic->redirtbl[pin].fields.ext_dest_id <<
+                     VIOAPIC_RTE_DEST_ID_UPPER_BITS) |
+                    vioapic->redirtbl[pin].fields.dest_id;
     uint8_t dest_mode = vioapic->redirtbl[pin].fields.dest_mode;
     uint8_t delivery_mode = vioapic->redirtbl[pin].fields.delivery_mode;
     uint8_t vector = vioapic->redirtbl[pin].fields.vector;
@@ -618,6 +620,21 @@ static int cf_check ioapic_check(const struct domain *d, hvm_domain_context_t *h
              e->fields.reserved[0] || e->fields.reserved[1] ||
              e->fields.reserved[2] || e->fields.reserved2 )
             return -EINVAL;
+
+        /*
+         * An RTE in the saved state has ext_dest_id bits set. Check that
+         * the destination Xen has extended destination ID support enabled,
+         * otherwise interrupt routing to APIC IDs > 255 would be broken
+         * after restore.
+         */
+        if ( e->fields.ext_dest_id && !d->arch.hvm.ext_dest_id_enabled )
+        {
+            printk(XENLOG_G_ERR "HVM restore: %pd IO-APIC RTE %u has "
+                                "extended destination ID bits set but "
+                                "XEN_HVM_CPUID_EXT_DEST_ID is not enabled\n",
+                                d, i);
+            return -EINVAL;
+        }
     }
 
     return 0;
@@ -659,7 +676,7 @@ static int cf_check ioapic_load(struct domain *d, hvm_domain_context_t *h)
     return 0;
 }
 
-HVM_REGISTER_SAVE_RESTORE(IOAPIC, ioapic_save, NULL, ioapic_load, 1,
+HVM_REGISTER_SAVE_RESTORE(IOAPIC, ioapic_save, ioapic_check, ioapic_load, 1,
                           HVMSR_PER_DOM);
 
 void vioapic_reset(struct domain *d)
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 27b1f089e2..36ea898ac7 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -66,7 +66,7 @@ static void vmsi_inj_irq(
 
 int vmsi_deliver(
     struct domain *d, int vector,
-    uint8_t dest, uint8_t dest_mode,
+    uint32_t dest, uint8_t dest_mode,
     uint8_t delivery_mode, uint8_t trig_mode)
 {
     struct vlapic *target;
@@ -109,7 +109,7 @@ void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
 {
     uint32_t flags = pirq_dpci->gmsi.gflags;
     int vector = pirq_dpci->gmsi.gvec;
-    uint8_t dest = (uint8_t)flags;
+    uint32_t dest = XEN_DOMCTL_VMSI_X86_FULL_DEST(flags);
     bool dest_mode = flags & XEN_DOMCTL_VMSI_X86_DM_MASK;
     uint8_t delivery_mode = MASK_EXTR(flags, XEN_DOMCTL_VMSI_X86_DELIV_MASK);
     bool trig_mode = flags & XEN_DOMCTL_VMSI_X86_TRIG_MASK;
@@ -125,7 +125,7 @@ void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
 }
 
 /* Return value, -1 : multi-dests, non-negative value: dest_vcpu_id */
-int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, uint8_t dest_mode)
+int hvm_girq_dest_2_vcpu_id(struct domain *d, uint32_t dest, uint8_t dest_mode)
 {
     int dest_vcpu_id = -1, w = 0;
     struct vcpu *v;
@@ -802,6 +802,8 @@ static unsigned int msi_gflags(uint16_t data, uint64_t addr, bool masked)
      */
     return MASK_INSR(MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK),
                      XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) |
+           MASK_INSR(MASK_EXTR(addr, MSI_ADDR_EXT_DEST_ID_MASK),
+                     XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK) |
            MASK_INSR(MASK_EXTR(addr, MSI_ADDR_REDIRECTION_MASK),
                      XEN_DOMCTL_VMSI_X86_RH_MASK) |
            MASK_INSR(MASK_EXTR(addr, MSI_ADDR_DESTMODE_MASK),
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
index 7d9774df59..11256d5e67 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -295,11 +295,11 @@ uint64_t hvm_get_guest_time_fixed(const struct vcpu *v, uint64_t at_tsc);
 
 int vmsi_deliver(
     struct domain *d, int vector,
-    uint8_t dest, uint8_t dest_mode,
+    uint32_t dest, uint8_t dest_mode,
     uint8_t delivery_mode, uint8_t trig_mode);
 struct hvm_pirq_dpci;
 void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci);
-int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, uint8_t dest_mode);
+int hvm_girq_dest_2_vcpu_id(struct domain *d, uint32_t dest, uint8_t dest_mode);
 
 enum hvm_intblk
 hvm_interrupt_blocked(struct vcpu *v, struct hvm_intack intack);
-- 
2.51.0



--
Julian Vetter | Vates Hypervisor & Kernel Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/7] x86/hvm: Support extended destination IDs in virtual MSI and IO-APIC
  2026-03-09 12:31 ` [PATCH v3 3/7] x86/hvm: Support extended destination IDs in virtual MSI and IO-APIC Julian Vetter
@ 2026-03-11 15:27   ` Jan Beulich
  2026-03-12 11:15     ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2026-03-11 15:27 UTC (permalink / raw)
  To: Julian Vetter
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 09.03.2026 13:31, Julian Vetter wrote:
> Use the newly defined masks to extract the full 15-bit destination ID
> from guest MSI addresses and IO-APIC RTEs. In hvm_inject_msi() combine
> the standard bits [19:12] with the extended bits [11:5] of the MSI
> address into a 15-bit destination ID for LAPIC delivery. Increase the
> dest parameter of vmsi_deliver() and hvm_girq_dest_2_vcpu_id() from
> uint8_t to uint32_t. In vmsi_deliver_pirq() extract the full destination
> from gflags via XEN_DOMCTL_VMSI_X86_FULL_DEST(). In msi_gflags() pack
> the extended bits from the MSI address into the new
> XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK field of gflags. In
> vioapic_deliver() read the combined 15-bit destination using the
> VIOAPIC_RTE_DEST() macro. Extend ioapic_check() to check for extended
> destination bits set in a domain that does not advertise
> XEN_HVM_CPUID_EXT_DEST_ID and refuse to restore the IO-APIC state,
> preventing silent interrupt misrouting after live migration.

This is pretty hard to read without being split in a few paragraphs.

> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -374,7 +374,16 @@ int hvm_set_pci_link_route(struct domain *d, u8 link, u8 isa_irq)
>  int hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data)
>  {
>      uint32_t tmp = (uint32_t) addr;
> -    uint8_t  dest = (tmp & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
> +    /*
> +     * Standard MSI destination address bits 19:12 (8 bits).
> +     * Extended MSI destination address bits 11:5 (7 more bits).
> +     *
> +     * As XEN_HVM_CPUID_EXT_DEST_ID is advertised, the guest may use bits 11:5
> +     * for high destination ID bits, expanding to 15 bits total. Guests unaware
> +     * of this feature set these bits to 0, so this is backwards-compatible.

How do you know? Like for the IO-APIC RTE bits, there is (and cannot be)
anything enforcing this. Hence for a guest to use this feature, it needs
to have a way to opt in.

> +     */
> +    uint32_t dest = (MASK_EXTR(tmp, MSI_ADDR_EXT_DEST_ID_MASK) << MSI_ADDR_DEST_ID_BITS) |

Nit: This line looks too long now.

Here as well as ...

> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -411,7 +411,9 @@ static void ioapic_inj_irq(
>  
>  static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>  {
> -    uint16_t dest = vioapic->redirtbl[pin].fields.dest_id;
> +    uint32_t dest = ((uint32_t)vioapic->redirtbl[pin].fields.ext_dest_id <<
> +                     VIOAPIC_RTE_DEST_ID_UPPER_BITS) |
> +                    vioapic->redirtbl[pin].fields.dest_id;

... e.g. here a macro or inline function doing the conversion would likely
help readability quite a bit.

> @@ -618,6 +620,21 @@ static int cf_check ioapic_check(const struct domain *d, hvm_domain_context_t *h
>               e->fields.reserved[0] || e->fields.reserved[1] ||
>               e->fields.reserved[2] || e->fields.reserved2 )
>              return -EINVAL;
> +
> +        /*
> +         * An RTE in the saved state has ext_dest_id bits set. Check that
> +         * the destination Xen has extended destination ID support enabled,
> +         * otherwise interrupt routing to APIC IDs > 255 would be broken
> +         * after restore.
> +         */
> +        if ( e->fields.ext_dest_id && !d->arch.hvm.ext_dest_id_enabled )

This won't build, as the ext_dest_id_enabled field appears only in patch 6.
But yes, that looks to be the opt-in mechanism I mentioned above.

> @@ -659,7 +676,7 @@ static int cf_check ioapic_load(struct domain *d, hvm_domain_context_t *h)
>      return 0;
>  }
>  
> -HVM_REGISTER_SAVE_RESTORE(IOAPIC, ioapic_save, NULL, ioapic_load, 1,
> +HVM_REGISTER_SAVE_RESTORE(IOAPIC, ioapic_save, ioapic_check, ioapic_load, 1,
>                            HVMSR_PER_DOM);

As per the comment there, this belongs in the earlier patch.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/7] x86/hvm: Support extended destination IDs in virtual MSI and IO-APIC
  2026-03-11 15:27   ` Jan Beulich
@ 2026-03-12 11:15     ` Jan Beulich
  2026-03-16 15:06       ` Julian Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2026-03-12 11:15 UTC (permalink / raw)
  To: Julian Vetter
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 11.03.2026 16:27, Jan Beulich wrote:
> On 09.03.2026 13:31, Julian Vetter wrote:
>> @@ -618,6 +620,21 @@ static int cf_check ioapic_check(const struct domain *d, hvm_domain_context_t *h
>>               e->fields.reserved[0] || e->fields.reserved[1] ||
>>               e->fields.reserved[2] || e->fields.reserved2 )
>>              return -EINVAL;
>> +
>> +        /*
>> +         * An RTE in the saved state has ext_dest_id bits set. Check that
>> +         * the destination Xen has extended destination ID support enabled,
>> +         * otherwise interrupt routing to APIC IDs > 255 would be broken
>> +         * after restore.
>> +         */
>> +        if ( e->fields.ext_dest_id && !d->arch.hvm.ext_dest_id_enabled )
> 
> This won't build, as the ext_dest_id_enabled field appears only in patch 6.
> But yes, that looks to be the opt-in mechanism I mentioned above.

Actually no, how could it be. That's for the DM to invoke.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/7] x86/hvm: Support extended destination IDs in virtual MSI and IO-APIC
  2026-03-12 11:15     ` Jan Beulich
@ 2026-03-16 15:06       ` Julian Vetter
  2026-03-19  9:02         ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Julian Vetter @ 2026-03-16 15:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 3/12/26 12:15 PM, Jan Beulich wrote:
> On 11.03.2026 16:27, Jan Beulich wrote:
>> On 09.03.2026 13:31, Julian Vetter wrote:
>>> @@ -618,6 +620,21 @@ static int cf_check ioapic_check(const struct domain *d, hvm_domain_context_t *h
>>>                e->fields.reserved[0] || e->fields.reserved[1] ||
>>>                e->fields.reserved[2] || e->fields.reserved2 )
>>>               return -EINVAL;
>>> +
>>> +        /*
>>> +         * An RTE in the saved state has ext_dest_id bits set. Check that
>>> +         * the destination Xen has extended destination ID support enabled,
>>> +         * otherwise interrupt routing to APIC IDs > 255 would be broken
>>> +         * after restore.
>>> +         */
>>> +        if ( e->fields.ext_dest_id && !d->arch.hvm.ext_dest_id_enabled )

Thank you for your feedback Jan! Yes, right the 'ext_dest_id_enabled' 
must be defined before it can be checked. I have rearranged this in my 
patch set.

>>
>> This won't build, as the ext_dest_id_enabled field appears only in patch 6.
>> But yes, that looks to be the opt-in mechanism I mentioned above.
> 
> Actually no, how could it be. That's for the DM to invoke.

But this comment here I'm not sure I fully understand. You mean that 
checking 'if ( e->fields.ext_dest_id && !d->arch.hvm.ext_dest_id_enabled 
)' is not enough? This check only verifies that if the target domain 
doesn't support ext_dest_id_enabled, no RTE in the source domain is 
allowed to have a ext_dest_id set. But now we also have to check that if 
the source domain has ext_dest_id_enabled, the target also have to 
announce it, right? So,

if ( s->ext_dest_id_enabled && !d->arch.hvm.ext_dest_id_enabled )
{
     //ERROR
}

Is this what you meant?

Julian

> 
> Jan



--
Julian Vetter | Vates Hypervisor & Kernel Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/7] x86/hvm: Support extended destination IDs in virtual MSI and IO-APIC
  2026-03-16 15:06       ` Julian Vetter
@ 2026-03-19  9:02         ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2026-03-19  9:02 UTC (permalink / raw)
  To: Julian Vetter
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 16.03.2026 16:06, Julian Vetter wrote:
> On 3/12/26 12:15 PM, Jan Beulich wrote:
>> On 11.03.2026 16:27, Jan Beulich wrote:
>>> On 09.03.2026 13:31, Julian Vetter wrote:
>>>> @@ -618,6 +620,21 @@ static int cf_check ioapic_check(const struct domain *d, hvm_domain_context_t *h
>>>>                e->fields.reserved[0] || e->fields.reserved[1] ||
>>>>                e->fields.reserved[2] || e->fields.reserved2 )
>>>>               return -EINVAL;
>>>> +
>>>> +        /*
>>>> +         * An RTE in the saved state has ext_dest_id bits set. Check that
>>>> +         * the destination Xen has extended destination ID support enabled,
>>>> +         * otherwise interrupt routing to APIC IDs > 255 would be broken
>>>> +         * after restore.
>>>> +         */
>>>> +        if ( e->fields.ext_dest_id && !d->arch.hvm.ext_dest_id_enabled )
> 
> Thank you for your feedback Jan! Yes, right the 'ext_dest_id_enabled' 
> must be defined before it can be checked. I have rearranged this in my 
> patch set.
> 
>>>
>>> This won't build, as the ext_dest_id_enabled field appears only in patch 6.
>>> But yes, that looks to be the opt-in mechanism I mentioned above.
>>
>> Actually no, how could it be. That's for the DM to invoke.
> 
> But this comment here I'm not sure I fully understand. You mean that 
> checking 'if ( e->fields.ext_dest_id && !d->arch.hvm.ext_dest_id_enabled 
> )' is not enough? This check only verifies that if the target domain 
> doesn't support ext_dest_id_enabled, no RTE in the source domain is 
> allowed to have a ext_dest_id set. But now we also have to check that if 
> the source domain has ext_dest_id_enabled, the target also have to 
> announce it, right? So,
> 
> if ( s->ext_dest_id_enabled && !d->arch.hvm.ext_dest_id_enabled )
> {
>      //ERROR
> }
> 
> Is this what you meant?

No. What I tried to convey is that besides Xen <-> DM negotiation, there
also needs to be Xen <-> DomU negotiation. While they shouldn't, existing
domains can write non-zero values, and the behavior of such domains
shouldn't change with your extension.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 7/7] x86/cpuid: Advertise XEN_HVM_CPUID_EXT_DEST_ID when device model opts in
  2026-03-09 12:31 [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Julian Vetter
                   ` (2 preceding siblings ...)
  2026-03-09 12:31 ` [PATCH v3 3/7] x86/hvm: Support extended destination IDs in virtual MSI and IO-APIC Julian Vetter
@ 2026-03-09 12:31 ` Julian Vetter
  2026-03-09 12:31 ` [PATCH v3 6/7] x86/dmop: Add XEN_DMOP_enable_ext_dest_id DM op Julian Vetter
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Julian Vetter @ 2026-03-09 12:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall,
	Julian Vetter

Set the XEN_HVM_CPUID_EXT_DEST_ID bit in the HVM hypervisor CPUID leaf
only when the device model has called XEN_DMOP_enable_ext_dest_id,
signalling it will use XEN_DMOP_bind_pt_msi_irq for passthrough MSIs.
This guarantees that Xen will decode the extended destination bits from
the raw MSI address internally, so the device model does not need to
understand the encoding itself. This ensures that an old device model
that still uses XEN_DOMCTL_bind_pt_irq will not trigger extended ID
usage in the guest, preserving backwards compatibility. However a device
model that calls XEN_DMOP_enable_ext_dest_id then has to use
XEN_DMOP_{bind,unbind}_pt_msi_irq, and enables support for x2APIC
destination IDs above 255.

Signed-off-by: Julian Vetter <julian.vetter@vates.tech>
---
Changes in V3:
- Addressed feedback from Jan and Roger -> Don't advertise the
  XEN_HVM_CPUID_EXT_DEST_ID unconditionally
---
 xen/arch/x86/cpuid.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 5decfad8cd..c1f82f83ea 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -148,6 +148,18 @@ static void cpuid_hypervisor_leaves(const struct vcpu *v, uint32_t leaf,
         res->a |= XEN_HVM_CPUID_DOMID_PRESENT;
         res->c = d->domain_id;
 
+        /*
+         * Advertise extended destination ID support only when the device model
+         * has opted in via XEN_DMOP_enable_ext_dest_id, making sure it will
+         * use XEN_DMOP_bind_pt_msi_irq for passthrough MSI (passing raw
+         * addr/data so Xen can decode extended bits). This allows guests to
+         * use bits 11:5 of the MSI address and bits 55:49 of the IO-APIC RTE
+         * for additional destination ID bits, expanding the addressable APIC
+         * ID range from 8 to 15 bits.
+         */
+        if ( d->arch.hvm.ext_dest_id_enabled )
+            res->a |= XEN_HVM_CPUID_EXT_DEST_ID;
+
         /*
          * Per-vCPU event channel upcalls are implemented and work
          * correctly with PIRQs routed over event channels.
-- 
2.51.0



--
Julian Vetter | Vates Hypervisor & Kernel Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 6/7] x86/dmop: Add XEN_DMOP_enable_ext_dest_id DM op
  2026-03-09 12:31 [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Julian Vetter
                   ` (3 preceding siblings ...)
  2026-03-09 12:31 ` [PATCH v3 7/7] x86/cpuid: Advertise XEN_HVM_CPUID_EXT_DEST_ID when device model opts in Julian Vetter
@ 2026-03-09 12:31 ` Julian Vetter
  2026-03-09 13:26   ` Roger Pau Monné
  2026-03-12 11:18   ` Jan Beulich
  2026-03-09 12:31 ` [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops Julian Vetter
  2026-03-10 16:55 ` [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Jan Beulich
  6 siblings, 2 replies; 23+ messages in thread
From: Julian Vetter @ 2026-03-09 12:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall,
	Julian Vetter

Xen cannot simply advertise XEN_HVM_CPUID_EXT_DEST_ID to the guest
without knowing that the device model will handle extended destination
IDs correctly for passthrough MSIs. A device model that still uses
XEN_DOMCTL_bind_pt_irq would pass only the low 8 bits of the destination
ID, misrouting interrupts to vCPUs with APIC IDs greater than 255. So,
add a DM op XEN_DMOP_enable_ext_dest_id that the device model can call
during domain setup (before vCPUs are started) to signal that it will
use XEN_DMOP_bind_pt_msi_irq for all passthrough MSI bindings. When
called, Xen sets ext_dest_id_enabled in struct hvm_domain, so it's
visible to the guest via CPUID.

Signed-off-by: Julian Vetter <julian.vetter@vates.tech>
---
Changes in V3:
- New patch addressing feedback from Roger
---
 tools/include/xendevicemodel.h        | 14 ++++++++++++++
 tools/libs/devicemodel/core.c         | 10 ++++++++++
 xen/arch/x86/hvm/dm.c                 |  5 +++++
 xen/arch/x86/include/asm/hvm/domain.h |  7 +++++++
 xen/include/public/hvm/dm_op.h        |  9 +++++++++
 5 files changed, 45 insertions(+)

diff --git a/tools/include/xendevicemodel.h b/tools/include/xendevicemodel.h
index 0d5d7b0ff1..270d76fe9c 100644
--- a/tools/include/xendevicemodel.h
+++ b/tools/include/xendevicemodel.h
@@ -412,6 +412,20 @@ int xendevicemodel_unbind_pt_msi_irq(
     xendevicemodel_handle *dmod, domid_t domid, uint32_t machine_irq,
     uint8_t gvec, uint64_t msi_addr, uint32_t msi_data);
 
+/**
+ * This function signals to Xen that this device model will use
+ * xendevicemodel_bind_pt_msi_irq() for all passthrough MSI bindings.
+ * After this call, Xen will advertise XEN_HVM_CPUID_EXT_DEST_ID to the
+ * guest, enabling 15-bit destination IDs. Must be called before the
+ * guest vCPUs are started!
+ *
+ * @parm dmod a handle to an open devicemodel interface.
+ * @parm domid the domain id to be serviced.
+ * @return 0 on success, -1 on failure.
+ */
+int xendevicemodel_enable_ext_dest_id(
+    xendevicemodel_handle *dmod, domid_t domid);
+
 #endif /* XENDEVICEMODEL_H */
 
 /*
diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index 4a52fe4750..03838aa37b 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -689,6 +689,16 @@ int xendevicemodel_unbind_pt_msi_irq(
     return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
 }
 
+int xendevicemodel_enable_ext_dest_id(
+    xendevicemodel_handle *dmod, domid_t domid)
+{
+    struct xen_dm_op op = {
+        .op = XEN_DMOP_enable_ext_dest_id,
+    };
+
+    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
+}
+
 int xendevicemodel_restrict(xendevicemodel_handle *dmod, domid_t domid)
 {
     return osdep_xendevicemodel_restrict(dmod, domid);
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 3d530d948f..7738400540 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -374,6 +374,7 @@ int dm_op(const struct dmop_args *op_args)
         [XEN_DMOP_nr_vcpus]                         = sizeof(struct xen_dm_op_nr_vcpus),
         [XEN_DMOP_bind_pt_msi_irq]                  = sizeof(struct xen_dm_op_bind_pt_msi_irq),
         [XEN_DMOP_unbind_pt_msi_irq]                = sizeof(struct xen_dm_op_bind_pt_msi_irq),
+        [XEN_DMOP_enable_ext_dest_id]               = 0,
     };
 
     rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
@@ -708,6 +709,10 @@ int dm_op(const struct dmop_args *op_args)
         break;
     }
 
+    case XEN_DMOP_enable_ext_dest_id:
+        d->arch.hvm.ext_dest_id_enabled = true;
+        break;
+
     default:
         rc = ioreq_server_dm_op(&op, d, &const_op);
         break;
diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
index abf9bc448d..770bc96970 100644
--- a/xen/arch/x86/include/asm/hvm/domain.h
+++ b/xen/arch/x86/include/asm/hvm/domain.h
@@ -105,6 +105,13 @@ struct hvm_domain {
     /* Compatibility setting for a bug in x2APIC LDR */
     bool bug_x2apic_ldr_vcpu_id;
 
+    /*
+     * Set by the device model via XEN_DMOP_enable_ext_dest_id to indicate
+     * it uses XEN_DMOP_bind_pt_msi_irq (raw MSI addr/data) for passthrough.
+     * Controls advertisement of XEN_HVM_CPUID_EXT_DEST_ID to the guest.
+     */
+    bool ext_dest_id_enabled;
+
     /* hypervisor intercepted msix table */
     struct list_head       msixtbl_list;
 
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index fd0f3a6a99..2814fe1c3d 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -447,6 +447,15 @@ typedef struct xen_dm_op_nr_vcpus xen_dm_op_nr_vcpus_t;
 #define XEN_DMOP_bind_pt_msi_irq   21
 #define XEN_DMOP_unbind_pt_msi_irq 22
 
+/*
+ * XEN_DMOP_enable_ext_dest_id: Signal to Xen that this device model will use
+ * XEN_DMOP_bind_pt_msi_irq for all passthrough MSI bindings, passing raw MSI
+ * address/data fields. Once called, Xen will advertise
+ * XEN_HVM_CPUID_EXT_DEST_ID to the guest. Must be called before the guest
+ * starts.
+ */
+#define XEN_DMOP_enable_ext_dest_id 23
+
 struct xen_dm_op_bind_pt_msi_irq {
     /* IN - physical IRQ (pirq) */
     uint32_t machine_irq;
-- 
2.51.0



--
Julian Vetter | Vates Hypervisor & Kernel Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 6/7] x86/dmop: Add XEN_DMOP_enable_ext_dest_id DM op
  2026-03-09 12:31 ` [PATCH v3 6/7] x86/dmop: Add XEN_DMOP_enable_ext_dest_id DM op Julian Vetter
@ 2026-03-09 13:26   ` Roger Pau Monné
  2026-03-17 10:22     ` Julian Vetter
  2026-03-12 11:18   ` Jan Beulich
  1 sibling, 1 reply; 23+ messages in thread
From: Roger Pau Monné @ 2026-03-09 13:26 UTC (permalink / raw)
  To: Julian Vetter
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall

On Mon, Mar 09, 2026 at 12:31:03PM +0000, Julian Vetter wrote:
> Xen cannot simply advertise XEN_HVM_CPUID_EXT_DEST_ID to the guest
> without knowing that the device model will handle extended destination
> IDs correctly for passthrough MSIs. A device model that still uses
> XEN_DOMCTL_bind_pt_irq would pass only the low 8 bits of the destination
> ID, misrouting interrupts to vCPUs with APIC IDs greater than 255. So,
> add a DM op XEN_DMOP_enable_ext_dest_id that the device model can call
> during domain setup (before vCPUs are started) to signal that it will
> use XEN_DMOP_bind_pt_msi_irq for all passthrough MSI bindings. When
> called, Xen sets ext_dest_id_enabled in struct hvm_domain, so it's
> visible to the guest via CPUID.

Have you considered whether you could re-use the padding in
XEN_DMOP_create_ioreq_server to signal whether the device model
supports Extended ID parsing?

Also, you might want some negotiation between multiple ioreq servers
on the same domain.  IOW: is multiple ioreq servers are registered
ahead of the domain having finished creation you could level whether
extended ID should be announced.  For ioreqs that are registered after
the domain have started you need to enforce the currently set Extended
ID support.  If the domain is running, and Extended ID is advertised
you must prevent registering any new ioreq that doesn't support
Extended ID.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 6/7] x86/dmop: Add XEN_DMOP_enable_ext_dest_id DM op
  2026-03-09 13:26   ` Roger Pau Monné
@ 2026-03-17 10:22     ` Julian Vetter
  2026-03-25 10:58       ` Roger Pau Monné
  0 siblings, 1 reply; 23+ messages in thread
From: Julian Vetter @ 2026-03-17 10:22 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall

On 3/9/26 14:29, Roger Pau Monné wrote:
> On Mon, Mar 09, 2026 at 12:31:03PM +0000, Julian Vetter wrote:
>> Xen cannot simply advertise XEN_HVM_CPUID_EXT_DEST_ID to the guest
>> without knowing that the device model will handle extended destination
>> IDs correctly for passthrough MSIs. A device model that still uses
>> XEN_DOMCTL_bind_pt_irq would pass only the low 8 bits of the destination
>> ID, misrouting interrupts to vCPUs with APIC IDs greater than 255. So,
>> add a DM op XEN_DMOP_enable_ext_dest_id that the device model can call
>> during domain setup (before vCPUs are started) to signal that it will
>> use XEN_DMOP_bind_pt_msi_irq for all passthrough MSI bindings. When
>> called, Xen sets ext_dest_id_enabled in struct hvm_domain, so it's
>> visible to the guest via CPUID.
>
> Have you considered whether you could re-use the padding in
> XEN_DMOP_create_ioreq_server to signal whether the device model
> supports Extended ID parsing?
>
> Also, you might want some negotiation between multiple ioreq servers
> on the same domain.  IOW: is multiple ioreq servers are registered
> ahead of the domain having finished creation you could level whether
> extended ID should be announced.  For ioreqs that are registered after
> the domain have started you need to enforce the currently set Extended
> ID support.  If the domain is running, and Extended ID is advertised
> you must prevent registering any new ioreq that doesn't support
> Extended ID.
>

Thank you Roger for your feedback! It's very appreciated! This was a
good idea. I have implemented this now. I have used one of the reserved
bytes and use it as a flag field.

But I have a remaining question/concern, which maybe you can clarify.
If server A (e.g., a secondary emulator) registers WITH ext_dest_id
before the domain starts, hvm_ext_dest_id_enabled() returns true and the
guest will be advertised XEN_HVM_CPUID_EXT_DEST_ID. If QEMU's primary
ioreq server then registers WITHOUT the flag and goes on to use
XEN_DOMCTL_bind_pt_irq, pass-through MSIs will be misrouted (or rejected
if I then refuse calls to XEN_DOMCTL_bind_pt_irq for that domain). My
implementation allows this combination before 'd->creation_finished'.

I have added a check in ioreq_server_dm_op() for the
XEN_DMOP_create_ioreq_server case for the "runtime" case as you
suggested. If one of the existing ioreq servers announces it, and a new
server wants to join, that doesn't have this flag set, it will be
rejected with -EINVAL:

if ( d->creation_finished && hvm_ext_dest_id_enabled(d) &&
      !(data->flags & XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID) )
     break;

But what happens to the first case I described, how can I reasonably
negotiate between different ioreq servers or determine who is the "main"
server? In practice, usually if QEMU ("the only" server that matters for
pass-through) opted in, the feature is enabled. If it didn't, no other
server would have opted in either, no? The only "sort of" issue would be
if someone wrote a secondary emulator that sets ext_dest_id to basically
"lock out" a QEMU that doesn't support the new XEN_DMs?

Thank you!

Julian

> Thanks, Roger.
>

--
Julian Vetter | Vates Hypervisor & Kernel Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 6/7] x86/dmop: Add XEN_DMOP_enable_ext_dest_id DM op
  2026-03-17 10:22     ` Julian Vetter
@ 2026-03-25 10:58       ` Roger Pau Monné
  0 siblings, 0 replies; 23+ messages in thread
From: Roger Pau Monné @ 2026-03-25 10:58 UTC (permalink / raw)
  To: Julian Vetter
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall

On Tue, Mar 17, 2026 at 10:22:20AM +0000, Julian Vetter wrote:
> On 3/9/26 14:29, Roger Pau Monné wrote:
> > On Mon, Mar 09, 2026 at 12:31:03PM +0000, Julian Vetter wrote:
> >> Xen cannot simply advertise XEN_HVM_CPUID_EXT_DEST_ID to the guest
> >> without knowing that the device model will handle extended destination
> >> IDs correctly for passthrough MSIs. A device model that still uses
> >> XEN_DOMCTL_bind_pt_irq would pass only the low 8 bits of the destination
> >> ID, misrouting interrupts to vCPUs with APIC IDs greater than 255. So,
> >> add a DM op XEN_DMOP_enable_ext_dest_id that the device model can call
> >> during domain setup (before vCPUs are started) to signal that it will
> >> use XEN_DMOP_bind_pt_msi_irq for all passthrough MSI bindings. When
> >> called, Xen sets ext_dest_id_enabled in struct hvm_domain, so it's
> >> visible to the guest via CPUID.
> > 
> > Have you considered whether you could re-use the padding in
> > XEN_DMOP_create_ioreq_server to signal whether the device model
> > supports Extended ID parsing?
> > 
> > Also, you might want some negotiation between multiple ioreq servers
> > on the same domain.  IOW: is multiple ioreq servers are registered
> > ahead of the domain having finished creation you could level whether
> > extended ID should be announced.  For ioreqs that are registered after
> > the domain have started you need to enforce the currently set Extended
> > ID support.  If the domain is running, and Extended ID is advertised
> > you must prevent registering any new ioreq that doesn't support
> > Extended ID.
> > 
> 
> Thank you Roger for your feedback! It's very appreciated! This was a 
> good idea. I have implemented this now. I have used one of the reserved 
> bytes and use it as a flag field.
> 
> But I have a remaining question/concern, which maybe you can clarify.
> If server A (e.g., a secondary emulator) registers WITH ext_dest_id 
> before the domain starts, hvm_ext_dest_id_enabled() returns true and the 
> guest will be advertised XEN_HVM_CPUID_EXT_DEST_ID. If QEMU's primary 
> ioreq server then registers WITHOUT the flag

But QEMU would also register ahead of domain build having finished,
and hence you need to take the minimum of all supported emulators.  In
this case, if there's a single emulator registered ahead of the domain
build having finished that doesn't support the Extended Destination ID
format you cannot advertise it.

> and goes on to use 
> XEN_DOMCTL_bind_pt_irq, pass-through MSIs will be misrouted (or rejected 
> if I then refuse calls to XEN_DOMCTL_bind_pt_irq for that domain). My 
> implementation allows this combination before 'd->creation_finished'.

Yes, the combination is fine as the feature is leveled across all
registered emulators before domain creation finished.  Once domain
creation has finished newly registered emulators must have Extended
Destination support if the feature has been advertised by Xen in
CPUID.

> I have added a check in ioreq_server_dm_op() for the 
> XEN_DMOP_create_ioreq_server case for the "runtime" case as you 
> suggested. If one of the existing ioreq servers announces it, and a new 
> server wants to join, that doesn't have this flag set, it will be 
> rejected with -EINVAL:

That seems fine, as long as the feature is levelled across all
registered ioreq server prior to the domain being started.  IOW: you
shouldn't take the value of the first registered ioreq server, it has
to be levelled across all registered ioreqs ahead of domain start.

> if ( d->creation_finished && hvm_ext_dest_id_enabled(d) &&
>       !(data->flags & XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID) )
>      break;
> 
> But what happens to the first case I described, how can I reasonably 
> negotiate between different ioreq servers or determine who is the "main" 
> server? 

Why do you care how is the "main" ioreq server?  All ioreq servers
must support the feature for it to be enabled, it doesn't matter
whether it's a primary or a secondary server.

> In practice, usually if QEMU ("the only" server that matters for 
> pass-through) opted in, the feature is enabled.

Hm, not really, On XenServer we use a separate ioreq server (not QEMU)
just for passthrough of certain devices.  So I can tell you there are
other users of ioreq outside of QEMU that do passthrough.

> If it didn't, no other 
> server would have opted in either, no? The only "sort of" issue would be 
> if someone wrote a secondary emulator that sets ext_dest_id to basically 
> "lock out" a QEMU that doesn't support the new XEN_DMs?

Ahead of the domain being created you need to figure out whether it's
safe to expose the feature across all registered ioreq server at that
point.  After that, if Extended Destination ID is exposed, only server
supporting it can be added for the lifetime of the domain.

You will also need to migrate this setting AFAICT, so the guest
doesn't see it disappearing when it resumes on a different host.

Hope this makes sense, let me know if you think some of the above is
wrong or undoable.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 6/7] x86/dmop: Add XEN_DMOP_enable_ext_dest_id DM op
  2026-03-09 12:31 ` [PATCH v3 6/7] x86/dmop: Add XEN_DMOP_enable_ext_dest_id DM op Julian Vetter
  2026-03-09 13:26   ` Roger Pau Monné
@ 2026-03-12 11:18   ` Jan Beulich
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2026-03-12 11:18 UTC (permalink / raw)
  To: Julian Vetter
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 09.03.2026 13:31, Julian Vetter wrote:
> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -447,6 +447,15 @@ typedef struct xen_dm_op_nr_vcpus xen_dm_op_nr_vcpus_t;
>  #define XEN_DMOP_bind_pt_msi_irq   21
>  #define XEN_DMOP_unbind_pt_msi_irq 22
>  
> +/*
> + * XEN_DMOP_enable_ext_dest_id: Signal to Xen that this device model will use
> + * XEN_DMOP_bind_pt_msi_irq for all passthrough MSI bindings, passing raw MSI
> + * address/data fields. Once called, Xen will advertise
> + * XEN_HVM_CPUID_EXT_DEST_ID to the guest. Must be called before the guest
> + * starts.
> + */
> +#define XEN_DMOP_enable_ext_dest_id 23
> +
>  struct xen_dm_op_bind_pt_msi_irq {
>      /* IN - physical IRQ (pirq) */
>      uint32_t machine_irq;

How come this is put between #define-s and their corresponding struct?

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops
  2026-03-09 12:31 [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Julian Vetter
                   ` (4 preceding siblings ...)
  2026-03-09 12:31 ` [PATCH v3 6/7] x86/dmop: Add XEN_DMOP_enable_ext_dest_id DM op Julian Vetter
@ 2026-03-09 12:31 ` Julian Vetter
  2026-03-09 13:38   ` Roger Pau Monné
                     ` (2 more replies)
  2026-03-10 16:55 ` [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Jan Beulich
  6 siblings, 3 replies; 23+ messages in thread
From: Julian Vetter @ 2026-03-09 12:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall,
	Julian Vetter

Add two DM ops for MSI passthrough IRQs. These new DM ops take the raw
MSI address and data fields rather than a pre-decoded gflags values. Xen
decodes the destination ID via msi_addr_to_gflags(), including any
extended destination bits in address[11:5]. This means the device model
does not need to understand the extended destination ID encoding, and
simply forwards the MSI address it observes from the guest.

Signed-off-by: Julian Vetter <julian.vetter@vates.tech>
---
Changes in V3:
- New patch addressing feedback from Roger
---
 tools/include/xendevicemodel.h |  37 ++++++++++++
 tools/libs/devicemodel/core.c  |  44 ++++++++++++++
 xen/arch/x86/hvm/dm.c          | 102 +++++++++++++++++++++++++++++++++
 xen/include/public/hvm/dm_op.h |  23 ++++++++
 xen/include/xlat.lst           |   1 +
 5 files changed, 207 insertions(+)

diff --git a/tools/include/xendevicemodel.h b/tools/include/xendevicemodel.h
index 227e7fd810..0d5d7b0ff1 100644
--- a/tools/include/xendevicemodel.h
+++ b/tools/include/xendevicemodel.h
@@ -375,6 +375,43 @@ int xendevicemodel_nr_vcpus(
  */
 int xendevicemodel_restrict(xendevicemodel_handle *dmod, domid_t domid);
 
+/**
+ * This function binds a passthrough physical IRQ to a guest MSI vector
+ * using raw MSI address/data fields. Unlike XEN_DOMCTL_bind_pt_irq,
+ * this interface supports extended (15-bit) destination IDs by having
+ * Xen decode the MSI address internally.
+ *
+ * @parm dmod a handle to an open devicemodel interface.
+ * @parm domid the domain id to be serviced.
+ * @parm machine_irq the physical IRQ number (pirq).
+ * @parm gvec the guest interrupt vector.
+ * @parm msi_addr the MSI address (0xfeexxxxx, includes extended dest ID).
+ * @parm msi_data the MSI data (vector, delivery mode, trigger mode).
+ * @parm gtable the MSI-X table base GFN, or 0 for plain MSI.
+ * @parm unmasked if non-zero, leave the IRQ unmasked after binding.
+ * @return 0 on success, -1 on failure.
+ */
+int xendevicemodel_bind_pt_msi_irq(
+    xendevicemodel_handle *dmod, domid_t domid, uint32_t machine_irq,
+    uint8_t gvec, uint64_t msi_addr, uint32_t msi_data, uint64_t gtable,
+    int unmasked);
+
+/**
+ * This function unbinds a passthrough physical IRQ previously bound
+ * with xendevicemodel_bind_pt_msi_irq.
+ *
+ * @parm dmod a handle to an open devicemodel interface.
+ * @parm domid the domain id to be serviced.
+ * @parm machine_irq the physical IRQ number (pirq).
+ * @parm gvec the guest interrupt vector.
+ * @parm msi_addr the MSI address.
+ * @parm msi_data the MSI data.
+ * @return 0 on success, -1 on failure.
+ */
+int xendevicemodel_unbind_pt_msi_irq(
+    xendevicemodel_handle *dmod, domid_t domid, uint32_t machine_irq,
+    uint8_t gvec, uint64_t msi_addr, uint32_t msi_data);
+
 #endif /* XENDEVICEMODEL_H */
 
 /*
diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index 8e619eeb0a..4a52fe4750 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -645,6 +645,50 @@ int xendevicemodel_nr_vcpus(
     return 0;
 }
 
+int xendevicemodel_bind_pt_msi_irq(
+    xendevicemodel_handle *dmod, domid_t domid, uint32_t machine_irq,
+    uint8_t gvec, uint64_t msi_addr, uint32_t msi_data, uint64_t gtable,
+    int unmasked)
+{
+    struct xen_dm_op op;
+    struct xen_dm_op_bind_pt_msi_irq *data;
+
+    memset(&op, 0, sizeof(op));
+
+    op.op = XEN_DMOP_bind_pt_msi_irq;
+    data = &op.u.bind_pt_msi_irq;
+
+    data->machine_irq = machine_irq;
+    data->gvec = gvec;
+    data->data = msi_data;
+    data->addr = msi_addr;
+    data->gtable = gtable;
+    if ( unmasked )
+        data->flags |= XEN_DMOP_MSI_FLAG_UNMASKED;
+
+    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
+}
+
+int xendevicemodel_unbind_pt_msi_irq(
+    xendevicemodel_handle *dmod, domid_t domid, uint32_t machine_irq,
+    uint8_t gvec, uint64_t msi_addr, uint32_t msi_data)
+{
+    struct xen_dm_op op;
+    struct xen_dm_op_bind_pt_msi_irq *data;
+
+    memset(&op, 0, sizeof(op));
+
+    op.op = XEN_DMOP_unbind_pt_msi_irq;
+    data = &op.u.bind_pt_msi_irq;
+
+    data->machine_irq = machine_irq;
+    data->gvec = gvec;
+    data->data = msi_data;
+    data->addr = msi_addr;
+
+    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
+}
+
 int xendevicemodel_restrict(xendevicemodel_handle *dmod, domid_t domid)
 {
     return osdep_xendevicemodel_restrict(dmod, domid);
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 3b53471af0..3d530d948f 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -7,12 +7,15 @@
 #include <xen/guest_access.h>
 #include <xen/dm.h>
 #include <xen/hypercall.h>
+#include <xen/iocap.h>
+#include <xen/iommu.h>
 #include <xen/ioreq.h>
 #include <xen/nospec.h>
 #include <xen/sched.h>
 
 #include <asm/hap.h>
 #include <asm/hvm/cacheattr.h>
+#include <asm/msi.h>
 #include <asm/shadow.h>
 
 #include <xsm/xsm.h>
@@ -322,6 +325,25 @@ static int inject_event(struct domain *d,
     return 0;
 }
 
+static uint32_t msi_addr_to_gflags(uint64_t addr, uint32_t data, bool masked)
+{
+    uint32_t tmp = (uint32_t)addr;
+
+    return MASK_INSR(MASK_EXTR(tmp, MSI_ADDR_DEST_ID_MASK),
+                     XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) |
+           MASK_INSR(MASK_EXTR(tmp, MSI_ADDR_EXT_DEST_ID_MASK),
+                     XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK) |
+           MASK_INSR(MASK_EXTR(tmp, MSI_ADDR_REDIRECTION_MASK),
+                     XEN_DOMCTL_VMSI_X86_RH_MASK) |
+           MASK_INSR(MASK_EXTR(tmp, MSI_ADDR_DESTMODE_MASK),
+                     XEN_DOMCTL_VMSI_X86_DM_MASK) |
+           MASK_INSR(MASK_EXTR(data, MSI_DATA_DELIVERY_MODE_MASK),
+                     XEN_DOMCTL_VMSI_X86_DELIV_MASK) |
+           MASK_INSR(MASK_EXTR(data, MSI_DATA_TRIGGER_MASK),
+                     XEN_DOMCTL_VMSI_X86_TRIG_MASK) |
+           (masked ? 0 : XEN_DOMCTL_VMSI_X86_UNMASKED);
+}
+
 int dm_op(const struct dmop_args *op_args)
 {
     struct domain *d;
@@ -350,6 +372,8 @@ int dm_op(const struct dmop_args *op_args)
         [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
         [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
         [XEN_DMOP_nr_vcpus]                         = sizeof(struct xen_dm_op_nr_vcpus),
+        [XEN_DMOP_bind_pt_msi_irq]                  = sizeof(struct xen_dm_op_bind_pt_msi_irq),
+        [XEN_DMOP_unbind_pt_msi_irq]                = sizeof(struct xen_dm_op_bind_pt_msi_irq),
     };
 
     rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
@@ -607,6 +631,83 @@ int dm_op(const struct dmop_args *op_args)
         break;
     }
 
+    case XEN_DMOP_bind_pt_msi_irq:
+    {
+        const struct xen_dm_op_bind_pt_msi_irq *data =
+            &op.u.bind_pt_msi_irq;
+        struct xen_domctl_bind_pt_irq bind = {
+            .machine_irq = data->machine_irq,
+            .irq_type = PT_IRQ_TYPE_MSI,
+        };
+        int irq;
+
+        rc = -EINVAL;
+        if ( data->pad0 || data->pad1 )
+            break;
+
+        if ( data->flags & ~XEN_DMOP_MSI_FLAG_UNMASKED )
+            break;
+
+        irq = domain_pirq_to_irq(d, bind.machine_irq);
+
+        rc = -EPERM;
+        if ( irq <= 0 || !irq_access_permitted(current->domain, irq) )
+            break;
+
+        bind.u.msi.gvec = data->gvec;
+        bind.u.msi.gflags =
+            msi_addr_to_gflags(data->addr, data->data,
+                               !(data->flags & XEN_DMOP_MSI_FLAG_UNMASKED));
+        bind.u.msi.gtable = data->gtable;
+
+        rc = -ESRCH;
+        if ( is_iommu_enabled(d) )
+        {
+            pcidevs_lock();
+            rc = pt_irq_create_bind(d, &bind);
+            pcidevs_unlock();
+        }
+        if ( rc < 0 )
+            printk(XENLOG_G_ERR
+                   "pt_irq_create_bind failed (%ld) for dom%d\n",
+                   rc, d->domain_id);
+        break;
+    }
+
+    case XEN_DMOP_unbind_pt_msi_irq:
+    {
+        const struct xen_dm_op_bind_pt_msi_irq *data =
+            &op.u.bind_pt_msi_irq;
+        struct xen_domctl_bind_pt_irq bind = {
+            .machine_irq = data->machine_irq,
+            .irq_type = PT_IRQ_TYPE_MSI,
+        };
+        int irq;
+
+        rc = -EINVAL;
+        if ( data->pad0 || data->pad1 )
+            break;
+
+        irq = domain_pirq_to_irq(d, bind.machine_irq);
+
+        rc = -EPERM;
+        if ( irq <= 0 || !irq_access_permitted(current->domain, irq) )
+            break;
+
+        rc = -ESRCH;
+        if ( is_iommu_enabled(d) )
+        {
+            pcidevs_lock();
+            rc = pt_irq_destroy_bind(d, &bind);
+            pcidevs_unlock();
+        }
+        if ( rc < 0 )
+            printk(XENLOG_G_ERR
+                   "pt_irq_destroy_bind failed (%ld) for dom%d\n",
+                   rc, d->domain_id);
+        break;
+    }
+
     default:
         rc = ioreq_server_dm_op(&op, d, &const_op);
         break;
@@ -643,6 +744,7 @@ CHECK_dm_op_remote_shutdown;
 CHECK_dm_op_relocate_memory;
 CHECK_dm_op_pin_memory_cacheattr;
 CHECK_dm_op_nr_vcpus;
+CHECK_dm_op_bind_pt_msi_irq;
 
 int compat_dm_op(
     domid_t domid, unsigned int nr_bufs, XEN_GUEST_HANDLE_PARAM(void) bufs)
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index 2bf0fdc1ae..fd0f3a6a99 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -444,6 +444,28 @@ struct xen_dm_op_nr_vcpus {
 };
 typedef struct xen_dm_op_nr_vcpus xen_dm_op_nr_vcpus_t;
 
+#define XEN_DMOP_bind_pt_msi_irq   21
+#define XEN_DMOP_unbind_pt_msi_irq 22
+
+struct xen_dm_op_bind_pt_msi_irq {
+    /* IN - physical IRQ (pirq) */
+    uint32_t machine_irq;
+    /* IN - guest vector */
+    uint8_t  gvec;
+    uint8_t  pad0;
+    uint16_t pad1;
+    /* IN - MSI data (vector, delivery, trigger) */
+    uint32_t data;
+    /* IN - flags */
+    uint32_t flags;
+#define XEN_DMOP_MSI_FLAG_UNMASKED (1u << 0)
+    /* IN - MSI address (0xfeexxxxx, includes ext dest) */
+    uint64_aligned_t addr;
+    /* IN - MSI-X table base GFN, 0 for MSI */
+    uint64_aligned_t gtable;
+};
+typedef struct xen_dm_op_bind_pt_msi_irq xen_dm_op_bind_pt_msi_irq_t;
+
 struct xen_dm_op {
     uint32_t op;
     uint32_t pad;
@@ -468,6 +490,7 @@ struct xen_dm_op {
         xen_dm_op_relocate_memory_t relocate_memory;
         xen_dm_op_pin_memory_cacheattr_t pin_memory_cacheattr;
         xen_dm_op_nr_vcpus_t nr_vcpus;
+        xen_dm_op_bind_pt_msi_irq_t bind_pt_msi_irq;
     } u;
 };
 
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9d08dcc4bb..9236394786 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -107,6 +107,7 @@
 ?	dm_op_map_mem_type_to_ioreq_server hvm/dm_op.h
 ?	dm_op_modified_memory		hvm/dm_op.h
 ?	dm_op_nr_vcpus			hvm/dm_op.h
+?	dm_op_bind_pt_msi_irq		hvm/dm_op.h
 ?	dm_op_pin_memory_cacheattr	hvm/dm_op.h
 ?	dm_op_relocate_memory		hvm/dm_op.h
 ?	dm_op_remote_shutdown		hvm/dm_op.h
-- 
2.51.0



--
Julian Vetter | Vates Hypervisor & Kernel Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops
  2026-03-09 12:31 ` [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops Julian Vetter
@ 2026-03-09 13:38   ` Roger Pau Monné
  2026-03-11 16:04   ` Jan Beulich
  2026-03-12 11:30   ` Jan Beulich
  2 siblings, 0 replies; 23+ messages in thread
From: Roger Pau Monné @ 2026-03-09 13:38 UTC (permalink / raw)
  To: Julian Vetter
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
	Michal Orzel, Stefano Stabellini, Juergen Gross, Julien Grall

On Mon, Mar 09, 2026 at 12:31:03PM +0000, Julian Vetter wrote:
> Add two DM ops for MSI passthrough IRQs. These new DM ops take the raw
> MSI address and data fields rather than a pre-decoded gflags values. Xen
> decodes the destination ID via msi_addr_to_gflags(), including any
> extended destination bits in address[11:5]. This means the device model
> does not need to understand the extended destination ID encoding, and
> simply forwards the MSI address it observes from the guest.
> 
> Signed-off-by: Julian Vetter <julian.vetter@vates.tech>
> ---
> Changes in V3:
> - New patch addressing feedback from Roger
> ---
>  tools/include/xendevicemodel.h |  37 ++++++++++++
>  tools/libs/devicemodel/core.c  |  44 ++++++++++++++
>  xen/arch/x86/hvm/dm.c          | 102 +++++++++++++++++++++++++++++++++
>  xen/include/public/hvm/dm_op.h |  23 ++++++++
>  xen/include/xlat.lst           |   1 +
>  5 files changed, 207 insertions(+)
> 
> diff --git a/tools/include/xendevicemodel.h b/tools/include/xendevicemodel.h
> index 227e7fd810..0d5d7b0ff1 100644
> --- a/tools/include/xendevicemodel.h
> +++ b/tools/include/xendevicemodel.h
> @@ -375,6 +375,43 @@ int xendevicemodel_nr_vcpus(
>   */
>  int xendevicemodel_restrict(xendevicemodel_handle *dmod, domid_t domid);
>  
> +/**
> + * This function binds a passthrough physical IRQ to a guest MSI vector
> + * using raw MSI address/data fields. Unlike XEN_DOMCTL_bind_pt_irq,
> + * this interface supports extended (15-bit) destination IDs by having
> + * Xen decode the MSI address internally.
> + *
> + * @parm dmod a handle to an open devicemodel interface.
> + * @parm domid the domain id to be serviced.
> + * @parm machine_irq the physical IRQ number (pirq).
> + * @parm gvec the guest interrupt vector.
> + * @parm msi_addr the MSI address (0xfeexxxxx, includes extended dest ID).
> + * @parm msi_data the MSI data (vector, delivery mode, trigger mode).
> + * @parm gtable the MSI-X table base GFN, or 0 for plain MSI.
> + * @parm unmasked if non-zero, leave the IRQ unmasked after binding.
> + * @return 0 on success, -1 on failure.
> + */
> +int xendevicemodel_bind_pt_msi_irq(
> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t machine_irq,
> +    uint8_t gvec, uint64_t msi_addr, uint32_t msi_data, uint64_t gtable,
> +    int unmasked);
> +
> +/**
> + * This function unbinds a passthrough physical IRQ previously bound
> + * with xendevicemodel_bind_pt_msi_irq.
> + *
> + * @parm dmod a handle to an open devicemodel interface.
> + * @parm domid the domain id to be serviced.
> + * @parm machine_irq the physical IRQ number (pirq).
> + * @parm gvec the guest interrupt vector.
> + * @parm msi_addr the MSI address.
> + * @parm msi_data the MSI data.
> + * @return 0 on success, -1 on failure.
> + */
> +int xendevicemodel_unbind_pt_msi_irq(
> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t machine_irq,
> +    uint8_t gvec, uint64_t msi_addr, uint32_t msi_data);
> +
>  #endif /* XENDEVICEMODEL_H */
>  
>  /*
> diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
> index 8e619eeb0a..4a52fe4750 100644
> --- a/tools/libs/devicemodel/core.c
> +++ b/tools/libs/devicemodel/core.c
> @@ -645,6 +645,50 @@ int xendevicemodel_nr_vcpus(
>      return 0;
>  }
>  
> +int xendevicemodel_bind_pt_msi_irq(
> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t machine_irq,
> +    uint8_t gvec, uint64_t msi_addr, uint32_t msi_data, uint64_t gtable,
> +    int unmasked)
> +{
> +    struct xen_dm_op op;
> +    struct xen_dm_op_bind_pt_msi_irq *data;
> +
> +    memset(&op, 0, sizeof(op));
> +
> +    op.op = XEN_DMOP_bind_pt_msi_irq;
> +    data = &op.u.bind_pt_msi_irq;
> +
> +    data->machine_irq = machine_irq;
> +    data->gvec = gvec;
> +    data->data = msi_data;
> +    data->addr = msi_addr;
> +    data->gtable = gtable;
> +    if ( unmasked )
> +        data->flags |= XEN_DMOP_MSI_FLAG_UNMASKED;
> +
> +    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> +}
> +
> +int xendevicemodel_unbind_pt_msi_irq(
> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t machine_irq,
> +    uint8_t gvec, uint64_t msi_addr, uint32_t msi_data)
> +{
> +    struct xen_dm_op op;
> +    struct xen_dm_op_bind_pt_msi_irq *data;
> +
> +    memset(&op, 0, sizeof(op));
> +
> +    op.op = XEN_DMOP_unbind_pt_msi_irq;
> +    data = &op.u.bind_pt_msi_irq;
> +
> +    data->machine_irq = machine_irq;
> +    data->gvec = gvec;
> +    data->data = msi_data;
> +    data->addr = msi_addr;
> +
> +    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> +}
> +
>  int xendevicemodel_restrict(xendevicemodel_handle *dmod, domid_t domid)
>  {
>      return osdep_xendevicemodel_restrict(dmod, domid);
> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
> index 3b53471af0..3d530d948f 100644
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -7,12 +7,15 @@
>  #include <xen/guest_access.h>
>  #include <xen/dm.h>
>  #include <xen/hypercall.h>
> +#include <xen/iocap.h>
> +#include <xen/iommu.h>
>  #include <xen/ioreq.h>
>  #include <xen/nospec.h>
>  #include <xen/sched.h>
>  
>  #include <asm/hap.h>
>  #include <asm/hvm/cacheattr.h>
> +#include <asm/msi.h>
>  #include <asm/shadow.h>
>  
>  #include <xsm/xsm.h>
> @@ -322,6 +325,25 @@ static int inject_event(struct domain *d,
>      return 0;
>  }
>  
> +static uint32_t msi_addr_to_gflags(uint64_t addr, uint32_t data, bool masked)
> +{
> +    uint32_t tmp = (uint32_t)addr;
> +
> +    return MASK_INSR(MASK_EXTR(tmp, MSI_ADDR_DEST_ID_MASK),
> +                     XEN_DOMCTL_VMSI_X86_DEST_ID_MASK) |
> +           MASK_INSR(MASK_EXTR(tmp, MSI_ADDR_EXT_DEST_ID_MASK),
> +                     XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK) |
> +           MASK_INSR(MASK_EXTR(tmp, MSI_ADDR_REDIRECTION_MASK),
> +                     XEN_DOMCTL_VMSI_X86_RH_MASK) |
> +           MASK_INSR(MASK_EXTR(tmp, MSI_ADDR_DESTMODE_MASK),
> +                     XEN_DOMCTL_VMSI_X86_DM_MASK) |
> +           MASK_INSR(MASK_EXTR(data, MSI_DATA_DELIVERY_MODE_MASK),
> +                     XEN_DOMCTL_VMSI_X86_DELIV_MASK) |
> +           MASK_INSR(MASK_EXTR(data, MSI_DATA_TRIGGER_MASK),
> +                     XEN_DOMCTL_VMSI_X86_TRIG_MASK) |
> +           (masked ? 0 : XEN_DOMCTL_VMSI_X86_UNMASKED);
> +}
> +
>  int dm_op(const struct dmop_args *op_args)
>  {
>      struct domain *d;
> @@ -350,6 +372,8 @@ int dm_op(const struct dmop_args *op_args)
>          [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
>          [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
>          [XEN_DMOP_nr_vcpus]                         = sizeof(struct xen_dm_op_nr_vcpus),
> +        [XEN_DMOP_bind_pt_msi_irq]                  = sizeof(struct xen_dm_op_bind_pt_msi_irq),
> +        [XEN_DMOP_unbind_pt_msi_irq]                = sizeof(struct xen_dm_op_bind_pt_msi_irq),
>      };
>  
>      rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
> @@ -607,6 +631,83 @@ int dm_op(const struct dmop_args *op_args)
>          break;
>      }
>  
> +    case XEN_DMOP_bind_pt_msi_irq:
> +    {
> +        const struct xen_dm_op_bind_pt_msi_irq *data =
> +            &op.u.bind_pt_msi_irq;
> +        struct xen_domctl_bind_pt_irq bind = {
> +            .machine_irq = data->machine_irq,
> +            .irq_type = PT_IRQ_TYPE_MSI,
> +        };
> +        int irq;
> +
> +        rc = -EINVAL;
> +        if ( data->pad0 || data->pad1 )
> +            break;
> +
> +        if ( data->flags & ~XEN_DMOP_MSI_FLAG_UNMASKED )
> +            break;
> +
> +        irq = domain_pirq_to_irq(d, bind.machine_irq);
> +
> +        rc = -EPERM;
> +        if ( irq <= 0 || !irq_access_permitted(current->domain, irq) )
> +            break;
> +
> +        bind.u.msi.gvec = data->gvec;
> +        bind.u.msi.gflags =
> +            msi_addr_to_gflags(data->addr, data->data,
> +                               !(data->flags & XEN_DMOP_MSI_FLAG_UNMASKED));
> +        bind.u.msi.gtable = data->gtable;

Retrofitting the new interface into the old one seems weird.  I would
do it the other way around - implement the old bind domctl on top of
an interface that's more suited for the new DM op.

That way you avoid having to expand gflags with extended destination
field.

> +
> +        rc = -ESRCH;
> +        if ( is_iommu_enabled(d) )
> +        {
> +            pcidevs_lock();
> +            rc = pt_irq_create_bind(d, &bind);
> +            pcidevs_unlock();
> +        }
> +        if ( rc < 0 )
> +            printk(XENLOG_G_ERR
> +                   "pt_irq_create_bind failed (%ld) for dom%d\n",
> +                   rc, d->domain_id);
> +        break;
> +    }
> +
> +    case XEN_DMOP_unbind_pt_msi_irq:
> +    {
> +        const struct xen_dm_op_bind_pt_msi_irq *data =
> +            &op.u.bind_pt_msi_irq;
> +        struct xen_domctl_bind_pt_irq bind = {
> +            .machine_irq = data->machine_irq,
> +            .irq_type = PT_IRQ_TYPE_MSI,
> +        };
> +        int irq;
> +
> +        rc = -EINVAL;
> +        if ( data->pad0 || data->pad1 )
> +            break;
> +
> +        irq = domain_pirq_to_irq(d, bind.machine_irq);
> +
> +        rc = -EPERM;
> +        if ( irq <= 0 || !irq_access_permitted(current->domain, irq) )
> +            break;
> +
> +        rc = -ESRCH;
> +        if ( is_iommu_enabled(d) )
> +        {
> +            pcidevs_lock();
> +            rc = pt_irq_destroy_bind(d, &bind);
> +            pcidevs_unlock();
> +        }
> +        if ( rc < 0 )
> +            printk(XENLOG_G_ERR
> +                   "pt_irq_destroy_bind failed (%ld) for dom%d\n",
> +                   rc, d->domain_id);
> +        break;
> +    }
> +
>      default:
>          rc = ioreq_server_dm_op(&op, d, &const_op);
>          break;
> @@ -643,6 +744,7 @@ CHECK_dm_op_remote_shutdown;
>  CHECK_dm_op_relocate_memory;
>  CHECK_dm_op_pin_memory_cacheattr;
>  CHECK_dm_op_nr_vcpus;
> +CHECK_dm_op_bind_pt_msi_irq;
>  
>  int compat_dm_op(
>      domid_t domid, unsigned int nr_bufs, XEN_GUEST_HANDLE_PARAM(void) bufs)
> diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
> index 2bf0fdc1ae..fd0f3a6a99 100644
> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -444,6 +444,28 @@ struct xen_dm_op_nr_vcpus {
>  };
>  typedef struct xen_dm_op_nr_vcpus xen_dm_op_nr_vcpus_t;
>  
> +#define XEN_DMOP_bind_pt_msi_irq   21
> +#define XEN_DMOP_unbind_pt_msi_irq 22
> +
> +struct xen_dm_op_bind_pt_msi_irq {
> +    /* IN - physical IRQ (pirq) */
> +    uint32_t machine_irq;
> +    /* IN - guest vector */
> +    uint8_t  gvec;

If you pass the address and data MSI fields there's no need to also
pass the vector, this can be obtained by Xen from the MSI fields?

> +    uint8_t  pad0;
> +    uint16_t pad1;
> +    /* IN - MSI data (vector, delivery, trigger) */
> +    uint32_t data;
> +    /* IN - flags */
> +    uint32_t flags;
> +#define XEN_DMOP_MSI_FLAG_UNMASKED (1u << 0)
> +    /* IN - MSI address (0xfeexxxxx, includes ext dest) */
> +    uint64_aligned_t addr;
> +    /* IN - MSI-X table base GFN, 0 for MSI */
> +    uint64_aligned_t gtable;
> +};
> +typedef struct xen_dm_op_bind_pt_msi_irq xen_dm_op_bind_pt_msi_irq_t;
> +
>  struct xen_dm_op {
>      uint32_t op;
>      uint32_t pad;
> @@ -468,6 +490,7 @@ struct xen_dm_op {
>          xen_dm_op_relocate_memory_t relocate_memory;
>          xen_dm_op_pin_memory_cacheattr_t pin_memory_cacheattr;
>          xen_dm_op_nr_vcpus_t nr_vcpus;
> +        xen_dm_op_bind_pt_msi_irq_t bind_pt_msi_irq;
>      } u;
>  };
>  
> diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
> index 9d08dcc4bb..9236394786 100644
> --- a/xen/include/xlat.lst
> +++ b/xen/include/xlat.lst
> @@ -107,6 +107,7 @@
>  ?	dm_op_map_mem_type_to_ioreq_server hvm/dm_op.h
>  ?	dm_op_modified_memory		hvm/dm_op.h
>  ?	dm_op_nr_vcpus			hvm/dm_op.h
> +?	dm_op_bind_pt_msi_irq		hvm/dm_op.h
>  ?	dm_op_pin_memory_cacheattr	hvm/dm_op.h
>  ?	dm_op_relocate_memory		hvm/dm_op.h
>  ?	dm_op_remote_shutdown		hvm/dm_op.h
> -- 
> 2.51.0
> 
> 
> 
> --
> Julian Vetter | Vates Hypervisor & Kernel Developer
> 
> XCP-ng & Xen Orchestra - Vates solutions
> 
> web: https://vates.tech
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops
  2026-03-09 12:31 ` [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops Julian Vetter
  2026-03-09 13:38   ` Roger Pau Monné
@ 2026-03-11 16:04   ` Jan Beulich
  2026-03-12 11:53     ` Jan Beulich
  2026-03-12 11:30   ` Jan Beulich
  2 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2026-03-11 16:04 UTC (permalink / raw)
  To: Julian Vetter
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 09.03.2026 13:31, Julian Vetter wrote:
> @@ -607,6 +631,83 @@ int dm_op(const struct dmop_args *op_args)
>          break;
>      }
>  
> +    case XEN_DMOP_bind_pt_msi_irq:
> +    {
> +        const struct xen_dm_op_bind_pt_msi_irq *data =
> +            &op.u.bind_pt_msi_irq;
> +        struct xen_domctl_bind_pt_irq bind = {
> +            .machine_irq = data->machine_irq,
> +            .irq_type = PT_IRQ_TYPE_MSI,
> +        };
> +        int irq;
> +
> +        rc = -EINVAL;
> +        if ( data->pad0 || data->pad1 )
> +            break;
> +
> +        if ( data->flags & ~XEN_DMOP_MSI_FLAG_UNMASKED )
> +            break;
> +
> +        irq = domain_pirq_to_irq(d, bind.machine_irq);
> +
> +        rc = -EPERM;
> +        if ( irq <= 0 || !irq_access_permitted(current->domain, irq) )
> +            break;
> +
> +        bind.u.msi.gvec = data->gvec;
> +        bind.u.msi.gflags =
> +            msi_addr_to_gflags(data->addr, data->data,
> +                               !(data->flags & XEN_DMOP_MSI_FLAG_UNMASKED));
> +        bind.u.msi.gtable = data->gtable;
> +
> +        rc = -ESRCH;
> +        if ( is_iommu_enabled(d) )
> +        {
> +            pcidevs_lock();
> +            rc = pt_irq_create_bind(d, &bind);
> +            pcidevs_unlock();

I understand the same locking is used at the other call site, but it's as
questionable there as it is here. We should try hard to avoid use of this
global lock when something lighter-weight would do.

> +        }
> +        if ( rc < 0 )
> +            printk(XENLOG_G_ERR
> +                   "pt_irq_create_bind failed (%ld) for dom%d\n",
> +                   rc, d->domain_id);

%pd please in new code. Also this message needs to be distinguishable
from that in arch_do_domctl().

> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -444,6 +444,28 @@ struct xen_dm_op_nr_vcpus {
>  };
>  typedef struct xen_dm_op_nr_vcpus xen_dm_op_nr_vcpus_t;
>  
> +#define XEN_DMOP_bind_pt_msi_irq   21
> +#define XEN_DMOP_unbind_pt_msi_irq 22
> +
> +struct xen_dm_op_bind_pt_msi_irq {
> +    /* IN - physical IRQ (pirq) */
> +    uint32_t machine_irq;
> +    /* IN - guest vector */
> +    uint8_t  gvec;
> +    uint8_t  pad0;
> +    uint16_t pad1;
> +    /* IN - MSI data (vector, delivery, trigger) */
> +    uint32_t data;
> +    /* IN - flags */
> +    uint32_t flags;
> +#define XEN_DMOP_MSI_FLAG_UNMASKED (1u << 0)
> +    /* IN - MSI address (0xfeexxxxx, includes ext dest) */

The address given is x86-specific, while the header is arch-generic. (If
this was to be an x86-only interface, ...

> +    uint64_aligned_t addr;

... this also wouldn't need to be a 64-bit address, for example.)

> +    /* IN - MSI-X table base GFN, 0 for MSI */
> +    uint64_aligned_t gtable;
> +};

Commentary throughout also leaves unclear which of the fields are actually
meaningful for XEN_DMOP_unbind_pt_msi_irq. In a new interface, the unused
fields would want checking to be zero. Yet better may be to have unbind
have its own, much smaller structure.

> +typedef struct xen_dm_op_bind_pt_msi_irq xen_dm_op_bind_pt_msi_irq_t;

Is this typedef actually needed anywhere? Clearly ...

> @@ -468,6 +490,7 @@ struct xen_dm_op {
>          xen_dm_op_relocate_memory_t relocate_memory;
>          xen_dm_op_pin_memory_cacheattr_t pin_memory_cacheattr;
>          xen_dm_op_nr_vcpus_t nr_vcpus;
> +        xen_dm_op_bind_pt_msi_irq_t bind_pt_msi_irq;

... the plain struct can be used e.g. here. Imo we should stop cluttering
the namespace with typedef-s which aren't actually needed. (A case where
they are needed is when a guest handle needs defining for the type.)

> --- a/xen/include/xlat.lst
> +++ b/xen/include/xlat.lst
> @@ -107,6 +107,7 @@
>  ?	dm_op_map_mem_type_to_ioreq_server hvm/dm_op.h
>  ?	dm_op_modified_memory		hvm/dm_op.h
>  ?	dm_op_nr_vcpus			hvm/dm_op.h
> +?	dm_op_bind_pt_msi_irq		hvm/dm_op.h
>  ?	dm_op_pin_memory_cacheattr	hvm/dm_op.h
>  ?	dm_op_relocate_memory		hvm/dm_op.h
>  ?	dm_op_remote_shutdown		hvm/dm_op.h

I'm curious: How did you determine the insertion point?

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops
  2026-03-11 16:04   ` Jan Beulich
@ 2026-03-12 11:53     ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2026-03-12 11:53 UTC (permalink / raw)
  To: Julian Vetter
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 11.03.2026 17:04, Jan Beulich wrote:
> On 09.03.2026 13:31, Julian Vetter wrote:
>> @@ -607,6 +631,83 @@ int dm_op(const struct dmop_args *op_args)
>>          break;
>>      }
>>  
>> +    case XEN_DMOP_bind_pt_msi_irq:
>> +    {
>> +        const struct xen_dm_op_bind_pt_msi_irq *data =
>> +            &op.u.bind_pt_msi_irq;
>> +        struct xen_domctl_bind_pt_irq bind = {
>> +            .machine_irq = data->machine_irq,
>> +            .irq_type = PT_IRQ_TYPE_MSI,
>> +        };
>> +        int irq;
>> +
>> +        rc = -EINVAL;
>> +        if ( data->pad0 || data->pad1 )
>> +            break;
>> +
>> +        if ( data->flags & ~XEN_DMOP_MSI_FLAG_UNMASKED )
>> +            break;
>> +
>> +        irq = domain_pirq_to_irq(d, bind.machine_irq);
>> +
>> +        rc = -EPERM;
>> +        if ( irq <= 0 || !irq_access_permitted(current->domain, irq) )
>> +            break;
>> +
>> +        bind.u.msi.gvec = data->gvec;
>> +        bind.u.msi.gflags =
>> +            msi_addr_to_gflags(data->addr, data->data,
>> +                               !(data->flags & XEN_DMOP_MSI_FLAG_UNMASKED));
>> +        bind.u.msi.gtable = data->gtable;
>> +
>> +        rc = -ESRCH;
>> +        if ( is_iommu_enabled(d) )
>> +        {
>> +            pcidevs_lock();
>> +            rc = pt_irq_create_bind(d, &bind);
>> +            pcidevs_unlock();
> 
> I understand the same locking is used at the other call site, but it's as
> questionable there as it is here. We should try hard to avoid use of this
> global lock when something lighter-weight would do.

I should perhaps have added that I have this on my own list of things to look
into, just that I didn't get to it yet.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops
  2026-03-09 12:31 ` [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops Julian Vetter
  2026-03-09 13:38   ` Roger Pau Monné
  2026-03-11 16:04   ` Jan Beulich
@ 2026-03-12 11:30   ` Jan Beulich
  2 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2026-03-12 11:30 UTC (permalink / raw)
  To: Julian Vetter
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 09.03.2026 13:31, Julian Vetter wrote:
> Add two DM ops for MSI passthrough IRQs. These new DM ops take the raw
> MSI address and data fields rather than a pre-decoded gflags values. Xen
> decodes the destination ID via msi_addr_to_gflags(), including any
> extended destination bits in address[11:5]. This means the device model
> does not need to understand the extended destination ID encoding, and
> simply forwards the MSI address it observes from the guest.
> 
> Signed-off-by: Julian Vetter <julian.vetter@vates.tech>

Since there's no cover letter to reply to, this one will need to do. What
I'm missing in the series is leveraging of the (limited) cleanup potential.
You're obsoleting the PT_IRQ_TYPE_MSI sub-case of
XEN_DOMCTL_{,un}bind_pt_irq. Respective libxc functions could now call the
libdevicemodel ones, rather than invoking those domctl-s. The domctl-s
could then reject attempts to use that sub-case.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore
  2026-03-09 12:31 [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Julian Vetter
                   ` (5 preceding siblings ...)
  2026-03-09 12:31 ` [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops Julian Vetter
@ 2026-03-10 16:55 ` Jan Beulich
  6 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2026-03-10 16:55 UTC (permalink / raw)
  To: Julian Vetter
  Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
	Stefano Stabellini, Juergen Gross, Julien Grall, xen-devel

On 09.03.2026 13:31, Julian Vetter wrote:
> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -594,6 +594,35 @@ int vioapic_get_trigger_mode(const struct domain *d, unsigned int gsi)
>      return vioapic->redirtbl[pin].fields.trig_mode;
>  }
>  
> +static int cf_check ioapic_check(const struct domain *d, hvm_domain_context_t *h)
> +{
> +    const HVM_SAVE_TYPE(IOAPIC) *s;
> +    unsigned int i;
> +
> +    if ( !has_vioapic(d) )
> +        return -ENODEV;
> +
> +    s = hvm_get_entry(IOAPIC, h);
> +    if ( !s )
> +        return -ENODATA;
> +
> +    for ( i = 0; i < ARRAY_SIZE(s->redirtbl); i++ )
> +    {
> +        const union vioapic_redir_entry *e = &s->redirtbl[i];
> +
> +        /*
> +         * Check to-be-loaded values are within valid range, for them to
> +         * represent actually reachable state.
> +         */
> +        if ( e->fields.reserve ||
> +             e->fields.reserved[0] || e->fields.reserved[1] ||
> +             e->fields.reserved[2] || e->fields.reserved[3] )
> +            return -EINVAL;

Are comment and code actually in sync? I can't spot anything preventing the
reserved fields to be set by a guest. Such setting would simply be ignored.
(And this is actually why I was asking you to add such a function: By adding
the checks you should have noticed that the fields can be non-zero if a guest
writes them this way. Which in turn may pose a problem for your extid
intentions.)

> +    }
> +
> +    return 0;
> +}

If it wasn't for the above, something like this may do for starters. Would
be nice if base_address, ioregsel, and id also had some sanity checking
applied.

However, does this build at all with the function unused? You lack ...

>  static int cf_check ioapic_save(struct vcpu *v, hvm_domain_context_t *h)
>  {
>      const struct domain *d = v->domain;

... a hunk altering the HVM_REGISTER_SAVE_RESTORE() further down.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-03-25 10:59 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 12:31 [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Julian Vetter
2026-03-09 12:31 ` [PATCH v3 4/7] x86/passthrough: Use extended destination ID in PT MSI bind/unbind Julian Vetter
2026-03-11 15:30   ` Jan Beulich
2026-03-09 12:31 ` [PATCH v3 2/7] x86/msi: Define extended destination ID masks and IO-APIC RTE fields Julian Vetter
2026-03-10 17:09   ` Jan Beulich
2026-03-11 10:15   ` Roger Pau Monné
2026-03-09 12:31 ` [PATCH v3 3/7] x86/hvm: Support extended destination IDs in virtual MSI and IO-APIC Julian Vetter
2026-03-11 15:27   ` Jan Beulich
2026-03-12 11:15     ` Jan Beulich
2026-03-16 15:06       ` Julian Vetter
2026-03-19  9:02         ` Jan Beulich
2026-03-09 12:31 ` [PATCH v3 7/7] x86/cpuid: Advertise XEN_HVM_CPUID_EXT_DEST_ID when device model opts in Julian Vetter
2026-03-09 12:31 ` [PATCH v3 6/7] x86/dmop: Add XEN_DMOP_enable_ext_dest_id DM op Julian Vetter
2026-03-09 13:26   ` Roger Pau Monné
2026-03-17 10:22     ` Julian Vetter
2026-03-25 10:58       ` Roger Pau Monné
2026-03-12 11:18   ` Jan Beulich
2026-03-09 12:31 ` [PATCH v3 5/7] x86/dmop: Add XEN_DMOP_{bind,unbind}_pt_msi_irq DM ops Julian Vetter
2026-03-09 13:38   ` Roger Pau Monné
2026-03-11 16:04   ` Jan Beulich
2026-03-12 11:53     ` Jan Beulich
2026-03-12 11:30   ` Jan Beulich
2026-03-10 16:55 ` [PATCH v3 1/7] x86/vioapic: Add ioapic_check() to validate IO-APIC state before restore Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.