qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
To: Sairaj Kodilkar <sarunkod@amd.com>,
	qemu-devel@nongnu.org, Vasant Hegde <vasant.hegde@amd.com>
Cc: pbonzini@redhat.com, richard.henderson@linaro.org,
	eduardo@habkost.net, mst@redhat.com, marcel.apfelbaum@gmail.com
Subject: Re: [PATCH 3/3] amd_iommu: Generate XT interrupts when xt support is enabled
Date: Wed, 3 Dec 2025 13:09:44 -0500	[thread overview]
Message-ID: <9bc52a98-972a-4376-aec6-69e438971801@oracle.com> (raw)
In-Reply-To: <20251118082403.3455-4-sarunkod@amd.com>

Hi Sairaj,

On 11/18/25 3:24 AM, Sairaj Kodilkar wrote:
> AMD IOMMU uses MMIO registers 0x170-0x180 to generate the interrupts
> when guest enables xt support through control register [IntCapXTEn]. The

"the interrupts" is a bit vague, considering that it is only a specific 
type of interrupts that originate from the IOMMU itself.
I think the quote from the spec is pretty clear and we might want to use 
just that:

"
When MMIO 0x18[IntCapXTEn]=1, interrupts originating from the IOMMU 
itself are sent based on the programming in XT IOMMU Interrupt Control 
Registers in MMIO 0x170-0x180 instead of the programming in the IOMMU's 
MSI capability registers.
"

I'd prefer if this was documented in the code comments, but I would go a 
step further and mention the specific interrupts, e.g.:

The mapping to MMIO offsets and interrupts controlled by each is as follows:

XT IOMMU General Interrupt Control Register (0x170): Event Log exception 
interrupts and Completion wait interrupts

XT IOMMU PPR Interrupt Control Register (0x178): PPR Log exception 
interrupts

XT IOMMU GA Log Interrupt Control Register (0x180): GA Log exception 
interrupts (irrelevant in vIOMMU case)


> guest programs these registers with appropriate vector and destination
> ID instead of writing to PCI MSI capability.
> 
> Current AMD vIOMMU is capable of generating interrupts only through PCI
> MSI capability and does not care about xt mode. Because of this AMD
> vIOMMU cannot generate event log interrupts when the guest has enabled
> xt mode.
> 

At first I thought that statement was not correct. If that were the 
case, I was wondering why we don't currently have issues with Completion 
Wait, since they are also controlled by MMIO 0x170 offset. But the Linux 
driver doesn't rely on the completion interrupt AFAICT, it just sets the 
completion store bit and monitors the semaphore to detect the 
completion. We might not be so lucky with other OSs, so good catch.

> Introduce a new flag "intcapxten" which is set when guest writes control

And similarly to the xten case in PATCH 2, we also need to migrate this 
new field. It can be added to the your proposed vmstate_xt.

> register [IntCapXTEn] (bit 51) and use vector and destination field in
> the XT MMIO register (0x170) to support XT mode.
> 
> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
> ---
>   hw/i386/amd_iommu.c  | 51 ++++++++++++++++++++++++++++++++++++++------
>   hw/i386/amd_iommu.h  |  3 +++
>   hw/i386/trace-events |  1 +
>   3 files changed, 49 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 7f08fc31111a..c6bc13c93679 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -200,18 +200,52 @@ static void amdvi_assign_andq(AMDVIState *s, hwaddr addr, uint64_t val)
>      amdvi_writeq_raw(s, addr, amdvi_readq(s, addr) & val);
>   }
>   
> +union mmio_xt_intr {
> +    uint64_t val;
> +    struct {
> +        uint64_t rsvd_1:2,
> +                 destination_mode:1,
> +                 rsvd_2:5,
> +                 destination_lo:24,
> +                 vector:8,
> +                 delivery_mode:1,
> +                 rsvd_3:15,
> +                 destination_hi:8;
> +    };
> +};

We should define this union mmio_xt_intr in amd_iommu.h header, right 
after struct irte_ga.

> +
> +static void amdvi_build_xt_msi_msg(AMDVIState *s, MSIMessage *msg)
> +{
> +    union mmio_xt_intr xt_reg;
> +    struct X86IOMMUIrq irq;
> +
> +    xt_reg.val = amdvi_readq(s, AMDVI_MMIO_XT_GEN_INTR);
> +
> +    irq.vector = xt_reg.vector;
> +    irq.delivery_mode = xt_reg.delivery_mode;
> +    irq.dest_mode = xt_reg.destination_mode;
> +    irq.dest = (xt_reg.destination_hi << 24) | xt_reg.destination_lo;
> +    irq.trigger_mode = 0;
> +    irq.redir_hint = 0;
> +

Based on my reading from the MSI details, hardcoding redir_hint=0 
results in the dest_mode field essentially being a nop, since the 
messages will be delivered in physical mode i.e. only to the APIC ID in 
the dest field.

 From Intel's SDM Vol3, 12.11.1 Message Address Register Format:

• If RH is 0, then the DM bit is ignored and the message is sent ahead 
independent of whether the physical or logical destination mode is used.

I am not sure if the current implementations of AMD IOMMU drivers ever 
use the logical mode, but I am thinking we should at least catch that 
case with a warning e.g.
if (xt_reg.destination_mode) {
     error_report_once(...
}

A follow up question, since you chose to explicitly set redir_hint to 0, 
and something that bothers me about the remap functions is that they set:

irq->redir_hint = irte.lo.fields_remap.rq_eoi;

where AFAICT, redir_hint and rq_eoi are semantically different and 
control unrelated behaviors. So I've been wondering why these 
assignments are done. No need to answer this specifically, but if you 
have a better understanding of it please let me know.


> +    x86_iommu_irq_to_msi_message(&irq, msg);
> +}
> +
>   static void amdvi_generate_msi_interrupt(AMDVIState *s)
>   {
>       MSIMessage msg = {};
> -    MemTxAttrs attrs = {
> -        .requester_id = pci_requester_id(&s->pci->dev)
> -    };
>   
> -    if (msi_enabled(&s->pci->dev)) {
> +    if (s->intcapxten) {
> +        trace_amdvi_generate_msi_interrupt("XT GEN");
> +        amdvi_build_xt_msi_msg(s, &msg);
> +    } else if (msi_enabled(&s->pci->dev)) {
> +        trace_amdvi_generate_msi_interrupt("MSI");
>           msg = msi_get_message(&s->pci->dev, 0);
> -        address_space_stl_le(&address_space_memory, msg.address, msg.data,
> -                             attrs, NULL);
> +    } else {
> +        trace_amdvi_generate_msi_interrupt("NO MSI");
> +        return;
>       }
> +    apic_get_class(NULL)->send_msi(&msg);

This is great. The method of writing to the address space directly still 
confuses me, using the APIC helper for MSI delivery seems to be the 
appropriate way.

>   }
>   
>   static uint32_t get_next_eventlog_entry(AMDVIState *s)
> @@ -1490,6 +1524,7 @@ static inline void amdvi_mmio_get_name(hwaddr addr,
>       MMIO_REG_TO_STRING(AMDVI_MMIO_PPR_BASE, name)
>       MMIO_REG_TO_STRING(AMDVI_MMIO_PPR_HEAD, name)
>       MMIO_REG_TO_STRING(AMDVI_MMIO_PPR_TAIL, name)
> +    MMIO_REG_TO_STRING(AMDVI_MMIO_XT_GEN_INTR, name)
>       default:
>           name = "UNHANDLED";
>       }
> @@ -1549,6 +1584,7 @@ static void amdvi_handle_control_write(AMDVIState *s)
>                           AMDVI_MMIO_CONTROL_CMDBUFLEN);
>       s->ga_enabled = !!(control & AMDVI_MMIO_CONTROL_GAEN);
>       s->xten = !!(control & AMDVI_MMIO_CONTROL_XTEN) && s->xtsup;
> +    s->intcapxten = !!(control & AMDVI_MMIO_CONTROL_INTCAPXTEN) && s->xtsup;
>   
I think that s->intcapxten should check for s->xten instead of s->xtsup, 
that way we create only one dependency on the input parameter, and the 
rest flows logically from what the guest driver configures.

On that topic (I should have mentioned this on patch 2), I think that it 
might be reasonable to make s->xten also conditional on whether 
s->ga_enabled is true. The spec just says they should both be set:

"In systems with x2APIC enabled, software must set MMIO 0x18[XTEn]=1 and 
MMIO 0x18[GAEn]=1.This enables the use of the 128-bit IRTE format with 
32-bit destination field. Even if Guest Virtual APIC
will not be used, software must set MMIO 0x18[GAEn]=1.
"

my understanding is that we can't support X2APIC mode without 128-bit 
IRTE format (which is controlled by ga_enabled), so it makes sense to 
block X2APIC (i.e. xten) unless ga_enabled is already set.

So in the end we'd have something like:

s->ga_enabled = !!(control & AMDVI_MMIO_CONTROL_GAEN);
s->xten = !!(control & AMDVI_MMIO_CONTROL_XTEN) &&
	  s->xtsup && s->ga_enabled;  // this should go in PATCH 2
s->intcapxten = !!(control & AMDVI_MMIO_CONTROL_INTCAPXTEN) && s->xten;

What do you think?

Thank you,
Alejandro

>       /* update the flags depending on the control register */
>       if (s->cmdbuf_enabled) {
> @@ -1755,6 +1791,9 @@ static void amdvi_mmio_write(void *opaque, hwaddr addr, uint64_t val,
>       case AMDVI_MMIO_STATUS:
>           amdvi_mmio_reg_write(s, size, val, addr);
>           break;
> +    case AMDVI_MMIO_XT_GEN_INTR:
> +        amdvi_mmio_reg_write(s, size, val, addr);
> +        break;
>       }
>   }
>   
> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
> index 32467d0bc241..399a4fb748e5 100644
> --- a/hw/i386/amd_iommu.h
> +++ b/hw/i386/amd_iommu.h
> @@ -57,6 +57,7 @@
>   #define AMDVI_MMIO_EXCL_BASE          0x0020
>   #define AMDVI_MMIO_EXCL_LIMIT         0x0028
>   #define AMDVI_MMIO_EXT_FEATURES       0x0030
> +#define AMDVI_MMIO_XT_GEN_INTR        0x0170
>   #define AMDVI_MMIO_COMMAND_HEAD       0x2000
>   #define AMDVI_MMIO_COMMAND_TAIL       0x2008
>   #define AMDVI_MMIO_EVENT_HEAD         0x2010
> @@ -107,6 +108,7 @@
>   #define AMDVI_MMIO_CONTROL_CMDBUFLEN      (1ULL << 12)
>   #define AMDVI_MMIO_CONTROL_GAEN           (1ULL << 17)
>   #define AMDVI_MMIO_CONTROL_XTEN           (1ULL << 50)
> +#define AMDVI_MMIO_CONTROL_INTCAPXTEN     (1ULL << 51)
>   
>   /* MMIO status register bits */
>   #define AMDVI_MMIO_STATUS_CMDBUF_RUN  (1 << 4)
> @@ -421,6 +423,7 @@ struct AMDVIState {
>       bool ga_enabled;
>       bool xtsup;     /* xtsup=on command line */
>       bool xten;      /* Enable x2apic */
> +    bool intcapxten; /* Enable IOMMU x2apic interrupt generation */
>   
>       /* DMA address translation */
>       bool dma_remap;
> diff --git a/hw/i386/trace-events b/hw/i386/trace-events
> index b704f4f90c3d..fe7aea4507ae 100644
> --- a/hw/i386/trace-events
> +++ b/hw/i386/trace-events
> @@ -114,6 +114,7 @@ amdvi_ir_intctl(uint8_t val) "int_ctl 0x%"PRIx8
>   amdvi_ir_target_abort(const char *str) "%s"
>   amdvi_ir_delivery_mode(const char *str) "%s"
>   amdvi_ir_irte_ga_val(uint64_t hi, uint64_t lo) "hi 0x%"PRIx64" lo 0x%"PRIx64
> +amdvi_generate_msi_interrupt(const char *str) "Mode: %s"
>   
>   # vmport.c
>   vmport_register(unsigned char command, void *func, void *opaque) "command: 0x%02x func: %p opaque: %p"



  reply	other threads:[~2025-12-03 18:10 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-18  8:24 [PATCH 0/3] amd_iommu: Support Generation of IOMMU XT interrupts Sairaj Kodilkar
2025-11-18  8:24 ` [PATCH 1/3] amd_iommu: Use switch case to determine mmio register name Sairaj Kodilkar
2025-11-20  1:36   ` Alejandro Jimenez
2025-11-20  4:43     ` Sairaj Kodilkar
2025-11-20 13:31       ` Alejandro Jimenez
2025-11-21  5:20         ` Sairaj Kodilkar
2025-11-21 16:36           ` Alejandro Jimenez
2025-11-18  8:24 ` [PATCH 2/3] amd_iommu: Turn on XT support only when guest has enabled it Sairaj Kodilkar
2025-11-25 23:04   ` Alejandro Jimenez
2025-11-26  5:12     ` Sairaj Kodilkar
2025-11-26 12:13     ` Sairaj Kodilkar
2025-11-26 22:16       ` Alejandro Jimenez
2025-11-27  6:18         ` Sairaj Kodilkar
2025-11-18  8:24 ` [PATCH 3/3] amd_iommu: Generate XT interrupts when xt support is enabled Sairaj Kodilkar
2025-12-03 18:09   ` Alejandro Jimenez [this message]
2025-12-04  8:17     ` Sairaj Kodilkar
2025-11-19 10:38 ` [PATCH 0/3] amd_iommu: Support Generation of IOMMU XT interrupts Vasant Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9bc52a98-972a-4376-aec6-69e438971801@oracle.com \
    --to=alejandro.j.jimenez@oracle.com \
    --cc=eduardo@habkost.net \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sarunkod@amd.com \
    --cc=vasant.hegde@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).