From: Sairaj Kodilkar <sarunkod@amd.com>
To: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
<qemu-devel@nongnu.org>, <kvm@vger.kernel.org>,
<vasant.hegde@amd.com>, <suravee.suthikulpanit@amd.com>
Cc: <mst@redhat.com>, <imammedo@redhat.com>, <anisinha@redhat.com>,
<marcel.apfelbaum@gmail.com>, <pbonzini@redhat.com>,
<richard.henderson@linaro.org>, <eduardo@habkost.net>,
<yi.l.liu@intel.com>, <eric.auger@redhat.com>,
<zhenzhong.duan@intel.com>, <cohuck@redhat.com>,
<seanjc@google.com>, <iommu@lists.linux.dev>,
<kevin.tian@intel.com>, <joro@8bytes.org>
Subject: Re: [RFC PATCH RESEND 5/5] amd_iommu: Add support for upto 2048 interrupts per IRT
Date: Wed, 28 Jan 2026 16:53:48 +0530 [thread overview]
Message-ID: <4bfbff72-aff8-417b-9672-b7a1740e476e@amd.com> (raw)
In-Reply-To: <c22ec7f4-4f49-4c39-89cf-be20429bb387@oracle.com>
On 1/28/2026 7:29 AM, Alejandro Jimenez wrote:
>
> On 11/18/25 5:15 AM, Sairaj Kodilkar wrote:
>> AMD IOMMU supports upto 2048 MSIs for a single device function
>> when NUM_INT_REMAP_SUP Extended-Feature-Register-2 bit is set to one.
>> Software can enable this feature by writing one to NUM_INT_REMAP_MODE
>> in the control register. MSI address destination mode (DM) bit decides
>> how many MSI data bits are used by IOMMU to index into IRT. When DM = 0,
>> IOMMU uses bits 8:0 (max 512) for the index, otherwise (DM = 1)
>> IOMMU uses bits 10:0 (max 2048) for IRT index.
>>
>> This feature can be enabled with flag `numint2k=on`. In case of
>> passhthrough devices viommu uses control register provided by vendor
>> capabilites to determine if host IOMMU has enabled 2048 MSIs. If host
>> IOMMU has not enabled it then the guest feature is disabled.
>>
>> example command line
>> '''
>> -object iommufd,id=fd0 \
>> -device amd_iommu,dma-remap=on,numint2k=on \
>> -device vfio-host,host=<DEVID>,iommufd=fd0 \
>> '''
>>
>> NOTE: In case of legacy VFIO container the guest will always fall back
>> to 512 MSIs.
>>
>> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
>> ---
>> hw/i386/amd_iommu.c | 74 ++++++++++++++++++++++++++++++++++++++++-----
>> hw/i386/amd_iommu.h | 12 ++++++++
>> 2 files changed, 79 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index 3221bf5a0303..4f62c4ee3671 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -116,7 +116,12 @@ uint64_t amdvi_extended_feature_register(AMDVIState *s)
>>
>> uint64_t amdvi_extended_feature_register2(AMDVIState *s)
>> {
>> - return AMDVI_DEFAULT_EXT_FEATURES2;
>> + uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES2;
>> + if (s->num_int_sup_2k) {
>> + feature |= AMDVI_FEATURE_NUM_INT_REMAP_SUP;
>> + }
>> +
>> + return feature;
>> }
>>
>> /* configure MMIO registers at startup/reset */
>> @@ -1538,6 +1543,9 @@ static void amdvi_handle_control_write(AMDVIState *s)
>> AMDVI_MMIO_CONTROL_CMDBUFLEN);
>> s->ga_enabled = !!(control & AMDVI_MMIO_CONTROL_GAEN);
>>
>> + s->num_int_enabled = (control >> AMDVI_MMIO_CONTROL_NUM_INT_REMAP_SHIFT) &
>> + AMDVI_MMIO_CONTROL_NUM_INT_REMAP_MASK;
>> +
>> /* update the flags depending on the control register */
>> if (s->cmdbuf_enabled) {
>> amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_CMDBUF_RUN);
>> @@ -2119,6 +2127,25 @@ static int amdvi_int_remap_msi(AMDVIState *iommu,
>> * (page 5)
>> */
>> delivery_mode = (origin->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 7;
>> + /*
>> + * The MSI address register bit[2] is used to get the destination
>> + * mode. The dest_mode 1 is valid for fixed and arbitrated interrupts
>> + * and when IOMMU supports upto 2048 interrupts.
>> + */
>> + dest_mode = (origin->address >> MSI_ADDR_DEST_MODE_SHIFT) & 1;
>> +
>> + if (dest_mode &&
>> + iommu->num_int_enabled == AMDVI_MMIO_CONTROL_NUM_INT_REMAP_2K) {
>> +
>> + trace_amdvi_ir_delivery_mode("2K interrupt mode");
>> + ret = __amdvi_int_remap_msi(iommu, origin, translated, dte, &irq, sid);
>> + if (ret < 0) {
>> + goto remap_fail;
>> + }
>> + /* Translate IRQ to MSI messages */
>> + x86_iommu_irq_to_msi_message(&irq, translated);
>> + goto out;
>> + }
>>
>> switch (delivery_mode) {
>> case AMDVI_IOAPIC_INT_TYPE_FIXED:
>> @@ -2159,12 +2186,6 @@ static int amdvi_int_remap_msi(AMDVIState *iommu,
>> goto remap_fail;
>> }
>>
>> - /*
>> - * The MSI address register bit[2] is used to get the destination
>> - * mode. The dest_mode 1 is valid for fixed and arbitrated interrupts
>> - * only.
>> - */
>> - dest_mode = (origin->address >> MSI_ADDR_DEST_MODE_SHIFT) & 1;
>> if (dest_mode) {
>> trace_amdvi_ir_err("invalid dest_mode");
>> ret = -AMDVI_IR_ERR;
>> @@ -2322,6 +2343,30 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>> return &iommu_as[devfn]->as;
>> }
>>
>> +static void amdvi_refresh_efrs_hwinfo(struct AMDVIState *s,
>> + struct iommu_hw_info_amd *hwinfo)
>> +{
>> + /* Check if host OS has enabled 2K interrupts */
>> + bool hwinfo_ctrl_2k;
>> +
>> + if (s->num_int_sup_2k && !hwinfo) {
>> + warn_report("AMDVI: Disabling 2048 MSI for guest, "
>> + "use IOMMUFD for device passthrough to support it");
>> + s->num_int_sup_2k = 0;
>> + }
>> +
>> + hwinfo_ctrl_2k = ((hwinfo->control_register
> We need to check that hwinfo is a valid pointer before attempting to access
> any of its fields. The code in the line above causes a segfault in the
> common case where we are just using the default VFIO legacy backend and no
> new options.
> Even when trying to use the new feature (numint2k=on) and iommufd backend
> in QEMU, if the host kernel was built with CONFIG_AMD_IOMMU_IOMMUFD=n
> (which is currently the default), the ioctl IOMMU_GET_HW_INFO will always
> return NULL data and hwinfo is also NULL at this point, so we crash and burn.
>
>> + >> AMDVI_MMIO_CONTROL_NUM_INT_REMAP_SHIFT)
>> + & AMDVI_MMIO_CONTROL_NUM_INT_REMAP_2K);
>> +
>> + if (s->num_int_sup_2k && !hwinfo_ctrl_2k) {
>> + warn_report("AMDVI: Disabling 2048 MSIs for guest, "
>> + "as host kernel does not support this feature");
>> + s->num_int_sup_2k = 0;
>> + }
>> +
>> + amdvi_refresh_efrs(s);
>> +}
>>
>> static bool amdvi_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
>> HostIOMMUDevice *hiod, Error **errp)
>> @@ -2354,6 +2399,20 @@ static bool amdvi_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
>> object_ref(hiod);
>> g_hash_table_insert(s->hiod_hash, new_key, hiod);
>>
>> + if (hiod->caps.type == IOMMU_HW_INFO_TYPE_AMD) {
>> + /*
>> + * Refresh the MMIO efr registers so that changes are visible to the
>> + * guest.
>> + */
>> + amdvi_refresh_efrs_hwinfo(s, &hiod->caps.vendor_caps.amd);
>> + } else {
>> + /*
>> + * Pass NULL hardware registers when we have non-IOMMUFD
>> + * passthrough device
>> + */
>> + amdvi_refresh_efrs_hwinfo(s, NULL);
> This call with hwinfo = NULL causes a segfault as I mentioned above. The
> code in amdvi_refresh_efrs_hwinfo() needs to be hardened.
>
> Thank you,
> Alejandro
Hi Alejandro
Good catch, I caught this but somehow forgot to fix.
Thanks for pointing it out
Thanks
Sairaj
>> + }
>> +
>> return true;
>> }
>>
>> @@ -2641,6 +2700,7 @@ static const Property amdvi_properties[] = {
>> DEFINE_PROP_BOOL("xtsup", AMDVIState, xtsup, false),
>> DEFINE_PROP_STRING("pci-id", AMDVIState, pci_id),
>> DEFINE_PROP_BOOL("dma-remap", AMDVIState, dma_remap, false),
>> + DEFINE_PROP_BOOL("numint2k", AMDVIState, num_int_sup_2k, false),
>> };
>>
>> static const VMStateDescription vmstate_amdvi_sysbus = {
>> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
>> index c8eaf229b50e..588725fe0c25 100644
>> --- a/hw/i386/amd_iommu.h
>> +++ b/hw/i386/amd_iommu.h
>> @@ -107,6 +107,9 @@
>> #define AMDVI_MMIO_CONTROL_COMWAITINTEN (1ULL << 4)
>> #define AMDVI_MMIO_CONTROL_CMDBUFLEN (1ULL << 12)
>> #define AMDVI_MMIO_CONTROL_GAEN (1ULL << 17)
>> +#define AMDVI_MMIO_CONTROL_NUM_INT_REMAP_MASK (0x3)
>> +#define AMDVI_MMIO_CONTROL_NUM_INT_REMAP_SHIFT (43)
>> +#define AMDVI_MMIO_CONTROL_NUM_INT_REMAP_2K (0x1)
>>
>> /* MMIO status register bits */
>> #define AMDVI_MMIO_STATUS_CMDBUF_RUN (1 << 4)
>> @@ -160,6 +163,7 @@
>> #define AMDVI_PERM_READ (1 << 0)
>> #define AMDVI_PERM_WRITE (1 << 1)
>>
>> +/* EFR */
>> #define AMDVI_FEATURE_PREFETCH (1ULL << 0) /* page prefetch */
>> #define AMDVI_FEATURE_PPR (1ULL << 1) /* PPR Support */
>> #define AMDVI_FEATURE_XT (1ULL << 2) /* x2APIC Support */
>> @@ -169,6 +173,9 @@
>> #define AMDVI_FEATURE_HE (1ULL << 8) /* hardware error regs */
>> #define AMDVI_FEATURE_PC (1ULL << 9) /* Perf counters */
>>
>> +/* EFR2 */
>> +#define AMDVI_FEATURE_NUM_INT_REMAP_SUP (1ULL << 8) /* 2K int support */
>> +
>> /* reserved DTE bits */
>> #define AMDVI_DTE_QUAD0_RESERVED (GENMASK64(6, 2) | GENMASK64(63, 63))
>> #define AMDVI_DTE_QUAD1_RESERVED 0
>> @@ -380,6 +387,8 @@ struct AMDVIState {
>> bool evtlog_enabled; /* event log enabled */
>> bool excl_enabled;
>>
>> + uint8_t num_int_enabled;
>> +
>> hwaddr devtab; /* base address device table */
>> uint64_t devtab_len; /* device table length */
>>
>> @@ -433,6 +442,9 @@ struct AMDVIState {
>>
>> /* DMA address translation */
>> bool dma_remap;
>> +
>> + /* upto 2048 interrupt support */
>> + bool num_int_sup_2k;
>> };
>>
>> uint64_t amdvi_extended_feature_register(AMDVIState *s);
next prev parent reply other threads:[~2026-01-28 11:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-18 10:15 [RFC PATCH RESEND 0/5] amd_iommu: support up to 2048 MSI vectors per IRT Sairaj Kodilkar
2025-11-18 10:15 ` [RFC PATCH RESEND 1/5] [DO NOT MERGE] linux-headers: Introduce struct iommu_hw_info_amd Sairaj Kodilkar
2025-11-18 10:15 ` [RFC PATCH RESEND 2/5] vfio/iommufd: Add amd specific hardware info struct to vendor capability Sairaj Kodilkar
2026-01-28 1:25 ` Alejandro Jimenez
2025-11-18 10:15 ` [RFC PATCH RESEND 3/5] amd-iommu: Add support for set/unset IOMMU for VFIO PCI devices Sairaj Kodilkar
2026-01-28 1:40 ` Alejandro Jimenez
2026-01-28 11:19 ` Sairaj Kodilkar
2025-11-18 10:15 ` [RFC PATCH RESEND 4/5] amd_iommu: Add support for extended feature register 2 Sairaj Kodilkar
2025-11-18 10:15 ` [RFC PATCH RESEND 5/5] amd_iommu: Add support for upto 2048 interrupts per IRT Sairaj Kodilkar
2026-01-28 1:59 ` Alejandro Jimenez
2026-01-28 11:23 ` Sairaj Kodilkar [this message]
2026-01-07 6:09 ` [RFC PATCH RESEND 0/5] amd_iommu: support up to 2048 MSI vectors " Sairaj Kodilkar
2026-01-28 1:23 ` Alejandro Jimenez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4bfbff72-aff8-417b-9672-b7a1740e476e@amd.com \
--to=sarunkod@amd.com \
--cc=alejandro.j.jimenez@oracle.com \
--cc=anisinha@redhat.com \
--cc=cohuck@redhat.com \
--cc=eduardo@habkost.net \
--cc=eric.auger@redhat.com \
--cc=imammedo@redhat.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=seanjc@google.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=vasant.hegde@amd.com \
--cc=yi.l.liu@intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox