Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
To: Sairaj Kodilkar <sarunkod@amd.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org, vasant.hegde@amd.com,
	suravee.suthikulpanit@amd.com
Cc: mst@redhat.com, imammedo@redhat.com, anisinha@redhat.com,
	marcel.apfelbaum@gmail.com, pbonzini@redhat.com,
	richard.henderson@linaro.org, eduardo@habkost.net,
	yi.l.liu@intel.com, eric.auger@redhat.com,
	zhenzhong.duan@intel.com, cohuck@redhat.com, seanjc@google.com,
	iommu@lists.linux.dev, kevin.tian@intel.com, joro@8bytes.org
Subject: Re: [RFC PATCH RESEND 0/5] amd_iommu: support up to 2048 MSI vectors per IRT
Date: Tue, 27 Jan 2026 20:23:01 -0500	[thread overview]
Message-ID: <2c9396a5-e967-4a8d-8e22-85afae4e8f24@oracle.com> (raw)
In-Reply-To: <2440cf13-e4d4-4894-b41a-fbdf7cd9b3b5@amd.com>

Hi Sairaj,

On 1/7/26 1:09 AM, Sairaj Kodilkar wrote:
> Hello all,
> 
> Gentle ping,
> 

I mentioned privately that I am investigating the main
limitations/alternatives that are listed below, but I don't yet have any
suggestions as to the best way to move forward. So while I am still hoping
from feedback from others in the list, I'll reply to this series as if it
will be merged using the current approach.

Thank you,
Alejandro

> On 11/18/2025 3:45 PM, Sairaj Kodilkar wrote:
>> Resending this series with KVM and IOMMU maintainers in CC.
>>
>> AMD IOMMU can route upto 2048 MSI vectors through a single
>> Interrupt Remapping Table (IRT) entry. This series brings the same
>> capability to the emulated AMD IOMMU in QEMU.
>>
>> Highlights
>> ----------
>> * Sets bits [9:8] in Extended-Feature-Register-2 to advertise 2K MSI
>>    support to the guest.
>> * Uses bits [10:0] of the MSI data to select the IRTE when the guest
>>    programs MSIs in logical-destination mode.
>> * Introduces a new IOMMU device property:
>>          -device amd-iommu,...,numint2k=on
>>
>>    The feature is **opt-in**; guests keep the 512-MSI behaviour unless
>>    `numint2k=on` is supplied.
>>
>> Passthrough devices
>> -------------------
>> When a PCI function is passed through via iommufd the code checks the
>> host’s vendor capabilities.  If the host IOMMU has not enabled
>> 2K-MSI support (bits [44:43] set in the control register) the guest
>> feature is disabled even if `numint2k=on` was requested.
>>
>> The detection logic relies on the iommufd interface; with the legacy
>> VFIO container the guest always falls back to 512 MSIs.
>>
>> Example
>> -------
>> qemu-system-x86_64 \
>> -enable-kvm -m 10G -smp cpus=8 \
>> -kernel /boot/vmlinuz \
>> -initrd /boot/initrd.img \
>> -append "console=ttyS0 earlyprintk=serial root=<DEVICE>"
>> -device amd-iommu,dma-remap=on,numint2k=on \
>> -object iommufd,id=iommufd0 \
>> -device vfio-pci,host=<DEVID>,iommufd=iommufd0 \
>> -global kvm-pit.lost_tick_policy=discard \
>> -cpu host \
>> -machine q35,kernel_irqchip=split \
>> -nographic \
>> -smbios type=0,version=2.8 \
>> -blockdev node-
>> name=drive0,driver=qcow2,file.driver=file,file.filename=<IMAGE> \
>> -device virtio-blk-pci,drive=drive0
>>
>> Limitations
>> -----------
>> This approach works well for features queried after IOMMUFD
>> initialization but cannot handle features needed during early QEMU
>> setup, before IOMMUFD is available.
>>
>> A key example is EFR2[HTRangeIgnore]. When this bit is set, the physical
>> IOMMU treats HyperTransport (HT) address ranges as regular memory
>> accesses rather than reserved regions. This has important implications
>> for memory layout:
>>
>> * Without HTRangeIgnore: QEMU must relocate RAM above 4G to above 1T on
>>    AMD platforms to avoid HT conflicts
>> * With HTRangeIgnore: QEMU can safely place RAM immediately above 4G,
>>    improving memory utilization
>>
>> Since RAM layout must be determined before IOMMUFD initialization, QEMU
>> cannot use hwinfo to query EFR2[HTRangeIgnore] feature bit.
>>
>> Another limitation with using the control register is that, if BIOS enables
>> particular feature (e.g. ControlRegister[GCR3TRPMode) without kernel support
>> QEMU incorrectly assumes that host kernel supports that feature potentially
>> causing guest failure.
>>
>> Alternative considered
>> ----------------------
>> We also explored alternate approach which uses KVM capability
>> "KVM_CAP_AMD_NUM_INT_2K_SUP", which user can query to know if host
>> kernel supports 2K MSIs. Similarly, this enables qemu to detect the
>> presence of EFR2[HTRangeIgnore] during RAM initialization.
>>
>> Although current implementation allows 2K MSI support only with
>> iommufd, it keeps the logic inside the vfio/iommufd and avoids
>> modifying KVM ABI. I am happy to discuss advantages and drawbacks of
>> both approaches.
>>
>> ------------------------------------------------------------------------
>>
>> The patches are based on top of bc831f37398b (qemu master). Additionally
>> it requires linux kernel with patches[1] which expose control register
>> via IOMMU_GET_HW_INFO ioctl.
>>
>> [1] https://lore.kernel.org/linux-iommu/20251029095846.4486-1-
>> sarunkod@amd.com/
>>
>> ------------------------------------------------------------------------
>>
>> Sairaj Kodilkar (3):
>>    vfio/iommufd: Add amd specific hardware info struct to vendor
>>      capability
>>    amd_iommu: Add support for extended feature register 2
>>    amd_iommu: Add support for upto 2048 interrupts per IRT
>>
>> Suravee Suthikulpanit (2):
>>    [DO NOT MERGE] linux-headers: Introduce struct iommu_hw_info_amd
>>    amd-iommu: Add support for set/unset IOMMU for VFIO PCI devices
>>
>>   hw/i386/acpi-build.c               |   4 +-
>>   hw/i386/amd_iommu-stub.c           |   5 +
>>   hw/i386/amd_iommu.c                | 163 +++++++++++++++++++++++++++--
>>   hw/i386/amd_iommu.h                |  24 +++++
>>   include/system/host_iommu_device.h |   1 +
>>   linux-headers/linux/iommufd.h      |  20 ++++
>>   6 files changed, 207 insertions(+), 10 deletions(-)
>>
> 


      reply	other threads:[~2026-01-28  1:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-18 10:15 [RFC PATCH RESEND 0/5] amd_iommu: support up to 2048 MSI vectors per IRT Sairaj Kodilkar
2025-11-18 10:15 ` [RFC PATCH RESEND 1/5] [DO NOT MERGE] linux-headers: Introduce struct iommu_hw_info_amd Sairaj Kodilkar
2025-11-18 10:15 ` [RFC PATCH RESEND 2/5] vfio/iommufd: Add amd specific hardware info struct to vendor capability Sairaj Kodilkar
2026-01-28  1:25   ` Alejandro Jimenez
2025-11-18 10:15 ` [RFC PATCH RESEND 3/5] amd-iommu: Add support for set/unset IOMMU for VFIO PCI devices Sairaj Kodilkar
2026-01-28  1:40   ` Alejandro Jimenez
2026-01-28 11:19     ` Sairaj Kodilkar
2025-11-18 10:15 ` [RFC PATCH RESEND 4/5] amd_iommu: Add support for extended feature register 2 Sairaj Kodilkar
2025-11-18 10:15 ` [RFC PATCH RESEND 5/5] amd_iommu: Add support for upto 2048 interrupts per IRT Sairaj Kodilkar
2026-01-28  1:59   ` Alejandro Jimenez
2026-01-28 11:23     ` Sairaj Kodilkar
2026-01-07  6:09 ` [RFC PATCH RESEND 0/5] amd_iommu: support up to 2048 MSI vectors " Sairaj Kodilkar
2026-01-28  1:23   ` Alejandro Jimenez [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c9396a5-e967-4a8d-8e22-85afae4e8f24@oracle.com \
    --to=alejandro.j.jimenez@oracle.com \
    --cc=anisinha@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=eric.auger@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sarunkod@amd.com \
    --cc=seanjc@google.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=vasant.hegde@amd.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox