qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Sairaj Kodilkar <sarunkod@amd.com>
To: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
	<qemu-devel@nongnu.org>
Cc: <pbonzini@redhat.com>, <richard.henderson@linaro.org>,
	<eduardo@habkost.net>, <peterx@redhat.com>, <david@redhat.com>,
	<philmd@linaro.org>, <mst@redhat.com>,
	<marcel.apfelbaum@gmail.com>, <alex.williamson@redhat.com>,
	<vasant.hegde@amd.com>, <suravee.suthikulpanit@amd.com>,
	<santosh.shukla@amd.com>, <Wei.Huang2@amd.com>,
	<joao.m.martins@oracle.com>, <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH 00/18] AMD vIOMMU: DMA remapping support for VFIO devices
Date: Wed, 23 Apr 2025 16:15:10 +0530	[thread overview]
Message-ID: <cf8587fa-f6e7-44c0-a33f-fa118e0d806d@amd.com> (raw)
In-Reply-To: <20250414020253.443831-1-alejandro.j.jimenez@oracle.com>



On 4/14/2025 7:32 AM, Alejandro Jimenez wrote:
> This series adds support for guests using the AMD vIOMMU to enable DMA
> remapping for VFIO devices. In addition to the currently supported
> passthrough (PT) mode, guest kernels are now able to to provide DMA
> address translation and access permission checking to VFs attached to
> paging domains, using the AMD v1 I/O page table format.
> 
> These changes provide the essential emulation required to boot and
> support regular operation for a Linux guest enabling DMA remapping e.g.
> via kernel parameters "iommu=nopt" or "iommu.passthrough=0".
> 
> A new amd-iommu device property "dma-remap" (default: off) is introduced
> to control whether the feature is available. See below for a full
> example of QEMU cmdline parameters used in testing.
> 
> The patchset has been tested on an AMD EPYC Genoa host, with Linux 6.14
> host and guest kernels, launching guests with up to 256 vCPUs, 512G
> memory, and 16 CX6 VFs. Testing with IOMMU x2apic support enabled (i.e.
> xtsup=on) requires fix:
> https://lore.kernel.org/all/20250410064447.29583-3-sarunkod@amd.com/
> 
> Although there is more work to do, I am sending this series as a patch
> and not an RFC since it provides a working implementation of the
> feature. With this basic infrastructure in place it becomes easier to
> add/verify enhancements and new functionality. Here are some items I am
> working to address in follow up patches:
> 
> - Page Fault and error reporting
> - Add QEMU tracing and tests
> - Provide control over VA Size advertised to guests
> - Support hotplug/unplug of devices and other advanced features
>    (suggestions welcomed)
> 
> Thank you,
> Alejandro
> 
> ---
> Example QEMU command line:
> 
> $QEMU \
> -nodefaults \
> -snapshot \
> -no-user-config \
> -display none \
> -serial mon:stdio -nographic \
> -machine q35,accel=kvm,kernel_irqchip=split \
> -cpu host,+topoext,+x2apic,-svm,-vmx,-kvm-msi-ext-dest-id \
> -smp 32 \
> -m 128G \
> -kernel $KERNEL \
> -initrd $INITRD \
> -append "console=tty0 console=ttyS0 root=/dev/mapper/ol-root ro rd.lvm.lv=ol/root rd.lvm.lv=ol/swap iommu.passthrough=0" \
> -device amd-iommu,intremap=on,xtsup=on,dma-remap=on \
> -blockdev node-name=drive0,driver=qcow2,file.driver=file,file.filename=./OracleLinux-uefi-x86_64.qcow2 \
> -device virtio-blk-pci,drive=drive0,id=virtio-disk0 \
> -drive if=pflash,format=raw,unit=0,file=/usr/share/edk2/ovmf/OVMF_CODE.fd,readonly=on \
> -drive if=pflash,format=raw,unit=1,file=./OVMF_VARS.fd \
> -device vfio-pci,host=0000:a1:00.1,id=net0
> ---
> 
> Alejandro Jimenez (18):
>    memory: Adjust event ranges to fit within notifier boundaries
>    amd_iommu: Add helper function to extract the DTE
>    amd_iommu: Add support for IOMMU notifier
>    amd_iommu: Unmap all address spaces under the AMD IOMMU on reset
>    amd_iommu: Toggle memory regions based on address translation mode
>    amd_iommu: Set all address spaces to default translation mode on reset
>    amd_iommu: Return an error when unable to read PTE from guest memory
>    amd_iommu: Helper to decode size of page invalidation command
>    amd_iommu: Add helpers to walk AMD v1 Page Table format
>    amd_iommu: Add a page walker to sync shadow page tables on
>      invalidation
>    amd_iommu: Sync shadow page tables on page invalidation
>    amd_iommu: Add replay callback
>    amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL
>    amd_iommu: Toggle address translation on device table entry
>      invalidation
>    amd_iommu: Use iova_tree records to determine large page size on UNMAP
>    amd_iommu: Do not assume passthrough translation when DTE[TV]=0
>    amd_iommu: Refactor amdvi_page_walk() to use common code for page walk
>    amd_iommu: Do not emit I/O page fault events during replay()
> 
>   hw/i386/amd_iommu.c | 856 ++++++++++++++++++++++++++++++++++++++++----
>   hw/i386/amd_iommu.h |  52 +++
>   system/memory.c     |  10 +-
>   3 files changed, 843 insertions(+), 75 deletions(-)
> 
> 
> base-commit: 56c6e249b6988c1b6edc2dd34ebb0f1e570a1365

Hi Alejandro,
I tested the patches with FIO and VFIO (using guest's /dev/vfio/vfio)
tests inside the guest. Everything looks good to me.

I also compared the fio performance with following parameters on a
passthrough nvme inside the guest with 16 vcpus.

[FIO PARAMETERS]
NVMEs     = 1
JOBS/NVME = 16
MODE      = RANDREAD
IOENGINE  = LIBAIO
IODEPTH   = 32
BLOCKSIZE = 4K
SIZE      = 100%

        RESULTS
=====================
Guest
IOMMU          IOPS
mode          (kilo)
=====================
nopt           13.7
pt           1191.0
--------------------

I see that nopt (emulate IOMMU) has a huge performance.
I wonder if the DMA remapping is really useful with such performance
penalty.

Regards
Sairaj Kodilkar



  parent reply	other threads:[~2025-04-23 10:46 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14  2:02 [PATCH 00/18] AMD vIOMMU: DMA remapping support for VFIO devices Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 01/18] memory: Adjust event ranges to fit within notifier boundaries Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 02/18] amd_iommu: Add helper function to extract the DTE Alejandro Jimenez
2025-04-16 11:36   ` Sairaj Kodilkar
2025-04-16 13:29     ` Alejandro Jimenez
2025-04-16 18:50       ` Michael S. Tsirkin
2025-04-16 22:37         ` Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 03/18] amd_iommu: Add support for IOMMU notifier Alejandro Jimenez
2025-04-16 12:14   ` Sairaj Kodilkar
2025-04-16 22:17     ` Alejandro Jimenez
2025-04-17 10:19       ` Sairaj Kodilkar
2025-04-17 16:21         ` Alejandro Jimenez
2025-04-17 16:34           ` Michael S. Tsirkin
2025-04-18  6:33           ` Sairaj Kodilkar
2025-04-14  2:02 ` [PATCH 04/18] amd_iommu: Unmap all address spaces under the AMD IOMMU on reset Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 05/18] amd_iommu: Toggle memory regions based on address translation mode Alejandro Jimenez
2025-04-22 12:17   ` Sairaj Kodilkar
2025-04-28 21:10     ` Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 06/18] amd_iommu: Set all address spaces to default translation mode on reset Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 07/18] amd_iommu: Return an error when unable to read PTE from guest memory Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 08/18] amd_iommu: Helper to decode size of page invalidation command Alejandro Jimenez
2025-04-22 12:26   ` Sairaj Kodilkar
2025-04-28 21:16     ` Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 09/18] amd_iommu: Add helpers to walk AMD v1 Page Table format Alejandro Jimenez
2025-04-17 12:40   ` CLEMENT MATHIEU--DRIF
2025-04-17 15:27     ` Alejandro Jimenez
2025-04-18  5:30       ` CLEMENT MATHIEU--DRIF
2025-04-23  6:28         ` Sairaj Kodilkar
2025-04-14  2:02 ` [PATCH 10/18] amd_iommu: Add a page walker to sync shadow page tables on invalidation Alejandro Jimenez
2025-04-17 15:14   ` Ethan MILON
2025-04-17 15:45     ` Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 11/18] amd_iommu: Sync shadow page tables on page invalidation Alejandro Jimenez
2025-04-22 12:38   ` Sairaj Kodilkar
2025-04-22 12:38   ` Sairaj Kodilkar
2025-04-29 19:47     ` Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 12/18] amd_iommu: Add replay callback Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 13/18] amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 14/18] amd_iommu: Toggle address translation on device table entry invalidation Alejandro Jimenez
2025-04-22 12:48   ` Sairaj Kodilkar
2025-04-29 20:45     ` Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 15/18] amd_iommu: Use iova_tree records to determine large page size on UNMAP Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 16/18] amd_iommu: Do not assume passthrough translation when DTE[TV]=0 Alejandro Jimenez
2025-04-23  6:06   ` Sairaj Kodilkar
2025-04-14  2:02 ` [PATCH 17/18] amd_iommu: Refactor amdvi_page_walk() to use common code for page walk Alejandro Jimenez
2025-04-14  2:02 ` [PATCH 18/18] amd_iommu: Do not emit I/O page fault events during replay() Alejandro Jimenez
2025-04-23  6:18   ` Sairaj Kodilkar
2025-04-23 10:45 ` Sairaj Kodilkar [this message]
2025-04-23 10:56   ` [PATCH 00/18] AMD vIOMMU: DMA remapping support for VFIO devices Sairaj Kodilkar
2025-04-24 11:49     ` Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf8587fa-f6e7-44c0-a33f-fa118e0d806d@amd.com \
    --to=sarunkod@amd.com \
    --cc=Wei.Huang2@amd.com \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=joao.m.martins@oracle.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=santosh.shukla@amd.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=vasant.hegde@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).