public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Suthikulpanit, Suravee" <suravee.suthikulpanit@amd.com>
Cc: linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	kvm@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com,
	yi.l.liu@intel.com, alex.williamson@redhat.com,
	nicolinc@nvidia.com, baolu.lu@linux.intel.com,
	eric.auger@redhat.com, pandoh@google.com, kumaranand@google.com,
	jon.grimm@amd.com, santosh.shukla@amd.com, vasant.hegde@amd.com,
	jay.chen@amd.com, joseph.chung@amd.com
Subject: Re: [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table
Date: Fri, 23 Jun 2023 08:45:34 -0300	[thread overview]
Message-ID: <ZJWF3oVvdj4OQmTf@nvidia.com> (raw)
In-Reply-To: <7364628b-54b6-b7e3-b272-8f91198679f9@amd.com>

On Thu, Jun 22, 2023 at 06:15:17PM -0700, Suthikulpanit, Suravee wrote:
> Jason,
> 
> On 6/22/2023 6:46 AM, Jason Gunthorpe wrote:
> > On Wed, Jun 21, 2023 at 06:54:47PM -0500, Suravee Suthikulpanit wrote:
> > 
> > > Since the IOMMU hardware virtualizes the guest command buffer, this allows
> > > IOMMU operations to be accelerated such as invalidation of guest pages
> > > (i.e. stage1) when the command is issued by the guest kernel without
> > > intervention from the hypervisor.
> > 
> > This is similar to what we are doing on ARM as well.
> 
> Ok
> 
> > > This series is implemented on top of the IOMMUFD framework. It leverages
> > > the exisiting APIs and ioctls for providing guest iommu information
> > > (i.e. struct iommu_hw_info_amd), and allowing guest to provide guest page
> > > table information (i.e. struct iommu_hwpt_amd_v2) for setting up user
> > > domain.
> > > 
> > > Please see the [4],[5], and [6] for more detail on the AMD HW-vIOMMU.
> > > 
> > > NOTES
> > > -----
> > > This series is organized into two parts:
> > >    * Part1: Preparing IOMMU driver for HW-vIOMMU support (Patch 1-8).
> > > 
> > >    * Part2: Introducing HW-vIOMMU support (Patch 9-21).
> > > 
> > >    * Patch 12 and 21 extends the existing IOMMUFD ioctls to support
> > >      additional opterations, which can be categorized into:
> > >      - Ioctls to init/destroy AMD HW-vIOMMU instance
> > >      - Ioctls to attach/detach guest devices to the AMD HW-vIOMMU instance.
> > >      - Ioctls to attach/detach guest domains to the AMD HW-vIOMMU instance.
> > >      - Ioctls to trap certain AMD HW-vIOMMU MMIO register accesses.
> > >      - Ioctls to trap AMD HW-vIOMMU command buffer initialization.
> > 
> > No one else seems to need this kind of stuff, why is AMD different?
> > 
> > Emulation and mediation to create the vIOMMU is supposed to be in the
> > VMM side, not in the kernel. I don't want to see different models by
> > vendor.
> 
> These ioctl is not necessary for emulation, which I would agree that it
> should be done on the VMM side (e.g. QEMU). These ioctls provides necessary
> information for programming the AMD IOMMU hardware to provide
> hardware-assisted virtualized IOMMU.

You have one called 'trap', it shouldn't be like this. It seems like
this is trying to parse the command buffer in the kernel, it should be
done in the VMM.

> In this series, AMD IOMMU GCR3 table is actually setup when the
> IOMMUFD_CMD_HWPT_ALLOC is called, which the driver provides a hook to struct
> iommu_ops.domain_alloc_user(). 

That isn't entirely right either, the GCR3 should be programmed into
HW during iommu_domain attach.

> The AMD-specific information is communicated from QEMU via
> iommu_domain_user_data.iommu_hwpt_amd_v2. This is similar to INTEL
> and ARM.

This is only for requesting the iommu_domain and supplying the gcr3 VA
for later use.
> 
> Please also note that for the AMD HW-vIOMMU device model in QEMU, the guest
> memory used for IOMMU device table is trapped on when guest IOMMU driver
> programs the guest Device Table Entry (gDTE). Then QEMU reads the content of
> gDTE to extract necessary information for setting up guest (stage-1) page
> table, and calls iommufd_backend_alloc_hwpt().

This is the same as ARM. It is a two step operation, you de-duplicate
the gDTE entries (eg to share vDIDs), allocating a HWPT if it doesn't
already exist, then you attach the HWPT to the physical device the
gDTE's vRID implies.

> There are still work to be done in this to fully support PASID. I'll
> take a look at this next.

I would expect PASID work is only about invalidation?
 
> > To start focus only on user space page tables and kernel mediated
> > invalidation and fit into the same model as everyone else. This is
> > approx the same patches and uAPI you see for ARM and Intel. AFAICT
> > AMD's HW is very similar to ARM's, so you should be aligning to the
> > ARM design.
> 
> I think the user space page table is covered as described above.

I'm not sure, it doesn't look like it is what I would expect.

> it seems that user-space is supposed to call the ioctl
> IOMMUFD_CMD_HWPT_INVALIDATE for both INTEL and ARM to issue invalidation for
> stage 1 page table. Please lemme know if I misunderstand the purpose of this
> ioctl.

Yes, the VMM traps the invalidation and issues it like this.
 
> However, for AMD since the HW-vIOMMU virtualizes the guest command buffer,
> and when it sees the page table invalidation command in the guest command
> buffer, it takes care of the invalidation using information in the DomIDMap,
> which maps guest domain ID (gDomID) of a particular guest to the
> corresponding host domain ID (hDomID) of the device and invalidate the
> nested translation according to the specified PASID, DomID, and GVA.

The VMM should do all of this stuff. The VMM parses the command buffer
and the VMM converts the commands to invalidation ioctls.

I'm a unclear if AMD supports a mode where the HW can directly operate
a command/invalidation queue in the VM without virtualization. Eg DMA
from guest memory and deliver directly to the guest completion
interrupts.

If it always needs SW then the SW part should be in the VMM, not the
kernel. Then you don't need to load all these tables into the kernel.

> The DomIDMap is setup by the host IOMMU driver during VM initialization.
> When the guest IOMMU driver attaches the VFIO pass-through device to a guest
> iommu_group (i.e domain), it programs the gDTE with the gDomID. This action
> is trapped into QEMU and the gDomID is read from the gDTE and communicated
> to hypervisor via the newly proposed ioctl VIOMMU_DOMAIN_ATTACH. Now the
> DomIDMap is created for the VFIO device.

The gDomID should be supplied when the HWPT is allocated, not via new
ioctls.

> Are you referring to the IOMMU API for SVA/PASID stuff:
>   * struct iommu_domain_ops.set_dev_pasid()
>   * struct iommu_ops.remove_dev_pasid()
>   * ...

Yes
 
> If so, we are working on it separately in parallel, and will be sending out
> RFC soon.

Great

Jason

  reply	other threads:[~2023-06-23 11:46 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 01/21] iommu/amd: Declare helper functions as extern Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 02/21] iommu/amd: Clean up spacing in amd_iommu_ops declaration Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 03/21] iommu/amd: Update PASID, GATS, and GLX feature related macros Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 04/21] iommu/amd: Modify domain_enable_v2() to add giov parameter Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 05/21] iommu/amd: Refactor set_dte_entry() helper function Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 06/21] iommu/amd: Modify set_dte_entry() to add gcr3 input parameter Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 07/21] iommu/amd: Modify set_dte_entry() to add user domain " Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 08/21] iommu/amd: Allow nested IOMMU page tables Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 09/21] iommu/amd: Add support for hw_info for iommu capability query Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 10/21] iommu/amd: Introduce vIOMMU-specific events and event info Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 11/21] iommu/amd: Introduce Reset vMMIO Command Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 12/21] iommu/amd: Introduce AMD vIOMMU-specific UAPI Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 13/21] iommu/amd: Introduce vIOMMU command-line option Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 14/21] iommu/amd: Initialize vIOMMU private address space regions Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 15/21] iommu/amd: Introduce vIOMMU vminit and vmdestroy ioctl Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 16/21] iommu/amd: Introduce vIOMMU ioctl for updating device mapping table Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 17/21] iommu/amd: Introduce vIOMMU ioctl for updating domain mapping Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 18/21] iommu/amd: Introduce vIOMMU ioctl for handling guest MMIO accesses Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 19/21] iommu/amd: Introduce vIOMMU ioctl for handling command buffer mapping Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 20/21] iommu/amd: Introduce vIOMMU ioctl for setting up guest CR3 Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 21/21] iommufd: Introduce AMD HW-vIOMMU IOCTL Suravee Suthikulpanit
2023-06-22 13:46 ` [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Jason Gunthorpe
2023-06-23  1:15   ` Suthikulpanit, Suravee
2023-06-23 11:45     ` Jason Gunthorpe [this message]
2023-06-23 22:05       ` Suthikulpanit, Suravee
2023-06-23 22:56         ` Jason Gunthorpe
2023-06-24  2:08           ` Suthikulpanit, Suravee
2023-06-26 13:20             ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJWF3oVvdj4OQmTf@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jay.chen@amd.com \
    --cc=jon.grimm@amd.com \
    --cc=joro@8bytes.org \
    --cc=joseph.chung@amd.com \
    --cc=kumaranand@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=pandoh@google.com \
    --cc=robin.murphy@arm.com \
    --cc=santosh.shukla@amd.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=vasant.hegde@amd.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox