All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jacob Pan <jacob.jun.pan@intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"Bie, Tiwei" <tiwei.bie@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	David Woodhouse <dwmw2@infradead.org>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	jacob.jun.pan@intel.com
Subject: Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device
Date: Thu, 20 Sep 2018 08:53:39 -0700	[thread overview]
Message-ID: <20180920085339.2ea67e72@jacob-builder> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D19130ED5B@SHSMSX101.ccr.corp.intel.com>

On Wed, 19 Sep 2018 02:22:03 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> > Sent: Tuesday, September 18, 2018 11:47 PM
> > 
> > On 14/09/2018 22:04, Jacob Pan wrote:  
> > >> This example only needs to modify first-level translation, and
> > >> works with SMMUv3. The kernel here could be the host, in which
> > >> case second-level translation is disabled in the SMMU, or it
> > >> could be the guest, in which case second-level mappings are
> > >> created by QEMU and first-level translation is managed by
> > >> assigning PASID tables to the guest.  
> > > There is a difference in case of guest SVA. VT-d v3 will bind
> > > guest PASID and guest CR3 instead of the guest PASID table. Then
> > > turn on nesting. In case of mdev, the second level is obtained
> > > from the aux domain which was setup for the default PASID. Or in
> > > case of PCI device, second level is harvested from RID2PASID.  
> > 
> > Right, though I wasn't talking about the host managing guest SVA
> > here, but a kernel binding the address space of one of its
> > userspace drivers to the mdev.
> >   
> > >> So (2) would use iommu_sva_bind_device(),  
> > > We would need something different than that for guest bind, just
> > > to show the two cases:>
> > > int iommu_sva_bind_device(struct device *dev, struct mm_struct
> > > *mm,  
> > int  
> > > *pasid, unsigned long flags, void *drvdata)
> > >
> > > (WIP)
> > > int sva_bind_gpasid(struct device *dev, struct gpasid_bind_data
> > > *data) where:
> > > /**
> > >  * struct gpasid_bind_data - Information about device and guest
> > > PASID binding
> > >  * @pasid:       Process address space ID used for the guest mm
> > >  * @addr_width:  Guest address width. Paging mode can also be
> > > derived.
> > >  * @gcr3:        Guest CR3 value from guest mm
> > >  */
> > > struct gpasid_bind_data {
> > >         __u32 pasid;
> > >         __u64 gcr3;
> > >         __u32 addr_width;
> > >         __u32 flags;
> > > #define IOMMU_SVA_GPASID_SRE    BIT(0) /* supervisor request */
> > > };
> > > Perhaps there is room to merge with io_mm but the life cycle  
> > management  
> > > of guest PASID and host PASID will be different if you rely on mm
> > > release callback than FD.  
> 
> let's not calling gpasid here - which makes sense only in
> bind_pasid_table proposal where pasid table thus pasid space is
> managed by guest. In above context it is always about host pasid
> (allocated in system-wide), which could point to a host cr3 (user
> process) or a guest cr3 (vm case).
> 
I agree this gpasid is confusing, we have a system wide PASID
name space. Just a way to differentiate different bind, perhaps
just a flag indicating the PASID is used for guest.
i.e.
struct pasid_bind_data {
         __u32 pasid;
         __u64 gcr3;
         __u32 addr_width;
         __u32 flags;
#define IOMMU_SVA_GPASID_SRE    BIT(0) /* supervisor request */
#define IOMMU_SVA_PASID_GUEST   BIT(0) /* host pasid used by guest */
};

> > I think gpasid management should stay separate from io_mm, since in
> > your case VFIO mechanisms are used for life cycle management of the
> > VM, similarly to the former bind_pasid_table proposal. For example
> > closing the container fd would unbind all guest page tables. The
> > QEMU process' address space lifetime seems like the wrong thing to
> > track for gpasid.  
> 
> I sort of agree (though not thinking through all the flow carefully).
> PASIDs are allocated per iommu domain, thus release also happens when
> domain is detached (along with container fd close).
> 
I also prefer to keep gpasid separate.

But I don't think we need to have per iommu domain per PASID for guest
SVA case. Assuming you are talking about host IOMMU domain. The PASID
bind call is a result of guest PASID cache flush with a PASID
previously allocated. The host just need to put gcr3 into the PASID
entry then harvest the second level from the existing domain.
> >   
> > >> but (1) needs something
> > >> else. Aren't auxiliary domains suitable for (1)? Why limit
> > >> auxiliary domain to second-level or nested translation? It seems
> > >> silly to use a different API for first-level, since the flow in
> > >> userspace and VFIO is the same as your second-level case as far
> > >> as MAP_DMA ioctl goes. The difference is that in your case the
> > >> auxiliary domain supports an additional operation which binds
> > >> first-level page tables. An auxiliary domain that only supports
> > >> first-level wouldn't support this operation, but it can still
> > >> implement iommu_map/unmap/etc. 
> > > I think the intention is that when a mdev is created, we don;t
> > > know whether it will be used for SVA or IOVA. So aux domain is
> > > here to "hold a spot" for the default PASID such that MAP_DMA
> > > calls can work as usual, which is second level only. Later, if
> > > SVA is used on the mdev there will be another PASID allocated for
> > > that purpose. Do we need to create an aux domain for each PASID?
> > > the translation can be looked up by the combination of parent dev
> > > and pasid.  
> > 
> > When allocating a new PASID for the guest, I suppose you need to
> > clone the second-level translation config? In which case a single
> > aux domain for the mdev might be easier to implement in the IOMMU
> > driver. Entirely up to you since we don't have this case on SMMUv3
> >   
> 
> One thing to highlight in related discussions (also mentioned in other
> thread). There is not a new iommu domain type called 'aux'. 'aux'
> matters only to a specific device when a domain is attached to that
> device which has aux capability enabled. Same domain can be attached
> to other device as normal domain. In that case multiple PASIDs
> allocated on same mdev are tied to same aux domain, same bare metal
> SVA case, i.e. any domain (normal or aux) can include 2nd level
> structure and multiple 1st level structures. Jean is correct - all
> PASIDs in same domain then share 2nd level translation, and there are
> io_mm or similar tracking structures to associate each PASID to a 1st
> level translation structure.
> 
I think we are all talking about the same thing :)
yes, 2nd level is cloned from aux domain/default PASID for mdev, and
pdev similarly from DMA_MAP domain.

> Thanks
> Kevin
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Jacob Pan]

  reply	other threads:[~2018-09-20 15:53 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30  4:09 [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
     [not found] ` <20180830040922.30426-1-baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-08-30  4:09   ` [RFC PATCH v2 01/10] iommu: Add APIs for multiple domains per device Lu Baolu
2018-08-30  4:09     ` Lu Baolu
2018-08-30  4:09   ` [RFC PATCH v2 02/10] iommu/vt-d: Add multiple domains per device query Lu Baolu
2018-08-30  4:09     ` Lu Baolu
2018-09-05 19:35     ` Alex Williamson
     [not found]       ` <20180905133540.5d7a7ea3-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
2018-09-06  0:54         ` Lu Baolu
2018-09-06  0:54           ` Lu Baolu
2018-08-30  4:09   ` [RFC PATCH v2 03/10] iommu/amd: Add default branch in amd_iommu_capable() Lu Baolu
2018-08-30  4:09     ` Lu Baolu
     [not found]     ` <20180830040922.30426-4-baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-09-05 19:37       ` Alex Williamson
2018-09-05 19:37         ` Alex Williamson
     [not found]         ` <20180905133703.331c0c17-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
2018-09-06  0:55           ` Lu Baolu
2018-09-06  0:55             ` Lu Baolu
2018-08-30  4:09   ` [RFC PATCH v2 04/10] iommu/vt-d: Enable/disable multiple domains per device Lu Baolu
2018-08-30  4:09     ` Lu Baolu
2018-09-05  3:01   ` [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device Tian, Kevin
2018-09-05  3:01     ` Tian, Kevin
2018-09-05 19:15     ` Alex Williamson
2018-09-06  1:29       ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 05/10] iommu/vt-d: Attach/detach domains in auxiliary mode Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 06/10] iommu/vt-d: Return ID associated with an auxiliary domain Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 07/10] vfio/mdev: Add mediated device domain type Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 08/10] vfio/type1: Add domain at(de)taching group helpers Lu Baolu
     [not found]   ` <20180830040922.30426-9-baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-09-10 16:23     ` Jean-Philippe Brucker
2018-09-10 16:23       ` Jean-Philippe Brucker
2018-09-12  5:02       ` Lu Baolu
     [not found]         ` <a97c8f75-ff49-5b87-198f-d3c8950d4e90-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-09-12 17:54           ` Jean-Philippe Brucker
2018-09-12 17:54             ` Jean-Philippe Brucker
2018-09-13  0:35             ` Tian, Kevin
2018-09-14 14:45               ` Jean-Philippe Brucker
2018-09-15  2:36                 ` Tian, Kevin
2018-09-18 15:52                   ` Jean-Philippe Brucker
     [not found]                     ` <f46bcf5f-002b-4269-a750-c4254c9fc89f-5wv7dgnIgG8@public.gmane.org>
2018-09-18 23:26                       ` Tian, Kevin
2018-09-19  2:10                         ` Lu Baolu
2018-09-25 17:55                         ` Jean-Philippe Brucker
2018-09-26  2:11                           ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 09/10] vfio/type1: Determine domain type of an mdev group Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 10/10] vfio/type1: Attach domain for " Lu Baolu
2018-09-10 16:22 ` [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device Jean-Philippe Brucker
2018-09-12  2:42   ` Lu Baolu
2018-09-12 17:54     ` Jean-Philippe Brucker
     [not found]       ` <3602f8c1-df17-4894-1bcc-4d779f9aa7fd-5wv7dgnIgG8@public.gmane.org>
2018-09-13  0:19         ` Tian, Kevin
2018-09-13  0:19           ` Tian, Kevin
2018-09-13 15:03           ` Jean-Philippe Brucker
     [not found]             ` <03d496b0-84c2-b3ca-5be5-d4540c6d8ec7-5wv7dgnIgG8@public.gmane.org>
2018-09-13 16:55               ` Raj, Ashok
2018-09-13 16:55                 ` Raj, Ashok
2018-09-14 14:39                 ` Jean-Philippe Brucker
2018-09-14  0:39               ` Tian, Kevin
     [not found]                 ` <AADFC41AFE54684AB9EE6CBC0274A5D191302ECE-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2018-09-14 14:40                   ` Jean-Philippe Brucker
2018-09-14 14:40                     ` Jean-Philippe Brucker
2018-09-14 21:04             ` Jacob Pan
2018-09-18 15:46               ` Jean-Philippe Brucker
2018-09-19  2:22                 ` Tian, Kevin
2018-09-20 15:53                   ` Jacob Pan [this message]
2018-09-14  2:46       ` Lu Baolu
     [not found]         ` <4b24f0c6-5985-efbc-f842-8bde239dfc2a-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-09-14  2:53           ` Tian, Kevin
2018-09-14  2:53             ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180920085339.2ea67e72@jacob-builder \
    --to=jacob.jun.pan@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe.brucker@arm.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sanjay.k.kumar@intel.com \
    --cc=tiwei.bie@intel.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.