Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
To: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Cc: tianyu.lan@intel.com, kevin.tian@intel.com, "Liu,
	Yi L" <yi.l.liu@intel.com>,
	ashok.raj@intel.com, kvm@vger.kernel.org, jasowang@redhat.com,
	Will Deacon <Will.Deacon@arm.com>,
	alex.williamson@redhat.com, peterx@redhat.com,
	qemu-devel@nongnu.org, iommu@lists.linux-foundation.org,
	jacob.jun.pan@intel.com
Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
Date: Mon, 3 Jul 2017 18:31:15 +0800	[thread overview]
Message-ID: <20170703103115.GB22053@gmail.com> (raw)
In-Reply-To: <0e4f2dd4-d553-b1b7-7bec-fe0ff5242c54@arm.com>

Hi Jean,

On Mon, Jul 03, 2017 at 12:52:52PM +0100, Jean-Philippe Brucker wrote:
> Hi Yi,
> 
> On 02/07/17 11:06, Liu, Yi L wrote:
> > On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
> > 
> > Hi Jean,
> > 
> > As we've got a few discussions on it. I'd like to have a conclusion and
> > make it as a reference for future discussion.
> > 
> > Currently, we are inclined to have a hybrid format for the iommu tlb
> > invalidation from userspace(vIOMMU or userspace driver).
> > 
> > Based on the previous discussion, may the below work?
> > 
> > 1. Add a IOCTL for iommu tlb invalidation.
> > 
> > VFIO_IOMMU_TLB_INVALIDATE
> > 
> > struct vfio_iommu_tlb_invalidate {
> >    __u32   argsz;
> >    __u32   length;
> 
> Wouldn't argsz be exactly length + 8? Might be redundant in this case.

yes, it is. we may not use it in future version. but yes, if we still use it.
I think we can make it easier.
 
> >    __u8    data[];
> > };
> > 
> > comments from Alex William: is it more suitable to add a new flag bit on
> > vfio_device_svm(a structure defined in patch 5 of this patchset), the data
> > structure is so similar.
> > 
> > Personally, I'm ok with it. Pls let me know your thoughts. However, the
> > precondition is we accept the whole definition in this email. If not, the
> > vfio_iommu_tlb_invalidate would be defined differently.
> 
> With this proposal sharing the structure makes sense. As I understand it
> we're keeping the VFIO_IOMMU_TLB_INVALIDATE ioctl? In which case adding a
> flag bit would be redundant.

yes, it seems to be strange if we share vfio_device_svm structure but use
a separate IOCTL cmd. Maybe it's more reasonable to share IOCTL cmd and just
add a new flag. Then all the svm related operations share the IOCTL. However,
need to check if there would be any non-svm related iommu tlb invalidation.
Then vfio_device_svm should be renamed to be non-svm specific.

> 
> > 2. Define a structure in include/uapi/linux/iommu.h(newly added header file)
> > 
> > struct iommu_tlb_invalidate {
> > 	__u32	scope;
> > /* pasid-selective invalidation described by @pasid */
> > #define IOMMU_INVALIDATE_PASID	(1 << 0)
> > /* address-selevtive invalidation described by (@vaddr, @size) */
> > #define IOMMU_INVALIDATE_VADDR	(1 << 1)
> > 	__u32	flags;
> > /*  targets non-pasid mappings, @pasid is not valid */
> > #define IOMMU_INVALIDATE_NO_PASID	(1 << 0)
> 
> Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> using a special mode where PASID 0 is reserved and any traffic without
> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> to invalidate that special context explicitly. But this means that
> invalidation packet targeted at that context will have "scope = PASID" and
> "flags = NO_PASID", which is utterly confusing.
> 
> I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID flag
> and just use PASID 0 to invalidate this context on ARM. I don't think
> other architectures would use the NO_PASID flag anyway, but might be mistaken.

I may suggest to keep it so far. On VT-d, we may pass some data in opaque, so
we may work without it. But if other vendor want to issue non-PASID tagged
cache, then may encounter problem.

> > /* indicating that the pIOMMU doesn't need to invalidate
> >    all intermediate tables cached as part of the PTE for
> >    vaddr, only the last-level entry (pte). This is a hint. */
> > #define IOMMU_INVALIDATE_VADDR_LEAF	(1 << 1)
> > 	__u32	pasid;
> > 	__u64	vaddr;
> > 	__u64	size;
> > 	__u8	data[];
> > };
> > 
> > For this part, the scope and flags are basically aligned with your previous
> > email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags
> > would be filled by vIOMMU emulator and be parsed by underlying iommu driver,
> > it is much more suitable to be defined in a uapi header file.
> 
> I tend to agree, defining a single structure in a new IOMMU UAPI file is
> better than having identical structures both in uapi/linux/vfio.h and
> linux/iommu.h. This way we avoid VFIO having to copy the same structure
> field by field. Arch-specific structures that go in
> iommu_tlb_invalidate.data also ought to be defined in uapi/linux/iommu.h

yes, it is.

> > Besides the reason above, I don't want VFIO engae too much on the data parsing.
> > If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate,
> > then both kernel space vfio and user space vfio needs to do much parsing. So I
> > may prefer the way above.
> 
> Would the entire validation of struct iommu_tlb_invalidate be offloaded to
> the IOMMU subsystem? Checking the structure sizes, copying from user, and
> validating the flags?

no, the copying from user and flag validation is still in VFIO. Basic idea is
still passing the iommu_tlb_invalidate.data to iommu sub-system.

> I guess it's mostly an implementation detail, but it might be better to
> keep this code in VFIO as well, even for the validation of
> iommu_tlb_invalidate.data (which would require VFIO to keep track of the
> model used during bind). This way VFIO sanitizes what comes from
> userspace, whilst the IOMMU subsystem only deals with valid kernel
> structures, and another subsystem could easily reuse the
> iommu_tlb_invalidate API.

it's a good idae. may think about it. VFIO should also be able to parse the
generic part of the iommu_tlb_invalidate.data.


Thanks,
Yi L

> The IOMMU subsystem would still validate the meaning of the fields, for
> instance whether a given combination of flag is allowed or if the PASID
> exists.
> 
> Thanks,
> Jean
> 
> > If you've got any other idea, pls feel free to post it. It's welcomed.
> > 
> > Thanks,
> > Yi L
> > 
> >> Hi Yi,
> >>
> >> On 26/04/17 11:12, Liu, Yi L wrote:
> >>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com>
> >>>
> >>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
> >>> invalidate request from guest to host.
> >>>
> >>> In the case of SVM virtualization on VT-d, host IOMMU driver has
> >>> no knowledge of caching structure updates unless the guest
> >>> invalidation activities are passed down to the host. So a new
> >>> IOCTL is needed to propagate the guest cache invalidation through
> >>> VFIO.
> >>>
> >>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> >>> ---
> >>>  include/uapi/linux/vfio.h | 9 +++++++++
> >>>  1 file changed, 9 insertions(+)
> >>>
> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>> index 6b97987..50c51f8 100644
> >>> --- a/include/uapi/linux/vfio.h
> >>> +++ b/include/uapi/linux/vfio.h
> >>> @@ -564,6 +564,15 @@ struct vfio_device_svm {
> >>>  
> >>>  #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
> >>>  
> >>> +/* For IOMMU TLB Invalidation Propagation */
> >>> +struct vfio_iommu_tlb_invalidate {
> >>> +	__u32	argsz;
> >>> +	__u32	length;
> >>> +	__u8	data[];
> >>> +};
> >>
> >> We initially discussed something a little more generic than this, with
> >> most info explicitly described and only pIOMMU-specific quirks and hints
> >> in an opaque structure. Out of curiosity, why the change? I'm not against
> >> a fully opaque structure, but there seem to be a large overlap between TLB
> >> invalidations across architectures.
> >>
> >>
> >> For what it's worth, when prototyping the paravirtualized IOMMU I came up
> >> with the following.
> >>
> >> (From the paravirtualized POV, the SMMU also has to swizzle endianess
> >> after unpacking an opaque structure, since userspace doesn't know what's
> >> in it and guest might use a different endianess. So we need to force all
> >> opaque data to be e.g. little-endian.)
> >>
> >> struct vfio_iommu_tlb_invalidate {
> >> 	__u32	argsz;
> >> 	__u32	scope;
> >> 	__u32	flags;
> >> 	__u32	pasid;
> >> 	__u64	vaddr;
> >> 	__u64	size;
> >> 	__u8	data[];
> >> };
> >>
> >> Scope is a bitfields restricting the invalidation scope. By default
> >> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
> >> and @size are unused.
> >>
> >> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
> >> scope to the pasid described by @pasid.
> >> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
> >> scope to the address range described by (@vaddr, @size).
> >>
> >> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
> >> range for *all* pasids (as well as no_pasid). Setting scope =
> >> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
> >> the VA range only for @pasid.
> >>
> >> Flags depend on the selected scope:
> >>
> >> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
> >> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
> >> exclusively (some architectures, e.g. SMMU, allow this)>>
> >> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
> >> to invalidate all intermediate tables cached as part of the PTW for vaddr,
> >> only the last-level entry (pte). This is a hint.
> >>
> >> I guess what's missing for Intel IOMMU and would go in @data is the
> >> "global" hint (which we don't have in SMMU invalidations). Do you see
> >> anything else, that the pIOMMU cannot deduce from this structure?
> >>
> >> Thanks,
> >> Jean
> >>
> >>
> >>> +#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
> >>> +
> >>>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >>>  
> >>>  /*
> >>>
> >>
> >>
> 
>

next prev parent reply	other threads:[~2017-07-04 10:45 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26 10:11 [Qemu-devel] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d Liu, Yi L
2017-04-26 10:11 ` [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function Liu, Yi L
2017-04-26 16:56   ` Jean-Philippe Brucker
2017-04-26 18:29     ` jacob pan
2017-04-26 18:59       ` Jean-Philippe Brucker
2017-04-27  6:36     ` Liu, Yi L
2017-04-27 10:12       ` Jean-Philippe Brucker
2017-04-28  7:59         ` Liu, Yi L
2017-04-28  9:04     ` Liu, Yi L
2017-04-28 12:51       ` Jean-Philippe Brucker
2017-05-23  7:50         ` Liu, Yi L
2017-05-25 12:33           ` Jean-Philippe Brucker
2017-05-12 21:59   ` Alex Williamson
2017-05-14 10:56     ` Liu, Yi L
2017-04-26 10:11 ` [Qemu-devel] [RFC PATCH 2/8] iommu/vt-d: add bind_pasid_table function Liu, Yi L
2017-05-12 21:59   ` Alex Williamson
2017-05-15 13:14     ` jacob pan
2017-04-26 10:12 ` [Qemu-devel] [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function Liu, Yi L
2017-05-12 21:59   ` Alex Williamson
2017-05-17 10:23     ` Liu, Yi L
2017-04-26 10:12 ` [Qemu-devel] [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function Liu, Yi L
2017-05-12 21:59   ` Alex Williamson
2017-05-17 10:24     ` Liu, Yi L
2017-04-26 10:12 ` [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation Liu, Yi L
2017-04-26 16:56   ` Jean-Philippe Brucker
2017-04-27  5:43     ` Liu, Yi L
2017-05-11 10:29   ` Liu, Yi L
2017-05-12 21:58   ` Alex Williamson
2017-05-17 10:27     ` Liu, Yi L
2017-05-18 11:29       ` Jean-Philippe Brucker
2017-04-26 10:12 ` [Qemu-devel] [RFC PATCH 6/8] VFIO: do pasid table binding Liu, Yi L
2017-05-09  7:55   ` Xiao Guangrong
2017-05-11 10:29     ` Liu, Yi L
2017-05-12 21:59   ` Alex Williamson
2017-04-26 10:12 ` [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation Liu, Yi L
2017-05-12 12:11   ` Jean-Philippe Brucker
2017-05-14 10:12     ` Liu, Yi L
2017-05-15 12:14       ` Jean-Philippe Brucker
2017-07-02 10:06     ` Liu, Yi L
2017-07-03 11:52       ` Jean-Philippe Brucker
2017-07-03 10:31         ` Liu, Yi L [this message]
2017-07-05  6:45           ` Tian, Kevin
2017-07-05 12:42             ` Jean-Philippe Brucker
2017-07-05 17:28               ` Alex Williamson
2017-07-05 22:26                 ` Tian, Kevin
2017-07-14  8:58                 ` Liu, Yi L
2017-07-14 18:15                   ` Alex Williamson
2017-07-17 10:58                     ` Liu, Yi L
2017-07-17 22:45                       ` Alex Williamson
2017-07-18  9:38                         ` Jean-Philippe Brucker
2017-07-18 14:29                           ` Alex Williamson
2017-07-18 15:03                             ` Jean-Philippe Brucker
2017-07-19 10:45                         ` Liu, Yi L
2017-07-19 21:50                           ` Jacob Pan
2017-07-05 22:31               ` Tian, Kevin
2017-05-12 21:58   ` Alex Williamson
2017-05-14 10:55     ` Liu, Yi L
2017-07-05  5:32       ` Tian, Kevin
2017-04-26 10:12 ` [Qemu-devel] [RFC PATCH 8/8] VFIO: do IOMMU TLB invalidation from guest Liu, Yi L
2017-05-08  4:09 ` [Qemu-devel] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d Xiao Guangrong
2017-05-07  7:33   ` Liu, Yi L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170703103115.GB22053@gmail.com \
    --to=yi.l.liu@linux.intel.com \
    --cc=Will.Deacon@arm.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe.brucker@arm.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).