qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Jason Wang <jasowang@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Jintack Lim <jintack@cs.columbia.edu>
Subject: Re: [Qemu-devel] [PATCH 08/10] intel-iommu: maintain per-device iova ranges
Date: Fri, 27 Apr 2018 15:28:10 +0800	[thread overview]
Message-ID: <20180427072810.GB13269@xz-mi> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D19113B5D2@SHSMSX101.ccr.corp.intel.com>

On Fri, Apr 27, 2018 at 07:02:14AM +0000, Tian, Kevin wrote:
> > From: Jason Wang [mailto:jasowang@redhat.com]
> > Sent: Friday, April 27, 2018 2:08 PM
> > 
> > On 2018年04月25日 12:51, Peter Xu wrote:
> > > For each VTDAddressSpace, now we maintain what IOVA ranges we have
> > > mapped and what we have not.  With that information, now we only
> > send
> > > MAP or UNMAP when necessary.  Say, we don't send MAP notifies if we
> > know
> > > we have already mapped the range, meanwhile we don't send UNMAP
> > notifies
> > > if we know we never mapped the range at all.
> > >
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >   include/hw/i386/intel_iommu.h |  2 ++
> > >   hw/i386/intel_iommu.c         | 28 ++++++++++++++++++++++++++++
> > >   hw/i386/trace-events          |  2 ++
> > >   3 files changed, 32 insertions(+)
> > >
> > > diff --git a/include/hw/i386/intel_iommu.h
> > b/include/hw/i386/intel_iommu.h
> > > index 486e205e79..09a2e94404 100644
> > > --- a/include/hw/i386/intel_iommu.h
> > > +++ b/include/hw/i386/intel_iommu.h
> > > @@ -27,6 +27,7 @@
> > >   #include "hw/i386/ioapic.h"
> > >   #include "hw/pci/msi.h"
> > >   #include "hw/sysbus.h"
> > > +#include "qemu/interval-tree.h"
> > >
> > >   #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
> > >   #define INTEL_IOMMU_DEVICE(obj) \
> > > @@ -95,6 +96,7 @@ struct VTDAddressSpace {
> > >       QLIST_ENTRY(VTDAddressSpace) next;
> > >       /* Superset of notifier flags that this address space has */
> > >       IOMMUNotifierFlag notifier_flags;
> > > +    ITTree *iova_tree;          /* Traces mapped IOVA ranges */
> > >   };
> > >
> > >   struct VTDBus {
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index a19c18b8d4..8f396a5d13 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -768,12 +768,37 @@ typedef struct {
> > >   static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> > >                                vtd_page_walk_info *info)
> > >   {
> > > +    VTDAddressSpace *as = info->as;
> > >       vtd_page_walk_hook hook_fn = info->hook_fn;
> > >       void *private = info->private;
> > > +    ITRange *mapped = it_tree_find(as->iova_tree, entry->iova,
> > > +                                   entry->iova + entry->addr_mask);
> > >
> > >       assert(hook_fn);
> > > +
> > > +    /* Update local IOVA mapped ranges */
> > > +    if (entry->perm) {
> > > +        if (mapped) {
> > > +            /* Skip since we have already mapped this range */
> > > +            trace_vtd_page_walk_one_skip_map(entry->iova, entry-
> > >addr_mask,
> > > +                                             mapped->start, mapped->end);
> > > +            return 0;
> > > +        }
> > > +        it_tree_insert(as->iova_tree, entry->iova,
> > > +                       entry->iova + entry->addr_mask);
> > 
> > I was consider a case e.g:
> > 
> > 1) map A (iova) to B (pa)
> > 2) invalidate A
> > 3) map A (iova) to C (pa)
> > 4) invalidate A
> > 
> > In this case, we will probably miss a walk here. But I'm not sure it was
> > allowed by the spec (though I think so).
> > 

Hi, Kevin,

Thanks for joining the discussion.

> 
> I thought it was wrong in a glimpse, but then changed my mind after
> another thinking. As long as device driver can quiescent the device
> to not access A (iova) within above window, then above sequence
> has no problem since any stale mappings (A->B) added before step 4)
> won't be used and then will get flushed after step 4). Regarding to
> that actually the 1st invalidation can be skipped:
> 
> 1) map A (iova) to B (pa)
> 2) driver programs device to use A
> 3) driver programs device to not use A
> 4) map A (iova) to C (pa)
> 	A->B may be still valid in IOTLB
> 5) invalidate A
> 6) driver programs device to use A

Note that IMHO this is a bit different from Jason's example, and it'll
be fine.  Current code should work well with this scenario since the
emulation code will not aware of the map A until step (5).  Then we'll
have the correct mapping.

While for Jason's example it's exactly the extra PSI that might cause
stale mappings (though again I think it's still problematic...).

Actually I think I can just fix up the code even if Jason's case
happens by unmapping that first then remap:

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 31e9b52452..2a9584f9d8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -778,13 +778,21 @@ static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
     /* Update local IOVA mapped ranges */
     if (entry->perm) {
         if (mapped) {
-            /* Skip since we have already mapped this range */
-            trace_vtd_page_walk_one_skip_map(entry->iova, entry->addr_mask,
-                                             mapped->start, mapped->end);
-            return 0;
+            int ret;
+            /* Cache the write permission */
+            IOMMUAccessFlags flags = entry->perm;
+
+            /* UNMAP the old first then remap.  No need to touch IOVA tree */
+            entry->perm = IOMMU_NONE;
+            ret = hook_fn(entry, private);
+            if (ret) {
+                return ret;
+            }
+            entry->perm = flags;
+        } else {
+            it_tree_insert(as->iova_tree, entry->iova,
+                           entry->iova + entry->addr_mask);
         }
-        it_tree_insert(as->iova_tree, entry->iova,
-                       entry->iova + entry->addr_mask);
     } else {
         if (!mapped) {
             /* Skip since we didn't map this range at all */

If we really think it necessary, I can squash this in, though this is
a bit ugly.  But I just want to confirm whether this would be anything
we want...

Thanks,

-- 
Peter Xu

  reply	other threads:[~2018-04-27  7:28 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-25  4:51 [Qemu-devel] [PATCH 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 01/10] intel-iommu: send PSI always even if across PDEs Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 02/10] intel-iommu: remove IntelIOMMUNotifierNode Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 03/10] intel-iommu: add iommu lock Peter Xu
2018-04-25 16:26   ` Emilio G. Cota
2018-04-26  5:45     ` Peter Xu
2018-04-27  5:13   ` Jason Wang
2018-04-27  6:26     ` Peter Xu
2018-04-27  7:19       ` Tian, Kevin
2018-04-27  9:53         ` Peter Xu
2018-04-28  1:54           ` Tian, Kevin
2018-04-28  1:43       ` Jason Wang
2018-04-28  2:24         ` Peter Xu
2018-04-28  2:42           ` Jason Wang
2018-04-28  3:06             ` Peter Xu
2018-04-28  3:11               ` Jason Wang
2018-04-28  3:14             ` Peter Xu
2018-04-28  3:16               ` Jason Wang
2018-04-30  7:22               ` Paolo Bonzini
2018-04-30  7:20           ` Paolo Bonzini
2018-05-03  5:39             ` Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 04/10] intel-iommu: only do page walk for MAP notifiers Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 05/10] intel-iommu: introduce vtd_page_walk_info Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 06/10] intel-iommu: pass in address space when page walk Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 07/10] util: implement simple interval tree logic Peter Xu
2018-04-27  5:53   ` Jason Wang
2018-04-27  6:27     ` Peter Xu
2018-05-03  7:10     ` Peter Xu
2018-05-03  7:21       ` Jason Wang
2018-05-03  7:30         ` Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 08/10] intel-iommu: maintain per-device iova ranges Peter Xu
2018-04-27  6:07   ` Jason Wang
2018-04-27  6:34     ` Peter Xu
2018-04-27  7:02     ` Tian, Kevin
2018-04-27  7:28       ` Peter Xu [this message]
2018-04-27  7:44         ` Tian, Kevin
2018-04-27  9:55           ` Peter Xu
2018-04-27 11:40             ` Peter Xu
2018-04-27 23:37               ` Tian, Kevin
2018-05-03  6:04                 ` Peter Xu
2018-05-03  7:20                   ` Jason Wang
2018-05-03  7:28                     ` Peter Xu
2018-05-03  7:43                       ` Jason Wang
2018-05-03  7:53                         ` Peter Xu
2018-05-03  9:22                           ` Jason Wang
2018-05-03  9:53                             ` Peter Xu
2018-05-03 12:01                               ` Peter Xu
2018-04-28  1:49               ` Jason Wang
2018-04-25  4:51 ` [Qemu-devel] [PATCH 09/10] intel-iommu: don't unmap all for shadow page table Peter Xu
2018-04-25  4:51 ` [Qemu-devel] [PATCH 10/10] intel-iommu: remove notify_unmap for page walk Peter Xu
2018-04-25  5:05 ` [Qemu-devel] [PATCH 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes no-reply
2018-04-25  5:34   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180427072810.GB13269@xz-mi \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jintack@cs.columbia.edu \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).