From: Peter Xu <peterx@redhat.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Jason Wang <jasowang@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"Michael S . Tsirkin" <mst@redhat.com>,
Alex Williamson <alex.williamson@redhat.com>,
Jintack Lim <jintack@cs.columbia.edu>
Subject: Re: [Qemu-devel] [PATCH 08/10] intel-iommu: maintain per-device iova ranges
Date: Fri, 27 Apr 2018 17:55:27 +0800 [thread overview]
Message-ID: <20180427095527.GE13269@xz-mi> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D19113C1C9@SHSMSX101.ccr.corp.intel.com>
On Fri, Apr 27, 2018 at 07:44:07AM +0000, Tian, Kevin wrote:
> > From: Peter Xu [mailto:peterx@redhat.com]
> > Sent: Friday, April 27, 2018 3:28 PM
> >
> > On Fri, Apr 27, 2018 at 07:02:14AM +0000, Tian, Kevin wrote:
> > > > From: Jason Wang [mailto:jasowang@redhat.com]
> > > > Sent: Friday, April 27, 2018 2:08 PM
> > > >
> > > > On 2018年04月25日 12:51, Peter Xu wrote:
> > > > > For each VTDAddressSpace, now we maintain what IOVA ranges we
> > have
> > > > > mapped and what we have not. With that information, now we only
> > > > send
> > > > > MAP or UNMAP when necessary. Say, we don't send MAP notifies if
> > we
> > > > know
> > > > > we have already mapped the range, meanwhile we don't send
> > UNMAP
> > > > notifies
> > > > > if we know we never mapped the range at all.
> > > > >
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > > ---
> > > > > include/hw/i386/intel_iommu.h | 2 ++
> > > > > hw/i386/intel_iommu.c | 28 ++++++++++++++++++++++++++++
> > > > > hw/i386/trace-events | 2 ++
> > > > > 3 files changed, 32 insertions(+)
> > > > >
> > > > > diff --git a/include/hw/i386/intel_iommu.h
> > > > b/include/hw/i386/intel_iommu.h
> > > > > index 486e205e79..09a2e94404 100644
> > > > > --- a/include/hw/i386/intel_iommu.h
> > > > > +++ b/include/hw/i386/intel_iommu.h
> > > > > @@ -27,6 +27,7 @@
> > > > > #include "hw/i386/ioapic.h"
> > > > > #include "hw/pci/msi.h"
> > > > > #include "hw/sysbus.h"
> > > > > +#include "qemu/interval-tree.h"
> > > > >
> > > > > #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
> > > > > #define INTEL_IOMMU_DEVICE(obj) \
> > > > > @@ -95,6 +96,7 @@ struct VTDAddressSpace {
> > > > > QLIST_ENTRY(VTDAddressSpace) next;
> > > > > /* Superset of notifier flags that this address space has */
> > > > > IOMMUNotifierFlag notifier_flags;
> > > > > + ITTree *iova_tree; /* Traces mapped IOVA ranges */
> > > > > };
> > > > >
> > > > > struct VTDBus {
> > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > > > index a19c18b8d4..8f396a5d13 100644
> > > > > --- a/hw/i386/intel_iommu.c
> > > > > +++ b/hw/i386/intel_iommu.c
> > > > > @@ -768,12 +768,37 @@ typedef struct {
> > > > > static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> > > > > vtd_page_walk_info *info)
> > > > > {
> > > > > + VTDAddressSpace *as = info->as;
> > > > > vtd_page_walk_hook hook_fn = info->hook_fn;
> > > > > void *private = info->private;
> > > > > + ITRange *mapped = it_tree_find(as->iova_tree, entry->iova,
> > > > > + entry->iova + entry->addr_mask);
> > > > >
> > > > > assert(hook_fn);
> > > > > +
> > > > > + /* Update local IOVA mapped ranges */
> > > > > + if (entry->perm) {
> > > > > + if (mapped) {
> > > > > + /* Skip since we have already mapped this range */
> > > > > + trace_vtd_page_walk_one_skip_map(entry->iova, entry-
> > > > >addr_mask,
> > > > > + mapped->start, mapped->end);
> > > > > + return 0;
> > > > > + }
> > > > > + it_tree_insert(as->iova_tree, entry->iova,
> > > > > + entry->iova + entry->addr_mask);
> > > >
> > > > I was consider a case e.g:
> > > >
> > > > 1) map A (iova) to B (pa)
> > > > 2) invalidate A
> > > > 3) map A (iova) to C (pa)
> > > > 4) invalidate A
> > > >
> > > > In this case, we will probably miss a walk here. But I'm not sure it was
> > > > allowed by the spec (though I think so).
> > > >
> >
> > Hi, Kevin,
> >
> > Thanks for joining the discussion.
> >
> > >
> > > I thought it was wrong in a glimpse, but then changed my mind after
> > > another thinking. As long as device driver can quiescent the device
> > > to not access A (iova) within above window, then above sequence
> > > has no problem since any stale mappings (A->B) added before step 4)
> > > won't be used and then will get flushed after step 4). Regarding to
> > > that actually the 1st invalidation can be skipped:
> > >
> > > 1) map A (iova) to B (pa)
> > > 2) driver programs device to use A
> > > 3) driver programs device to not use A
> > > 4) map A (iova) to C (pa)
> > > A->B may be still valid in IOTLB
> > > 5) invalidate A
> > > 6) driver programs device to use A
> >
> > Note that IMHO this is a bit different from Jason's example, and it'll
> > be fine. Current code should work well with this scenario since the
> > emulation code will not aware of the map A until step (5). Then we'll
> > have the correct mapping.
>
> you are right. we still need the extra PSI otherwise the 1st mapping
> is problematic for use. So back to Jason's example.
>
> >
> > While for Jason's example it's exactly the extra PSI that might cause
> > stale mappings (though again I think it's still problematic...).
>
> problematic in software side (e.g. that way IOMMU core relies on
> device drivers which conflict with the value of using IOMMU) but
> it is OK from hardware p.o.v. btw the extra PSI itself doesn't cause
> stale mapping. Instead it is device activity after that PSI may cause it.
>
> >
> > Actually I think I can just fix up the code even if Jason's case
> > happens by unmapping that first then remap:
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 31e9b52452..2a9584f9d8 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -778,13 +778,21 @@ static int vtd_page_walk_one(IOMMUTLBEntry
> > *entry, int level,
> > /* Update local IOVA mapped ranges */
> > if (entry->perm) {
> > if (mapped) {
> > - /* Skip since we have already mapped this range */
> > - trace_vtd_page_walk_one_skip_map(entry->iova, entry-
> > >addr_mask,
> > - mapped->start, mapped->end);
> > - return 0;
> > + int ret;
> > + /* Cache the write permission */
> > + IOMMUAccessFlags flags = entry->perm;
> > +
> > + /* UNMAP the old first then remap. No need to touch IOVA tree */
> > + entry->perm = IOMMU_NONE;
> > + ret = hook_fn(entry, private);
> > + if (ret) {
> > + return ret;
> > + }
> > + entry->perm = flags;
> > + } else {
> > + it_tree_insert(as->iova_tree, entry->iova,
> > + entry->iova + entry->addr_mask);
> > }
> > - it_tree_insert(as->iova_tree, entry->iova,
> > - entry->iova + entry->addr_mask);
> > } else {
> > if (!mapped) {
> > /* Skip since we didn't map this range at all */
> >
> > If we really think it necessary, I can squash this in, though this is
> > a bit ugly. But I just want to confirm whether this would be anything
> > we want...
> >
>
> I didn’t look into your actual code yet. If others think above
> change is OK then it's definitely good as we conform to hardware
> behavior here. Otherwise if there is a way to detect such unusual
> usage and then adopt some action (say kill the VM), it's also fine
> since user knows he runs a bad OS which is not supported by
> Qemu. It's just not good if such situation is not handled, which
> leads to some undefined behavior which nobody knows the reason
> w/o hard debug into. :-)
Yeah, then let me do this. :)
Jason, would you be good with above change squashed?
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2018-04-27 9:55 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-25 4:51 [Qemu-devel] [PATCH 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 01/10] intel-iommu: send PSI always even if across PDEs Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 02/10] intel-iommu: remove IntelIOMMUNotifierNode Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 03/10] intel-iommu: add iommu lock Peter Xu
2018-04-25 16:26 ` Emilio G. Cota
2018-04-26 5:45 ` Peter Xu
2018-04-27 5:13 ` Jason Wang
2018-04-27 6:26 ` Peter Xu
2018-04-27 7:19 ` Tian, Kevin
2018-04-27 9:53 ` Peter Xu
2018-04-28 1:54 ` Tian, Kevin
2018-04-28 1:43 ` Jason Wang
2018-04-28 2:24 ` Peter Xu
2018-04-28 2:42 ` Jason Wang
2018-04-28 3:06 ` Peter Xu
2018-04-28 3:11 ` Jason Wang
2018-04-28 3:14 ` Peter Xu
2018-04-28 3:16 ` Jason Wang
2018-04-30 7:22 ` Paolo Bonzini
2018-04-30 7:20 ` Paolo Bonzini
2018-05-03 5:39 ` Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 04/10] intel-iommu: only do page walk for MAP notifiers Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 05/10] intel-iommu: introduce vtd_page_walk_info Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 06/10] intel-iommu: pass in address space when page walk Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 07/10] util: implement simple interval tree logic Peter Xu
2018-04-27 5:53 ` Jason Wang
2018-04-27 6:27 ` Peter Xu
2018-05-03 7:10 ` Peter Xu
2018-05-03 7:21 ` Jason Wang
2018-05-03 7:30 ` Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 08/10] intel-iommu: maintain per-device iova ranges Peter Xu
2018-04-27 6:07 ` Jason Wang
2018-04-27 6:34 ` Peter Xu
2018-04-27 7:02 ` Tian, Kevin
2018-04-27 7:28 ` Peter Xu
2018-04-27 7:44 ` Tian, Kevin
2018-04-27 9:55 ` Peter Xu [this message]
2018-04-27 11:40 ` Peter Xu
2018-04-27 23:37 ` Tian, Kevin
2018-05-03 6:04 ` Peter Xu
2018-05-03 7:20 ` Jason Wang
2018-05-03 7:28 ` Peter Xu
2018-05-03 7:43 ` Jason Wang
2018-05-03 7:53 ` Peter Xu
2018-05-03 9:22 ` Jason Wang
2018-05-03 9:53 ` Peter Xu
2018-05-03 12:01 ` Peter Xu
2018-04-28 1:49 ` Jason Wang
2018-04-25 4:51 ` [Qemu-devel] [PATCH 09/10] intel-iommu: don't unmap all for shadow page table Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 10/10] intel-iommu: remove notify_unmap for page walk Peter Xu
2018-04-25 5:05 ` [Qemu-devel] [PATCH 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes no-reply
2018-04-25 5:34 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180427095527.GE13269@xz-mi \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=jasowang@redhat.com \
--cc=jintack@cs.columbia.edu \
--cc=kevin.tian@intel.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).