From: Peter Xu <peterx@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org, Tian Kevin <kevin.tian@intel.com>,
Alex Williamson <alex.williamson@redhat.com>,
Jintack Lim <jintack@cs.columbia.edu>,
Jason Wang <jasowang@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes
Date: Fri, 18 May 2018 14:34:17 +0800 [thread overview]
Message-ID: <20180518063417.GL2569@xz-mi> (raw)
In-Reply-To: <20180518000204-mutt-send-email-mst@kernel.org>
On Fri, May 18, 2018 at 12:04:04AM +0300, Michael S. Tsirkin wrote:
> On Thu, May 17, 2018 at 04:59:15PM +0800, Peter Xu wrote:
> > (Hello, Jintack, Feel free to test this branch again against your scp
> > error case when you got free time)
> >
> > I rewrote some of the patches in V3. Major changes:
> >
> > - Dropped mergable interval tree, instead introduced IOVA tree, which
> > is even simpler.
> >
> > - Fix the scp error issue that Jintack reported. Please see patches
> > for detailed information. That's the major reason to rewrite a few
> > of the patches. We use replay for domain flushes are possibly
> > incorrect in the past. The thing is that IOMMU replay has an
> > "definition" that "we should only send MAP when new page detected",
> > while for shadow page syncing we actually need something else than
> > that. So in this version I started to use a new
> > vtd_sync_shadow_page_table() helper to do the page sync.
> >
> > - Some other refines after the refactoring.
> >
> > I'll add unit test for the IOVA tree after this series merged to make
> > sure we won't switch to another new tree implementaion...
> >
> > The element size in the new IOVA tree should be around
> > sizeof(GTreeNode + IOMMUTLBEntry) ~= (5*8+4*8) = 72 bytes. So the
> > worst case usage ratio would be 72/4K=2%, which still seems acceptable
> > (it means 8G L2 guest will use 8G*2%=160MB as metadata to maintain the
> > mapping in QEMU).
> >
> > I did explicit test with scp this time, copying 1G sized file for >10
> > times on each of the following case:
> >
> > - L1 guest, with vIOMMU and with assigned device
> > - L2 guest, without vIOMMU and with assigned device
> > - L2 guest, with vIOMMU (so 3-layer nested IOMMU) and with assigned device
> >
> > Please review. Thanks,
> >
> > (Below are old content from previous cover letter)
> >
> > ==========================
> >
> > v2:
> > - fix patchew code style warnings
> > - interval tree: postpone malloc when inserting; simplify node remove
> > a bit where proper [Jason]
> > - fix up comment and commit message for iommu lock patch [Kevin]
> > - protect context cache too using the iommu lock [Kevin, Jason]
> > - add vast comment in patch 8 to explain the modify-PTE problem
> > [Jason, Kevin]
> >
> > Online repo:
> >
> > https://github.com/xzpeter/qemu/tree/fix-vtd-dma
> >
> > This series fixes several major problems that current code has:
> >
> > - Issue 1: when getting very big PSI UNMAP invalidations, the current
> > code is buggy in that we might skip the notification while actually
> > we should always send that notification.
>
> security issue
>
> > - Issue 2: IOTLB is not thread safe, while block dataplane can be
> > accessing and updating it in parallel.
>
> security issue
>
> > - Issue 3: For devices that only registered with UNMAP-only notifiers,
> > we don't really need to do page walking for PSIs, we can directly
> > deliver the notification down. For example, vhost.
>
> optimization
>
> > - Issue 4: unsafe window for MAP notified devices like vfio-pci (and
> > in the future, vDPA as well). The problem is that, now for domain
> > invalidations we do this to make sure the shadow page tables are
> > correctly synced:
> >
> > 1. unmap the whole address space
> > 2. replay the whole address space, map existing pages
> >
> > However during step 1 and 2 there will be a very tiny window (it can
> > be as big as 3ms) that the shadow page table is either invalid or
> > incomplete (since we're rebuilding it up). That's fatal error since
> > devices never know that happending and it's still possible to DMA to
> > memories.
>
> correctness but not a security issue
>
> > Patch 1 fixes issue 1. I put it at the first since it's picked from
> > an old post.
> >
> > Patch 2 is a cleanup to remove useless IntelIOMMUNotifierNode struct.
> >
> > Patch 3 fixes issue 2.
> >
> > Patch 4 fixes issue 3.
> >
> > Patch 5-9 fix issue 4. Here a very simple interval tree is
> > implemented based on Gtree. It's different with general interval tree
> > in that it does not allow user to pass in private data (e.g.,
> > translated addresses). However that benefits us that then we can
> > merge adjacent interval leaves so that hopefully we won't consume much
> > memory even if the mappings are a lot (that happens for nested virt -
> > when mapping the whole L2 guest RAM range, it can be at least in GBs).
> >
> > Patch 10 is another big cleanup only can work after patch 9.
>
>
> So 1-2 are needed on stable. 1-9 would be nice to have
> there too, even though they are big and it looks risky.
Yes, although issue 4 is not a security issue, but it might cause DMA
errors and unusability of devices to happen. I don't know very much
on the details of how stable tree should treat patches like this, but
considering that this whole series only touches VT-d code, and as you
mentioned merely all the patches would be nice to have even for
stable, I'll just CC stable for all the patches, refine messages and
repost. Thanks,
--
Peter Xu
next prev parent reply other threads:[~2018-05-18 6:34 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-17 8:59 [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 01/12] intel-iommu: send PSI always even if across PDEs Peter Xu
2018-05-17 21:00 ` Michael S. Tsirkin
2018-05-18 8:23 ` Auger Eric
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 02/12] intel-iommu: remove IntelIOMMUNotifierNode Peter Xu
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 03/12] intel-iommu: add iommu lock Peter Xu
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 04/12] intel-iommu: only do page walk for MAP notifiers Peter Xu
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 05/12] intel-iommu: introduce vtd_page_walk_info Peter Xu
2018-05-18 8:23 ` Auger Eric
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 06/12] intel-iommu: pass in address space when page walk Peter Xu
2018-05-18 8:23 ` Auger Eric
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 07/12] intel-iommu: trace domain id during " Peter Xu
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 08/12] util: implement simple iova tree Peter Xu
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 09/12] intel-iommu: maintain per-device iova ranges Peter Xu
2018-05-17 9:46 ` Peter Xu
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 10/12] intel-iommu: simplify page walk logic Peter Xu
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 11/12] intel-iommu: new vtd_sync_shadow_page_table_range Peter Xu
2018-05-17 8:59 ` [Qemu-devel] [PATCH v3 12/12] intel-iommu: new sync_shadow_page_table Peter Xu
2018-05-17 21:06 ` Michael S. Tsirkin
2018-05-18 6:22 ` Peter Xu
2018-05-17 19:49 ` [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes Jintack Lim
2018-05-18 6:26 ` Peter Xu
2018-05-18 6:28 ` Peter Xu
2018-05-17 21:04 ` Michael S. Tsirkin
2018-05-18 6:34 ` Peter Xu [this message]
2018-05-17 21:08 ` Michael S. Tsirkin
2018-05-18 6:30 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180518063417.GL2569@xz-mi \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=jasowang@redhat.com \
--cc=jintack@cs.columbia.edu \
--cc=kevin.tian@intel.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).