[Qemu-devel] [PATCH v3 01/12] intel-iommu: send PSI always even if across PDEs

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: Tian Kevin <kevin.tian@intel.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Jintack Lim <jintack@cs.columbia.edu>,
	peterx@redhat.com, Jason Wang <jasowang@redhat.com>
Subject: [Qemu-devel] [PATCH v3 01/12] intel-iommu: send PSI always even if across PDEs
Date: Thu, 17 May 2018 16:59:16 +0800	[thread overview]
Message-ID: <20180517085927.24925-2-peterx@redhat.com> (raw)
In-Reply-To: <20180517085927.24925-1-peterx@redhat.com>

During IOVA page table walking, there is a special case when the PSI
covers one whole PDE (Page Directory Entry, which contains 512 Page
Table Entries) or more.  In the past, we skip that entry and we don't
notify the IOMMU notifiers.  This is not correct.  We should send UNMAP
notification to registered UNMAP notifiers in this case.

For UNMAP only notifiers, this might cause IOTLBs cached in the devices
even if they were already invalid.  For MAP/UNMAP notifiers like
vfio-pci, this will cause stale page mappings.

This special case doesn't trigger often, but it is very easy to be
triggered by nested device assignments, since in that case we'll
possibly map the whole L2 guest RAM region into the device's IOVA
address space (several GBs at least), which is far bigger than normal
kernel driver usages of the device (tens of MBs normally).

Without this patch applied to L1 QEMU, nested device assignment to L2
guests will dump some errors like:

qemu-system-x86_64: VFIO_MAP_DMA: -17
qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
                    0x7f89a920d000) = -17 (File exists)

Acked-by: Jason Wang <jasowang@redhat.com>
[peterx: rewrite the commit message]
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fb31de9416..b359efd6f9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
 
 typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
 
+static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
+                             vtd_page_walk_hook hook_fn, void *private)
+{
+    assert(hook_fn);
+    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
+                            entry->addr_mask, entry->perm);
+    return hook_fn(entry, private);
+}
+
 /**
  * vtd_page_walk_level - walk over specific level for IOVA range
  *
@@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
          */
         entry_valid = read_cur | write_cur;
 
+        entry.target_as = &address_space_memory;
+        entry.iova = iova & subpage_mask;
+        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+        entry.addr_mask = ~subpage_mask;
+
         if (vtd_is_last_slpte(slpte, level)) {
-            entry.target_as = &address_space_memory;
-            entry.iova = iova & subpage_mask;
             /* NOTE: this is only meaningful if entry_valid == true */
             entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
-            entry.addr_mask = ~subpage_mask;
-            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
             if (!entry_valid && !notify_unmap) {
                 trace_vtd_page_walk_skip_perm(iova, iova_next);
                 goto next;
             }
-            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
-                                    entry.addr_mask, entry.perm);
-            if (hook_fn) {
-                ret = hook_fn(&entry, private);
-                if (ret < 0) {
-                    return ret;
-                }
+            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+            if (ret < 0) {
+                return ret;
             }
         } else {
             if (!entry_valid) {
-                trace_vtd_page_walk_skip_perm(iova, iova_next);
+                if (notify_unmap) {
+                    /*
+                     * The whole entry is invalid; unmap it all.
+                     * Translated address is meaningless, zero it.
+                     */
+                    entry.translated_addr = 0x0;
+                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+                    if (ret < 0) {
+                        return ret;
+                    }
+                } else {
+                    trace_vtd_page_walk_skip_perm(iova, iova_next);
+                }
                 goto next;
             }
             ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
-- 
2.17.0

next prev parent reply	other threads:[~2018-05-17  8:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-17  8:59 [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
2018-05-17  8:59 ` Peter Xu [this message]
2018-05-17 21:00   ` [Qemu-devel] [PATCH v3 01/12] intel-iommu: send PSI always even if across PDEs Michael S. Tsirkin
2018-05-18  8:23   ` Auger Eric
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 02/12] intel-iommu: remove IntelIOMMUNotifierNode Peter Xu
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 03/12] intel-iommu: add iommu lock Peter Xu
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 04/12] intel-iommu: only do page walk for MAP notifiers Peter Xu
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 05/12] intel-iommu: introduce vtd_page_walk_info Peter Xu
2018-05-18  8:23   ` Auger Eric
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 06/12] intel-iommu: pass in address space when page walk Peter Xu
2018-05-18  8:23   ` Auger Eric
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 07/12] intel-iommu: trace domain id during " Peter Xu
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 08/12] util: implement simple iova tree Peter Xu
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 09/12] intel-iommu: maintain per-device iova ranges Peter Xu
2018-05-17  9:46   ` Peter Xu
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 10/12] intel-iommu: simplify page walk logic Peter Xu
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 11/12] intel-iommu: new vtd_sync_shadow_page_table_range Peter Xu
2018-05-17  8:59 ` [Qemu-devel] [PATCH v3 12/12] intel-iommu: new sync_shadow_page_table Peter Xu
2018-05-17 21:06   ` Michael S. Tsirkin
2018-05-18  6:22     ` Peter Xu
2018-05-17 19:49 ` [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes Jintack Lim
2018-05-18  6:26   ` Peter Xu
2018-05-18  6:28     ` Peter Xu
2018-05-17 21:04 ` Michael S. Tsirkin
2018-05-18  6:34   ` Peter Xu
2018-05-17 21:08 ` Michael S. Tsirkin
2018-05-18  6:30   ` Peter Xu

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:fb31de941 dfblob:b359efd6f )
 OR (
bs:"[Qemu-devel] [PATCH v3 01/12] intel-iommu: send PSI always even if across PDEs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180517085927.24925-2-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jintack@cs.columbia.edu \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).