[Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb_entry()
@ 2017-10-10  9:42 Maxime Coquelin
  2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 1/3] exec: add page_mask for flatview_do_translate Maxime Coquelin
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Maxime Coquelin @ 2017-10-10  9:42 UTC (permalink / raw)
  To: peterx, pbonzini, mst, jasowang, qemu-devel; +Cc: qemu-stable, Maxime Coquelin

This series is a rebase of the first two patches of Peter's series
improving address_space_get_iotlb_entry():
Message-Id: <1496404254-17429-1-git-send-email-peterx@redhat.com>

This third version sets initial page mask to ~0. In case of multiple iommus
chained on top of each other, the min page mask og the iommus is selected.
If no iommu, target's default page size is used (4K on x86_64).

This new revision also fixes an off-by-one error in memory notifier code,
spotted during code review, that could lead to the notifyee to receive
unexpected notifications for ranges it isn't registered to.

This series does not include Michael's suggestion to replace the use of page
masks by page length for IOTLB entries, to be able to support non power of
two page sizes. Idea is that it could be used for para-virtualized IOMMU
devices, but only para-virtualized device I'm aware of is the upcoming
virtio-iommu which also uses page masks. Moreover, these fixes are quite
urgent as they fix a regression which has a big impact on vhost performance.

As mentioned, the series is actually not only an improvement, but it fixes a
regression in the way IOTLB updates sent to the backends are generated.
The regression is introduced by patch:
a764040cc8 ("exec: abstract address_space_do_translate()")

Prior to patch a764040cc8, IOTLB entries sent to the backend were aligned on
the guest page boundaries (both addresses and size).
For example, with the guest using 2MB pages:
 * Backend sends IOTLB miss request for iova = 0x112378fb4
 * QEMU replies with an IOTLB update with iova = 0x112200000, size = 0x200000
 * Bakend insert above entry in its cache and compute the translation
In this case, if the backend needs later to translate 0x112378004, it will
result in a cache it and no need to send another IOTLB miss.

With patch a764040cc8, the addr of the IOTLB entry is the address requested
via the IOTLB miss, the size is computed to cover the remaining of the guest
page.
The same example gives:
 * Backend sends IOTLB miss request for iova = 0x112378fb4
 * QEMU replies with an IOTLB update with iova = 112378fb4, size = 0x8704c
 * Bakend insert above entry in its cache and compute the translation
In this case, if the backend needs later to translate 0x112378004, it will
result in another cache miss:
 * Backend sends IOTLB miss request for iova = 0x112378004
 * QEMU replies with an IOTLB update with iova = 0x112378004, size = 0x87FFC
 * Bakend insert above entry in its cache and compute the translation
It results in having much more IOTLB misses, and more importantly it pollutes
the device IOTLB cache by multiplying the number of entries that moreover
overlap.

Note that current Kernel & User backends implementation do not merge contiguous
and overlapping IOTLB entries at device IOTLB cache insertion.

This series fixes this regression, so that IOTLB updates are aligned on
guest's page boundaries.

Changes since v2:
=================
- Init page mask to ~0UL, and select the smallest mask in case of multiple
  iommu chained. If no iommu, use target's page mask. (Paolo)
- Add patch 3 to fix off-by-one error in notifier.

Changes since rebase:
=====================
- Fix page_mask initial value
- Apply Michael's on second patch

Maxime Coquelin (1):
  memory: fix off-by-one error in memory_region_notify_one()

Peter Xu (2):
  exec: add page_mask for flatview_do_translate
  exec: simplify address_space_get_iotlb_entry

 exec.c   | 80 +++++++++++++++++++++++++++++++++++++++++++---------------------
 memory.c |  2 +-
 2 files changed, 55 insertions(+), 27 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v3 1/3] exec: add page_mask for flatview_do_translate
  2017-10-10  9:42 [Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb_entry() Maxime Coquelin
@ 2017-10-10  9:42 ` Maxime Coquelin
  2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 2/3] exec: simplify address_space_get_iotlb_entry Maxime Coquelin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Maxime Coquelin @ 2017-10-10  9:42 UTC (permalink / raw)
  To: peterx, pbonzini, mst, jasowang, qemu-devel; +Cc: qemu-stable, Maxime Coquelin

From: Peter Xu <peterx@redhat.com>

The function is originally used for flatview_space_translate() and what
we care about most is (xlat, plen) range. However for iotlb requests, we
don't really care about "plen", but the size of the page that "xlat" is
located on. While, plen cannot really contain this information.

A simple example to show why "plen" is not good for IOTLB translations:

E.g., for huge pages, it is possible that guest mapped 1G huge page on
device side that used this GPA range:

  0x100000000 - 0x13fffffff

Then let's say we want to translate one IOVA that finally mapped to GPA
0x13ffffe00 (which is located on this 1G huge page). Then here we'll
get:

  (xlat, plen) = (0x13fffe00, 0x200)

So the IOTLB would be only covering a very small range since from
"plen" (which is 0x200 bytes) we cannot tell the size of the page.

Actually we can really know that this is a huge page - we just throw the
information away in flatview_do_translate().

This patch introduced "page_mask" optional parameter to capture that
page mask info. Also, I made "plen" an optional parameter as well, with
some comments for the whole function.

No functional change yet.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 exec.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 45 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 7a80460725..7697fc1bcc 100644
--- a/exec.c
+++ b/exec.c
@@ -467,11 +467,29 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
     return section;
 }
 
-/* Called from RCU critical section */
+/**
+ * flatview_do_translate - translate an address in FlatView
+ *
+ * @fv: the flat view that we want to translate on
+ * @addr: the address to be translated in above address space
+ * @xlat: the translated address offset within memory region. It
+ *        cannot be @NULL.
+ * @plen_out: valid read/write length of the translated address. It
+ *            can be @NULL when we don't care about it.
+ * @page_mask_out: page mask for the translated address. This
+ *            should only be meaningful for IOMMU translated
+ *            addresses, since there may be huge pages that this bit
+ *            would tell. It can be @NULL if we don't care about it.
+ * @is_write: whether the translation operation is for write
+ * @is_mmio: whether this can be MMIO, set true if it can
+ *
+ * This function is called from RCU critical section
+ */
 static MemoryRegionSection flatview_do_translate(FlatView *fv,
                                                  hwaddr addr,
                                                  hwaddr *xlat,
-                                                 hwaddr *plen,
+                                                 hwaddr *plen_out,
+                                                 hwaddr *page_mask_out,
                                                  bool is_write,
                                                  bool is_mmio,
                                                  AddressSpace **target_as)
@@ -480,11 +498,17 @@ static MemoryRegionSection flatview_do_translate(FlatView *fv,
     MemoryRegionSection *section;
     IOMMUMemoryRegion *iommu_mr;
     IOMMUMemoryRegionClass *imrc;
+    hwaddr page_mask = (hwaddr)(-1);
+    hwaddr plen = (hwaddr)(-1);
+
+    if (plen_out) {
+        plen = *plen_out;
+    }
 
     for (;;) {
         section = address_space_translate_internal(
                 flatview_to_dispatch(fv), addr, &addr,
-                plen, is_mmio);
+                &plen, is_mmio);
 
         iommu_mr = memory_region_get_iommu(section->mr);
         if (!iommu_mr) {
@@ -496,7 +520,8 @@ static MemoryRegionSection flatview_do_translate(FlatView *fv,
                                 IOMMU_WO : IOMMU_RO);
         addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
                 | (addr & iotlb.addr_mask));
-        *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
+        page_mask &= iotlb.addr_mask;
+        plen = MIN(plen, (addr | iotlb.addr_mask) - addr + 1);
         if (!(iotlb.perm & (1 << is_write))) {
             goto translate_fail;
         }
@@ -507,6 +532,19 @@ static MemoryRegionSection flatview_do_translate(FlatView *fv,
 
     *xlat = addr;
 
+    if (page_mask == (hwaddr)(-1)) {
+        /* Not behind an IOMMU, use default page size. */
+        page_mask = ~TARGET_PAGE_MASK;
+    }
+
+    if (page_mask_out) {
+        *page_mask_out = page_mask;
+    }
+
+    if (plen_out) {
+        *plen_out = plen;
+    }
+
     return *section;
 
 translate_fail:
@@ -525,7 +563,7 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
 
     /* This can never be MMIO. */
     section = flatview_do_translate(address_space_to_flatview(as), addr,
-                                    &xlat, &plen, is_write, false, &as);
+                                    &xlat, &plen, NULL, is_write, false, &as);
 
     /* Illegal translation */
     if (section.mr == &io_mem_unassigned) {
@@ -569,7 +607,8 @@ MemoryRegion *flatview_translate(FlatView *fv, hwaddr addr, hwaddr *xlat,
     AddressSpace *as = NULL;
 
     /* This can be MMIO, so setup MMIO bit. */
-    section = flatview_do_translate(fv, addr, xlat, plen, is_write, true, &as);
+    section = flatview_do_translate(fv, addr, xlat, plen, NULL,
+                                    is_write, true, &as);
     mr = section.mr;
 
     if (xen_enabled() && memory_access_is_direct(mr, is_write)) {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v3 2/3] exec: simplify address_space_get_iotlb_entry
  2017-10-10  9:42 [Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb_entry() Maxime Coquelin
  2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 1/3] exec: add page_mask for flatview_do_translate Maxime Coquelin
@ 2017-10-10  9:42 ` Maxime Coquelin
  2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 3/3] memory: fix off-by-one error in memory_region_notify_one() Maxime Coquelin
  2017-10-10 10:25 ` [Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb_entry() Paolo Bonzini
  3 siblings, 0 replies; 6+ messages in thread
From: Maxime Coquelin @ 2017-10-10  9:42 UTC (permalink / raw)
  To: peterx, pbonzini, mst, jasowang, qemu-devel; +Cc: qemu-stable, Maxime Coquelin

From: Peter Xu <peterx@redhat.com>

This patch let address_space_get_iotlb_entry() to use the newly
introduced page_mask parameter in flatview_do_translate(). Then we
will be sure the IOTLB can be aligned to page mask, also we should
nicely support huge pages now when introducing a764040.

Fixes: a764040 ("exec: abstract address_space_do_translate()")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 exec.c | 31 ++++++++++---------------------
 1 file changed, 10 insertions(+), 21 deletions(-)

diff --git a/exec.c b/exec.c
index 7697fc1bcc..890851a96f 100644
--- a/exec.c
+++ b/exec.c
@@ -556,14 +556,14 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
                                             bool is_write)
 {
     MemoryRegionSection section;
-    hwaddr xlat, plen;
+    hwaddr xlat, page_mask;
 
-    /* Try to get maximum page mask during translation. */
-    plen = (hwaddr)-1;
-
-    /* This can never be MMIO. */
-    section = flatview_do_translate(address_space_to_flatview(as), addr,
-                                    &xlat, &plen, NULL, is_write, false, &as);
+    /*
+     * This can never be MMIO, and we don't really care about plen,
+     * but page mask.
+     */
+    section = flatview_do_translate(address_space_to_flatview(as), addr, &xlat,
+                                    NULL, &page_mask, is_write, false, &as);
 
     /* Illegal translation */
     if (section.mr == &io_mem_unassigned) {
@@ -574,22 +574,11 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
     xlat += section.offset_within_address_space -
         section.offset_within_region;
 
-    if (plen == (hwaddr)-1) {
-        /*
-         * We use default page size here. Logically it only happens
-         * for identity mappings.
-         */
-        plen = TARGET_PAGE_SIZE;
-    }
-
-    /* Convert to address mask */
-    plen -= 1;
-
     return (IOMMUTLBEntry) {
         .target_as = as,
-        .iova = addr & ~plen,
-        .translated_addr = xlat & ~plen,
-        .addr_mask = plen,
+        .iova = addr & ~page_mask,
+        .translated_addr = xlat & ~page_mask,
+        .addr_mask = page_mask,
         /* IOTLBs are for DMAs, and DMA only allows on RAMs. */
         .perm = IOMMU_RW,
     };
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v3 3/3] memory: fix off-by-one error in memory_region_notify_one()
  2017-10-10  9:42 [Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb_entry() Maxime Coquelin
  2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 1/3] exec: add page_mask for flatview_do_translate Maxime Coquelin
  2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 2/3] exec: simplify address_space_get_iotlb_entry Maxime Coquelin
@ 2017-10-10  9:42 ` Maxime Coquelin
  2017-10-10 10:13   ` Peter Xu
  2017-10-10 10:25 ` [Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb_entry() Paolo Bonzini
  3 siblings, 1 reply; 6+ messages in thread
From: Maxime Coquelin @ 2017-10-10  9:42 UTC (permalink / raw)
  To: peterx, pbonzini, mst, jasowang, qemu-devel; +Cc: qemu-stable, Maxime Coquelin

This patch fixes an off-by-one error that could lead to the
notifyee to receive notifications for ranges it is not
registered to.

The bug has been spotted by code review.

Fixes: bd2bfa4c52e5 ("memory: introduce memory_region_notify_one()")
Cc: qemu-stable@nongnu.org
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/memory.c b/memory.c
index 5e6351a6c1..b637c12bad 100644
--- a/memory.c
+++ b/memory.c
@@ -1892,7 +1892,7 @@ void memory_region_notify_one(IOMMUNotifier *notifier,
      * Skip the notification if the notification does not overlap
      * with registered range.
      */
-    if (notifier->start > entry->iova + entry->addr_mask + 1 ||
+    if (notifier->start > entry->iova + entry->addr_mask ||
         notifier->end < entry->iova) {
         return;
     }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] memory: fix off-by-one error in memory_region_notify_one()
  2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 3/3] memory: fix off-by-one error in memory_region_notify_one() Maxime Coquelin
@ 2017-10-10 10:13   ` Peter Xu
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Xu @ 2017-10-10 10:13 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: pbonzini, mst, jasowang, qemu-devel, qemu-stable

On Tue, Oct 10, 2017 at 11:42:47AM +0200, Maxime Coquelin wrote:
> This patch fixes an off-by-one error that could lead to the
> notifyee to receive notifications for ranges it is not
> registered to.
> 
> The bug has been spotted by code review.
> 
> Fixes: bd2bfa4c52e5 ("memory: introduce memory_region_notify_one()")
> Cc: qemu-stable@nongnu.org
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks for fixing it!

Reviewed-by: Peter Xu <peterx@redhat.com>

> ---
>  memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/memory.c b/memory.c
> index 5e6351a6c1..b637c12bad 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1892,7 +1892,7 @@ void memory_region_notify_one(IOMMUNotifier *notifier,
>       * Skip the notification if the notification does not overlap
>       * with registered range.
>       */
> -    if (notifier->start > entry->iova + entry->addr_mask + 1 ||
> +    if (notifier->start > entry->iova + entry->addr_mask ||
>          notifier->end < entry->iova) {
>          return;
>      }
> -- 
> 2.13.6
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb_entry()
  2017-10-10  9:42 [Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb_entry() Maxime Coquelin
                   ` (2 preceding siblings ...)
  2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 3/3] memory: fix off-by-one error in memory_region_notify_one() Maxime Coquelin
@ 2017-10-10 10:25 ` Paolo Bonzini
  3 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2017-10-10 10:25 UTC (permalink / raw)
  To: Maxime Coquelin, peterx, mst, jasowang, qemu-devel; +Cc: qemu-stable

On 10/10/2017 11:42, Maxime Coquelin wrote:
> This series is a rebase of the first two patches of Peter's series
> improving address_space_get_iotlb_entry():
> Message-Id: <1496404254-17429-1-git-send-email-peterx@redhat.com>
> 
> This third version sets initial page mask to ~0. In case of multiple iommus
> chained on top of each other, the min page mask og the iommus is selected.
> If no iommu, target's default page size is used (4K on x86_64).
> 
> This new revision also fixes an off-by-one error in memory notifier code,
> spotted during code review, that could lead to the notifyee to receive
> unexpected notifications for ranges it isn't registered to.
> 
> This series does not include Michael's suggestion to replace the use of page
> masks by page length for IOTLB entries, to be able to support non power of
> two page sizes. Idea is that it could be used for para-virtualized IOMMU
> devices, but only para-virtualized device I'm aware of is the upcoming
> virtio-iommu which also uses page masks. Moreover, these fixes are quite
> urgent as they fix a regression which has a big impact on vhost performance.
> 
> As mentioned, the series is actually not only an improvement, but it fixes a
> regression in the way IOTLB updates sent to the backends are generated.
> The regression is introduced by patch:
> a764040cc8 ("exec: abstract address_space_do_translate()")
> 
> Prior to patch a764040cc8, IOTLB entries sent to the backend were aligned on
> the guest page boundaries (both addresses and size).
> For example, with the guest using 2MB pages:
>  * Backend sends IOTLB miss request for iova = 0x112378fb4
>  * QEMU replies with an IOTLB update with iova = 0x112200000, size = 0x200000
>  * Bakend insert above entry in its cache and compute the translation
> In this case, if the backend needs later to translate 0x112378004, it will
> result in a cache it and no need to send another IOTLB miss.
> 
> With patch a764040cc8, the addr of the IOTLB entry is the address requested
> via the IOTLB miss, the size is computed to cover the remaining of the guest
> page.
> The same example gives:
>  * Backend sends IOTLB miss request for iova = 0x112378fb4
>  * QEMU replies with an IOTLB update with iova = 112378fb4, size = 0x8704c
>  * Bakend insert above entry in its cache and compute the translation
> In this case, if the backend needs later to translate 0x112378004, it will
> result in another cache miss:
>  * Backend sends IOTLB miss request for iova = 0x112378004
>  * QEMU replies with an IOTLB update with iova = 0x112378004, size = 0x87FFC
>  * Bakend insert above entry in its cache and compute the translation
> It results in having much more IOTLB misses, and more importantly it pollutes
> the device IOTLB cache by multiplying the number of entries that moreover
> overlap.
> 
> Note that current Kernel & User backends implementation do not merge contiguous
> and overlapping IOTLB entries at device IOTLB cache insertion.
> 
> This series fixes this regression, so that IOTLB updates are aligned on
> guest's page boundaries.
> 
> Changes since v2:
> =================
> - Init page mask to ~0UL, and select the smallest mask in case of multiple
>   iommu chained. If no iommu, use target's page mask. (Paolo)
> - Add patch 3 to fix off-by-one error in notifier.
> 
> Changes since rebase:
> =====================
> - Fix page_mask initial value
> - Apply Michael's on second patch
> 
> Maxime Coquelin (1):
>   memory: fix off-by-one error in memory_region_notify_one()
> 
> Peter Xu (2):
>   exec: add page_mask for flatview_do_translate
>   exec: simplify address_space_get_iotlb_entry
> 
>  exec.c   | 80 +++++++++++++++++++++++++++++++++++++++++++---------------------
>  memory.c |  2 +-
>  2 files changed, 55 insertions(+), 27 deletions(-)
> 

Queued, thanks.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-10-10 10:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-10  9:42 [Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb_entry() Maxime Coquelin
2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 1/3] exec: add page_mask for flatview_do_translate Maxime Coquelin
2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 2/3] exec: simplify address_space_get_iotlb_entry Maxime Coquelin
2017-10-10  9:42 ` [Qemu-devel] [PATCH v3 3/3] memory: fix off-by-one error in memory_region_notify_one() Maxime Coquelin
2017-10-10 10:13   ` Peter Xu
2017-10-10 10:25 ` [Qemu-devel] [PATCH v3 0/3] exec: further refine address_space_get_iotlb_entry() Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).