* [PATCH v3 1/5] util: Add iova_tree_foreach_range_data
2023-06-08 9:52 [PATCH v3 0/5] Optimize UNMAP call and bug fix Zhenzhong Duan
@ 2023-06-08 9:52 ` Zhenzhong Duan
2023-06-08 9:52 ` [PATCH v3 2/5] intel_iommu: Fix a potential issue in VFIO dirty page sync Zhenzhong Duan
` (4 subsequent siblings)
5 siblings, 0 replies; 30+ messages in thread
From: Zhenzhong Duan @ 2023-06-08 9:52 UTC (permalink / raw)
To: qemu-devel
Cc: mst, peterx, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
This function is a variant of iova_tree_foreach and support tranversing
a range to trigger callback with a private data.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/qemu/iova-tree.h | 17 +++++++++++++++--
util/iova-tree.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index 8528e5c98fbc..df3dba79e671 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -39,6 +39,7 @@ typedef struct DMAMap {
IOMMUAccessFlags perm;
} QEMU_PACKED DMAMap;
typedef gboolean (*iova_tree_iterator)(DMAMap *map);
+typedef gboolean (*iova_tree_iterator_2)(DMAMap *map, gpointer *private);
/**
* iova_tree_new:
@@ -131,11 +132,23 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
* @iterator: the interator for the mappings, return true to stop
*
* Iterate over the iova tree.
- *
- * Return: 1 if found any overlap, 0 if not, <0 if error.
*/
void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
+/**
+ * iova_tree_foreach_range_data:
+ *
+ * @tree: the iova tree to iterate on
+ * @range: the iova range to iterate in
+ * @func: the interator for the mappings, return true to stop
+ * @private: parameter passed to @func
+ *
+ * Iterate over an iova range in iova tree.
+ */
+void iova_tree_foreach_range_data(IOVATree *tree, DMAMap *range,
+ iova_tree_iterator_2 func,
+ gpointer *private);
+
/**
* iova_tree_alloc_map:
*
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 536789797e47..a3cbd5198410 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -42,6 +42,12 @@ typedef struct IOVATreeFindIOVAArgs {
const DMAMap *result;
} IOVATreeFindIOVAArgs;
+typedef struct IOVATreeIterator {
+ DMAMap *range;
+ iova_tree_iterator_2 func;
+ gpointer *private;
+} IOVATreeIterator;
+
/**
* Iterate args to the next hole
*
@@ -164,6 +170,31 @@ void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator)
g_tree_foreach(tree->tree, iova_tree_traverse, iterator);
}
+static gboolean iova_tree_traverse_range(gpointer key, gpointer value,
+ gpointer data)
+{
+ DMAMap *map = key;
+ IOVATreeIterator *iterator = data;
+ DMAMap *range = iterator->range;
+
+ g_assert(key == value);
+
+ if (iova_tree_compare(map, range, NULL)) {
+ return false;
+ }
+
+ return iterator->func(map, iterator->private);
+}
+
+void iova_tree_foreach_range_data(IOVATree *tree, DMAMap *range,
+ iova_tree_iterator_2 func,
+ gpointer *private)
+{
+ IOVATreeIterator iterator = {range, func, private};
+
+ g_tree_foreach(tree->tree, iova_tree_traverse_range, &iterator);
+}
+
void iova_tree_remove(IOVATree *tree, DMAMap map)
{
const DMAMap *overlap;
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 2/5] intel_iommu: Fix a potential issue in VFIO dirty page sync
2023-06-08 9:52 [PATCH v3 0/5] Optimize UNMAP call and bug fix Zhenzhong Duan
2023-06-08 9:52 ` [PATCH v3 1/5] util: Add iova_tree_foreach_range_data Zhenzhong Duan
@ 2023-06-08 9:52 ` Zhenzhong Duan
2023-06-08 13:42 ` Peter Xu
2023-06-08 9:52 ` [PATCH v3 3/5] intel_iommu: Fix flag check in replay Zhenzhong Duan
` (3 subsequent siblings)
5 siblings, 1 reply; 30+ messages in thread
From: Zhenzhong Duan @ 2023-06-08 9:52 UTC (permalink / raw)
To: qemu-devel
Cc: mst, peterx, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
Peter Xu found a potential issue:
"The other thing is when I am looking at the new code I found that we
actually extended the replay() to be used also in dirty tracking of vfio,
in vfio_sync_dirty_bitmap(). For that maybe it's already broken if
unmap_all() because afaiu log_sync() can be called in migration thread
anytime during DMA so I think it means the device is prone to DMA with the
IOMMU pgtable quickly erased and rebuilt here, which means the DMA could
fail unexpectedly. Copy Alex, Kirti and Neo."
Fix it by replacing the unmap_all() to only evacuate the iova tree
(keeping all host mappings untouched, IOW, don't notify UNMAP), and
do a full resync in page walk which will notify all existing mappings
as MAP. This way we don't interrupt with any existing mapping if there
is (e.g. for the dirty sync case), meanwhile we keep sync too to latest
(for moving a vfio device into an existing iommu group).
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 94d52f4205d2..34af12f392f5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3825,13 +3825,10 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
IntelIOMMUState *s = vtd_as->iommu_state;
uint8_t bus_n = pci_bus_num(vtd_as->bus);
VTDContextEntry ce;
+ DMAMap map = { .iova = 0, .size = HWADDR_MAX };
- /*
- * The replay can be triggered by either a invalidation or a newly
- * created entry. No matter what, we release existing mappings
- * (it means flushing caches for UNMAP-only registers).
- */
- vtd_address_space_unmap(vtd_as, n);
+ /* replay is protected by BQL, page walk will re-setup it safely */
+ iova_tree_remove(vtd_as->iova_tree, map);
if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
trace_vtd_replay_ce_valid(s->root_scalable ? "scalable mode" :
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/5] intel_iommu: Fix a potential issue in VFIO dirty page sync
2023-06-08 9:52 ` [PATCH v3 2/5] intel_iommu: Fix a potential issue in VFIO dirty page sync Zhenzhong Duan
@ 2023-06-08 13:42 ` Peter Xu
0 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2023-06-08 13:42 UTC (permalink / raw)
To: Zhenzhong Duan
Cc: qemu-devel, mst, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
On Thu, Jun 08, 2023 at 05:52:28PM +0800, Zhenzhong Duan wrote:
> Peter Xu found a potential issue:
>
> "The other thing is when I am looking at the new code I found that we
> actually extended the replay() to be used also in dirty tracking of vfio,
> in vfio_sync_dirty_bitmap(). For that maybe it's already broken if
> unmap_all() because afaiu log_sync() can be called in migration thread
> anytime during DMA so I think it means the device is prone to DMA with the
> IOMMU pgtable quickly erased and rebuilt here, which means the DMA could
> fail unexpectedly. Copy Alex, Kirti and Neo."
>
> Fix it by replacing the unmap_all() to only evacuate the iova tree
> (keeping all host mappings untouched, IOW, don't notify UNMAP), and
> do a full resync in page walk which will notify all existing mappings
> as MAP. This way we don't interrupt with any existing mapping if there
> is (e.g. for the dirty sync case), meanwhile we keep sync too to latest
> (for moving a vfio device into an existing iommu group).
>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 3/5] intel_iommu: Fix flag check in replay
2023-06-08 9:52 [PATCH v3 0/5] Optimize UNMAP call and bug fix Zhenzhong Duan
2023-06-08 9:52 ` [PATCH v3 1/5] util: Add iova_tree_foreach_range_data Zhenzhong Duan
2023-06-08 9:52 ` [PATCH v3 2/5] intel_iommu: Fix a potential issue in VFIO dirty page sync Zhenzhong Duan
@ 2023-06-08 9:52 ` Zhenzhong Duan
2023-06-08 13:43 ` Peter Xu
2023-06-08 9:52 ` [PATCH v3 4/5] intel_iommu: Fix address space unmap Zhenzhong Duan
` (2 subsequent siblings)
5 siblings, 1 reply; 30+ messages in thread
From: Zhenzhong Duan @ 2023-06-08 9:52 UTC (permalink / raw)
To: qemu-devel
Cc: mst, peterx, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
Replay doesn't notify registered notifiers but the one passed
to it. So it's meaningless to check the registered notifier's
synthetic flag.
There is no issue currently as all replay use cases have MAP
flag set, but let's be robust.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 34af12f392f5..f046f8591335 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3837,7 +3837,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
PCI_FUNC(vtd_as->devfn),
vtd_get_domain_id(s, &ce, vtd_as->pasid),
ce.hi, ce.lo);
- if (vtd_as_has_map_notifier(vtd_as)) {
+ if (n->notifier_flags & IOMMU_NOTIFIER_MAP) {
/* This is required only for MAP typed notifiers */
vtd_page_walk_info info = {
.hook_fn = vtd_replay_hook,
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/5] intel_iommu: Fix flag check in replay
2023-06-08 9:52 ` [PATCH v3 3/5] intel_iommu: Fix flag check in replay Zhenzhong Duan
@ 2023-06-08 13:43 ` Peter Xu
0 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2023-06-08 13:43 UTC (permalink / raw)
To: Zhenzhong Duan
Cc: qemu-devel, mst, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
On Thu, Jun 08, 2023 at 05:52:29PM +0800, Zhenzhong Duan wrote:
> Replay doesn't notify registered notifiers but the one passed
> to it. So it's meaningless to check the registered notifier's
> synthetic flag.
>
> There is no issue currently as all replay use cases have MAP
> flag set, but let's be robust.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 4/5] intel_iommu: Fix address space unmap
2023-06-08 9:52 [PATCH v3 0/5] Optimize UNMAP call and bug fix Zhenzhong Duan
` (2 preceding siblings ...)
2023-06-08 9:52 ` [PATCH v3 3/5] intel_iommu: Fix flag check in replay Zhenzhong Duan
@ 2023-06-08 9:52 ` Zhenzhong Duan
2023-06-08 13:48 ` Peter Xu
2023-06-08 9:52 ` [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls Zhenzhong Duan
2023-06-08 15:53 ` [PATCH v3 0/5] Optimize UNMAP call and bug fix Peter Xu
5 siblings, 1 reply; 30+ messages in thread
From: Zhenzhong Duan @ 2023-06-08 9:52 UTC (permalink / raw)
To: qemu-devel
Cc: mst, peterx, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
During address space unmap, corresponding IOVA tree entries are
also removed. But DMAMap is set beyond notifier's scope by 1, so
in theory there is possibility to remove a continuous entry above
the notifier's scope but falling in adjacent notifier's scope.
There is no issue currently as no use cases allocate notifiers
continuously, but let's be robust.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f046f8591335..dcc334060cd6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3791,7 +3791,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
n->start, size);
map.iova = n->start;
- map.size = size;
+ map.size = size - 1; /* Inclusive */
iova_tree_remove(as->iova_tree, map);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/5] intel_iommu: Fix address space unmap
2023-06-08 9:52 ` [PATCH v3 4/5] intel_iommu: Fix address space unmap Zhenzhong Duan
@ 2023-06-08 13:48 ` Peter Xu
2023-06-09 3:31 ` Duan, Zhenzhong
0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2023-06-08 13:48 UTC (permalink / raw)
To: Zhenzhong Duan
Cc: qemu-devel, mst, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
On Thu, Jun 08, 2023 at 05:52:30PM +0800, Zhenzhong Duan wrote:
> During address space unmap, corresponding IOVA tree entries are
> also removed. But DMAMap is set beyond notifier's scope by 1, so
> in theory there is possibility to remove a continuous entry above
> the notifier's scope but falling in adjacent notifier's scope.
This function is only called in "loop over all notifiers" case (or replay()
that just got removed, but even so there'll be only 1 notifier normally
iiuc at least for vt-d), hopefully it means no bug exist (no Fixes needed,
no backport needed either), but still worth fixing it up.
>
> There is no issue currently as no use cases allocate notifiers
> continuously, but let's be robust.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3 4/5] intel_iommu: Fix address space unmap
2023-06-08 13:48 ` Peter Xu
@ 2023-06-09 3:31 ` Duan, Zhenzhong
2023-06-09 13:36 ` Peter Xu
0 siblings, 1 reply; 30+ messages in thread
From: Duan, Zhenzhong @ 2023-06-09 3:31 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
Liu, Yi L, Peng, Chao P
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Thursday, June 8, 2023 9:48 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: qemu-devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
>pbonzini@redhat.com; richard.henderson@linaro.org; eduardo@habkost.net;
>marcel.apfelbaum@gmail.com; alex.williamson@redhat.com;
>clg@redhat.com; david@redhat.com; philmd@linaro.org;
>kwankhede@nvidia.com; cjia@nvidia.com; Liu, Yi L <yi.l.liu@intel.com>; Peng,
>Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 4/5] intel_iommu: Fix address space unmap
>
>On Thu, Jun 08, 2023 at 05:52:30PM +0800, Zhenzhong Duan wrote:
>> During address space unmap, corresponding IOVA tree entries are also
>> removed. But DMAMap is set beyond notifier's scope by 1, so in theory
>> there is possibility to remove a continuous entry above the notifier's
>> scope but falling in adjacent notifier's scope.
>
>This function is only called in "loop over all notifiers" case (or replay() that just
>got removed, but even so there'll be only 1 notifier normally iiuc at least for
>vt-d), hopefully it means no bug exist (no Fixes needed, no backport needed
>either), but still worth fixing it up.
Not two notifiers as vtd-ir splits for vt-d?
Thanks
Zhenzhong
>
>>
>> There is no issue currently as no use cases allocate notifiers
>> continuously, but let's be robust.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>
>Reviewed-by: Peter Xu <peterx@redhat.com>
>
>--
>Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/5] intel_iommu: Fix address space unmap
2023-06-09 3:31 ` Duan, Zhenzhong
@ 2023-06-09 13:36 ` Peter Xu
2023-06-13 2:32 ` Duan, Zhenzhong
0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2023-06-09 13:36 UTC (permalink / raw)
To: Duan, Zhenzhong
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
Liu, Yi L, Peng, Chao P
On Fri, Jun 09, 2023 at 03:31:46AM +0000, Duan, Zhenzhong wrote:
>
>
> >-----Original Message-----
> >From: Peter Xu <peterx@redhat.com>
> >Sent: Thursday, June 8, 2023 9:48 PM
> >To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
> >Cc: qemu-devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
> >pbonzini@redhat.com; richard.henderson@linaro.org; eduardo@habkost.net;
> >marcel.apfelbaum@gmail.com; alex.williamson@redhat.com;
> >clg@redhat.com; david@redhat.com; philmd@linaro.org;
> >kwankhede@nvidia.com; cjia@nvidia.com; Liu, Yi L <yi.l.liu@intel.com>; Peng,
> >Chao P <chao.p.peng@intel.com>
> >Subject: Re: [PATCH v3 4/5] intel_iommu: Fix address space unmap
> >
> >On Thu, Jun 08, 2023 at 05:52:30PM +0800, Zhenzhong Duan wrote:
> >> During address space unmap, corresponding IOVA tree entries are also
> >> removed. But DMAMap is set beyond notifier's scope by 1, so in theory
> >> there is possibility to remove a continuous entry above the notifier's
> >> scope but falling in adjacent notifier's scope.
> >
> >This function is only called in "loop over all notifiers" case (or replay() that just
> >got removed, but even so there'll be only 1 notifier normally iiuc at least for
> >vt-d), hopefully it means no bug exist (no Fixes needed, no backport needed
> >either), but still worth fixing it up.
>
> Not two notifiers as vtd-ir splits for vt-d?
The two notifiers will all be attached to the same IOMMU mr, so
IOMMU_NOTIFIER_FOREACH() will loop over them all always?
And this actually shouldn't matter, IMHO, as the IR split has the
0xfeeXXXXX hole only, so when notifying with end=0xfee00000 (comparing to
end=0xfedfffff) it shouldn't make a difference iiuc because there should
have no iova entry at 0xfee00000 anyway in the tree.
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3 4/5] intel_iommu: Fix address space unmap
2023-06-09 13:36 ` Peter Xu
@ 2023-06-13 2:32 ` Duan, Zhenzhong
0 siblings, 0 replies; 30+ messages in thread
From: Duan, Zhenzhong @ 2023-06-13 2:32 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
Liu, Yi L, Peng, Chao P
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Friday, June 9, 2023 9:37 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: qemu-devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
>pbonzini@redhat.com; richard.henderson@linaro.org; eduardo@habkost.net;
>marcel.apfelbaum@gmail.com; alex.williamson@redhat.com;
>clg@redhat.com; david@redhat.com; philmd@linaro.org;
>kwankhede@nvidia.com; cjia@nvidia.com; Liu, Yi L <yi.l.liu@intel.com>; Peng,
>Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 4/5] intel_iommu: Fix address space unmap
>
>On Fri, Jun 09, 2023 at 03:31:46AM +0000, Duan, Zhenzhong wrote:
>>
>>
>> >-----Original Message-----
>> >From: Peter Xu <peterx@redhat.com>
>> >Sent: Thursday, June 8, 2023 9:48 PM
>> >To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>> >Cc: qemu-devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
>> >pbonzini@redhat.com; richard.henderson@linaro.org;
>> >eduardo@habkost.net; marcel.apfelbaum@gmail.com;
>> >alex.williamson@redhat.com; clg@redhat.com; david@redhat.com;
>> >philmd@linaro.org; kwankhede@nvidia.com; cjia@nvidia.com; Liu, Yi L
>> ><yi.l.liu@intel.com>; Peng, Chao P <chao.p.peng@intel.com>
>> >Subject: Re: [PATCH v3 4/5] intel_iommu: Fix address space unmap
>> >
>> >On Thu, Jun 08, 2023 at 05:52:30PM +0800, Zhenzhong Duan wrote:
>> >> During address space unmap, corresponding IOVA tree entries are
>> >> also removed. But DMAMap is set beyond notifier's scope by 1, so in
>> >> theory there is possibility to remove a continuous entry above the
>> >> notifier's scope but falling in adjacent notifier's scope.
>> >
>> >This function is only called in "loop over all notifiers" case (or
>> >replay() that just got removed, but even so there'll be only 1
>> >notifier normally iiuc at least for vt-d), hopefully it means no bug
>> >exist (no Fixes needed, no backport needed either), but still worth fixing it
>up.
>>
>> Not two notifiers as vtd-ir splits for vt-d?
>
>The two notifiers will all be attached to the same IOMMU mr, so
>IOMMU_NOTIFIER_FOREACH() will loop over them all always?
Yes.
>
>And this actually shouldn't matter, IMHO, as the IR split has the 0xfeeXXXXX
>hole only, so when notifying with end=0xfee00000 (comparing to
>end=0xfedfffff) it shouldn't make a difference iiuc because there should have
>no iova entry at 0xfee00000 anyway in the tree.
Clear.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 9:52 [PATCH v3 0/5] Optimize UNMAP call and bug fix Zhenzhong Duan
` (3 preceding siblings ...)
2023-06-08 9:52 ` [PATCH v3 4/5] intel_iommu: Fix address space unmap Zhenzhong Duan
@ 2023-06-08 9:52 ` Zhenzhong Duan
2023-06-08 14:05 ` Peter Xu
2023-06-08 20:34 ` Peter Xu
2023-06-08 15:53 ` [PATCH v3 0/5] Optimize UNMAP call and bug fix Peter Xu
5 siblings, 2 replies; 30+ messages in thread
From: Zhenzhong Duan @ 2023-06-08 9:52 UTC (permalink / raw)
To: qemu-devel
Cc: mst, peterx, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
Commit 63b88968f1 ("intel-iommu: rework the page walk logic") adds logic
to record mapped IOVA ranges so we only need to send MAP or UNMAP when
necessary. But there is still a corner case of unnecessary UNMAP.
During invalidation, either domain or device selective, we only need to
unmap when there are recorded mapped IOVA ranges, presuming most of OSes
allocating IOVA range continuously, e.g. on x86, linux sets up mapping
from 0xffffffff downwards.
Strace shows UNMAP ioctl taking 0.000014us and we have 28 such ioctl()
in one invalidation, as two notifiers in x86 are split into power of 2
pieces.
ioctl(48, VFIO_IOMMU_UNMAP_DMA, 0x7ffffd5c42f0) = 0 <0.000014>
The other purpose of this patch is to eliminate noisy error log when we
work with IOMMUFD. It looks the duplicate UNMAP call will fail with IOMMUFD
while always succeed with legacy container. This behavior difference leads
to below error log for IOMMUFD:
IOMMU_IOAS_UNMAP failed: No such file or directory
vfio_container_dma_unmap(0x562012d6b6d0, 0x0, 0x80000000) = -2 (No such file or directory)
IOMMU_IOAS_UNMAP failed: No such file or directory
vfio_container_dma_unmap(0x562012d6b6d0, 0x80000000, 0x40000000) = -2 (No such file or directory)
...
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index dcc334060cd6..9e5ba81c89e2 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3743,6 +3743,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
hwaddr start = n->start;
hwaddr end = n->end;
IntelIOMMUState *s = as->iommu_state;
+ IOMMUTLBEvent event;
DMAMap map;
/*
@@ -3762,22 +3763,25 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
assert(start <= end);
size = remain = end - start + 1;
+ event.type = IOMMU_NOTIFIER_UNMAP;
+ event.entry.target_as = &address_space_memory;
+ event.entry.perm = IOMMU_NONE;
+ /* This field is meaningless for unmap */
+ event.entry.translated_addr = 0;
+
while (remain >= VTD_PAGE_SIZE) {
- IOMMUTLBEvent event;
uint64_t mask = dma_aligned_pow2_mask(start, end, s->aw_bits);
uint64_t size = mask + 1;
assert(size);
- event.type = IOMMU_NOTIFIER_UNMAP;
- event.entry.iova = start;
- event.entry.addr_mask = mask;
- event.entry.target_as = &address_space_memory;
- event.entry.perm = IOMMU_NONE;
- /* This field is meaningless for unmap */
- event.entry.translated_addr = 0;
-
- memory_region_notify_iommu_one(n, &event);
+ map.iova = start;
+ map.size = mask;
+ if (iova_tree_find(as->iova_tree, &map)) {
+ event.entry.iova = start;
+ event.entry.addr_mask = mask;
+ memory_region_notify_iommu_one(n, &event);
+ }
start += size;
remain -= size;
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 9:52 ` [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls Zhenzhong Duan
@ 2023-06-08 14:05 ` Peter Xu
2023-06-08 14:11 ` Jason Gunthorpe
2023-06-09 3:41 ` Duan, Zhenzhong
2023-06-08 20:34 ` Peter Xu
1 sibling, 2 replies; 30+ messages in thread
From: Peter Xu @ 2023-06-08 14:05 UTC (permalink / raw)
To: Zhenzhong Duan, Jason Gunthorpe
Cc: qemu-devel, mst, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
On Thu, Jun 08, 2023 at 05:52:31PM +0800, Zhenzhong Duan wrote:
> Commit 63b88968f1 ("intel-iommu: rework the page walk logic") adds logic
> to record mapped IOVA ranges so we only need to send MAP or UNMAP when
> necessary. But there is still a corner case of unnecessary UNMAP.
>
> During invalidation, either domain or device selective, we only need to
> unmap when there are recorded mapped IOVA ranges, presuming most of OSes
> allocating IOVA range continuously, e.g. on x86, linux sets up mapping
> from 0xffffffff downwards.
>
> Strace shows UNMAP ioctl taking 0.000014us and we have 28 such ioctl()
> in one invalidation, as two notifiers in x86 are split into power of 2
> pieces.
>
> ioctl(48, VFIO_IOMMU_UNMAP_DMA, 0x7ffffd5c42f0) = 0 <0.000014>
Thanks for the numbers, but for a fair comparison IMHO it needs to be a
comparison of before/after on the whole time used for unmap AS. It'll be
great to have finer granule measurements like each ioctl, but the total
time used should be more important (especially to contain "after"). Side
note: I don't think the UNMAP ioctl will take the same time; it should
matter on whether there's mapping exist).
Actually it's hard to tell because this also depends on what's in the iova
tree.. but still at least we know how it works in some cases.
>
> The other purpose of this patch is to eliminate noisy error log when we
> work with IOMMUFD. It looks the duplicate UNMAP call will fail with IOMMUFD
> while always succeed with legacy container. This behavior difference leads
> to below error log for IOMMUFD:
>
> IOMMU_IOAS_UNMAP failed: No such file or directory
> vfio_container_dma_unmap(0x562012d6b6d0, 0x0, 0x80000000) = -2 (No such file or directory)
> IOMMU_IOAS_UNMAP failed: No such file or directory
> vfio_container_dma_unmap(0x562012d6b6d0, 0x80000000, 0x40000000) = -2 (No such file or directory)
> ...
My gut feeling is the major motivation is actually this (not the perf).
tens of some 14us ioctls is really nothing on a rare event..
Jason Wang raised a question in previous version and I think JasonG's reply
is here:
https://lore.kernel.org/r/ZHTaQXd3ZybmhCLb@nvidia.com
JasonG: sorry I know zero on iommufd api yet, but you said:
The VFIO emulation functions should do whatever VFIO does, is there
a mistake there?
IIUC what VFIO does here is it returns succeed if unmap over nothing rather
than failing like iommufd. Curious (like JasonW) on why that retval? I'd
assume for returning "how much unmapped" we can at least still return 0 for
nothing.
Are you probably suggesting that we can probably handle that in QEMU side
on -ENOENT here for iommufd only (a question to Yi?).
If that's already a kernel abi, not sure whether it's even discussable, but
just to raise this up.
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 14:05 ` Peter Xu
@ 2023-06-08 14:11 ` Jason Gunthorpe
2023-06-08 15:40 ` Peter Xu
2023-06-09 3:41 ` Duan, Zhenzhong
1 sibling, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2023-06-08 14:11 UTC (permalink / raw)
To: Peter Xu
Cc: Zhenzhong Duan, qemu-devel, mst, jasowang, pbonzini,
richard.henderson, eduardo, marcel.apfelbaum, alex.williamson,
clg, david, philmd, kwankhede, cjia, yi.l.liu, chao.p.peng
On Thu, Jun 08, 2023 at 10:05:08AM -0400, Peter Xu wrote:
> IIUC what VFIO does here is it returns succeed if unmap over nothing rather
> than failing like iommufd. Curious (like JasonW) on why that retval? I'd
> assume for returning "how much unmapped" we can at least still return 0 for
> nothing.
In iommufd maps are objects, you can only map or unmap entire
objects. The ability to batch unmap objects by specifying an range
that spans many is something that was easy to do and that VFIO had,
but I'm not sure it is actually usefull..
So asking to unmap an object that is already known not to be mapped is
actually possibly racy, especially if you consider iommufd's support
for kernel-side IOVA allocation. It should not be done, or if it is
done, with user space locking to protect it.
For VFIO, long long ago, VFIO could unmap IOVA page at a time - ie it
wasn't objects. In this world it made some sense that the unmap would
'succeed' as the end result was unmapped.
> Are you probably suggesting that we can probably handle that in QEMU side
> on -ENOENT here for iommufd only (a question to Yi?).
Yes, this can be done, ENOENT is reliably returned and qemu doesn't
use the kernel-side IOVA allocator.
But if there is the proper locks to prevent a map/unmap race, then
there should also be the proper locks to check that there is no map in
the first place and avoid the kernel call..
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 14:11 ` Jason Gunthorpe
@ 2023-06-08 15:40 ` Peter Xu
2023-06-08 16:27 ` Jason Gunthorpe
2023-06-09 4:03 ` Duan, Zhenzhong
0 siblings, 2 replies; 30+ messages in thread
From: Peter Xu @ 2023-06-08 15:40 UTC (permalink / raw)
To: Jason Gunthorpe, Yi Liu, Zhenzhong Duan
Cc: Zhenzhong Duan, qemu-devel, mst, jasowang, pbonzini,
richard.henderson, eduardo, marcel.apfelbaum, alex.williamson,
clg, david, philmd, kwankhede, cjia, yi.l.liu, chao.p.peng
On Thu, Jun 08, 2023 at 11:11:15AM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 08, 2023 at 10:05:08AM -0400, Peter Xu wrote:
>
> > IIUC what VFIO does here is it returns succeed if unmap over nothing rather
> > than failing like iommufd. Curious (like JasonW) on why that retval? I'd
> > assume for returning "how much unmapped" we can at least still return 0 for
> > nothing.
>
> In iommufd maps are objects, you can only map or unmap entire
> objects. The ability to batch unmap objects by specifying an range
> that spans many is something that was easy to do and that VFIO had,
> but I'm not sure it is actually usefull..
>
> So asking to unmap an object that is already known not to be mapped is
> actually possibly racy, especially if you consider iommufd's support
> for kernel-side IOVA allocation. It should not be done, or if it is
> done, with user space locking to protect it.
>
> For VFIO, long long ago, VFIO could unmap IOVA page at a time - ie it
> wasn't objects. In this world it made some sense that the unmap would
> 'succeed' as the end result was unmapped.
>
> > Are you probably suggesting that we can probably handle that in QEMU side
> > on -ENOENT here for iommufd only (a question to Yi?).
>
> Yes, this can be done, ENOENT is reliably returned and qemu doesn't
> use the kernel-side IOVA allocator.
>
> But if there is the proper locks to prevent a map/unmap race, then
> there should also be the proper locks to check that there is no map in
> the first place and avoid the kernel call..
The problem is IIRC guest iommu driver can do smart things like batching
invalidations, it means when QEMU gets it from the guest OS it may already
not matching one mapped objects.
We can definitely lookup every single object and explicitly unmap, but it
loses partial of the point of batching that guest OS does. Logically QEMU
can redirect that batched invalidation into one ioctl() to the host, rather
than a lot of smaller ones.
While for this specific patch - Zhenzhong/Yi, do you agree that we should
just handle -ENOENT in the iommufd series (I assume it's still under work),
then for this specific patch it's only about performance difference?
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 15:40 ` Peter Xu
@ 2023-06-08 16:27 ` Jason Gunthorpe
2023-06-08 19:53 ` Peter Xu
2023-06-09 4:03 ` Duan, Zhenzhong
1 sibling, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2023-06-08 16:27 UTC (permalink / raw)
To: Peter Xu
Cc: Yi Liu, Zhenzhong Duan, qemu-devel, mst, jasowang, pbonzini,
richard.henderson, eduardo, marcel.apfelbaum, alex.williamson,
clg, david, philmd, kwankhede, cjia, chao.p.peng
On Thu, Jun 08, 2023 at 11:40:55AM -0400, Peter Xu wrote:
> > But if there is the proper locks to prevent a map/unmap race, then
> > there should also be the proper locks to check that there is no map in
> > the first place and avoid the kernel call..
>
> The problem is IIRC guest iommu driver can do smart things like batching
> invalidations, it means when QEMU gets it from the guest OS it may already
> not matching one mapped objects.
qemu has to fix it. The kernel API is object based, not paged
based. You cannot unmap partions of a prior mapping.
I assume for this kind of emulation it is doing 4k objects because
it has no idea what size of mapping the client will use?
> We can definitely lookup every single object and explicitly unmap, but it
> loses partial of the point of batching that guest OS does.
You don't need every single object, but it would be faster to check
where things are mapped and then call the kernel correctly instead of
trying to iterate with the unmapped reults.
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 16:27 ` Jason Gunthorpe
@ 2023-06-08 19:53 ` Peter Xu
2023-06-09 1:00 ` Jason Gunthorpe
2023-06-09 5:49 ` Duan, Zhenzhong
0 siblings, 2 replies; 30+ messages in thread
From: Peter Xu @ 2023-06-08 19:53 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Yi Liu, Zhenzhong Duan, qemu-devel, mst, jasowang, pbonzini,
richard.henderson, eduardo, marcel.apfelbaum, alex.williamson,
clg, david, philmd, kwankhede, cjia, chao.p.peng
On Thu, Jun 08, 2023 at 01:27:50PM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 08, 2023 at 11:40:55AM -0400, Peter Xu wrote:
>
> > > But if there is the proper locks to prevent a map/unmap race, then
> > > there should also be the proper locks to check that there is no map in
> > > the first place and avoid the kernel call..
> >
> > The problem is IIRC guest iommu driver can do smart things like batching
> > invalidations, it means when QEMU gets it from the guest OS it may already
> > not matching one mapped objects.
>
> qemu has to fix it. The kernel API is object based, not paged
> based. You cannot unmap partions of a prior mapping.
>
> I assume for this kind of emulation it is doing 4k objects because
> it has no idea what size of mapping the client will use?
MAP is fine, before notify() to VFIO or anything, qemu scans the pgtable
and handles it in page size or huge page size, so it can be >4K but always
guest iommu pgsize aligned.
I think we rely on guest behaving right, so it should also always operate
on that size minimum when mapped huge. It shouldn't violate the
"per-object" protocol of iommufd.
IIUC the same to vfio type1v2 from that aspect.
It's more about UNMAP batching, but I assume iommufd is fine if it's fine
with holes inside for that case. The only difference of "not exist" of
-ENOENT seems to be just same as before as long as QEMU treats it as 0 like
before.
Though that does look slightly special, because the whole empty UNMAP
region can be seen as a hole too; not sure when that -ENOENT will be useful
if qemu should always bypass it anyway. Indeed not a problem to qemu.
>
> > We can definitely lookup every single object and explicitly unmap, but it
> > loses partial of the point of batching that guest OS does.
>
> You don't need every single object, but it would be faster to check
> where things are mapped and then call the kernel correctly instead of
> trying to iterate with the unmapped reults.
Maybe yes. If so, It'll be great if Zhenzhong could just attach some proof
on that, meanwhile drop the "iommufd UNMAP warnings" section in the commit
message.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 19:53 ` Peter Xu
@ 2023-06-09 1:00 ` Jason Gunthorpe
2023-06-09 5:49 ` Duan, Zhenzhong
1 sibling, 0 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2023-06-09 1:00 UTC (permalink / raw)
To: Peter Xu
Cc: Yi Liu, Zhenzhong Duan, qemu-devel, mst, jasowang, pbonzini,
richard.henderson, eduardo, marcel.apfelbaum, alex.williamson,
clg, david, philmd, kwankhede, cjia, chao.p.peng
On Thu, Jun 08, 2023 at 03:53:23PM -0400, Peter Xu wrote:
> Though that does look slightly special, because the whole empty UNMAP
> region can be seen as a hole too; not sure when that -ENOENT will be useful
> if qemu should always bypass it anyway. Indeed not a problem to qemu.
It sounds like it might be good to have a flag to unmap the whole
range regardless of contiguity
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 19:53 ` Peter Xu
2023-06-09 1:00 ` Jason Gunthorpe
@ 2023-06-09 5:49 ` Duan, Zhenzhong
2023-06-09 21:26 ` Peter Xu
1 sibling, 1 reply; 30+ messages in thread
From: Duan, Zhenzhong @ 2023-06-09 5:49 UTC (permalink / raw)
To: Peter Xu, Jason Gunthorpe
Cc: Liu, Yi L, qemu-devel@nongnu.org, mst@redhat.com,
jasowang@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org, eduardo@habkost.net,
marcel.apfelbaum@gmail.com, alex.williamson@redhat.com,
clg@redhat.com, david@redhat.com, philmd@linaro.org,
kwankhede@nvidia.com, cjia@nvidia.com, Peng, Chao P
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Friday, June 9, 2023 3:53 AM
>To: Jason Gunthorpe <jgg@nvidia.com>
>Cc: Liu, Yi L <yi.l.liu@intel.com>; Duan, Zhenzhong
><zhenzhong.duan@intel.com>; qemu-devel@nongnu.org; mst@redhat.com;
>jasowang@redhat.com; pbonzini@redhat.com;
>richard.henderson@linaro.org; eduardo@habkost.net;
>marcel.apfelbaum@gmail.com; alex.williamson@redhat.com;
>clg@redhat.com; david@redhat.com; philmd@linaro.org;
>kwankhede@nvidia.com; cjia@nvidia.com; Peng, Chao P
><chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary
>UNMAP calls
>
>On Thu, Jun 08, 2023 at 01:27:50PM -0300, Jason Gunthorpe wrote:
>> On Thu, Jun 08, 2023 at 11:40:55AM -0400, Peter Xu wrote:
>>
>> > > But if there is the proper locks to prevent a map/unmap race, then
>> > > there should also be the proper locks to check that there is no map in
>> > > the first place and avoid the kernel call..
>> >
>> > The problem is IIRC guest iommu driver can do smart things like batching
>> > invalidations, it means when QEMU gets it from the guest OS it may
>already
>> > not matching one mapped objects.
>>
>> qemu has to fix it. The kernel API is object based, not paged
>> based. You cannot unmap partions of a prior mapping.
>>
>> I assume for this kind of emulation it is doing 4k objects because
>> it has no idea what size of mapping the client will use?
>
>MAP is fine, before notify() to VFIO or anything, qemu scans the pgtable
>and handles it in page size or huge page size, so it can be >4K but always
>guest iommu pgsize aligned.
>
>I think we rely on guest behaving right, so it should also always operate
>on that size minimum when mapped huge. It shouldn't violate the
>"per-object" protocol of iommufd.
>
>IIUC the same to vfio type1v2 from that aspect.
>
>It's more about UNMAP batching, but I assume iommufd is fine if it's fine
>with holes inside for that case. The only difference of "not exist" of
>-ENOENT seems to be just same as before as long as QEMU treats it as 0 like
>before.
>
>Though that does look slightly special, because the whole empty UNMAP
>region can be seen as a hole too; not sure when that -ENOENT will be useful
>if qemu should always bypass it anyway. Indeed not a problem to qemu.
>
>>
>> > We can definitely lookup every single object and explicitly unmap, but it
>> > loses partial of the point of batching that guest OS does.
>>
>> You don't need every single object, but it would be faster to check
>> where things are mapped and then call the kernel correctly instead of
>> trying to iterate with the unmapped reults.
>
>Maybe yes. If so, It'll be great if Zhenzhong could just attach some proof
>on that, meanwhile drop the "iommufd UNMAP warnings" section in the commit
>message.
Seems vtd_page_walk_one() already works in above way, checking mapping
changes and calling kernel for changed entries?
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-09 5:49 ` Duan, Zhenzhong
@ 2023-06-09 21:26 ` Peter Xu
2023-06-13 2:37 ` Duan, Zhenzhong
2023-06-14 9:47 ` Duan, Zhenzhong
0 siblings, 2 replies; 30+ messages in thread
From: Peter Xu @ 2023-06-09 21:26 UTC (permalink / raw)
To: Duan, Zhenzhong
Cc: Jason Gunthorpe, Liu, Yi L, qemu-devel@nongnu.org, mst@redhat.com,
jasowang@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org, eduardo@habkost.net,
marcel.apfelbaum@gmail.com, alex.williamson@redhat.com,
clg@redhat.com, david@redhat.com, philmd@linaro.org,
kwankhede@nvidia.com, cjia@nvidia.com, Peng, Chao P
On Fri, Jun 09, 2023 at 05:49:06AM +0000, Duan, Zhenzhong wrote:
> Seems vtd_page_walk_one() already works in above way, checking mapping
> changes and calling kernel for changed entries?
Agreed in most cases, but the path this patch modified is not? E.g. it
happens in rare cases where we simply want to unmap everything (e.g. on a
system reset, or invalid context entry)?
That's also why I'm curious whether perf of this path matters at all (and
assuming now we all agree that's the only goal now..), because afaiu it
didn't really trigger in common paths.
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-09 21:26 ` Peter Xu
@ 2023-06-13 2:37 ` Duan, Zhenzhong
2023-06-14 9:47 ` Duan, Zhenzhong
1 sibling, 0 replies; 30+ messages in thread
From: Duan, Zhenzhong @ 2023-06-13 2:37 UTC (permalink / raw)
To: Peter Xu
Cc: Jason Gunthorpe, Liu, Yi L, qemu-devel@nongnu.org, mst@redhat.com,
jasowang@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org, eduardo@habkost.net,
marcel.apfelbaum@gmail.com, alex.williamson@redhat.com,
clg@redhat.com, david@redhat.com, philmd@linaro.org,
kwankhede@nvidia.com, cjia@nvidia.com, Peng, Chao P
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Saturday, June 10, 2023 5:26 AM
>Subject: Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary
>UNMAP calls
>
>On Fri, Jun 09, 2023 at 05:49:06AM +0000, Duan, Zhenzhong wrote:
>> Seems vtd_page_walk_one() already works in above way, checking mapping
>> changes and calling kernel for changed entries?
>
>Agreed in most cases, but the path this patch modified is not? E.g. it happens
>in rare cases where we simply want to unmap everything (e.g. on a system
>reset, or invalid context entry)?
Clear.
>
>That's also why I'm curious whether perf of this path matters at all (and
>assuming now we all agree that's the only goal now..), because afaiu it didn't
>really trigger in common paths.
I'll collect performance data and reply back.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-09 21:26 ` Peter Xu
2023-06-13 2:37 ` Duan, Zhenzhong
@ 2023-06-14 9:47 ` Duan, Zhenzhong
1 sibling, 0 replies; 30+ messages in thread
From: Duan, Zhenzhong @ 2023-06-14 9:47 UTC (permalink / raw)
To: Peter Xu
Cc: Jason Gunthorpe, Liu, Yi L, qemu-devel@nongnu.org, mst@redhat.com,
jasowang@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org, eduardo@habkost.net,
marcel.apfelbaum@gmail.com, alex.williamson@redhat.com,
clg@redhat.com, david@redhat.com, philmd@linaro.org,
kwankhede@nvidia.com, cjia@nvidia.com, Peng, Chao P
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Saturday, June 10, 2023 5:26 AM
>Subject: Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary
>UNMAP calls
>
>On Fri, Jun 09, 2023 at 05:49:06AM +0000, Duan, Zhenzhong wrote:
>> Seems vtd_page_walk_one() already works in above way, checking mapping
>> changes and calling kernel for changed entries?
>
>Agreed in most cases, but the path this patch modified is not? E.g. it happens
>in rare cases where we simply want to unmap everything (e.g. on a system
>reset, or invalid context entry)?
>
>That's also why I'm curious whether perf of this path matters at all (and
>assuming now we all agree that's the only goal now..), because afaiu it didn't
>really trigger in common paths.
I used below changes to collect time spent with iommufd backend during system reset.
Enable macro TEST_UNMAP to test unmap iova tree entries one by one.
Disable TEST_UNMAP to use unmap_ALL
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3736,16 +3736,44 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
return vtd_dev_as;
}
+static gboolean iova_tree_iterator1(DMAMap *map)
+{
+ static int cnt;
+ printf("**********dump iova tree %d: iova %lx, size %lx\n", ++cnt, map->iova, map->size);
+ return false;
+}
+
+//#define TEST_UNMAP
+#ifdef TEST_UNMAP
+static gboolean vtd_unmap_single(DMAMap *map, gpointer *private)
+{
+ IOMMUTLBEvent event;
+
+ event.type = IOMMU_NOTIFIER_UNMAP;
+ event.entry.iova = map->iova;
+ event.entry.addr_mask = map->size;
+ event.entry.target_as = &address_space_memory;
+ event.entry.perm = IOMMU_NONE;
+ event.entry.translated_addr = 0;
+
+ memory_region_notify_iommu_one((IOMMUNotifier *)private, &event);
+ return false;
+}
+#endif
+
/* Unmap the whole range in the notifier's scope. */
static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
{
+ int64_t start_tv, delta_tv;
hwaddr size, remain;
hwaddr start = n->start;
hwaddr end = n->end;
IntelIOMMUState *s = as->iommu_state;
- IOMMUTLBEvent event;
DMAMap map;
+ iova_tree_foreach(as->iova_tree, iova_tree_iterator1);
+
+ start_tv = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
/*
* Note: all the codes in this function has a assumption that IOVA
* bits are no more than VTD_MGAW bits (which is restricted by
@@ -3763,6 +3791,13 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
assert(start <= end);
size = remain = end - start + 1;
+#ifdef TEST_UNMAP
+ map.iova = n->start;
+ map.size = size - 1;
+ iova_tree_foreach_range_data(as->iova_tree, &map, vtd_unmap_single,
+ (gpointer *)n);
+#else
+ IOMMUTLBEvent event;
event.type = IOMMU_NOTIFIER_UNMAP;
event.entry.target_as = &address_space_memory;
event.entry.perm = IOMMU_NONE;
@@ -3788,6 +3823,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
}
assert(!remain);
+#endif
trace_vtd_as_unmap_whole(pci_bus_num(as->bus),
VTD_PCI_SLOT(as->devfn),
@@ -3797,6 +3833,9 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
map.iova = n->start;
map.size = size - 1; /* Inclusive */
iova_tree_remove(as->iova_tree, map);
+
+ delta_tv = qemu_clock_get_us(QEMU_CLOCK_REALTIME) - start_tv;
+ printf("************ delta_tv %lu us **************\n", delta_tv);
}
RESULT:
[ 14.825015] reboot: Power down
**********dump iova tree 1: iova fffbe000, size fff
...
**********dump iova tree 66: iova fffff000, size fff
...
With TEST_UNMAP:
************ delta_tv 393 us **************
************ delta_tv 0 us **************
Without TEST_UNMAP:
************ delta_tv 364 us **************
************ delta_tv 2 us **************
It looks no explicit difference, unmap_ALL is a little better.
I also tried legacy container, result is similar as above:
With TEST_UNMAP:
************ delta_tv 325 us **************
************ delta_tv 0 us **************
Without TEST_UNMAP:
************ delta_tv 317 us **************
************ delta_tv 1 us **************
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 15:40 ` Peter Xu
2023-06-08 16:27 ` Jason Gunthorpe
@ 2023-06-09 4:03 ` Duan, Zhenzhong
1 sibling, 0 replies; 30+ messages in thread
From: Duan, Zhenzhong @ 2023-06-09 4:03 UTC (permalink / raw)
To: Peter Xu, Jason Gunthorpe, Liu, Yi L
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
Liu, Yi L, Peng, Chao P
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Thursday, June 8, 2023 11:41 PM
>To: Jason Gunthorpe <jgg@nvidia.com>; Liu, Yi L <yi.l.liu@intel.com>; Duan,
>Zhenzhong <zhenzhong.duan@intel.com>
>Cc: Duan, Zhenzhong <zhenzhong.duan@intel.com>; qemu-
>devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
>pbonzini@redhat.com; richard.henderson@linaro.org; eduardo@habkost.net;
>marcel.apfelbaum@gmail.com; alex.williamson@redhat.com;
>clg@redhat.com; david@redhat.com; philmd@linaro.org;
>kwankhede@nvidia.com; cjia@nvidia.com; Liu, Yi L <yi.l.liu@intel.com>; Peng,
>Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary
>UNMAP calls
>
>On Thu, Jun 08, 2023 at 11:11:15AM -0300, Jason Gunthorpe wrote:
>> On Thu, Jun 08, 2023 at 10:05:08AM -0400, Peter Xu wrote:
>>
>> > IIUC what VFIO does here is it returns succeed if unmap over nothing
>rather
>> > than failing like iommufd. Curious (like JasonW) on why that retval? I'd
>> > assume for returning "how much unmapped" we can at least still return 0
>for
>> > nothing.
>>
>> In iommufd maps are objects, you can only map or unmap entire
>> objects. The ability to batch unmap objects by specifying an range
>> that spans many is something that was easy to do and that VFIO had,
>> but I'm not sure it is actually usefull..
>>
>> So asking to unmap an object that is already known not to be mapped is
>> actually possibly racy, especially if you consider iommufd's support
>> for kernel-side IOVA allocation. It should not be done, or if it is
>> done, with user space locking to protect it.
>>
>> For VFIO, long long ago, VFIO could unmap IOVA page at a time - ie it
>> wasn't objects. In this world it made some sense that the unmap would
>> 'succeed' as the end result was unmapped.
>>
>> > Are you probably suggesting that we can probably handle that in QEMU
>side
>> > on -ENOENT here for iommufd only (a question to Yi?).
>>
>> Yes, this can be done, ENOENT is reliably returned and qemu doesn't
>> use the kernel-side IOVA allocator.
>>
>> But if there is the proper locks to prevent a map/unmap race, then
>> there should also be the proper locks to check that there is no map in
>> the first place and avoid the kernel call..
>
>The problem is IIRC guest iommu driver can do smart things like batching
>invalidations, it means when QEMU gets it from the guest OS it may already
>not matching one mapped objects.
>
>We can definitely lookup every single object and explicitly unmap, but it
>loses partial of the point of batching that guest OS does. Logically QEMU
>can redirect that batched invalidation into one ioctl() to the host, rather
>than a lot of smaller ones.
>
>While for this specific patch - Zhenzhong/Yi, do you agree that we should
>just handle -ENOENT in the iommufd series (I assume it's still under work),
>then for this specific patch it's only about performance difference?
Yes, that make sense.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 14:05 ` Peter Xu
2023-06-08 14:11 ` Jason Gunthorpe
@ 2023-06-09 3:41 ` Duan, Zhenzhong
1 sibling, 0 replies; 30+ messages in thread
From: Duan, Zhenzhong @ 2023-06-09 3:41 UTC (permalink / raw)
To: Peter Xu, Jason Gunthorpe
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
Liu, Yi L, Peng, Chao P
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Thursday, June 8, 2023 10:05 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>; Jason Gunthorpe
><jgg@nvidia.com>
>Cc: qemu-devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
>pbonzini@redhat.com; richard.henderson@linaro.org; eduardo@habkost.net;
>marcel.apfelbaum@gmail.com; alex.williamson@redhat.com;
>clg@redhat.com; david@redhat.com; philmd@linaro.org;
>kwankhede@nvidia.com; cjia@nvidia.com; Liu, Yi L <yi.l.liu@intel.com>; Peng,
>Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary
>UNMAP calls
>
>On Thu, Jun 08, 2023 at 05:52:31PM +0800, Zhenzhong Duan wrote:
>> Commit 63b88968f1 ("intel-iommu: rework the page walk logic") adds
>> logic to record mapped IOVA ranges so we only need to send MAP or
>> UNMAP when necessary. But there is still a corner case of unnecessary
>UNMAP.
>>
>> During invalidation, either domain or device selective, we only need
>> to unmap when there are recorded mapped IOVA ranges, presuming most
>of
>> OSes allocating IOVA range continuously, e.g. on x86, linux sets up
>> mapping from 0xffffffff downwards.
>>
>> Strace shows UNMAP ioctl taking 0.000014us and we have 28 such ioctl()
>> in one invalidation, as two notifiers in x86 are split into power of 2
>> pieces.
>>
>> ioctl(48, VFIO_IOMMU_UNMAP_DMA, 0x7ffffd5c42f0) = 0 <0.000014>
>
>Thanks for the numbers, but for a fair comparison IMHO it needs to be a
>comparison of before/after on the whole time used for unmap AS. It'll be
>great to have finer granule measurements like each ioctl, but the total time
>used should be more important (especially to contain "after"). Side
>note: I don't think the UNMAP ioctl will take the same time; it should matter
>on whether there's mapping exist).
Yes, but what we want to optimize out is the unmapping no-existent range case.
Will show the time diff spent in unmap AS.
>
>Actually it's hard to tell because this also depends on what's in the iova tree..
>but still at least we know how it works in some cases.
>
>>
>> The other purpose of this patch is to eliminate noisy error log when
>> we work with IOMMUFD. It looks the duplicate UNMAP call will fail with
>> IOMMUFD while always succeed with legacy container. This behavior
>> difference leads to below error log for IOMMUFD:
>>
>> IOMMU_IOAS_UNMAP failed: No such file or directory
>> vfio_container_dma_unmap(0x562012d6b6d0, 0x0, 0x80000000) = -2 (No
>> such file or directory) IOMMU_IOAS_UNMAP failed: No such file or
>> directory vfio_container_dma_unmap(0x562012d6b6d0, 0x80000000,
>> 0x40000000) = -2 (No such file or directory) ...
>
>My gut feeling is the major motivation is actually this (not the perf).
>tens of some 14us ioctls is really nothing on a rare event..
To be honest, yes.
Thanks
Zhenzhong
>
>Jason Wang raised a question in previous version and I think JasonG's reply is
>here:
>
>https://lore.kernel.org/r/ZHTaQXd3ZybmhCLb@nvidia.com
>
>JasonG: sorry I know zero on iommufd api yet, but you said:
>
> The VFIO emulation functions should do whatever VFIO does, is there
> a mistake there?
>
>IIUC what VFIO does here is it returns succeed if unmap over nothing rather
>than failing like iommufd. Curious (like JasonW) on why that retval? I'd
>assume for returning "how much unmapped" we can at least still return 0 for
>nothing.
>
>Are you probably suggesting that we can probably handle that in QEMU side
>on -ENOENT here for iommufd only (a question to Yi?).
>
>If that's already a kernel abi, not sure whether it's even discussable, but just to
>raise this up.
>
>--
>Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 9:52 ` [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls Zhenzhong Duan
2023-06-08 14:05 ` Peter Xu
@ 2023-06-08 20:34 ` Peter Xu
2023-06-09 4:01 ` Duan, Zhenzhong
1 sibling, 1 reply; 30+ messages in thread
From: Peter Xu @ 2023-06-08 20:34 UTC (permalink / raw)
To: Zhenzhong Duan
Cc: qemu-devel, mst, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
On Thu, Jun 08, 2023 at 05:52:31PM +0800, Zhenzhong Duan wrote:
> while (remain >= VTD_PAGE_SIZE) {
> - IOMMUTLBEvent event;
> uint64_t mask = dma_aligned_pow2_mask(start, end, s->aw_bits);
> uint64_t size = mask + 1;
>
> assert(size);
>
> - event.type = IOMMU_NOTIFIER_UNMAP;
> - event.entry.iova = start;
> - event.entry.addr_mask = mask;
> - event.entry.target_as = &address_space_memory;
> - event.entry.perm = IOMMU_NONE;
> - /* This field is meaningless for unmap */
> - event.entry.translated_addr = 0;
> -
> - memory_region_notify_iommu_one(n, &event);
> + map.iova = start;
> + map.size = mask;
> + if (iova_tree_find(as->iova_tree, &map)) {
> + event.entry.iova = start;
> + event.entry.addr_mask = mask;
> + memory_region_notify_iommu_one(n, &event);
> + }
Ah one more thing: I think this path can also be triggered by notifiers
without MAP event registered, whose iova tree will always be empty. So we
may only do this for MAP, then I'm not sure whether it'll be worthwhile..
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-08 20:34 ` Peter Xu
@ 2023-06-09 4:01 ` Duan, Zhenzhong
2023-06-14 9:38 ` Duan, Zhenzhong
0 siblings, 1 reply; 30+ messages in thread
From: Duan, Zhenzhong @ 2023-06-09 4:01 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
Liu, Yi L, Peng, Chao P
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Friday, June 9, 2023 4:34 AM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: qemu-devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
>pbonzini@redhat.com; richard.henderson@linaro.org; eduardo@habkost.net;
>marcel.apfelbaum@gmail.com; alex.williamson@redhat.com;
>clg@redhat.com; david@redhat.com; philmd@linaro.org;
>kwankhede@nvidia.com; cjia@nvidia.com; Liu, Yi L <yi.l.liu@intel.com>; Peng,
>Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary
>UNMAP calls
>
>On Thu, Jun 08, 2023 at 05:52:31PM +0800, Zhenzhong Duan wrote:
>> while (remain >= VTD_PAGE_SIZE) {
>> - IOMMUTLBEvent event;
>> uint64_t mask = dma_aligned_pow2_mask(start, end, s->aw_bits);
>> uint64_t size = mask + 1;
>>
>> assert(size);
>>
>> - event.type = IOMMU_NOTIFIER_UNMAP;
>> - event.entry.iova = start;
>> - event.entry.addr_mask = mask;
>> - event.entry.target_as = &address_space_memory;
>> - event.entry.perm = IOMMU_NONE;
>> - /* This field is meaningless for unmap */
>> - event.entry.translated_addr = 0;
>> -
>> - memory_region_notify_iommu_one(n, &event);
>> + map.iova = start;
>> + map.size = mask;
>> + if (iova_tree_find(as->iova_tree, &map)) {
>> + event.entry.iova = start;
>> + event.entry.addr_mask = mask;
>> + memory_region_notify_iommu_one(n, &event);
>> + }
>
>Ah one more thing: I think this path can also be triggered by notifiers without
>MAP event registered, whose iova tree will always be empty. So we may only
>do this for MAP, then I'm not sure whether it'll be worthwhile..
Hmm, yes, my change will lead to vhost missed to receive some invalidation request in device-tlb disabled case as iova tree is empty. Thanks for point out.
Let me collect time diff spent in unmap AS for you to decide if it still worth or not.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-09 4:01 ` Duan, Zhenzhong
@ 2023-06-14 9:38 ` Duan, Zhenzhong
2023-06-14 12:51 ` Peter Xu
0 siblings, 1 reply; 30+ messages in thread
From: Duan, Zhenzhong @ 2023-06-14 9:38 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
Liu, Yi L, Peng, Chao P
>-----Original Message-----
>From: Duan, Zhenzhong
>Sent: Friday, June 9, 2023 12:02 PM
>To: Peter Xu <peterx@redhat.com>
>Cc: qemu-devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
>pbonzini@redhat.com; richard.henderson@linaro.org; eduardo@habkost.net;
>marcel.apfelbaum@gmail.com; alex.williamson@redhat.com;
>clg@redhat.com; david@redhat.com; philmd@linaro.org;
>kwankhede@nvidia.com; cjia@nvidia.com; Liu, Yi L <yi.l.liu@intel.com>; Peng,
>Chao P <chao.p.peng@intel.com>
>Subject: RE: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary
>UNMAP calls
>
>
>
>>-----Original Message-----
>>From: Peter Xu <peterx@redhat.com>
>>Sent: Friday, June 9, 2023 4:34 AM
>>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>Cc: qemu-devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
>>pbonzini@redhat.com; richard.henderson@linaro.org;
>eduardo@habkost.net;
>>marcel.apfelbaum@gmail.com; alex.williamson@redhat.com;
>clg@redhat.com;
>>david@redhat.com; philmd@linaro.org; kwankhede@nvidia.com;
>>cjia@nvidia.com; Liu, Yi L <yi.l.liu@intel.com>; Peng, Chao P
>><chao.p.peng@intel.com>
>>Subject: Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary
>>UNMAP calls
>>
>>On Thu, Jun 08, 2023 at 05:52:31PM +0800, Zhenzhong Duan wrote:
>>> while (remain >= VTD_PAGE_SIZE) {
>>> - IOMMUTLBEvent event;
>>> uint64_t mask = dma_aligned_pow2_mask(start, end, s->aw_bits);
>>> uint64_t size = mask + 1;
>>>
>>> assert(size);
>>>
>>> - event.type = IOMMU_NOTIFIER_UNMAP;
>>> - event.entry.iova = start;
>>> - event.entry.addr_mask = mask;
>>> - event.entry.target_as = &address_space_memory;
>>> - event.entry.perm = IOMMU_NONE;
>>> - /* This field is meaningless for unmap */
>>> - event.entry.translated_addr = 0;
>>> -
>>> - memory_region_notify_iommu_one(n, &event);
>>> + map.iova = start;
>>> + map.size = mask;
>>> + if (iova_tree_find(as->iova_tree, &map)) {
>>> + event.entry.iova = start;
>>> + event.entry.addr_mask = mask;
>>> + memory_region_notify_iommu_one(n, &event);
>>> + }
>>
>>Ah one more thing: I think this path can also be triggered by notifiers
>>without MAP event registered, whose iova tree will always be empty. So
>>we may only do this for MAP, then I'm not sure whether it'll be worthwhile..
>
>Hmm, yes, my change will lead to vhost missed to receive some invalidation
>request in device-tlb disabled case as iova tree is empty. Thanks for point out.
>
>Let me collect time diff spent in unmap AS for you to decide if it still worth or
>not.
I used below changes to collect time spent:
@@ -3739,12 +3739,14 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
/* Unmap the whole range in the notifier's scope. */
static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
{
+ int64_t start_tv, delta_tv;
hwaddr size, remain;
hwaddr start = n->start;
hwaddr end = n->end;
IntelIOMMUState *s = as->iommu_state;
DMAMap map;
+ start_tv = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
/*
* Note: all the codes in this function has a assumption that IOVA
* bits are no more than VTD_MGAW bits (which is restricted by
@@ -3793,6 +3795,9 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
map.iova = n->start;
map.size = size;
iova_tree_remove(as->iova_tree, map);
+
+ delta_tv = qemu_clock_get_us(QEMU_CLOCK_REALTIME) - start_tv;
+ printf("************ delta_tv %lu us **************\n", delta_tv);
}
With legacy container(vfio-pci,host=81:11.1,id=vfio1,bus=root1)
Hotplug:
************ delta_tv 12 us **************
************ delta_tv 8 us **************
Unplug:
************ delta_tv 12 us **************
************ delta_tv 3 us **************
After fix:
Hotplug: empty
Unplug:
************ delta_tv 2 us **************
************ delta_tv 1 us **************
With iommufd container(vfio-pci,host=81:11.1,id=vfio1,bus=root1,iommufd=iommufd0)
Hotplug:
************ delta_tv 25 us **************
************ delta_tv 23 us **************
Unplug:
************ delta_tv 15 us **************
************ delta_tv 5 us **************
After fix:
Hotplug: empty
Unplug:
************ delta_tv 2 us **************
************ delta_tv 1 us **************
It looks the benefit of this patch is negligible for legacy container and iommufd.
I'd like to drop this patch as it makes no difference, your opinion?
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls
2023-06-14 9:38 ` Duan, Zhenzhong
@ 2023-06-14 12:51 ` Peter Xu
0 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2023-06-14 12:51 UTC (permalink / raw)
To: Duan, Zhenzhong
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
Liu, Yi L, Peng, Chao P
On Wed, Jun 14, 2023 at 09:38:41AM +0000, Duan, Zhenzhong wrote:
> It looks the benefit of this patch is negligible for legacy container and iommufd.
> I'd like to drop this patch as it makes no difference, your opinion?
Thanks for the test results. Sounds good here.
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 0/5] Optimize UNMAP call and bug fix
2023-06-08 9:52 [PATCH v3 0/5] Optimize UNMAP call and bug fix Zhenzhong Duan
` (4 preceding siblings ...)
2023-06-08 9:52 ` [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary UNMAP calls Zhenzhong Duan
@ 2023-06-08 15:53 ` Peter Xu
2023-06-09 3:32 ` Duan, Zhenzhong
5 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2023-06-08 15:53 UTC (permalink / raw)
To: Zhenzhong Duan
Cc: qemu-devel, mst, jasowang, pbonzini, richard.henderson, eduardo,
marcel.apfelbaum, alex.williamson, clg, david, philmd, kwankhede,
cjia, yi.l.liu, chao.p.peng
On Thu, Jun 08, 2023 at 05:52:26PM +0800, Zhenzhong Duan wrote:
> Hi All,
>
> This patchset includes some fixes on VFIO dirty sync and vIOMMU.
> PATCH1 isn't needed now as dependent changes in PATCH2 is removed,
> but as Peter has given Reviewed-by, leave it to maintainer to
> decide if pick or not.
Let's drop patch 1 until it's really used. Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3 0/5] Optimize UNMAP call and bug fix
2023-06-08 15:53 ` [PATCH v3 0/5] Optimize UNMAP call and bug fix Peter Xu
@ 2023-06-09 3:32 ` Duan, Zhenzhong
0 siblings, 0 replies; 30+ messages in thread
From: Duan, Zhenzhong @ 2023-06-09 3:32 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, clg@redhat.com, david@redhat.com,
philmd@linaro.org, kwankhede@nvidia.com, cjia@nvidia.com,
Liu, Yi L, Peng, Chao P
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Thursday, June 8, 2023 11:54 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: qemu-devel@nongnu.org; mst@redhat.com; jasowang@redhat.com;
>pbonzini@redhat.com; richard.henderson@linaro.org; eduardo@habkost.net;
>marcel.apfelbaum@gmail.com; alex.williamson@redhat.com;
>clg@redhat.com; david@redhat.com; philmd@linaro.org;
>kwankhede@nvidia.com; cjia@nvidia.com; Liu, Yi L <yi.l.liu@intel.com>; Peng,
>Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 0/5] Optimize UNMAP call and bug fix
>
>On Thu, Jun 08, 2023 at 05:52:26PM +0800, Zhenzhong Duan wrote:
>> Hi All,
>>
>> This patchset includes some fixes on VFIO dirty sync and vIOMMU.
>> PATCH1 isn't needed now as dependent changes in PATCH2 is removed, but
>> as Peter has given Reviewed-by, leave it to maintainer to decide if
>> pick or not.
>
>Let's drop patch 1 until it's really used. Thanks,
Will drop it in next version.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 30+ messages in thread