qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Peter Xu" <peterx@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>
Subject: [PULL 09/56] intel-iommu: Document iova_tree
Date: Mon, 30 Jan 2023 15:19:11 -0500	[thread overview]
Message-ID: <20230130201810.11518-10-mst@redhat.com> (raw)
In-Reply-To: <20230130201810.11518-1-mst@redhat.com>

From: Peter Xu <peterx@redhat.com>

It seems not super clear on when iova_tree is used, and why.  Add a rich
comment above iova_tree to track why we needed the iova_tree, and when we
need it.

Also comment for the map/unmap messages, on how they're used and
implications (e.g. unmap can be larger than the mapped ranges).

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230109193727.1360190-1-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/exec/memory.h         | 26 ++++++++++++++++++++++++
 include/hw/i386/intel_iommu.h | 38 ++++++++++++++++++++++++++++++++++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c37ffdbcd1..2e602a2fad 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -129,6 +129,32 @@ struct IOMMUTLBEntry {
 /*
  * Bitmap for different IOMMUNotifier capabilities. Each notifier can
  * register with one or multiple IOMMU Notifier capability bit(s).
+ *
+ * Normally there're two use cases for the notifiers:
+ *
+ *   (1) When the device needs accurate synchronizations of the vIOMMU page
+ *       tables, it needs to register with both MAP|UNMAP notifies (which
+ *       is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).
+ *
+ *       Regarding to accurate synchronization, it's when the notified
+ *       device maintains a shadow page table and must be notified on each
+ *       guest MAP (page table entry creation) and UNMAP (invalidation)
+ *       events (e.g. VFIO). Both notifications must be accurate so that
+ *       the shadow page table is fully in sync with the guest view.
+ *
+ *   (2) When the device doesn't need accurate synchronizations of the
+ *       vIOMMU page tables, it needs to register only with UNMAP or
+ *       DEVIOTLB_UNMAP notifies.
+ *
+ *       It's when the device maintains a cache of IOMMU translations
+ *       (IOTLB) and is able to fill that cache by requesting translations
+ *       from the vIOMMU through a protocol similar to ATS (Address
+ *       Translation Service).
+ *
+ *       Note that in this mode the vIOMMU will not maintain a shadowed
+ *       page table for the address space, and the UNMAP messages can cover
+ *       more than the pages that used to get mapped.  The IOMMU notifiee
+ *       should be able to take care of over-sized invalidations.
  */
 typedef enum {
     IOMMU_NOTIFIER_NONE = 0,
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 46d973e629..89dcbc5e1e 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -109,7 +109,43 @@ struct VTDAddressSpace {
     QLIST_ENTRY(VTDAddressSpace) next;
     /* Superset of notifier flags that this address space has */
     IOMMUNotifierFlag notifier_flags;
-    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
+    /*
+     * @iova_tree traces mapped IOVA ranges.
+     *
+     * The tree is not needed if no MAP notifier is registered with current
+     * VTD address space, because all guest invalidate commands can be
+     * directly passed to the IOMMU UNMAP notifiers without any further
+     * reshuffling.
+     *
+     * The tree OTOH is required for MAP typed iommu notifiers for a few
+     * reasons.
+     *
+     * Firstly, there's no way to identify whether an PSI (Page Selective
+     * Invalidations) or DSI (Domain Selective Invalidations) event is an
+     * MAP or UNMAP event within the message itself.  Without having prior
+     * knowledge of existing state vIOMMU doesn't know whether it should
+     * notify MAP or UNMAP for a PSI message it received when caching mode
+     * is enabled (for MAP notifiers).
+     *
+     * Secondly, PSI messages received from guest driver can be enlarged in
+     * range, covers but not limited to what the guest driver wanted to
+     * invalidate.  When the range to invalidates gets bigger than the
+     * limit of a PSI message, it can even become a DSI which will
+     * invalidate the whole domain.  If the vIOMMU directly notifies the
+     * registered device with the unmodified range, it may confuse the
+     * registered drivers (e.g. vfio-pci) on either:
+     *
+     *   (1) Trying to map the same region more than once (for
+     *       VFIO_IOMMU_MAP_DMA, -EEXIST will trigger), or,
+     *
+     *   (2) Trying to UNMAP a range that is still partially mapped.
+     *
+     * That accuracy is not required for UNMAP-only notifiers, but it is a
+     * must-to-have for notifiers registered with MAP events, because the
+     * vIOMMU needs to make sure the shadow page table is always in sync
+     * with the guest IOMMU pgtables for a device.
+     */
+    IOVATree *iova_tree;
 };
 
 struct VTDIOTLBEntry {
-- 
MST



  parent reply	other threads:[~2023-01-30 20:27 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-30 20:18 [PULL 00/56] virtio,pc,pci: features, cleanups, fixes Michael S. Tsirkin
2023-01-30 20:18 ` [PULL 01/56] shpc: disallow unplug when power indicator is blinking Michael S. Tsirkin
2023-01-30 20:18 ` [PULL 02/56] hw/i386/acpi-build: Remove unused attributes Michael S. Tsirkin
2023-01-30 20:18 ` [PULL 03/56] hw/isa/isa-bus: Turn isa_build_aml() into qbus_build_aml() Michael S. Tsirkin
2023-01-30 20:18 ` [PULL 04/56] hw/acpi/piix4: No need to #include "hw/southbridge/piix.h" Michael S. Tsirkin
2023-01-30 20:18 ` [PULL 05/56] hw/acpi/acpi_dev_interface: Remove unused parameter from AcpiDeviceIfClass::madt_cpu Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 06/56] vhost-user: Correct a reference of TARGET_AARCH64 Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 07/56] hw/pci-host: Use register definitions from PCI standard Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 08/56] virtio-rng-pci: fix migration compat for vectors Michael S. Tsirkin
2023-01-30 20:19 ` Michael S. Tsirkin [this message]
2023-01-30 20:19 ` [PULL 12/56] tests: acpi: cleanup arguments to make them more readable Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 13/56] tests: acpi: whitelist DSDT blobs for tests that use pci-bridges Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 14/56] tests: acpi: extend pcihp with nested bridges Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 15/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 16/56] tests: acpi: cleanup use_uefi argument usage Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 17/56] pci_bridge: remove whitespace Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 18/56] x86: acpi: pcihp: clean up duplicate bridge_in_acpi assignment Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 19/56] pci: acpi hotplug: rename x-native-hotplug to x-do-not-expose-native-hotplug-cap Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 20/56] pcihp: piix4: do not call acpi_pcihp_reset() when ACPI PCI hotplug is disabled Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 21/56] pci: acpihp: assign BSEL only to coldplugged bridges Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 22/56] x86: pcihp: fix invalid AML PCNT calls to hotplugged bridges Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 23/56] tests: boot_sector_test: avoid crashing if status is not available yet Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data Michael S. Tsirkin
2023-01-30 20:19   ` Michael S. Tsirkin
2023-01-31 19:39   ` Jason A. Donenfeld
2023-01-31 21:27     ` Michael S. Tsirkin
2023-01-31 20:54   ` H. Peter Anvin
2023-01-31 21:22     ` Jason A. Donenfeld
2023-02-01  5:40       ` H. Peter Anvin
2023-01-31 23:32   ` Jason A. Donenfeld
2023-01-30 20:20 ` [PULL 11/56] tests: qtest: print device_add error before failing test Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 26/56] tests: acpi: add reboot cycle to bridge test Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 27/56] tests: acpi: whitelist DSDT before refactoring acpi based PCI hotplug machinery Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 28/56] pcihp: drop pcihp_bridge_en dependency when composing PCNT method Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 29/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 30/56] tests: acpi: whitelist DSDT before refactoring acpi based PCI hotplug machinery Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 24/56] tests: acpi: extend bridge tests with hotplugged bridges Michael S. Tsirkin
2023-01-30 20:20   ` Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 31/56] pcihp: compose PCNT callchain right before its user _GPE._E01 Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 32/56] pcihp: do not put empty PCNT in DSDT Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 25/56] tests: boot_sector_test(): make it multi-shot Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 33/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 34/56] whitelist DSDT before adding endpoint devices to bridge testcases Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 35/56] tests: acpi: add endpoint devices to bridges Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 36/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 37/56] x86: pcihp: acpi: prepare slot ignore rule to work with self describing bridges Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 38/56] pci: acpi: wire up AcpiDevAmlIf interface to generic bridge Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 39/56] pcihp: make bridge describe itself using AcpiDevAmlIfClass:build_dev_aml Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 40/56] pci: make sure pci_bus_is_express() won't error out with "discards ‘const’ qualifier" Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 41/56] pcihp: isolate rule whether slot should be described in DSDT Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 42/56] tests: acpi: whitelist DSDT before decoupling PCI hotplug code from basic slots description Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 43/56] pcihp: acpi: decouple hotplug and generic " Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 44/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 45/56] tests: acpi: whitelist DSDT blobs before removing dynamic _DSM on coldplugged bridges Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 46/56] pcihp: acpi: ignore coldplugged bridges when composing hotpluggable slots Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 47/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 48/56] tests: acpi: whitelist DSDT before moving non-hotpluggble slots description from hotplug path Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 49/56] pcihp: generate populated non-hotpluggble slot descriptions on non-hotplug path Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 50/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 51/56] vhost-user: Skip unnecessary duplicated VHOST_USER_ADD/REM_MEM_REG requests Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 52/56] hw: Use TYPE_PCI_BUS definition where appropriate Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 53/56] tests/qtest/bios-tables-test: Make the test less verbose by default Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 54/56] Revert "vhost-user: Monitor slave channel in vhost_user_read()" Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 55/56] Revert "vhost-user: Introduce nested event loop " Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 56/56] docs/pcie.txt: Replace ioh3420 with pcie-root-port Michael S. Tsirkin
2023-02-02 13:42 ` [PULL 00/56] virtio,pc,pci: features, cleanups, fixes Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230130201810.11518-10-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=david@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).