qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Eric Auger" <eric.auger@redhat.com>,
	"Zhenzhong Duan" <zhenzhong.duan@intel.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Harsh Prateek Bora" <harshpb@linux.ibm.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Eric Farman" <farman@linux.ibm.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Matthew Rosato" <mjrosato@linux.ibm.com>,
	"Cédric Le Goater" <clg@redhat.com>,
	"Yi Liu" <yi.l.liu@intel.com>,
	"Nicolin Chen" <nicolinc@nvidia.com>
Subject: [PULL 46/47] docs/devel: Add VFIO iommufd backend documentation
Date: Tue, 19 Dec 2023 19:56:42 +0100	[thread overview]
Message-ID: <20231219185643.725448-47-clg@redhat.com> (raw)
In-Reply-To: <20231219185643.725448-1-clg@redhat.com>

From: Zhenzhong Duan <zhenzhong.duan@intel.com>

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 MAINTAINERS                    |   1 +
 docs/devel/index-internals.rst |   1 +
 docs/devel/vfio-iommufd.rst    | 166 +++++++++++++++++++++++++++++++++
 3 files changed, 168 insertions(+)
 create mode 100644 docs/devel/vfio-iommufd.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index ca70bb4e6415fc3af110cc7fd37ac67be5ab8c9d..0ddb20a35f205dba3b437c33bf489a53ecfc36b0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2176,6 +2176,7 @@ F: backends/iommufd.c
 F: include/sysemu/iommufd.h
 F: include/qemu/chardev_open.h
 F: util/chardev_open.c
+F: docs/devel/vfio-iommufd.rst
 
 vhost
 M: Michael S. Tsirkin <mst@redhat.com>
diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst
index 6f81df92bcaba790477aff1ccb51048409331950..3def4a138bae5eca5b564e0044c1c2e80b5bc07a 100644
--- a/docs/devel/index-internals.rst
+++ b/docs/devel/index-internals.rst
@@ -18,5 +18,6 @@ Details about QEMU's various subsystems including how to add features to them.
    s390-dasd-ipl
    tracing
    vfio-migration
+   vfio-iommufd
    writing-monitor-commands
    virtio-backends
diff --git a/docs/devel/vfio-iommufd.rst b/docs/devel/vfio-iommufd.rst
new file mode 100644
index 0000000000000000000000000000000000000000..3d1c11f175e5968e9f1519da70c9a0a6ced03995
--- /dev/null
+++ b/docs/devel/vfio-iommufd.rst
@@ -0,0 +1,166 @@
+===============================
+IOMMUFD BACKEND usage with VFIO
+===============================
+
+(Same meaning for backend/container/BE)
+
+With the introduction of iommufd, the Linux kernel provides a generic
+interface for user space drivers to propagate their DMA mappings to kernel
+for assigned devices. While the legacy kernel interface is group-centric,
+the new iommufd interface is device-centric, relying on device fd and iommufd.
+
+To support both interfaces in the QEMU VFIO device, introduce a base container
+to abstract the common part of VFIO legacy and iommufd container. So that the
+generic VFIO code can use either container.
+
+The base container implements generic functions such as memory_listener and
+address space management whereas the derived container implements callbacks
+specific to either legacy or iommufd. Each container has its own way to setup
+secure context and dma management interface. The below diagram shows how it
+looks like with both containers.
+
+::
+
+                      VFIO                           AddressSpace/Memory
+      +-------+  +----------+  +-----+  +-----+
+      |  pci  |  | platform |  |  ap |  | ccw |
+      +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
+          |           |           |        |        |   AddressSpace       |
+          |           |           |        |        +------------+---------+
+      +---V-----------V-----------V--------V----+               /
+      |           VFIOAddressSpace              | <------------+
+      |                  |                      |  MemoryListener
+      |        VFIOContainerBase list           |
+      +-------+----------------------------+----+
+              |                            |
+              |                            |
+      +-------V------+            +--------V----------+
+      |   iommufd    |            |    vfio legacy    |
+      |  container   |            |     container     |
+      +-------+------+            +--------+----------+
+              |                            |
+              | /dev/iommu                 | /dev/vfio/vfio
+              | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
+  Userspace   |                            |
+  ============+============================+===========================
+  Kernel      |  device fd                 |
+              +---------------+            | group/container fd
+              | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
+              |  ATTACH_IOAS) |            | device fd
+              |               |            |
+              |       +-------V------------V-----------------+
+      iommufd |       |                vfio                  |
+  (map/unmap  |       +---------+--------------------+-------+
+  ioas_copy)  |                 |                    | map/unmap
+              |                 |                    |
+       +------V------+    +-----V------+      +------V--------+
+       | iommfd core |    |  device    |      |  vfio iommu   |
+       +-------------+    +------------+      +---------------+
+
+* Secure Context setup
+
+  - iommufd BE: uses device fd and iommufd to setup secure context
+    (bind_iommufd, attach_ioas)
+  - vfio legacy BE: uses group fd and container fd to setup secure context
+    (set_container, set_iommu)
+
+* Device access
+
+  - iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX``
+  - vfio legacy BE: device fd is retrieved from group fd ioctl
+
+* DMA Mapping flow
+
+  1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
+  2. VFIO populates DMA map/unmap via the container BEs
+     * iommufd BE: uses iommufd
+     * vfio legacy BE: uses container fd
+
+Example configuration
+=====================
+
+Step 1: configure the host device
+---------------------------------
+
+It's exactly same as the VFIO device with legacy VFIO container.
+
+Step 2: configure QEMU
+----------------------
+
+Interactions with the ``/dev/iommu`` are abstracted by a new iommufd
+object (compiled in with the ``CONFIG_IOMMUFD`` option).
+
+Any QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must
+be linked with an iommufd object. It gets a new optional property
+named iommufd which allows to pass an iommufd object. Take ``vfio-pci``
+device for example:
+
+.. code-block:: bash
+
+    -object iommufd,id=iommufd0
+    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
+
+Note the ``/dev/iommu`` and VFIO cdev can be externally opened by a
+management layer. In such a case the fd is passed, the fd supports a
+string naming the fd or a number, for example:
+
+.. code-block:: bash
+
+    -object iommufd,id=iommufd0,fd=22
+    -device vfio-pci,iommufd=iommufd0,fd=23
+
+If the ``fd`` property is not passed, the fd is opened by QEMU.
+
+If no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd
+is not used and the user gets the behavior based on the legacy VFIO
+container:
+
+.. code-block:: bash
+
+    -device vfio-pci,host=0000:02:00.0
+
+Supported platform
+==================
+
+Supports x86, ARM and s390x currently.
+
+Caveats
+=======
+
+Dirty page sync
+---------------
+
+Dirty page sync with iommufd backend is unsupported yet, live migration is
+disabled by default. But it can be force enabled like below, low efficient
+though.
+
+.. code-block:: bash
+
+    -object iommufd,id=iommufd0
+    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on
+
+P2P DMA
+-------
+
+PCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI
+BAR region yet. Below warning shows for assigned PCI device, it's not a bug.
+
+.. code-block:: none
+
+    qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
+    qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
+
+FD passing with mdev
+--------------------
+
+``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev.
+If FD passing is used, there is no way to know that and the mdev is treated
+like a real PCI device. There is an error as below if user wants to enable
+RAM discarding for mdev.
+
+.. code-block:: none
+
+    qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices
+
+``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend
+devices are always mdev and RAM discarding is force enabled.
-- 
2.43.0



  parent reply	other threads:[~2023-12-19 19:06 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-19 18:55 [PULL 00/47] vfio queue Cédric Le Goater
2023-12-19 18:55 ` [PULL 01/47] vfio: Introduce base object for VFIOContainer and targeted interface Cédric Le Goater
2023-12-19 18:55 ` [PULL 02/47] vfio/container: Introduce a empty VFIOIOMMUOps Cédric Le Goater
2023-12-19 18:55 ` [PULL 03/47] vfio/container: Switch to dma_map|unmap API Cédric Le Goater
2023-12-19 18:56 ` [PULL 04/47] vfio/common: Introduce vfio_container_init/destroy helper Cédric Le Goater
2023-12-19 18:56 ` [PULL 05/47] vfio/common: Move giommu_list in base container Cédric Le Goater
2023-12-19 18:56 ` [PULL 06/47] vfio/container: Move space field to " Cédric Le Goater
2023-12-19 18:56 ` [PULL 07/47] vfio/container: Switch to IOMMU BE set_dirty_page_tracking/query_dirty_bitmap API Cédric Le Goater
2023-12-19 18:56 ` [PULL 08/47] vfio/container: Move per container device list in base container Cédric Le Goater
2023-12-19 18:56 ` [PULL 09/47] vfio/container: Convert functions to " Cédric Le Goater
2023-12-19 18:56 ` [PULL 10/47] vfio/container: Move pgsizes and dma_max_mappings " Cédric Le Goater
2023-12-19 18:56 ` [PULL 11/47] vfio/container: Move vrdl_list " Cédric Le Goater
2023-12-19 18:56 ` [PULL 12/47] vfio/container: Move listener " Cédric Le Goater
2023-12-19 18:56 ` [PULL 13/47] vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size " Cédric Le Goater
2023-12-19 18:56 ` [PULL 14/47] vfio/container: Move iova_ranges " Cédric Le Goater
2023-12-19 18:56 ` [PULL 15/47] vfio/container: Implement attach/detach_device Cédric Le Goater
2023-12-19 18:56 ` [PULL 16/47] vfio/spapr: Introduce spapr backend and target interface Cédric Le Goater
2023-12-19 18:56 ` [PULL 17/47] vfio/spapr: switch to spapr IOMMU BE add/del_section_window Cédric Le Goater
2023-12-19 18:56 ` [PULL 18/47] vfio/spapr: Move prereg_listener into spapr container Cédric Le Goater
2023-12-19 18:56 ` [PULL 19/47] vfio/spapr: Move hostwin_list " Cédric Le Goater
2023-12-19 18:56 ` [PULL 20/47] backends/iommufd: Introduce the iommufd object Cédric Le Goater
2023-12-21 16:00   ` Cédric Le Goater
2023-12-21 17:14     ` Eric Auger
2023-12-21 21:23       ` Cédric Le Goater
2023-12-22 10:09         ` Eric Auger
2023-12-22 10:34           ` Cédric Le Goater
2023-12-22  2:41     ` Duan, Zhenzhong
2023-12-19 18:56 ` [PULL 21/47] util/char_dev: Add open_cdev() Cédric Le Goater
2023-12-19 18:56 ` [PULL 22/47] vfio/common: return early if space isn't empty Cédric Le Goater
2023-12-19 18:56 ` [PULL 23/47] vfio/iommufd: Implement the iommufd backend Cédric Le Goater
2023-12-19 18:56 ` [PULL 24/47] vfio/iommufd: Relax assert check for " Cédric Le Goater
2023-12-19 18:56 ` [PULL 25/47] vfio/iommufd: Add support for iova_ranges and pgsizes Cédric Le Goater
2023-12-19 18:56 ` [PULL 26/47] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info Cédric Le Goater
2023-12-19 18:56 ` [PULL 27/47] vfio/pci: Introduce a vfio pci hot reset interface Cédric Le Goater
2023-12-19 18:56 ` [PULL 28/47] vfio/iommufd: Enable pci hot reset through iommufd cdev interface Cédric Le Goater
2023-12-19 18:56 ` [PULL 29/47] vfio/pci: Allow the selection of a given iommu backend Cédric Le Goater
2023-12-19 18:56 ` [PULL 30/47] vfio/pci: Make vfio cdev pre-openable by passing a file handle Cédric Le Goater
2023-12-19 18:56 ` [PULL 31/47] vfio/platform: Allow the selection of a given iommu backend Cédric Le Goater
2023-12-19 18:56 ` [PULL 32/47] vfio/platform: Make vfio cdev pre-openable by passing a file handle Cédric Le Goater
2023-12-19 18:56 ` [PULL 33/47] vfio/ap: Allow the selection of a given iommu backend Cédric Le Goater
2023-12-19 18:56 ` [PULL 34/47] vfio/ap: Make vfio cdev pre-openable by passing a file handle Cédric Le Goater
2023-12-19 18:56 ` [PULL 35/47] vfio/ccw: Allow the selection of a given iommu backend Cédric Le Goater
2023-12-19 18:56 ` [PULL 36/47] vfio/ccw: Make vfio cdev pre-openable by passing a file handle Cédric Le Goater
2023-12-19 18:56 ` [PULL 37/47] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks Cédric Le Goater
2023-12-19 18:56 ` [PULL 38/47] hw/arm: Activate IOMMUFD for virt machines Cédric Le Goater
2023-12-19 18:56 ` [PULL 39/47] kconfig: Activate IOMMUFD for s390x machines Cédric Le Goater
2023-12-19 18:56 ` [PULL 40/47] hw/i386: Activate IOMMUFD for q35 machines Cédric Le Goater
2023-12-19 18:56 ` [PULL 41/47] vfio/pci: Move VFIODevice initializations in vfio_instance_init Cédric Le Goater
2023-12-19 18:56 ` [PULL 42/47] vfio/platform: Move VFIODevice initializations in vfio_platform_instance_init Cédric Le Goater
2023-12-19 18:56 ` [PULL 43/47] vfio/ap: Move VFIODevice initializations in vfio_ap_instance_init Cédric Le Goater
2023-12-19 18:56 ` [PULL 44/47] vfio/ccw: Move VFIODevice initializations in vfio_ccw_instance_init Cédric Le Goater
2023-12-19 18:56 ` [PULL 45/47] vfio: Introduce a helper function to initialize VFIODevice Cédric Le Goater
2023-12-19 18:56 ` Cédric Le Goater [this message]
2023-12-19 18:56 ` [PULL 47/47] hw/ppc/Kconfig: Imply VFIO_PCI Cédric Le Goater
2023-12-20 16:03 ` [PULL 00/47] vfio queue Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231219185643.725448-47-clg@redhat.com \
    --to=clg@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=harshpb@linux.ibm.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=nicolinc@nvidia.com \
    --cc=npiggin@gmail.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=thuth@redhat.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).