[PATCH v3 00/23] iommufd: Add vIOMMU infrastructure (Part-4 vQUEUE)

linux-tegra.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nicolin Chen <nicolinc@nvidia.com>
To: <jgg@nvidia.com>, <kevin.tian@intel.com>, <corbet@lwn.net>,
	<will@kernel.org>
Cc: <bagasdotme@gmail.com>, <robin.murphy@arm.com>, <joro@8bytes.org>,
	<thierry.reding@gmail.com>, <vdumpa@nvidia.com>,
	<jonathanh@nvidia.com>, <shuah@kernel.org>, <jsnitsel@redhat.com>,
	<nathan@kernel.org>, <peterz@infradead.org>, <yi.l.liu@intel.com>,
	<mshavit@google.com>, <praan@google.com>,
	<zhangzekun11@huawei.com>, <iommu@lists.linux.dev>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-tegra@vger.kernel.org>, <linux-kselftest@vger.kernel.org>,
	<patches@lists.linux.dev>, <mochs@nvidia.com>,
	<alok.a.tiwari@oracle.com>, <vasant.hegde@amd.com>
Subject: [PATCH v3 00/23] iommufd: Add vIOMMU infrastructure (Part-4 vQUEUE)
Date: Thu, 1 May 2025 16:01:06 -0700	[thread overview]
Message-ID: <cover.1746139811.git.nicolinc@nvidia.com> (raw)

The vIOMMU object is designed to represent a slice of an IOMMU HW for its
virtualization features shared with or passed to user space (a VM mostly)
in a way of HW acceleration. This extended the HWPT-based design for more
advanced virtualization feature.

A vQUEUE introduced by this series as a part of the vIOMMU infrastructure
represents a HW accelerated queue/buffer for VM to use exclusively, e.g.
 - NVIDIA's Virtual Command Queue
 - AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer
each of which is an IOMMU HW feature to directly access the virtual queue
in the guest address space, to avoid VM Exits to improve the performance.

As an initial use case, it adds support for guest-owned HW virtual queues
that VMM can allocate per request from a guest OS writing the VM register.
Introduce IOMMUFD_OBJ_VQUEUE and its allocator IOMMUFD_CMD_VQUEUE_ALLOC,
allowing VMM to forward the IOMMU-specific queue info, such as queue base
address, size, and etc.

Meanwhile, a guest-owned virtual queue needs the kernel (a virtual queue
driver) to control the queue by reading/writing its consumer and producer
indexes, which means the virtual queue HW allows the guest kernel to get
a direct R/W access to those registers. Introduce an mmap infrastructure
to the iommufd core so as to support pass through a piece of MMIO region
from the host physical address space to the guest physical address space.
The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during
the IOMMUFD_CMD_VQUEUE_ALLOC and returned to the user space as an output
driver-data carried via the IOMMUFD_CMD_VQUEUE_ALLOC. So, this requires a
driver-specific user data support in the vIOMMU allocation flow.

As a real-world use case, this series implements a vQUEUE support to the
tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it
is also the Tegra CMDQV series Part-2 (user-space support), reworked from
Previous RFCv1:
    https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to
the standard SMMUv3 operating in the nested translation mode trapping CMDQ
for TLBI and ATC_INV commands, this gives a huge performance improvement:
70% to 90% reductions of invalidation time were measured by various DMA
unmap tests running in a guest OS.

// Unmap latencies from "dma_map_benchmark -g @granule -t @threads",
// by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq"
@granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0
    4KB        1          35.7 us          5.3 us
   16KB        1          41.8 us          6.8 us
   64KB        1          68.9 us          9.9 us
  128KB        1         109.0 us         12.6 us
  256KB        1         187.1 us         18.0 us
    4KB        2          96.9 us          6.8 us
   16KB        2          97.8 us          7.5 us
   64KB        2         151.5 us         10.7 us
  128KB        2         257.8 us         12.7 us
  256KB        2         443.0 us         17.9 us

This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_vqueue-v3

Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vqueue-v3

Changelog
v3
 * Add Reviewed-by from Baolu, Pranjal, and Alok
 * Revise kdocs, uAPI docs, and commit logs
 * Rename "vCMDQ" back to "vQUEUE" for AMD cases
 * [tegra] Add tegra241_vcmdq_hw_flush_timeout()
 * [tegra] Rename vsmmu_alloc to alloc_vintf_user
 * [tegra] Use writel for SID replacement registers
 * [tegra] Move mmap removal call to vsmmu_destroy op
 * [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user()
 * [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED()
 * [iommufd] Add an object-type "owner" to immap structure
 * [iommufd] Drop the ictx input in the new for-driver APIs
 * [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle
 * [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers
 * [iommufd] Rename iommufd_ctx_alloc/free_mmap to
             _iommufd_alloc/destroy_mmap
v2
 https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/
 * Add Reviewed-by from Jason
 * [smmu] Fix vsmmu initial value
 * [smmu] Support impl for hw_info
 * [tegra] Rename "slot" to "vsid"
 * [tegra] Update kdocs and commit logs
 * [tegra] Map/unmap LVCMDQ dynamically
 * [tegra] Refcount the previous LVCMDQ
 * [tegra] Return -EEXIST if LVCMDQ exists
 * [tegra] Simplify VINTF cleanup routine
 * [tegra] Use vmid and s2_domain in vsmmu
 * [tegra] Rename "mmap_pgoff" to "immap_id"
 * [tegra] Add more addr and length validation
 * [iommufd] Add more narrative to mmap's kdoc
 * [iommufd] Add iommufd_struct_depend/undepend()
 * [iommufd] Rename vcmdq_free op to vcmdq_destroy
 * [iommufd] Fix bug in iommu_copy_struct_to_user()
 * [iommufd] Drop is_io from iommufd_ctx_alloc_mmap()
 * [iommufd] Test the queue memory for its contiguity
 * [iommufd] Return -ENXIO if address or length fails
 * [iommufd] Do not change @min_last in mock_viommu_alloc()
 * [iommufd] Generalize TEGRA241_VCMDQ data in core structure
 * [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC
 * [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping
v1
 https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/

Thanks
Nicolin

Nicolin Chen (23):
  iommufd/viommu: Add driver-allocated vDEVICE support
  iommu: Pass in a driver-level user data structure to viommu_alloc op
  iommufd/viommu: Allow driver-specific user data for a vIOMMU object
  iommu: Add iommu_copy_struct_to_user helper
  iommufd/driver: Let iommufd_viommu_alloc helper save ictx to
    viommu->ictx
  iommufd/driver: Add iommufd_struct_destroy to revert
    iommufd_viommu_alloc
  iommufd/selftest: Support user_data in mock_viommu_alloc
  iommufd/selftest: Add covearge for viommu data
  iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers
  iommufd/viommu: Introduce IOMMUFD_OBJ_VQUEUE and its related struct
  iommufd/viommu: Add IOMMUFD_CMD_VQUEUE_ALLOC ioctl
  iommufd/driver: Add iommufd_vqueue_depend/undepend() helpers
  iommufd/selftest: Add coverage for IOMMUFD_CMD_VQUEUE_ALLOC
  iommufd: Add mmap interface
  iommufd/selftest: Add coverage for the new mmap interface
  Documentation: userspace-api: iommufd: Update vQUEUE
  iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op
  iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info
  iommu/tegra241-cmdqv: Use request_threaded_irq
  iommu/tegra241-cmdqv: Simplify deinit flow in
    tegra241_cmdqv_remove_vintf()
  iommu/tegra241-cmdqv: Do not statically map LVCMDQs
  iommu/tegra241-cmdqv: Add user-space use support
  iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  25 +-
 drivers/iommu/iommufd/io_pagetable.h          |   8 +
 drivers/iommu/iommufd/iommufd_private.h       |  28 +-
 drivers/iommu/iommufd/iommufd_test.h          |  20 +
 include/linux/iommu.h                         |  43 +-
 include/linux/iommufd.h                       | 184 ++++++-
 include/uapi/linux/iommufd.h                  | 117 ++++-
 tools/testing/selftests/iommu/iommufd_utils.h |  52 +-
 .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     |  42 +-
 .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c    | 481 +++++++++++++++++-
 drivers/iommu/iommufd/device.c                | 117 +----
 drivers/iommu/iommufd/driver.c                |  88 ++++
 drivers/iommu/iommufd/io_pagetable.c          |  95 ++++
 drivers/iommu/iommufd/main.c                  |  84 ++-
 drivers/iommu/iommufd/selftest.c              | 126 ++++-
 drivers/iommu/iommufd/viommu.c                | 116 ++++-
 tools/testing/selftests/iommu/iommufd.c       |  96 +++-
 .../selftests/iommu/iommufd_fail_nth.c        |  11 +-
 Documentation/userspace-api/iommufd.rst       |  15 +
 19 files changed, 1555 insertions(+), 193 deletions(-)

-- 
2.43.0

next             reply	other threads:[~2025-05-01 23:01 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-01 23:01 Nicolin Chen [this message]
2025-05-01 23:01 ` [PATCH v3 01/23] iommufd/viommu: Add driver-allocated vDEVICE support Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 02/23] iommu: Pass in a driver-level user data structure to viommu_alloc op Nicolin Chen
2025-05-06  5:43   ` Vasant Hegde
2025-05-01 23:01 ` [PATCH v3 03/23] iommufd/viommu: Allow driver-specific user data for a vIOMMU object Nicolin Chen
2025-05-06  9:32   ` Vasant Hegde
2025-05-01 23:01 ` [PATCH v3 04/23] iommu: Add iommu_copy_struct_to_user helper Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 05/23] iommufd/driver: Let iommufd_viommu_alloc helper save ictx to viommu->ictx Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 06/23] iommufd/driver: Add iommufd_struct_destroy to revert iommufd_viommu_alloc Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 07/23] iommufd/selftest: Support user_data in mock_viommu_alloc Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 08/23] iommufd/selftest: Add covearge for viommu data Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 09/23] iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 10/23] iommufd/viommu: Introduce IOMMUFD_OBJ_VQUEUE and its related struct Nicolin Chen
2025-05-06  9:17   ` Vasant Hegde
2025-05-01 23:01 ` [PATCH v3 11/23] iommufd/viommu: Add IOMMUFD_CMD_VQUEUE_ALLOC ioctl Nicolin Chen
2025-05-06  9:15   ` Vasant Hegde
2025-05-06 12:01     ` Jason Gunthorpe
2025-05-07  7:41       ` Vasant Hegde
2025-05-07  8:00         ` Tian, Kevin
2025-05-07 12:31         ` Jason Gunthorpe
2025-05-08  4:46           ` Vasant Hegde
2025-05-08  5:56             ` Nicolin Chen
2025-05-08 12:14               ` Jason Gunthorpe
2025-05-08 17:12                 ` Nicolin Chen
2025-05-09 11:52                   ` Vasant Hegde
2025-05-01 23:01 ` [PATCH v3 12/23] iommufd/driver: Add iommufd_vqueue_depend/undepend() helpers Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 13/23] iommufd/selftest: Add coverage for IOMMUFD_CMD_VQUEUE_ALLOC Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 14/23] iommufd: Add mmap interface Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 15/23] iommufd/selftest: Add coverage for the new " Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 16/23] Documentation: userspace-api: iommufd: Update vQUEUE Nicolin Chen
2025-05-02  3:50   ` Bagas Sanjaya
2025-05-02  5:29     ` Nicolin Chen
2025-05-02  7:31       ` Bagas Sanjaya
2025-05-01 23:01 ` [PATCH v3 17/23] iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 18/23] iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 19/23] iommu/tegra241-cmdqv: Use request_threaded_irq Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 20/23] iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 21/23] iommu/tegra241-cmdqv: Do not statically map LVCMDQs Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 22/23] iommu/tegra241-cmdqv: Add user-space use support Nicolin Chen
2025-05-01 23:09   ` Nicolin Chen
2025-05-01 23:01 ` [PATCH v3 23/23] iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1746139811.git.nicolinc@nvidia.com \
    --to=nicolinc@nvidia.com \
    --cc=alok.a.tiwari@oracle.com \
    --cc=bagasdotme@gmail.com \
    --cc=corbet@lwn.net \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=jonathanh@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jsnitsel@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=mshavit@google.com \
    --cc=nathan@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=praan@google.com \
    --cc=robin.murphy@arm.com \
    --cc=shuah@kernel.org \
    --cc=thierry.reding@gmail.com \
    --cc=vasant.hegde@amd.com \
    --cc=vdumpa@nvidia.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhangzekun11@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).