All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: <jgg@nvidia.com>, <kevin.tian@intel.com>, <corbet@lwn.net>,
	<will@kernel.org>
Cc: <bagasdotme@gmail.com>, <robin.murphy@arm.com>, <joro@8bytes.org>,
	<thierry.reding@gmail.com>, <vdumpa@nvidia.com>,
	<jonathanh@nvidia.com>, <shuah@kernel.org>, <jsnitsel@redhat.com>,
	<nathan@kernel.org>, <peterz@infradead.org>, <yi.l.liu@intel.com>,
	<mshavit@google.com>, <praan@google.com>,
	<zhangzekun11@huawei.com>, <iommu@lists.linux.dev>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-tegra@vger.kernel.org>, <linux-kselftest@vger.kernel.org>,
	<patches@lists.linux.dev>, <mochs@nvidia.com>,
	<alok.a.tiwari@oracle.com>, <vasant.hegde@amd.com>,
	<dwmw2@infradead.org>, <baolu.lu@linux.intel.com>
Subject: [PATCH v5 00/29] iommufd: Add vIOMMU infrastructure (Part-4 HW QUEUE)
Date: Sat, 17 May 2025 20:21:17 -0700	[thread overview]
Message-ID: <cover.1747537752.git.nicolinc@nvidia.com> (raw)

The vIOMMU object is designed to represent a slice of an IOMMU HW for its
virtualization features shared with or passed to user space (a VM mostly)
in a way of HW acceleration. This extended the HWPT-based design for more
advanced virtualization feature.

HW QUEUE introduced by this series as a part of the vIOMMU infrastructure
represents a HW accelerated queue/buffer for VM to use exclusively, e.g.
 - NVIDIA's Virtual Command Queue
 - AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer
each of which allows its IOMMU HW to directly access a queue memory owned
by a guest VM and allows a guest OS to control the HW queue direclty, to
avoid VM Exit overheads to improve the performance.

Introduce IOMMUFD_OBJ_HW_QUEUE and its pairing IOMMUFD_CMD_HW_QUEUE_ALLOC
allowing VMM to forward the IOMMU-specific queue info, such as queue base
address, size, and etc.

Meanwhile, a guest-owned queue needs the guest kernel to control the queue
by reading/writing its consumer and producer indexes, via MMIO acceses to
the hardware MMIO registers. Introduce an mmap infrastructure for iommufd
to support passing through a piece of MMIO region from the host physical
address space to the guest physical address space. The mmap info (offset/
length) used by an mmap syscall must be pre-allocated and returned to the
user space via an output driver-data during an IOMMUFD_CMD_HW_QUEUE_ALLOC
call. Thus, it requires a driver-specific user data support in the vIOMMU
allocation flow.

As a real-world use case, this series implements a HW QUEUE support in the
tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it
is also the Tegra CMDQV series Part-2 (user-space support), reworked from
Previous RFCv1:
    https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to
the standard SMMUv3 operating in the nested translation mode trapping CMDQ
for TLBI and ATC_INV commands, this gives a huge performance improvement:
70% to 90% reductions of invalidation time were measured by various DMA
unmap tests running in a guest OS.

// Unmap latencies from "dma_map_benchmark -g @granule -t @threads",
// by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq"
@granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0
    4KB        1          35.7 us          5.3 us
   16KB        1          41.8 us          6.8 us
   64KB        1          68.9 us          9.9 us
  128KB        1         109.0 us         12.6 us
  256KB        1         187.1 us         18.0 us
    4KB        2          96.9 us          6.8 us
   16KB        2          97.8 us          7.5 us
   64KB        2         151.5 us         10.7 us
  128KB        2         257.8 us         12.7 us
  256KB        2         443.0 us         17.9 us

This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_hw_queue-v5

Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_hw_queue-v5

Changelog
v5
 * Rebase on v6.15-rc6
 * Add Reviewed-by from Jason and Kevin
 * Correct typos in kdoc and update commit logs
 * [iommufd] Add a cosmetic fix
 * [iommufd] Drop unused num_pfns
 * [iommufd] Drop unnecessary check
 * [iommufd] Reorder patch sequence
 * [iommufd] Use io_remap_pfn_range()
 * [iommufd] Use success oriented flow
 * [iommufd] Fix max_npages calculation
 * [iommufd] Add more selftest coverage
 * [iommufd] Drop redundant static_assert
 * [iommufd] Fix mmap pfn range validation
 * [iommufd] Reject unmap on pinned iovas
 * [iommufd] Drop redundant vm_flags_set()
 * [iommufd] Drop iommufd_struct_destroy()
 * [iommufd] Drop redundant queue iova test
 * [iommufd] Use "mmio_addr" and "mmio_pfn"
 * [iommufd] Rename to "nesting_parent_iova"
 * [iommufd] Make iopt_pin_pages call option
 * [iommufd] Add ictx comparison in depend()
 * [iommufd] Add iommufd_object_alloc_ucmd()
 * [iommufd] Move kcalloc() after validations
 * [iommufd] Replace ictx setting with WARN_ON
 * [iommufd] Make hw_info's type bidirectional
 * [smmu] Add supported_vsmmu_type in impl_ops
 * [smmu] Drop impl report in smmu vendor struct
 * [tegra] Add IOMMU_HW_INFO_TYPE_TEGRA241_CMDQV
 * [tegra] Replace "number of VINTFs" with a note
 * [tegra] Drop the redundant lvcmdq pointer setting
 * [tegra] Flag IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA
 * [tegra] Use "vintf_alloc_vsid" for vdevice_alloc op
v4
 https://lore.kernel.org/all/cover.1746757630.git.nicolinc@nvidia.com/
 * Rebase on v6.15-rc5
 * Add Reviewed-by from Vasant
 * Rename "vQUEUE" to "HW QUEUE"
 * Use "offset" and "length" for all mmap-related variables
 * [iommufd] Use u64 for guest PA
 * [iommufd] Fix typo in uAPI doc
 * [iommufd] Rename immap_id to offset
 * [iommufd] Drop the partial-size mmap support
 * [iommufd] Do not replace WARN_ON with WARN_ON_ONCE
 * [iommufd] Use "u64 base_addr" for queue base address
 * [iommufd] Use u64 base_pfn/num_pfns for immap structure
 * [iommufd] Correct the size passed in to mtree_alloc_range()
 * [iommufd] Add IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA to viommu_ops
v3
 https://lore.kernel.org/all/cover.1746139811.git.nicolinc@nvidia.com/
 * Add Reviewed-by from Baolu, Pranjal, and Alok
 * Revise kdocs, uAPI docs, and commit logs
 * Rename "vCMDQ" back to "vQUEUE" for AMD cases
 * [tegra] Add tegra241_vcmdq_hw_flush_timeout()
 * [tegra] Rename vsmmu_alloc to alloc_vintf_user
 * [tegra] Use writel for SID replacement registers
 * [tegra] Move mmap removal call to vsmmu_destroy op
 * [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user()
 * [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED()
 * [iommufd] Add an object-type "owner" to immap structure
 * [iommufd] Drop the ictx input in the new for-driver APIs
 * [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle
 * [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers
 * [iommufd] Rename iommufd_ctx_alloc/free_mmap to
             _iommufd_alloc/destroy_mmap
v2
 https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/
 * Add Reviewed-by from Jason
 * [smmu] Fix vsmmu initial value
 * [smmu] Support impl for hw_info
 * [tegra] Rename "slot" to "vsid"
 * [tegra] Update kdocs and commit logs
 * [tegra] Map/unmap LVCMDQ dynamically
 * [tegra] Refcount the previous LVCMDQ
 * [tegra] Return -EEXIST if LVCMDQ exists
 * [tegra] Simplify VINTF cleanup routine
 * [tegra] Use vmid and s2_domain in vsmmu
 * [tegra] Rename "mmap_pgoff" to "immap_id"
 * [tegra] Add more addr and length validation
 * [iommufd] Add more narrative to mmap's kdoc
 * [iommufd] Add iommufd_struct_depend/undepend()
 * [iommufd] Rename vcmdq_free op to vcmdq_destroy
 * [iommufd] Fix bug in iommu_copy_struct_to_user()
 * [iommufd] Drop is_io from iommufd_ctx_alloc_mmap()
 * [iommufd] Test the queue memory for its contiguity
 * [iommufd] Return -ENXIO if address or length fails
 * [iommufd] Do not change @min_last in mock_viommu_alloc()
 * [iommufd] Generalize TEGRA241_VCMDQ data in core structure
 * [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC
 * [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping
v1
 https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/

Thanks
Nicolin

Nicolin Chen (29):
  iommufd: Apply obvious cosmetic fixes
  iommufd: Introduce iommufd_object_alloc_ucmd helper
  iommu: Apply the new iommufd_object_alloc_ucmd helper
  iommu: Add iommu_copy_struct_to_user helper
  iommu: Pass in a driver-level user data structure to viommu_alloc op
  iommufd/viommu: Allow driver-specific user data for a vIOMMU object
  iommufd/selftest: Support user_data in mock_viommu_alloc
  iommufd/selftest: Add coverage for viommu data
  iommufd: Do not unmap an owned iopt_area
  iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers
  iommufd/driver: Let iommufd_viommu_alloc helper save ictx to
    viommu->ictx
  iommufd/viommu: Add driver-allocated vDEVICE support
  iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct
  iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl
  iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers
  iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC
  iommufd: Add mmap interface
  iommufd/selftest: Add coverage for the new mmap interface
  Documentation: userspace-api: iommufd: Update HW QUEUE
  iommu: Allow an input type in hw_info op
  iommufd: Allow an input data_type via iommu_hw_info
  iommufd/selftest: Update hw_info coverage for an input data_type
  iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op
  iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops
  iommu/tegra241-cmdqv: Use request_threaded_irq
  iommu/tegra241-cmdqv: Simplify deinit flow in
    tegra241_cmdqv_remove_vintf()
  iommu/tegra241-cmdqv: Do not statically map LVCMDQs
  iommu/tegra241-cmdqv: Add user-space use support
  iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  28 +-
 drivers/iommu/iommufd/io_pagetable.h          |  15 +-
 drivers/iommu/iommufd/iommufd_private.h       |  41 +-
 drivers/iommu/iommufd/iommufd_test.h          |  20 +
 include/linux/iommu.h                         |  53 +-
 include/linux/iommufd.h                       | 221 +++++++-
 include/uapi/linux/iommufd.h                  | 150 +++++-
 tools/testing/selftests/iommu/iommufd_utils.h |  91 +++-
 .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     |  33 +-
 .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c    | 496 +++++++++++++++++-
 drivers/iommu/intel/iommu.c                   |   4 +
 drivers/iommu/iommufd/device.c                | 137 +----
 drivers/iommu/iommufd/driver.c                |  97 ++++
 drivers/iommu/iommufd/eventq.c                |  14 +-
 drivers/iommu/iommufd/hw_pagetable.c          |   6 +-
 drivers/iommu/iommufd/io_pagetable.c          | 106 +++-
 drivers/iommu/iommufd/iova_bitmap.c           |   1 -
 drivers/iommu/iommufd/main.c                  |  80 ++-
 drivers/iommu/iommufd/pages.c                 |  19 +-
 drivers/iommu/iommufd/selftest.c              | 158 +++++-
 drivers/iommu/iommufd/viommu.c                | 146 +++++-
 tools/testing/selftests/iommu/iommufd.c       | 146 +++++-
 .../selftests/iommu/iommufd_fail_nth.c        |  15 +-
 Documentation/userspace-api/iommufd.rst       |  12 +
 24 files changed, 1794 insertions(+), 295 deletions(-)

-- 
2.43.0



             reply	other threads:[~2025-05-18  3:26 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-18  3:21 Nicolin Chen [this message]
2025-05-18  3:21 ` [PATCH v5 01/29] iommufd: Apply obvious cosmetic fixes Nicolin Chen
2025-05-23  7:43   ` Tian, Kevin
2025-05-28 16:55   ` Jason Gunthorpe
2025-05-18  3:21 ` [PATCH v5 02/29] iommufd: Introduce iommufd_object_alloc_ucmd helper Nicolin Chen
2025-05-23  7:46   ` Tian, Kevin
2025-05-23 21:17     ` Nicolin Chen
2025-05-28 16:56   ` Jason Gunthorpe
2025-05-18  3:21 ` [PATCH v5 03/29] iommu: Apply the new " Nicolin Chen
2025-05-23  7:49   ` Tian, Kevin
2025-05-23 21:34     ` Nicolin Chen
2025-05-28  8:11       ` Tian, Kevin
2025-05-28 16:57   ` Jason Gunthorpe
2025-05-18  3:21 ` [PATCH v5 04/29] iommu: Add iommu_copy_struct_to_user helper Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 05/29] iommu: Pass in a driver-level user data structure to viommu_alloc op Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 06/29] iommufd/viommu: Allow driver-specific user data for a vIOMMU object Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 07/29] iommufd/selftest: Support user_data in mock_viommu_alloc Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 08/29] iommufd/selftest: Add coverage for viommu data Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 09/29] iommufd: Do not unmap an owned iopt_area Nicolin Chen
2025-05-23  7:53   ` Tian, Kevin
2025-05-23 21:38     ` Nicolin Chen
2025-05-24  3:30   ` Nicolin Chen
2025-05-28 17:08   ` Jason Gunthorpe
2025-05-28 18:07     ` Nicolin Chen
2025-06-05  4:30     ` Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 10/29] iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers Nicolin Chen
2025-05-28 17:17   ` Jason Gunthorpe
2025-06-05  4:11     ` Nicolin Chen
2025-06-05 15:16       ` Jason Gunthorpe
2025-06-05 17:04         ` Nicolin Chen
2025-06-05 19:40           ` Jason Gunthorpe
2025-06-06  4:46             ` Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 11/29] iommufd/driver: Let iommufd_viommu_alloc helper save ictx to viommu->ictx Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 12/29] iommufd/viommu: Add driver-allocated vDEVICE support Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 13/29] iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct Nicolin Chen
2025-05-23  7:55   ` Tian, Kevin
2025-05-23 21:45     ` Nicolin Chen
2025-05-28  8:12       ` Tian, Kevin
2025-05-28 18:01         ` Nicolin Chen
2025-05-30 16:07   ` Jason Gunthorpe
2025-05-30 16:33     ` Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 14/29] iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl Nicolin Chen
2025-05-23  8:00   ` Tian, Kevin
2025-05-24  0:30     ` Nicolin Chen
2025-05-30 16:14   ` Jason Gunthorpe
2025-05-30 17:38     ` Nicolin Chen
2025-05-30 17:40       ` Jason Gunthorpe
2025-05-30 18:23         ` Nicolin Chen
2025-05-30 18:25           ` Jason Gunthorpe
2025-05-30 18:39             ` Nicolin Chen
2025-06-03  5:41     ` Nicolin Chen
2025-06-03 12:24       ` Jason Gunthorpe
2025-05-18  3:21 ` [PATCH v5 15/29] iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 16/29] iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 17/29] iommufd: Add mmap interface Nicolin Chen
2025-05-30 16:29   ` Jason Gunthorpe
2025-05-30 16:59     ` Nicolin Chen
2025-05-30 17:12       ` Jason Gunthorpe
2025-05-18  3:21 ` [PATCH v5 18/29] iommufd/selftest: Add coverage for the new " Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 19/29] Documentation: userspace-api: iommufd: Update HW QUEUE Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 20/29] iommu: Allow an input type in hw_info op Nicolin Chen
2025-05-23  8:04   ` Tian, Kevin
2025-05-30 16:52   ` Jason Gunthorpe
2025-05-18  3:21 ` [PATCH v5 21/29] iommufd: Allow an input data_type via iommu_hw_info Nicolin Chen
2025-05-23  8:06   ` Tian, Kevin
2025-05-30 16:52   ` Jason Gunthorpe
2025-05-30 17:11     ` Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 22/29] iommufd/selftest: Update hw_info coverage for an input data_type Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 23/29] iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 24/29] iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops Nicolin Chen
2025-05-30 16:57   ` Jason Gunthorpe
2025-05-18  3:21 ` [PATCH v5 25/29] iommu/tegra241-cmdqv: Use request_threaded_irq Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 26/29] iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 27/29] iommu/tegra241-cmdqv: Do not statically map LVCMDQs Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 28/29] iommu/tegra241-cmdqv: Add user-space use support Nicolin Chen
2025-05-30 17:10   ` Jason Gunthorpe
2025-05-30 17:19     ` Nicolin Chen
2025-05-18  3:21 ` [PATCH v5 29/29] iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support Nicolin Chen
2025-05-30 17:09   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1747537752.git.nicolinc@nvidia.com \
    --to=nicolinc@nvidia.com \
    --cc=alok.a.tiwari@oracle.com \
    --cc=bagasdotme@gmail.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=jonathanh@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jsnitsel@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=mshavit@google.com \
    --cc=nathan@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=praan@google.com \
    --cc=robin.murphy@arm.com \
    --cc=shuah@kernel.org \
    --cc=thierry.reding@gmail.com \
    --cc=vasant.hegde@amd.com \
    --cc=vdumpa@nvidia.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhangzekun11@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.