From: Nicolin Chen <nicolinc@nvidia.com>
To: Pranjal Shrivastava <praan@google.com>
Cc: <jgg@nvidia.com>, <kevin.tian@intel.com>, <corbet@lwn.net>,
<will@kernel.org>, <bagasdotme@gmail.com>, <robin.murphy@arm.com>,
<joro@8bytes.org>, <thierry.reding@gmail.com>,
<vdumpa@nvidia.com>, <jonathanh@nvidia.com>, <shuah@kernel.org>,
<jsnitsel@redhat.com>, <nathan@kernel.org>,
<peterz@infradead.org>, <yi.l.liu@intel.com>,
<mshavit@google.com>, <zhangzekun11@huawei.com>,
<iommu@lists.linux.dev>, <linux-doc@vger.kernel.org>,
<linux-kernel@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>,
<linux-tegra@vger.kernel.org>, <linux-kselftest@vger.kernel.org>,
<patches@lists.linux.dev>, <mochs@nvidia.com>,
<alok.a.tiwari@oracle.com>, <vasant.hegde@amd.com>
Subject: Re: [PATCH v2 21/22] iommu/tegra241-cmdqv: Add user-space use support
Date: Wed, 30 Apr 2025 15:39:15 -0700 [thread overview]
Message-ID: <aBKmk6PNFreeyfLh@Asurada-Nvidia> (raw)
In-Reply-To: <aBKdMaFLPFJYegIS@google.com>
On Wed, Apr 30, 2025 at 09:59:13PM +0000, Pranjal Shrivastava wrote:
> On Fri, Apr 25, 2025 at 10:58:16PM -0700, Nicolin Chen wrote:
> > The CMDQV HW supports a user-space use for virtualization cases. It allows
> > the VM to issue guest-level TLBI or ATC_INV commands directly to the queue
> > and executes them without a VMEXIT, as HW will replace the VMID field in a
> > TLBI command and the SID field in an ATC_INV command with the preset VMID
> > and SID.
> >
> > This is built upon the vIOMMU infrastructure by allowing VMM to allocate a
> > VINTF (as a vIOMMU object) and assign VCMDQs (vCMDQ objects) to the VINTF.
> >
> > So firstly, replace the standard vSMMU model with the VINTF implementation
> > but reuse the standard cache_invalidate op (for unsupported commands) and
> > the standard alloc_domain_nested op (for standard nested STE).
> >
> > Each VINTF has two 64KB MMIO pages (128B per logical vCMDQ):
> > - Page0 (directly accessed by guest) has all the control and status bits.
> > - Page1 (trapped by VMM) has guest-owned queue memory location/size info.
> >
> > VMM should trap the emulated VINTF0's page1 of the guest VM for the guest-
> > level VCMDQ location/size info and forward that to the kernel to translate
> > to a physical memory location to program the VCMDQ HW during an allocation
> > call. Then, it should mmap the assigned VINTF's page0 to the VINTF0 page0
> > of the guest VM. This allows the guest OS to read and write the guest-own
> > VINTF's page0 for direct control of the VCMDQ HW.
> >
> > For ATC invalidation commands that hold an SID, it requires all devices to
> > register their virtual SIDs to the SID_MATCH registers and their physical
> > SIDs to the pairing SID_REPLACE registers, so that HW can use those as a
> > lookup table to replace those virtual SIDs with the correct physical SIDs.
> > Thus, implement the driver-allocated vDEVICE op with a tegra241_vintf_sid
> > structure to allocate SID_REPLACE and to program the SIDs accordingly.
> >
> > This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to
> > the standard SMMUv3 operating in the nested translation mode trapping CMDQ
> > for TLBI and ATC_INV commands, this gives a huge performance improvement:
> > 70% to 90% reductions of invalidation time were measured by various DMA
> > unmap tests running in a guest OS.
> >
>
> The write-up is super helpful to understand how the HW works from a high
> level. Thanks for explaining this well! :)
>
> I'm curious to know the DMA unmap tests that were run for perf?
tools/testing/selftests/dma/dma_map_benchmark.c
> > /**
> > * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
> > * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
> > *
> > - * @flags: Must be set to 0
> > - * @impl: Must be 0
> > + * @flags: Combination of enum iommu_hw_info_arm_smmuv3_flags
> > + * @impl: Implementation-defined bits when the following flags are set:
> > + * - IOMMU_HW_INFO_ARM_SMMUV3_HAS_TEGRA241_CMDQV
> > + * Bits[15:12] - Log2 of the total number of SID replacements
> > + * Bits[07:04] - Log2 of the total number of vCMDQs per vIOMMU
> > + * Bits[03:00] - Version number for the CMDQ-V HW
>
> Nit: It seems that we deliberately chose not to reveal `NUM_VINTF_LOG2`
> to the user-space. If so, maybe we shall mark those bitfields as unused
> or reserved for clarity? Bits[11:08] - Reserved / Unused (even 31:16).
I think it should have been there, but kernel should just report 0.
Bits[11:08] - Log2 of the total number of virtual interface
> > * @idr: Implemented features for ARM SMMU Non-secure programming interface
> > * @iidr: Information about the implementation and implementer of ARM SMMU,
> > * and architecture version supported
> > @@ -952,10 +965,28 @@ struct iommu_fault_alloc {
> > * enum iommu_viommu_type - Virtual IOMMU Type
> > * @IOMMU_VIOMMU_TYPE_DEFAULT: Reserved for future use
> > * @IOMMU_VIOMMU_TYPE_ARM_SMMUV3: ARM SMMUv3 driver specific type
> > + * @IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV Extension for SMMUv3
> > */
> > enum iommu_viommu_type {
> > IOMMU_VIOMMU_TYPE_DEFAULT = 0,
> > IOMMU_VIOMMU_TYPE_ARM_SMMUV3 = 1,
> > + IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV = 2,
> > +};
>
> This is a little confusing.. I understand that we need a new viommu type
> to copy the new struct iommu_viommu_tegra241_cmdqv b/w the user & kernel
>
> But, in a previous patch (Add vsmmu_alloc impl op), we add a check to
> fallback to the standard type SMMUv3, if the impl_ops->vsmmu_alloc
> returns -EOPNOTSUPP:
>
> if (master->smmu->impl_ops && master->smmu->impl_ops->vsmmu_alloc)
> vsmmu = master->smmu->impl_ops->vsmmu_alloc(
> master->smmu, s2_parent, ictx, viommu_type, user_data);
> if (PTR_ERR(vsmmu) == -EOPNOTSUPP) {
> if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3)
> return ERR_PTR(-EOPNOTSUPP);
> /* Fallback to standard SMMUv3 type if viommu_type matches */
> vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core,
> &arm_vsmmu_ops);
>
> Now, if we'll ALWAYS try to allocate an impl-specified vsmmu first, even
> when the viommu_type == IOMMU_VIOMMU_TYPE_ARM_SMMUV3, we are anyways
> going to return back from the impl_ops->vsmmu_alloc with -EOPNOTSUPP.
That's not necessarily true. An impl_ops->vsmmu_alloc can support
IOMMU_VIOMMU_TYPE_ARM_SMMUV3 potentially, e.g. an impl could just
toggle a few special bits in a register and return a valid vsmmu
pointer.
It doesn't work like this with VCMDQ as it supports its own type,
but for the long run I think we should pass in the standard type
to impl_ops->vsmmu_alloc too.
> Then we'll again check if the retval was -EOPNOTSUPP and re-check the
> viommu_type requested.. which seems a little counter intuitive.
It's just prioritizing the impl_ops->vsmmu_alloc. Similar to the
probe, if VCMDQ is missing or encountering some initialization
problem, give it a chance to fallback to the standard SMMU.
> > + /*
> > + * @length must be a power of 2, in range of
> > + * [ 32, 1 ^ (idr[1].CMDQS + CMDQ_ENT_SZ_SHIFT) ]
> > + */
>
> Nit: 2 ^ (idr[1].CMDQS + CMDQ_ENT_SZ_SHIFT) to match the comment in uapi
Alok pointed it out too. Fixed.
> > + vcmdq = iommufd_vcmdq_alloc(viommu, struct tegra241_vcmdq, core);
> > + if (!vcmdq)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + /*
> > + * HW requires to unmap LVCMDQs in descending order, so destroy() must
> > + * follow this rule. Set a dependency on its previous LVCMDQ so iommufd
> > + * core will help enforce it.
> > + */
> > + if (prev) {
> > + ret = iommufd_vcmdq_depend(vcmdq, prev, core);
> > + if (ret)
> > + goto free_vcmdq;
> > + }
> > + vcmdq->prev = prev;
> > +
> > + ret = tegra241_vintf_init_lvcmdq(vintf, index, vcmdq);
> > + if (ret)
> > + goto free_vcmdq;
> > +
> > + dev_dbg(cmdqv->dev, "%sallocated\n",
> > + lvcmdq_error_header(vcmdq, header, 64));
> > +
> > + tegra241_vcmdq_map_lvcmdq(vcmdq);
> > +
> > + vcmdq->cmdq.q.q_base = q_base & VCMDQ_ADDR;
> > + vcmdq->cmdq.q.q_base |= log2size;
> > +
> > + ret = tegra241_vcmdq_hw_init_user(vcmdq);
> > + if (ret)
> > + goto free_vcmdq;
> > + vintf->lvcmdqs[index] = vcmdq;
> > +
> > + return &vcmdq->core;
> > +free_vcmdq:
> > + iommufd_struct_destroy(viommu->ictx, vcmdq, core);
> > + return ERR_PTR(ret);
>
> Are we missing an undepend here?
Right. The iommufd_struct_destroy doesn't invoke obj->ops.abort().
The whole revert flow is wonky, missing all the unmap/deinit steps.
> > +static void tegra241_vintf_destroy_vdevice(struct iommufd_vdevice *vdev)
> > +{
> > + struct tegra241_vintf_sid *vsid =
> > + container_of(vdev, struct tegra241_vintf_sid, core);
> > + struct tegra241_vintf *vintf = vsid->vintf;
> > +
> > + writel_relaxed(0, REG_VINTF(vintf, SID_REPLACE(vsid->idx)));
> > + writel_relaxed(0, REG_VINTF(vintf, SID_MATCH(vsid->idx)));
>
> Just a thought: Should these be writel to avoid races?
> Although I believe all user-queues would be free-d by this point?
Yea. They should be. I will change them.
Thanks
Nicolin
next prev parent reply other threads:[~2025-04-30 22:42 UTC|newest]
Thread overview: 146+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-26 5:57 [PATCH v2 00/22] iommufd: Add vIOMMU infrastructure (Part-4 vCMDQ) Nicolin Chen
2025-04-26 5:57 ` [PATCH v2 01/22] iommufd/viommu: Add driver-allocated vDEVICE support Nicolin Chen
2025-04-27 6:23 ` Baolu Lu
2025-04-28 0:41 ` Tian, Kevin
2025-04-28 18:08 ` Nicolin Chen
2025-04-26 5:57 ` [PATCH v2 02/22] iommu: Pass in a driver-level user data structure to viommu_alloc op Nicolin Chen
2025-04-27 6:31 ` Baolu Lu
2025-04-28 17:19 ` Nicolin Chen
2025-04-28 17:28 ` Pranjal Shrivastava
2025-04-26 5:57 ` [PATCH v2 03/22] iommufd/viommu: Allow driver-specific user data for a vIOMMU object Nicolin Chen
2025-04-27 6:36 ` Baolu Lu
2025-04-28 17:52 ` Pranjal Shrivastava
2025-04-30 14:58 ` ALOK TIWARI
2025-04-26 5:57 ` [PATCH v2 04/22] iommu: Add iommu_copy_struct_to_user helper Nicolin Chen
2025-04-27 6:39 ` Baolu Lu
2025-04-28 17:50 ` Pranjal Shrivastava
2025-04-28 18:21 ` Nicolin Chen
2025-04-29 8:31 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 05/22] iommufd: Add iommufd_struct_destroy to revert iommufd_viommu_alloc Nicolin Chen
2025-04-27 6:55 ` Baolu Lu
2025-04-28 17:24 ` Nicolin Chen
2025-04-26 5:58 ` [PATCH v2 06/22] iommufd/selftest: Support user_data in mock_viommu_alloc Nicolin Chen
2025-04-28 18:56 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 07/22] iommufd/selftest: Add covearge for viommu data Nicolin Chen
2025-04-28 19:02 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 08/22] iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers Nicolin Chen
2025-04-27 7:22 ` Baolu Lu
2025-04-28 17:41 ` Nicolin Chen
2025-05-05 15:01 ` Jason Gunthorpe
2025-05-05 15:44 ` Nicolin Chen
2025-05-05 15:55 ` Jason Gunthorpe
2025-05-05 16:03 ` Nicolin Chen
2025-05-05 16:05 ` Jason Gunthorpe
2025-05-05 16:19 ` Nicolin Chen
2025-05-05 16:56 ` Jason Gunthorpe
2025-04-28 20:14 ` Pranjal Shrivastava
2025-04-28 22:12 ` Nicolin Chen
2025-04-28 23:34 ` Nicolin Chen
2025-04-29 18:03 ` Pranjal Shrivastava
2025-05-06 9:36 ` Tian, Kevin
2025-05-06 19:17 ` Nicolin Chen
2025-05-07 7:22 ` Tian, Kevin
2025-05-07 7:36 ` Nicolin Chen
2025-05-07 7:51 ` Tian, Kevin
2025-04-26 5:58 ` [PATCH v2 09/22] iommufd/viommu: Introduce IOMMUFD_OBJ_VCMDQ and its related struct Nicolin Chen
2025-04-28 1:09 ` Baolu Lu
2025-04-28 18:10 ` Nicolin Chen
2025-05-05 15:02 ` Jason Gunthorpe
2025-05-05 15:45 ` Nicolin Chen
2025-04-28 21:01 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 10/22] iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC ioctl Nicolin Chen
2025-04-28 1:32 ` Baolu Lu
2025-04-28 18:58 ` Nicolin Chen
2025-04-29 6:11 ` Baolu Lu
2025-04-28 12:12 ` Vasant Hegde
2025-04-28 20:02 ` Nicolin Chen
2025-04-29 5:34 ` Vasant Hegde
2025-04-29 6:45 ` Nicolin Chen
2025-04-29 10:22 ` Vasant Hegde
2025-04-29 17:14 ` Nicolin Chen
2025-04-30 4:22 ` Vasant Hegde
2025-04-30 8:01 ` Nicolin Chen
2025-04-30 10:21 ` Vasant Hegde
2025-05-06 9:25 ` Tian, Kevin
2025-05-06 20:12 ` Nicolin Chen
2025-05-07 7:25 ` Tian, Kevin
2025-05-07 7:37 ` Nicolin Chen
2025-05-07 12:33 ` Jason Gunthorpe
2025-05-07 20:51 ` Nicolin Chen
2025-04-28 21:34 ` Pranjal Shrivastava
2025-04-28 22:44 ` Nicolin Chen
2025-04-29 8:28 ` Pranjal Shrivastava
2025-04-29 18:10 ` Pranjal Shrivastava
2025-04-29 18:15 ` Nicolin Chen
2025-04-29 18:57 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 11/22] iommufd: Add for-driver helpers iommufd_vcmdq_depend/undepend() Nicolin Chen
2025-04-28 2:22 ` Baolu Lu
2025-04-28 18:17 ` Nicolin Chen
2025-04-29 12:40 ` Pranjal Shrivastava
2025-04-29 17:10 ` Nicolin Chen
2025-04-29 17:59 ` Pranjal Shrivastava
2025-04-29 18:07 ` Nicolin Chen
2025-04-29 18:44 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 12/22] iommufd/selftest: Add coverage for IOMMUFD_CMD_VCMDQ_ALLOC Nicolin Chen
2025-04-26 5:58 ` [PATCH v2 13/22] iommufd: Add mmap interface Nicolin Chen
2025-04-28 2:50 ` Baolu Lu
2025-04-28 18:54 ` Nicolin Chen
2025-05-05 16:50 ` Jason Gunthorpe
2025-05-05 17:21 ` Nicolin Chen
2025-05-05 17:28 ` Jason Gunthorpe
2025-05-05 20:07 ` Nicolin Chen
2025-05-06 9:22 ` Tian, Kevin
2025-05-06 12:55 ` Jason Gunthorpe
2025-05-06 12:54 ` Jason Gunthorpe
2025-05-06 20:54 ` Nicolin Chen
2025-05-07 12:36 ` Jason Gunthorpe
2025-05-07 20:49 ` Nicolin Chen
2025-04-29 20:24 ` Pranjal Shrivastava
2025-04-29 20:34 ` Pranjal Shrivastava
2025-04-29 20:39 ` Nicolin Chen
2025-04-29 20:55 ` Pranjal Shrivastava
2025-04-29 21:05 ` Nicolin Chen
2025-04-29 21:35 ` Pranjal Shrivastava
2025-04-29 21:46 ` Nicolin Chen
2025-04-29 21:57 ` Pranjal Shrivastava
2025-05-05 16:55 ` Jason Gunthorpe
2025-05-05 17:27 ` Nicolin Chen
2025-05-05 17:31 ` Jason Gunthorpe
2025-05-05 19:50 ` Nicolin Chen
2025-05-06 12:52 ` Jason Gunthorpe
2025-05-06 19:30 ` Nicolin Chen
2025-05-07 12:39 ` Jason Gunthorpe
2025-05-07 21:09 ` Nicolin Chen
2025-05-07 22:08 ` Jason Gunthorpe
2025-05-08 3:49 ` Nicolin Chen
2025-05-08 9:15 ` Tian, Kevin
2025-05-08 12:12 ` Jason Gunthorpe
2025-05-08 17:14 ` Nicolin Chen
2025-05-05 18:47 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 14/22] iommufd/selftest: Add coverage for the new " Nicolin Chen
2025-04-26 5:58 ` [PATCH v2 15/22] Documentation: userspace-api: iommufd: Update vCMDQ Nicolin Chen
2025-04-28 14:31 ` Bagas Sanjaya
2025-04-28 19:00 ` Nicolin Chen
2025-04-26 5:58 ` [PATCH v2 16/22] iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op Nicolin Chen
2025-04-29 21:36 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 17/22] iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info Nicolin Chen
2025-04-29 21:44 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 18/22] iommu/tegra241-cmdqv: Use request_threaded_irq Nicolin Chen
2025-04-29 21:47 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 19/22] iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() Nicolin Chen
2025-04-29 22:05 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 20/22] iommu/tegra241-cmdqv: Do not statically map LVCMDQs Nicolin Chen
2025-04-29 20:43 ` ALOK TIWARI
2025-04-29 22:32 ` Pranjal Shrivastava
2025-04-29 22:37 ` Nicolin Chen
2025-04-26 5:58 ` [PATCH v2 21/22] iommu/tegra241-cmdqv: Add user-space use support Nicolin Chen
2025-04-29 19:47 ` ALOK TIWARI
2025-04-29 21:12 ` Nicolin Chen
2025-04-30 21:59 ` Pranjal Shrivastava
2025-04-30 22:39 ` Nicolin Chen [this message]
2025-05-01 0:54 ` Nicolin Chen
2025-05-01 21:46 ` Pranjal Shrivastava
2025-05-01 21:45 ` Pranjal Shrivastava
2025-04-26 5:58 ` [PATCH v2 22/22] iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support Nicolin Chen
2025-04-30 15:07 ` ALOK TIWARI
2025-04-30 22:03 ` Pranjal Shrivastava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBKmk6PNFreeyfLh@Asurada-Nvidia \
--to=nicolinc@nvidia.com \
--cc=alok.a.tiwari@oracle.com \
--cc=bagasdotme@gmail.com \
--cc=corbet@lwn.net \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=jonathanh@nvidia.com \
--cc=joro@8bytes.org \
--cc=jsnitsel@redhat.com \
--cc=kevin.tian@intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=mochs@nvidia.com \
--cc=mshavit@google.com \
--cc=nathan@kernel.org \
--cc=patches@lists.linux.dev \
--cc=peterz@infradead.org \
--cc=praan@google.com \
--cc=robin.murphy@arm.com \
--cc=shuah@kernel.org \
--cc=thierry.reding@gmail.com \
--cc=vasant.hegde@amd.com \
--cc=vdumpa@nvidia.com \
--cc=will@kernel.org \
--cc=yi.l.liu@intel.com \
--cc=zhangzekun11@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.