From: Jason Gunthorpe <jgg@ziepe.ca>
To: Zong Li <zong.li@sifive.com>
Cc: joro@8bytes.org, will@kernel.org, robin.murphy@arm.com,
tjeznach@rivosinc.com, paul.walmsley@sifive.com,
palmer@dabbelt.com, aou@eecs.berkeley.edu, kevin.tian@intel.com,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-riscv@lists.infradead.org
Subject: Re: [RFC PATCH v2 07/10] iommu/riscv: support nested iommu for creating domains owned by userspace
Date: Wed, 19 Jun 2024 13:02:11 -0300 [thread overview]
Message-ID: <20240619160211.GO1091770@ziepe.ca> (raw)
In-Reply-To: <20240614142156.29420-8-zong.li@sifive.com>
On Fri, Jun 14, 2024 at 10:21:53PM +0800, Zong Li wrote:
> This patch implements .domain_alloc_user operation for creating domains
> owend by userspace, e.g. through IOMMUFD. Add s2 domain for parent
> domain for second stage, s1 domain will be the first stage.
>
> Don't remove IOMMU private data of dev in blocked domain, because it
> holds the user data of device, which is used when attaching device into
> s1 domain.
>
> Signed-off-by: Zong Li <zong.li@sifive.com>
> ---
> drivers/iommu/riscv/iommu.c | 236 ++++++++++++++++++++++++++++++++++-
> include/uapi/linux/iommufd.h | 17 +++
> 2 files changed, 252 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 2130106e421f..410b236e9b24 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -846,6 +846,8 @@ static int riscv_iommu_iodir_set_mode(struct riscv_iommu_device *iommu,
>
> /* This struct contains protection domain specific IOMMU driver data. */
> struct riscv_iommu_domain {
> + struct riscv_iommu_domain *s2;
> + struct riscv_iommu_device *iommu;
IMHO you should create a riscv_iommu_domain_nested and not put these
here, like ARM did.
The kernel can't change the nested domain so it can't recieve and
distribute invalidations.
> +/**
> + * Nested IOMMU operations
> + */
> +
> +static int riscv_iommu_attach_dev_nested(struct iommu_domain *domain, struct device *dev)
> +{
> + struct riscv_iommu_domain *riscv_domain = iommu_domain_to_riscv(domain);
> + struct riscv_iommu_device *iommu = dev_to_iommu(dev);
> + struct riscv_iommu_info *info = dev_iommu_priv_get(dev);
> +
> + /*
> + * Add bond to the new domain's list, but don't unlink in current domain.
> + * We need to flush entries in stage-2 page table by iterating the list.
> + */
> + if (riscv_iommu_bond_link(riscv_domain, dev))
> + return -ENOMEM;
> +
> + riscv_iommu_iotlb_inval(riscv_domain, 0, ULONG_MAX);
> + info->dc_user.ta |= RISCV_IOMMU_PC_TA_V;
Seems odd??
> + riscv_iommu_iodir_update(iommu, dev, &info->dc_user);
This will be need some updating to allow changes that don't toggle
V=0, like in arm.
> + info->domain = riscv_domain;
> +
> + return 0;
> +}
> +
> +static void riscv_iommu_domain_free_nested(struct iommu_domain *domain)
> +{
> + struct riscv_iommu_domain *riscv_domain = iommu_domain_to_riscv(domain);
> + struct riscv_iommu_bond *bond;
> +
> + /* Unlink bond in s2 domain, because we link bond both on s1 and s2 domain */
> + list_for_each_entry_rcu(bond, &riscv_domain->s2->bonds, list)
> + riscv_iommu_bond_unlink(riscv_domain->s2, bond->dev);
> +
> + if ((int)riscv_domain->pscid > 0)
> + ida_free(&riscv_iommu_pscids, riscv_domain->pscid);
> +
> + kfree(riscv_domain);
> +}
> +
> +static const struct iommu_domain_ops riscv_iommu_nested_domain_ops = {
> + .attach_dev = riscv_iommu_attach_dev_nested,
> + .free = riscv_iommu_domain_free_nested,
> +};
> +
> +static int
> +riscv_iommu_get_dc_user(struct device *dev, struct iommu_hwpt_riscv_iommu *user_arg)
> +{
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> + struct riscv_iommu_device *iommu = dev_to_iommu(dev);
> + struct riscv_iommu_info *info = dev_iommu_priv_get(dev);
> + struct riscv_iommu_dc dc;
> + struct riscv_iommu_fq_record event;
> + u64 dc_len = sizeof(struct riscv_iommu_dc) >>
> + (!(iommu->caps & RISCV_IOMMU_CAPABILITIES_MSI_FLAT));
> + u64 event_len = sizeof(struct riscv_iommu_fq_record);
> + void __user *event_user = NULL;
> +
> + for (int i = 0; i < fwspec->num_ids; i++) {
> + event.hdr =
> + FIELD_PREP(RISCV_IOMMU_FQ_HDR_CAUSE, RISCV_IOMMU_FQ_CAUSE_DDT_INVALID) |
> + FIELD_PREP(RISCV_IOMMU_FQ_HDR_DID, fwspec->ids[i]);
> +
> + /* Sanity check DC of stage-1 from user data */
> + if (!user_arg->out_event_uptr || user_arg->event_len != event_len)
> + return -EINVAL;
This is not extensible, see below about just inlining it.
> + event_user = u64_to_user_ptr(user_arg->out_event_uptr);
> +
> + if (!user_arg->dc_uptr || user_arg->dc_len != dc_len)
> + return -EINVAL;
> +
> + if (copy_from_user(&dc, u64_to_user_ptr(user_arg->dc_uptr), dc_len))
> + return -EFAULT;
> +
> + if (!(dc.tc & RISCV_IOMMU_DDTE_V)) {
> + dev_dbg(dev, "Invalid DDT from user data\n");
> + if (copy_to_user(event_user, &event, event_len))
> + return -EFAULT;
> + }
On ARM we are going to support non-valid STEs. It should put the the
translation into blocking and ideally emulate translation failure
events.
> +
> + if (!dc.fsc || dc.iohgatp) {
> + dev_dbg(dev, "Wrong page table from user data\n");
> + if (copy_to_user(event_user, &event, event_len))
> + return -EFAULT;
> + }
> +
> + /* Save DC of stage-1 from user data */
> + memcpy(&info->dc_user,
> + riscv_iommu_get_dc(iommu, fwspec->ids[i]),
This does not seem right, the fwspec shouldn't be part of domain
allocation, even arguably for nesting. The nesting domain should
represent the user_dc only. Any customization of kernel controlled bits
should be done during attach only, nor do I really understand why this
is looping over all the fwspecs but only memcpying the last one..
> + sizeof(struct riscv_iommu_dc));
> + info->dc_user.fsc = dc.fsc;
> + }
> +
> + return 0;
> +}
> +
> +static struct iommu_domain *
> +riscv_iommu_domain_alloc_nested(struct device *dev,
> + struct iommu_domain *parent,
> + const struct iommu_user_data *user_data)
> +{
> + struct riscv_iommu_domain *s2_domain = iommu_domain_to_riscv(parent);
> + struct riscv_iommu_domain *s1_domain;
> + struct riscv_iommu_device *iommu = dev_to_iommu(dev);
> + struct iommu_hwpt_riscv_iommu arg;
> + int ret, va_bits;
> +
> + if (user_data->type != IOMMU_HWPT_DATA_RISCV_IOMMU)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + if (parent->type != IOMMU_DOMAIN_UNMANAGED)
> + return ERR_PTR(-EINVAL);
> +
> + ret = iommu_copy_struct_from_user(&arg,
> + user_data,
> + IOMMU_HWPT_DATA_RISCV_IOMMU,
> + out_event_uptr);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + s1_domain = kzalloc(sizeof(*s1_domain), GFP_KERNEL);
> + if (!s1_domain)
> + return ERR_PTR(-ENOMEM);
> +
> + spin_lock_init(&s1_domain->lock);
> + INIT_LIST_HEAD_RCU(&s1_domain->bonds);
> +
> + s1_domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
> + RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
> + if (s1_domain->pscid < 0) {
> + iommu_free_page(s1_domain->pgd_root);
> + kfree(s1_domain);
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + /* Get device context of stage-1 from user*/
> + ret = riscv_iommu_get_dc_user(dev, &arg);
> + if (ret) {
> + kfree(s1_domain);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + if (!iommu) {
> + va_bits = VA_BITS;
> + } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) {
> + va_bits = 57;
> + } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48) {
> + va_bits = 48;
> + } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39) {
> + va_bits = 39;
> + } else {
> + dev_err(dev, "cannot find supported page table mode\n");
> + return ERR_PTR(-ENODEV);
> + }
> +
> + /*
> + * The ops->domain_alloc_user could be directly called by the iommufd core,
> + * instead of iommu core. So, this function need to set the default value of
> + * following data member:
> + * - domain->pgsize_bitmap
> + * - domain->geometry
> + * - domain->type
> + * - domain->ops
> + */
> + s1_domain->s2 = s2_domain;
> + s1_domain->iommu = iommu;
> + s1_domain->domain.type = IOMMU_DOMAIN_NESTED;
> + s1_domain->domain.ops = &riscv_iommu_nested_domain_ops;
> + s1_domain->domain.pgsize_bitmap = SZ_4K;
> + s1_domain->domain.geometry.aperture_start = 0;
> + s1_domain->domain.geometry.aperture_end = DMA_BIT_MASK(va_bits - 1);
> + s1_domain->domain.geometry.force_aperture = true;
There is no geometry or page size of nesting domains.
> +
> + return &s1_domain->domain;
> +}
> +
> +static struct iommu_domain *
> +riscv_iommu_domain_alloc_user(struct device *dev, u32 flags,
> + struct iommu_domain *parent,
> + const struct iommu_user_data *user_data)
> +{
> + struct iommu_domain *domain;
> + struct riscv_iommu_domain *riscv_domain;
> +
> + /* Allocate stage-1 domain if it has stage-2 parent domain */
> + if (parent)
> + return riscv_iommu_domain_alloc_nested(dev, parent, user_data);
> +
> + if (flags & ~((IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING)))
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + if (user_data)
> + return ERR_PTR(-EINVAL);
> +
> + /* domain_alloc_user op needs to be fully initialized */
> + domain = iommu_domain_alloc(dev->bus);
Please organize your driver to avoid calling this core function
through a pointer and return the correct type from the start so you
don't have to cast.
> + if (!domain)
> + return ERR_PTR(-ENOMEM);
> +
> + /*
> + * We assume that nest-parent or g-stage only will come here
> + * TODO: Shadow page table doesn't be supported now.
> + * We currently can't distinguish g-stage and shadow
> + * page table here. Shadow page table shouldn't be
> + * put at stage-2.
> + */
> + riscv_domain = iommu_domain_to_riscv(domain);
> +
> + /* pgd_root may be allocated in .domain_alloc_paging */
> + if (riscv_domain->pgd_root)
> + iommu_free_page(riscv_domain->pgd_root);
And don't do stuff like this, if domain_alloc didn't do the right
stuff then reorganize it so that it does. Most likely pass in a flag
that you are allocating the nest so it can setup properly if it is
only a small change like this.
> +/**
> + * struct iommu_hwpt_riscv_iommu - RISCV IOMMU stage-1 device context table
> + * info (IOMMU_HWPT_TYPE_RISCV_IOMMU)
> + * @dc_len: Length of device context
> + * @dc_uptr: User pointer to the address of device context
> + * @event_len: Length of an event record
> + * @out_event_uptr: User pointer to the address of event record
> + */
> +struct iommu_hwpt_riscv_iommu {
> + __aligned_u64 dc_len;
> + __aligned_u64 dc_uptr;
Do we really want this to be a pointer? ARM just inlined it in the
struct, why not do that?
> + __aligned_u64 event_len;
> + __aligned_u64 out_event_uptr;
> +};
Similar here too, why not just inline the response memory?
Jason
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2024-06-19 16:02 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-14 14:21 [RFC PATCH v2 00/10] RISC-V IOMMU HPM and nested IOMMU support Zong Li
2024-06-14 14:21 ` [RFC PATCH v2 01/10] iommu/riscv: add RISC-V IOMMU PMU support Zong Li
2024-06-17 14:55 ` Jason Gunthorpe
2024-06-18 1:14 ` Zong Li
2024-06-14 14:21 ` [RFC PATCH v2 02/10] iommu/riscv: support HPM and interrupt handling Zong Li
2024-12-10 7:54 ` [External] " yunhui cui
2024-12-10 8:48 ` Xu Lu
2024-12-27 8:37 ` Zong Li
2025-09-01 13:36 ` [RFC PATCH v2 00/10] RISC-V IOMMU HPM and nested IOMMU support niliqiang
2025-09-02 4:01 ` Zong Li
2024-06-14 14:21 ` [RFC PATCH v2 03/10] iommu/riscv: use data structure instead of individual values Zong Li
2024-06-14 14:21 ` [RFC PATCH v2 04/10] iommu/riscv: add iotlb_sync_map operation support Zong Li
2024-06-15 3:14 ` Baolu Lu
2024-06-17 13:43 ` Zong Li
2024-06-17 14:39 ` Jason Gunthorpe
2024-06-18 3:01 ` Zong Li
2024-06-18 13:31 ` Jason Gunthorpe
2024-06-14 14:21 ` [RFC PATCH v2 05/10] iommu/riscv: support GSCID and GVMA invalidation command Zong Li
2024-06-14 14:21 ` [RFC PATCH v2 06/10] iommu/riscv: support nested iommu for getting iommu hardware information Zong Li
2024-06-19 15:49 ` Jason Gunthorpe
2024-06-21 7:32 ` Zong Li
2024-06-14 14:21 ` [RFC PATCH v2 07/10] iommu/riscv: support nested iommu for creating domains owned by userspace Zong Li
2024-06-19 16:02 ` Jason Gunthorpe [this message]
2024-06-28 9:03 ` Zong Li
2024-06-28 22:32 ` Jason Gunthorpe
2024-06-19 16:34 ` Joao Martins
2024-06-21 7:34 ` Zong Li
2024-06-14 14:21 ` [RFC PATCH v2 08/10] iommu/riscv: support nested iommu for flushing cache Zong Li
2024-06-15 3:22 ` Baolu Lu
2024-06-17 2:16 ` Zong Li
2024-06-19 16:17 ` Jason Gunthorpe
2024-06-28 8:19 ` Zong Li
2024-06-28 22:26 ` Jason Gunthorpe
2024-06-14 14:21 ` [RFC PATCH v2 09/10] iommu/dma: Support MSIs through nested domains Zong Li
2024-06-14 18:12 ` Nicolin Chen
2024-06-17 2:15 ` Zong Li
2024-06-14 14:21 ` [RFC PATCH v2 10/10] iommu:riscv: support nested iommu for get_msi_mapping_domain operation Zong Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240619160211.GO1091770@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=aou@eecs.berkeley.edu \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=robin.murphy@arm.com \
--cc=tjeznach@rivosinc.com \
--cc=will@kernel.org \
--cc=zong.li@sifive.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).