From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4340F3385B6 for ; Fri, 24 Apr 2026 08:50:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777020621; cv=none; b=gVKnZxi6D71ZHnew2jca23KZ1OzoNqWZPO8Gq1/DPd5wDR0RDYV32Z32oRs1Kp/eqmgkGR3qMA7m1MijPFToy+AjMyvvyfnXcz/c3fl9r7azg7Dpbjx82IyL6rLWQvpO2oRyINDgcZ3aQFAUen/v2BDRbQwbeYLqFDumWE5MaZ0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777020621; c=relaxed/simple; bh=QjYFSBeCbFK99m1dIJR+XgEazeT5WRvnB7NyWckUAM0=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=HvmmA0Uni40b7CMmKP5nVuKC5WNmA08DiXI3o4IJG9ZIbP+Zxt2uaLJmee85X20UQBF1xoCvfh0PQl/j+mZ5cPwG3B9ycRSep4ugX8D472CHJW0DVra23k4KeDEC7p8KFopXorXWfbfGEdUwQgBEpI8B8jr3ng3Y3ku8BN7/IUc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--joonwonkang.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=FGFdcrFc; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--joonwonkang.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="FGFdcrFc" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c70ea91bfe1so3859318a12.1 for ; Fri, 24 Apr 2026 01:50:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1777020618; x=1777625418; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=i9nRQWrSO8yCuAghUfsOuR4TVxWLZXmy5/OuspGEKv4=; b=FGFdcrFc5iGNOjUj1jMheA0JAMVh+T5QK2HT9+ZfuAM04ega+GTFWu4l4YC4yiLUQj 6fEAos7R+j6cy+CFBy94HgtfTcuZb06kANbV5jbfx4TGv0bxCW+ZOlzAUUGE0UlYejj2 XKJVL1pGKGcO2m59CMA9jA+UU8LXf2l8x+P6h0OL9Z/Ng/YPidLQY3MPUbWrwA49TWfu WJ7dGVTn8pXaEp/kumh/jXo7QRiv1+XzbrcytA0BFreM/2St9tkydYWv3PuE0OSSwoyb W28D+GAF2cFifrUzBRFb2v7uBRPpreaXE3K9v3fEhNX2nM61yvfHhW1Ng4zbUGPfKWGx 01nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777020618; x=1777625418; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=i9nRQWrSO8yCuAghUfsOuR4TVxWLZXmy5/OuspGEKv4=; b=p1gzzjc/PY1HEXSgk14gaIwJHIUebQLUGCP6w0QA7Dhc+tsNDZgd/mdZ2UF9QS4yav rNw6/OcW5aKmrN/OK8wslOjx32AJ52LEX9HiJE74KyXMrIszBUKRDD/Q01C0YI+EhLXi JHTGOZlDfJLIi7HSbBzn+D0LDnOfj+cW5PYBXvrU4a0VI2kQsuCxKKSXE5CLCjyp81h1 KApubc2gJY7eOa0L3UyOEyqGNLMaJBwCbXk1qJ6X+Hk1n3BrbgSFU/ZZEES0ERuT010F g9Xf0Mv/n88+4njKhL+saM3ssc4r4HJemrboUC8fH9ihUXFf8egVWPOOUUwHP/Ip7inA 8IZw== X-Forwarded-Encrypted: i=1; AFNElJ+j9AKzadh3Fx0ZzeaJuEGAmov3ir8X60Si3vlJuodDbd7z9uNIiYtVn3T2HyzP0prchHkx2bgOJmZ2ibM=@vger.kernel.org X-Gm-Message-State: AOJu0YzGY6dkoyVMBl59acUfGh2j4OOBrIeSUufmDdo52Vu98SqJmP7a y6eXlfv61le1Lp0MEU74kf7I+6I36HDyFZSb/Y7KBrpaaC5aIhJK29WyRmd36lY8bkn/ZLzuGNp N9DVJ6IfodANG8e8oAjlkd8fefw== X-Received: from pgx2.prod.google.com ([2002:a63:1742:0:b0:c79:63dd:68f3]) (user=joonwonkang job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:7fa2:b0:398:9662:10ff with SMTP id adf61e73a8af0-3a08d687750mr36319637637.4.1777020617462; Fri, 24 Apr 2026 01:50:17 -0700 (PDT) Date: Fri, 24 Apr 2026 08:50:10 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.54.0.545.g6539524ca2-goog Message-ID: <20260424085011.3502295-1-joonwonkang@google.com> Subject: [PATCH RFC] iommu: Enable per-device SSID space for SVA From: Joonwon Kang To: will@kernel.org, robin.murphy@arm.com, joro@8bytes.org Cc: jgg@ziepe.ca, nicolinc@nvidia.com, praan@google.com, kees@kernel.org, amhetre@nvidia.com, Alexander.Grest@microsoft.com, baolu.lu@linux.intel.com, smostafa@google.com, linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Joonwon Kang Content-Type: text/plain; charset="UTF-8" For SVA, the IOMMU core always allocates PASID from the global PASID space. The use of this global PASID space comes from the limitation of the ENQCMD instruction in Intel CPUs that it fetches its PASID operand from IA32_PASID, which is per-task. Due to this nature, SVA with ARM SMMU v3 has been found not working in our environment when other modules/devices compete for PASID. The environment looks as follows: - The device is not a PCIe device. - The device is to use SVA. - The supported SSID/PASID space is very small for the device; only 1 to 3 SSIDs are supported. - There is a custom way of transmitting the SSID from the kernel to the device. With this setup, when other modules have allocated all the PASIDs that our device is expected to use from the global PASID space via APIs like iommu_alloc_global_pasid() or iommu_sva_bind_device(), SVA binding to our device fails due to the lack of available PASIDs. Since SSID/PASID is supported per-SID in ARM SMMU v3, this commit leverages the fact and lifts the use of the global PASID space if possible. What it does includes: - Introduce a new IOMMU capability IOMMU_CAP_PER_DEV_PASID_SPACE, which represents whether the IOMMU supports an independent PASID space per- device, not shared across devices. ARM SMMU v3 is the case. - Open a new API iommu_attach_device_pasid_any() to allocate any available PASID and attach an IOMMU domain to it. - Opt out the use of the global PASID space for SVA if the IOMMU has that capability, and use the new API to allocate a PASID in that case. Signed-off-by: Joonwon Kang --- v1: Request comments for this approach, other possible approaches and/or other aspects to consider more. Code is not sanitized and commits are not separated appropriately in this version. drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 + drivers/iommu/iommu-sva.c | 44 +++++++---- drivers/iommu/iommu.c | 85 ++++++++++++++++++++- include/linux/iommu.h | 5 ++ 4 files changed, 121 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 4d00d796f078..3a700ab0b5c7 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2494,6 +2494,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) return true; case IOMMU_CAP_DIRTY_TRACKING: return arm_smmu_dbm_capable(master->smmu); + case IOMMU_CAP_PER_DEV_PASID_SPACE: + return true; default: return false; } diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 07d64908a05f..637d8fd29cbf 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -21,6 +21,7 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de { struct iommu_mm_data *iommu_mm; ioasid_t pasid; + const struct iommu_ops *ops = dev_iommu_ops(dev); lockdep_assert_held(&iommu_sva_lock); @@ -39,11 +40,18 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de if (!iommu_mm) return ERR_PTR(-ENOMEM); - pasid = iommu_alloc_global_pasid(dev); - if (pasid == IOMMU_PASID_INVALID) { - kfree(iommu_mm); - return ERR_PTR(-ENOSPC); + if (ops->capable && ops->capable(dev, IOMMU_CAP_PER_DEV_PASID_SPACE)) { + pasid = IOMMU_NO_PASID; + iommu_mm->pasid_global = false; + } else { + pasid = iommu_alloc_global_pasid(dev); + if (pasid == IOMMU_PASID_INVALID) { + kfree(iommu_mm); + return ERR_PTR(-ENOSPC); + } + iommu_mm->pasid_global = true; } + iommu_mm->pasid = pasid; iommu_mm->mm = mm; INIT_LIST_HEAD(&iommu_mm->sva_domains); @@ -114,13 +122,15 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm goto out_unlock; } - /* Search for an existing domain. */ - list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) { - ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid, - &handle->handle); - if (!ret) { - domain->users++; - goto out; + if (iommu_mm->pasid != IOMMU_NO_PASID) { + /* Search for an existing domain. */ + list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) { + ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid, + &handle->handle); + if (!ret) { + domain->users++; + goto out; + } } } @@ -131,8 +141,13 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm goto out_free_handle; } - ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid, - &handle->handle); + if (iommu_mm->pasid != IOMMU_NO_PASID) { + ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid, + &handle->handle); + } else { + ret = iommu_attach_device_pasid_any(domain, dev, &iommu_mm->pasid, + &handle->handle); + } if (ret) goto out_free_domain; domain->users = 1; @@ -211,7 +226,8 @@ void mm_pasid_drop(struct mm_struct *mm) if (!iommu_mm) return; - iommu_free_global_pasid(iommu_mm->pasid); + if (iommu_mm->pasid_global) + iommu_free_global_pasid(iommu_mm->pasid); kfree(iommu_mm); } diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 35db51780954..b882ecad7f57 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1061,7 +1061,7 @@ struct iommu_group *iommu_group_alloc(void) mutex_init(&group->mutex); INIT_LIST_HEAD(&group->devices); INIT_LIST_HEAD(&group->entry); - xa_init(&group->pasid_array); + xa_init_flags(&group->pasid_array, XA_FLAGS_ALLOC); ret = ida_alloc(&iommu_group_ida, GFP_KERNEL); if (ret < 0) { @@ -3619,6 +3619,89 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_attach_device_pasid); +/** + * iommu_attach_device_pasid_any() - Allocate a pasid of device and attach a + * domain to it + * @domain: the iommu domain. + * @dev: the attached device. + * @pasid: pointer to the pasid of the device to be allocated. + * @handle: the attach handle. + * + * Caller should always provide a new handle to avoid race with the paths + * that have lockless reference to handle if it intends to pass a valid handle. + * + * Return: 0 on success, or an error. + */ +int iommu_attach_device_pasid_any(struct iommu_domain *domain, + struct device *dev, + ioasid_t *pasid, + struct iommu_attach_handle *handle) +{ + /* Caller must be a probed driver on dev */ + struct iommu_group *group = dev->iommu_group; + const struct iommu_ops *ops; + void *entry; + u32 new_pasid; + int ret; + + if (!group) + return -ENODEV; + + ops = dev_iommu_ops(dev); + + if (!domain->ops->set_dev_pasid || + !ops->blocked_domain || + !ops->blocked_domain->ops->set_dev_pasid) + return -EOPNOTSUPP; + + if (!domain_iommu_ops_compatible(ops, domain) || !pasid) + return -EINVAL; + + mutex_lock(&group->mutex); + + /* + * This is a concurrent attach during a device reset. Reject it until + * pci_dev_reset_iommu_done() attaches the device to group->domain. + */ + if (group->resetting_domain) { + ret = -EBUSY; + goto out_unlock; + } + + entry = iommu_make_pasid_array_entry(domain, handle); + + struct xa_limit limit = { + .min = IOMMU_FIRST_GLOBAL_PASID, + .max = dev->iommu->max_pasids - 1, + }; + + ret = xa_alloc(&group->pasid_array, &new_pasid, XA_ZERO_ENTRY, limit, GFP_KERNEL); + if (ret) + goto out_unlock; + + ret = __iommu_set_group_pasid(domain, group, new_pasid, NULL); + if (ret) { + xa_release(&group->pasid_array, new_pasid); + goto out_unlock; + } + + /* + * The xa_insert() above reserved the memory, and the group->mutex is + * held, this cannot fail. The new domain cannot be visible until the + * operation succeeds as we cannot tolerate PRIs becoming concurrently + * queued and then failing attach. + */ + WARN_ON(xa_is_err(xa_store(&group->pasid_array, + new_pasid, entry, GFP_KERNEL))); + + *pasid = new_pasid; + +out_unlock: + mutex_unlock(&group->mutex); + return ret; +} +EXPORT_SYMBOL_GPL(iommu_attach_device_pasid_any); + /** * iommu_replace_device_pasid - Replace the domain that a specific pasid * of the device is attached to diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 54b8b48c762e..1665f9fe1d8a 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -271,6 +271,7 @@ enum iommu_cap { */ IOMMU_CAP_DEFERRED_FLUSH, IOMMU_CAP_DIRTY_TRACKING, /* IOMMU supports dirty tracking */ + IOMMU_CAP_PER_DEV_PASID_SPACE, /* IOMMU supports per-device PASID space */ }; /* These are the possible reserved region types */ @@ -1136,6 +1137,7 @@ struct iommu_sva { struct iommu_mm_data { u32 pasid; + bool pasid_global; struct mm_struct *mm; struct list_head sva_domains; struct list_head mm_list_elm; @@ -1184,6 +1186,9 @@ void iommu_device_release_dma_owner(struct device *dev); int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid, struct iommu_attach_handle *handle); +int iommu_attach_device_pasid_any(struct iommu_domain *domain, + struct device *dev, ioasid_t *pasid, + struct iommu_attach_handle *handle); void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid); ioasid_t iommu_alloc_global_pasid(struct device *dev); -- 2.54.0.545.g6539524ca2-goog