public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Heiko Stuebner <heiko@sntech.de>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Joerg Roedel <jroedel@suse.de>, Sasha Levin <sashal@kernel.org>,
	joro@8bytes.org, will@kernel.org, iommu@lists.linux.dev
Subject: [PATCH AUTOSEL 6.4 14/58] iommu: Make __iommu_group_set_domain() handle error unwind
Date: Sun, 23 Jul 2023 21:12:42 -0400	[thread overview]
Message-ID: <20230724011338.2298062-14-sashal@kernel.org> (raw)
In-Reply-To: <20230724011338.2298062-1-sashal@kernel.org>

From: Jason Gunthorpe <jgg@nvidia.com>

[ Upstream commit dcf40ed3a20d727be054c4a20db47b32cb5036d4 ]

Let's try to have a consistent and clear strategy for error handling
during domain attach failures.

There are two broad categories, the first is callers doing destruction and
trying to set the domain back to a previously good domain. These cases
cannot handle failure during destruction flows and must succeed, or at
least avoid a UAF on the current group->domain which is likely about to be
freed.

Many of the drivers are well behaved here and will not hit the WARN_ON's
or a UAF, but some are doing hypercalls/etc that can fail unpredictably
and don't meet the expectations.

The second case is attaching a domain for the first time in a failable
context, failure should restore the attachment back to group->domain using
the above unfailable operation.

Have __iommu_group_set_domain_internal() execute a common algorithm that
tries to achieve this, and in the worst case, would leave a device
"detached" or assigned to a global blocking domain. This relies on some
existing common driver behaviors where attach failure will also do detatch
and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever
fail.

Name the first case with __iommu_group_set_domain_nofail() to make it
clear.

Pull all the error handling and WARN_ON generation into
__iommu_group_set_domain_internal().

Avoid the obfuscating use of __iommu_group_for_each_dev() and be more
careful about what should happen during failures by only touching devices
we've already touched.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/iommu/iommu.c | 137 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 112 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f1dcfa3f1a1b4..873e66ab1e2d7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -97,8 +97,26 @@ static int __iommu_attach_device(struct iommu_domain *domain,
 				 struct device *dev);
 static int __iommu_attach_group(struct iommu_domain *domain,
 				struct iommu_group *group);
+
+enum {
+	IOMMU_SET_DOMAIN_MUST_SUCCEED = 1 << 0,
+};
+
+static int __iommu_group_set_domain_internal(struct iommu_group *group,
+					     struct iommu_domain *new_domain,
+					     unsigned int flags);
 static int __iommu_group_set_domain(struct iommu_group *group,
-				    struct iommu_domain *new_domain);
+				    struct iommu_domain *new_domain)
+{
+	return __iommu_group_set_domain_internal(group, new_domain, 0);
+}
+static void __iommu_group_set_domain_nofail(struct iommu_group *group,
+					    struct iommu_domain *new_domain)
+{
+	WARN_ON(__iommu_group_set_domain_internal(
+		group, new_domain, IOMMU_SET_DOMAIN_MUST_SUCCEED));
+}
+
 static int iommu_create_device_direct_mappings(struct iommu_group *group,
 					       struct device *dev);
 static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
@@ -2028,15 +2046,13 @@ EXPORT_SYMBOL_GPL(iommu_domain_free);
 static void __iommu_group_set_core_domain(struct iommu_group *group)
 {
 	struct iommu_domain *new_domain;
-	int ret;
 
 	if (group->owner)
 		new_domain = group->blocking_domain;
 	else
 		new_domain = group->default_domain;
 
-	ret = __iommu_group_set_domain(group, new_domain);
-	WARN(ret, "iommu driver failed to attach the default/blocking domain");
+	__iommu_group_set_domain_nofail(group, new_domain);
 }
 
 static int __iommu_attach_device(struct iommu_domain *domain,
@@ -2221,21 +2237,55 @@ int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_group);
 
-static int iommu_group_do_set_platform_dma(struct device *dev, void *data)
+static int __iommu_device_set_domain(struct iommu_group *group,
+				     struct device *dev,
+				     struct iommu_domain *new_domain,
+				     unsigned int flags)
 {
-	const struct iommu_ops *ops = dev_iommu_ops(dev);
-
-	if (!WARN_ON(!ops->set_platform_dma_ops))
-		ops->set_platform_dma_ops(dev);
+	int ret;
 
+	ret = __iommu_attach_device(new_domain, dev);
+	if (ret) {
+		/*
+		 * If we have a blocking domain then try to attach that in hopes
+		 * of avoiding a UAF. Modern drivers should implement blocking
+		 * domains as global statics that cannot fail.
+		 */
+		if ((flags & IOMMU_SET_DOMAIN_MUST_SUCCEED) &&
+		    group->blocking_domain &&
+		    group->blocking_domain != new_domain)
+			__iommu_attach_device(group->blocking_domain, dev);
+		return ret;
+	}
 	return 0;
 }
 
-static int __iommu_group_set_domain(struct iommu_group *group,
-				    struct iommu_domain *new_domain)
+/*
+ * If 0 is returned the group's domain is new_domain. If an error is returned
+ * then the group's domain will be set back to the existing domain unless
+ * IOMMU_SET_DOMAIN_MUST_SUCCEED, otherwise an error is returned and the group's
+ * domains is left inconsistent. This is a driver bug to fail attach with a
+ * previously good domain. We try to avoid a kernel UAF because of this.
+ *
+ * IOMMU groups are really the natural working unit of the IOMMU, but the IOMMU
+ * API works on domains and devices.  Bridge that gap by iterating over the
+ * devices in a group.  Ideally we'd have a single device which represents the
+ * requestor ID of the group, but we also allow IOMMU drivers to create policy
+ * defined minimum sets, where the physical hardware may be able to distiguish
+ * members, but we wish to group them at a higher level (ex. untrusted
+ * multi-function PCI devices).  Thus we attach each device.
+ */
+static int __iommu_group_set_domain_internal(struct iommu_group *group,
+					     struct iommu_domain *new_domain,
+					     unsigned int flags)
 {
+	struct group_device *last_gdev;
+	struct group_device *gdev;
+	int result;
 	int ret;
 
+	lockdep_assert_held(&group->mutex);
+
 	if (group->domain == new_domain)
 		return 0;
 
@@ -2245,8 +2295,12 @@ static int __iommu_group_set_domain(struct iommu_group *group,
 	 * platform specific behavior.
 	 */
 	if (!new_domain) {
-		__iommu_group_for_each_dev(group, NULL,
-					   iommu_group_do_set_platform_dma);
+		for_each_group_device(group, gdev) {
+			const struct iommu_ops *ops = dev_iommu_ops(gdev->dev);
+
+			if (!WARN_ON(!ops->set_platform_dma_ops))
+				ops->set_platform_dma_ops(gdev->dev);
+		}
 		group->domain = NULL;
 		return 0;
 	}
@@ -2256,16 +2310,52 @@ static int __iommu_group_set_domain(struct iommu_group *group,
 	 * domain. This switch does not have to be atomic and DMA can be
 	 * discarded during the transition. DMA must only be able to access
 	 * either new_domain or group->domain, never something else.
-	 *
-	 * Note that this is called in error unwind paths, attaching to a
-	 * domain that has already been attached cannot fail.
 	 */
-	ret = __iommu_group_for_each_dev(group, new_domain,
-					 iommu_group_do_attach_device);
-	if (ret)
-		return ret;
+	result = 0;
+	for_each_group_device(group, gdev) {
+		ret = __iommu_device_set_domain(group, gdev->dev, new_domain,
+						flags);
+		if (ret) {
+			result = ret;
+			/*
+			 * Keep trying the other devices in the group. If a
+			 * driver fails attach to an otherwise good domain, and
+			 * does not support blocking domains, it should at least
+			 * drop its reference on the current domain so we don't
+			 * UAF.
+			 */
+			if (flags & IOMMU_SET_DOMAIN_MUST_SUCCEED)
+				continue;
+			goto err_revert;
+		}
+	}
 	group->domain = new_domain;
-	return 0;
+	return result;
+
+err_revert:
+	/*
+	 * This is called in error unwind paths. A well behaved driver should
+	 * always allow us to attach to a domain that was already attached.
+	 */
+	last_gdev = gdev;
+	for_each_group_device(group, gdev) {
+		const struct iommu_ops *ops = dev_iommu_ops(gdev->dev);
+
+		/*
+		 * If set_platform_dma_ops is not present a NULL domain can
+		 * happen only for first probe, in which case we leave
+		 * group->domain as NULL and let release clean everything up.
+		 */
+		if (group->domain)
+			WARN_ON(__iommu_device_set_domain(
+				group, gdev->dev, group->domain,
+				IOMMU_SET_DOMAIN_MUST_SUCCEED));
+		else if (ops->set_platform_dma_ops)
+			ops->set_platform_dma_ops(gdev->dev);
+		if (gdev == last_gdev)
+			break;
+	}
+	return ret;
 }
 
 void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
@@ -3182,16 +3272,13 @@ EXPORT_SYMBOL_GPL(iommu_device_claim_dma_owner);
 
 static void __iommu_release_dma_ownership(struct iommu_group *group)
 {
-	int ret;
-
 	if (WARN_ON(!group->owner_cnt || !group->owner ||
 		    !xa_empty(&group->pasid_array)))
 		return;
 
 	group->owner_cnt = 0;
 	group->owner = NULL;
-	ret = __iommu_group_set_domain(group, group->default_domain);
-	WARN(ret, "iommu driver failed to attach the default domain");
+	__iommu_group_set_domain_nofail(group, group->default_domain);
 }
 
 /**
-- 
2.39.2


  parent reply	other threads:[~2023-07-24  1:15 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-24  1:12 [PATCH AUTOSEL 6.4 01/58] drm/amd/display: Do not set drr on pipe commit Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 02/58] drm/amd/display: Update DTBCLK for DCN32 Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 03/58] drm/scheduler: set entity to NULL in drm_sched_entity_pop_job() Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 04/58] drm/amdgpu: fix calltrace warning in amddrm_buddy_fini Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 05/58] drm/radeon: Fix integer overflow in radeon_cs_parser_init Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 06/58] drm/amdgpu: Fix integer overflow in amdgpu_cs_pass1 Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 07/58] drm/amdgpu: fix memory leak in mes self test Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 08/58] Revert "drm/amd/display: disable SubVP + DRR to prevent underflow" Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 09/58] ALSA: emu10k1: roll up loops in DSP setup code for Audigy Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 10/58] ASoC: Intel: sof_sdw: add quirk for MTL RVP Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 11/58] ASoC: Intel: sof_sdw: add quirk for LNL RVP Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 12/58] PCI: tegra194: Fix possible array out of bounds access Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 13/58] ASoC: cs35l56: Move DSP part string generation so that it is done only once Sasha Levin
2023-07-24  1:12 ` Sasha Levin [this message]
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 15/58] ASoC: SOF: amd: Add pci revision id check Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 16/58] drm/stm: ltdc: fix late dereference check Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 17/58] arm64: dts: qcom: ipq5332: add QFPROM node Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 18/58] drm: rcar-du: remove R-Car H3 ES1.* workarounds Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 19/58] ASoC: amd: vangogh: Add check for acp config flags in vangogh platform Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 20/58] RDMA/mana_ib: Use v2 version of cfg_rx_steer_req to enable RX coalescing Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 21/58] ARM: dts: imx6dl: prtrvt, prtvt7, prti6q, prtwd2: fix USB related warnings Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 22/58] ASoC: Intel: sof_sdw_rt_sdca_jack_common: test SOF_JACK_JDSRC in _exit Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 23/58] ASoC: Intel: sof_sdw: add quick for Dell SKU 0BDA Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 24/58] ASoC: Intel: sof_sdw: Add support for Rex soundwire Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 25/58] iopoll: Call cpu_relax() in busy loops Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 26/58] quota: Properly disable quotas when add_dquot_ref() fails Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 27/58] quota: fix warning in dqgrab() Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 28/58] ALSA: hda: Add Loongson LS7A HD-Audio support Sasha Levin
2023-07-24  3:00   ` Yanteng Si
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 29/58] ASoC: SOF: Intel: fix SoundWire/HDaudio mutual exclusion Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 30/58] dma-remap: use kvmalloc_array/kvfree for larger dma memory remap Sasha Levin
2023-07-24  1:12 ` [PATCH AUTOSEL 6.4 31/58] accel/habanalabs: add pci health check during heartbeat Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 32/58] accel/habanalabs: fix mem leak in capture user mappings Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 33/58] hwmon: (aquacomputer_d5next) Add support for Aquacomputer Leakshield Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 34/58] dt-bindings: input: goodix: Add "goodix,no-reset-during-suspend" property Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 35/58] HID: i2c-hid: goodix: Add support for " Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 36/58] HID: logitech-hidpp: Add USB and Bluetooth IDs for the Logitech G915 TKL Keyboard Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 37/58] iommu/amd: Introduce Disable IRTE Caching Support Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 38/58] drm/amdgpu: install stub fence into potential unused fence pointers Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 39/58] drm/amd/display: Remove v_startup workaround for dcn3+ Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 40/58] drm/amd/display: Trigger DIO FIFO resync on commit streams Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 41/58] drm/amd/display: Apply 60us prefetch for DCFCLK <= 300Mhz Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 42/58] Revert "drm/amd/display: Do not set drr on pipe commit" Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 43/58] HID: add quirk for 03f0:464a HP Elite Presenter Mouse Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 44/58] drm/amd/pm: Fill metrics data for SMUv13.0.6 Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 45/58] drm/amdgpu: unmap and remove csa_va properly Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 46/58] RDMA/mlx5: Return the firmware result upon destroying QP/RQ Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 47/58] RDMA/bnxt_re: consider timeout of destroy ah as success Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 48/58] drm/amd/display: Skip DPP DTO update if root clock is gated Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 49/58] drm/amd/display: Enable dcn314 DPP RCO Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 50/58] ASoC: SOF: core: Free the firmware trace before calling snd_sof_shutdown() Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 51/58] ovl: check type and offset of struct vfsmount in ovl_entry Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 52/58] HID: intel-ish-hid: ipc: Add Arrow Lake PCI device ID Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 53/58] udf: Fix uninitialized array access for some pathnames Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 54/58] ALSA: hda/realtek: Add quirks for ROG ALLY CS35l41 audio Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 55/58] fs: jfs: Fix UBSAN: array-index-out-of-bounds in dbAllocDmapLev Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 56/58] MIPS: dec: prom: Address -Warray-bounds warning Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 57/58] FS: JFS: Fix null-ptr-deref Read in txBegin Sasha Levin
2023-07-24  1:13 ` [PATCH AUTOSEL 6.4 58/58] FS: JFS: Check for read-only mounted filesystem " Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230724011338.2298062-14-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=heiko@sntech.de \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jroedel@suse.de \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=schnelle@linux.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox