devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Add support for Translation Buffer Units
@ 2023-10-19  2:19 Georgi Djakov
  2023-10-19  2:19 ` [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings Georgi Djakov
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Georgi Djakov @ 2023-10-19  2:19 UTC (permalink / raw)
  To: robh+dt, krzysztof.kozlowski+dt, conor+dt, will, robin.murphy,
	joro
  Cc: devicetree, andersson, konrad.dybcio, linux-arm-kernel, iommu,
	linux-kernel, linux-arm-msm, quic_cgoldswo, quic_sukadev,
	quic_pdaly, quic_sudaraja, djakov

The TCUs (Translation Control Units) and TBUs (Translation Buffer
Units) are key components of the ARM MMU-500. Multiple TBUs are
connected to a single TCU over an interconnect. Each TBU contains
a TLB that caches page tables. The MMU-500 implements a TBU for each
connected master, and the TBU is designed, so that it is local to the
master.

The Qualcomm sdm845 platform has an implementation of the ARM SMMU-500,
that consists of a single TCU and multiple TBUs. Support for the TCU
is already present and this patchset is adds support for TBUs that
allow us to use some of the debug features to include more info when
a context fault occurs.

Each TBU requires some resources like power-domains, interconnects and
clocks to be described in DT. These resources will be being controlled
by the TBU driver when they in use.

Georgi Djakov (6):
  dt-bindings: iommu: Add Translation Buffer Unit bindings
  iommu/arm-smmu-qcom: Add support for TBUs
  iommu/arm-smmu-qcom: Add Qualcomm TBU driver
  iommu/arm-smmu: Allow using a threaded handler for context interrupts
  iommu/arm-smmu-qcom: Use a custom context fault handler for sdm845
  arm64: dts: qcom: sdm845: Add DT nodes for the TBUs

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  13 +
 .../bindings/iommu/qcom,qsmmuv500-tbu.yaml    |  67 +++
 arch/arm64/boot/dts/qcom/sdm845.dtsi          |  85 +++
 drivers/iommu/Kconfig                         |   8 +
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c    | 544 ++++++++++++++++++
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h    |   4 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c         |  12 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h         |   3 +
 8 files changed, 733 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings
  2023-10-19  2:19 [PATCH 0/6] Add support for Translation Buffer Units Georgi Djakov
@ 2023-10-19  2:19 ` Georgi Djakov
  2023-10-19 11:12   ` Konrad Dybcio
  2023-10-24 18:42   ` Rob Herring
  2023-10-19  2:19 ` [PATCH 2/6] iommu/arm-smmu-qcom: Add support for TBUs Georgi Djakov
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 15+ messages in thread
From: Georgi Djakov @ 2023-10-19  2:19 UTC (permalink / raw)
  To: robh+dt, krzysztof.kozlowski+dt, conor+dt, will, robin.murphy,
	joro
  Cc: devicetree, andersson, konrad.dybcio, linux-arm-kernel, iommu,
	linux-kernel, linux-arm-msm, quic_cgoldswo, quic_sukadev,
	quic_pdaly, quic_sudaraja, djakov

The "apps_smmu" on the Qualcomm sdm845 platform is an implementation
of the ARM SMMU-500, that consists of a single TCU (Translation Control
Unit) and multiple TBUs (Translation Buffer Units). The TCU is already
being described in the ARM SMMU DT schema. Add also bindings for the
TBUs so that we can describe their properties.

In this DT schema, the TBUs are modelled as a child devices of the TCU
and each of them is described with it's own resources such as clocks,
power domains, interconnects etc.

Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
---
 .../devicetree/bindings/iommu/arm,smmu.yaml   | 13 ++++
 .../bindings/iommu/qcom,qsmmuv500-tbu.yaml    | 67 +++++++++++++++++++
 2 files changed, 80 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index cf29ab10501c..afc323b4bbc5 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -230,6 +230,19 @@ properties:
       enabled for any given device.
     $ref: /schemas/types.yaml#/definitions/phandle
 
+  '#address-cells':
+    const: 2
+
+  '#size-cells':
+    const: 2
+
+  ranges: true
+
+patternProperties:
+  "^tbu@[0-9a-f]+$":
+    $ref: qcom,qsmmuv500-tbu.yaml
+    description: The SMMU may include Translation Buffer Units (TBU) as subnodes
+
 required:
   - compatible
   - reg
diff --git a/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml b/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
new file mode 100644
index 000000000000..4baba7397e90
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
@@ -0,0 +1,67 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/qcom,qsmmuv500-tbu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm TBU (Translation Buffer Unit)
+
+maintainers:
+  - Georgi Djakov <quic_c_gdjako@quicinc.com>
+
+description:
+  TBU nodes represent Translation Buffer Units in an ARM SMMU. Each TBU node
+  should be a child node of the SMMU in the device tree.
+
+properties:
+  compatible:
+    enum:
+      - qcom,qsmmuv500-tbu
+
+  reg:
+    items:
+      - description: Address and size of the TBU's register space.
+
+  reg-names:
+    items:
+      - const: base
+
+  clocks:
+    maxItems: 1
+
+  interconnects:
+    maxItems: 1
+
+  power-domains:
+    maxItems: 1
+
+  qcom,stream-id-range:
+    $ref: /schemas/types.yaml#/definitions/uint32-array
+    description: Stream ID range (address and size) that is assigned by the TBU
+
+required:
+  - compatible
+  - reg
+  - interconnects
+  - qcom,stream-id-range
+
+additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/clock/qcom,gcc-sdm845.h>
+    #include <dt-bindings/interconnect/qcom,sdm845.h>
+    #include <dt-bindings/power/qcom-rpmpd.h>
+
+
+    tbu@150e1000 {
+        compatible = "qcom,qsmmuv500-tbu";
+        reg = <0x150e1000 0x1000>;
+        reg-names = "base";
+        clocks = <&gcc GCC_AGGRE_NOC_PCIE_TBU_CLK>;
+        power-domains = <&gcc HLOS1_VOTE_AGGRE_NOC_MMU_PCIE_TBU_GDSC>;
+        interconnects = <&system_noc MASTER_GNOC_SNOC 0 &config_noc SLAVE_IMEM_CFG 0>;
+        qcom,stream-id-range = <0x1c00 0x400>;
+    };
+
+...

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] iommu/arm-smmu-qcom: Add support for TBUs
  2023-10-19  2:19 [PATCH 0/6] Add support for Translation Buffer Units Georgi Djakov
  2023-10-19  2:19 ` [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings Georgi Djakov
@ 2023-10-19  2:19 ` Georgi Djakov
  2023-10-19 11:13   ` Konrad Dybcio
  2023-10-19  2:19 ` [PATCH 3/6] iommu/arm-smmu-qcom: Add Qualcomm TBU driver Georgi Djakov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Georgi Djakov @ 2023-10-19  2:19 UTC (permalink / raw)
  To: robh+dt, krzysztof.kozlowski+dt, conor+dt, will, robin.murphy,
	joro
  Cc: devicetree, andersson, konrad.dybcio, linux-arm-kernel, iommu,
	linux-kernel, linux-arm-msm, quic_cgoldswo, quic_sukadev,
	quic_pdaly, quic_sudaraja, djakov

The ARM MMU-500 implements a Translation Buffer Unit (TBU) for each
connected master besides a single TCU which controls and manages the
address translations.

Allow the Qualcomm SMMU driver to probe for any TBU devices that can
provide additional debug features like triggering transactions, logging
outstanding transactions, snapshot capture etc. The most basic use-case
would be to get information from the TBUs and print it during a context
fault.

Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 12 ++++++++++++
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h |  4 +++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 7f52ac67495f..655c7f50ca84 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -1,12 +1,14 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
  * Copyright (c) 2019, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved
  */
 
 #include <linux/acpi.h>
 #include <linux/adreno-smmu-priv.h>
 #include <linux/delay.h>
 #include <linux/of_device.h>
+#include <linux/of_platform.h>
 #include <linux/firmware/qcom/qcom_scm.h>
 
 #include "arm-smmu.h"
@@ -466,6 +468,16 @@ static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
 	qsmmu->smmu.impl = impl;
 	qsmmu->cfg = data->cfg;
 
+	/* Populate TBU devices if such are present in DT */
+	if (np && of_device_is_compatible(np, "arm,mmu-500")) {
+		int ret;
+
+		INIT_LIST_HEAD(&qsmmu->tbu_list);
+		ret = devm_of_platform_populate(smmu->dev);
+		if (ret)
+			return ERR_PTR(ret);
+	}
+
 	return &qsmmu->smmu;
 }
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
index 593910567b88..2164a9cf3dde 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 /*
- * Copyright (c) 2022, Qualcomm Innovation Center, Inc. All rights reserved.
+ * Copyright (c) 2022-2023, Qualcomm Innovation Center, Inc. All rights reserved.
  */
 
 #ifndef _ARM_SMMU_QCOM_H
@@ -12,6 +12,8 @@ struct qcom_smmu {
 	bool bypass_quirk;
 	u8 bypass_cbndx;
 	u32 stall_enabled;
+	struct mutex tbu_list_lock;  /* protects tbu_list */
+	struct list_head tbu_list;
 };
 
 enum qcom_smmu_impl_reg_offset {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] iommu/arm-smmu-qcom: Add Qualcomm TBU driver
  2023-10-19  2:19 [PATCH 0/6] Add support for Translation Buffer Units Georgi Djakov
  2023-10-19  2:19 ` [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings Georgi Djakov
  2023-10-19  2:19 ` [PATCH 2/6] iommu/arm-smmu-qcom: Add support for TBUs Georgi Djakov
@ 2023-10-19  2:19 ` Georgi Djakov
  2023-10-21 21:05   ` Bjorn Andersson
  2023-10-26 15:25   ` kernel test robot
  2023-10-19  2:19 ` [PATCH 4/6] iommu/arm-smmu: Allow using a threaded handler for context interrupts Georgi Djakov
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 15+ messages in thread
From: Georgi Djakov @ 2023-10-19  2:19 UTC (permalink / raw)
  To: robh+dt, krzysztof.kozlowski+dt, conor+dt, will, robin.murphy,
	joro
  Cc: devicetree, andersson, konrad.dybcio, linux-arm-kernel, iommu,
	linux-kernel, linux-arm-msm, quic_cgoldswo, quic_sukadev,
	quic_pdaly, quic_sudaraja, djakov

Add driver for the Qualcomm implementation of the ARM MMU-500 TBU.
The driver will enable the resources needed by the TBU and will
configure the registers for some debug features like checking if
there are any pending transactions, capturing transactions and
running ATOS (Address Translation Operations). ATOS/eCATS are used
to manually trigger an address translation of IOVA to physical
address by the SMMU hardware.

Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
---
 drivers/iommu/Kconfig                      |   8 +
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 384 +++++++++++++++++++++
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |   2 +
 3 files changed, 394 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 2b12b583ef4b..2b024d65b921 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -377,6 +377,14 @@ config ARM_SMMU_QCOM
 	  When running on a Qualcomm platform that has the custom variant
 	  of the ARM SMMU, this needs to be built into the SMMU driver.
 
+config ARM_SMMU_QCOM_TBU
+	bool "Qualcomm TBU driver"
+	depends on ARM_SMMU_QCOM
+	help
+	  The SMMU on Qualcomm platform may include a Translation Buffer
+	  Units (TBUs) for each master. Enabling support for these will
+	  allow operating the TBUs to help debugging context faults.
+
 config ARM_SMMU_QCOM_DEBUG
 	bool "ARM SMMU QCOM implementation defined debug support"
 	depends on ARM_SMMU_QCOM
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 655c7f50ca84..d20263a1cd2c 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -10,17 +10,324 @@
 #include <linux/of_device.h>
 #include <linux/of_platform.h>
 #include <linux/firmware/qcom/qcom_scm.h>
+#include <linux/interconnect.h>
+#include <linux/iopoll.h>
 
 #include "arm-smmu.h"
 #include "arm-smmu-qcom.h"
 
 #define QCOM_DUMMY_VAL	-1
 
+#define TBU_DBG_TIMEOUT_US		100
+#define DEBUG_AXUSER_REG		0x30
+#define DEBUG_AXUSER_CDMID		GENMASK_ULL(43, 36)
+#define DEBUG_AXUSER_CDMID_VAL		0xff
+#define DEBUG_PAR_REG			0x28
+#define DEBUG_PAR_FAULT_VAL		BIT(0)
+#define DEBUG_PAR_PA			GENMASK_ULL(47, 12)
+#define DEBUG_SID_HALT_REG		0x0
+#define DEBUG_SID_HALT_VAL		BIT(16)
+#define DEBUG_SID_HALT_SID		GENMASK(9, 0)
+#define DEBUG_SR_HALT_ACK_REG		0x20
+#define DEBUG_SR_HALT_ACK_VAL		BIT(1)
+#define DEBUG_SR_ECATS_RUNNING_VAL	BIT(0)
+#define DEBUG_TXN_AXCACHE		GENMASK(5, 2)
+#define DEBUG_TXN_AXPROT		GENMASK(8, 6)
+#define DEBUG_TXN_AXPROT_PRIV		0x1
+#define DEBUG_TXN_AXPROT_NSEC		0x2
+#define DEBUG_TXN_TRIGG_REG		0x18
+#define DEBUG_TXN_TRIGGER		BIT(0)
+#define DEBUG_VA_ADDR_REG		0x8
+
 static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
 {
 	return container_of(smmu, struct qcom_smmu, smmu);
 }
 
+#ifdef CONFIG_ARM_SMMU_QCOM_TBU
+
+struct qsmmuv500_tbu {
+	struct device *dev;
+	struct arm_smmu_device *smmu;
+	u32 sid_range[2];
+	struct list_head list;
+	struct clk *clk;
+	struct icc_path	*path;
+	void __iomem *base;
+	spinlock_t halt_lock; /* protects halt count */
+	int halt_count;
+};
+
+static DEFINE_SPINLOCK(ecats_lock);
+
+static struct qsmmuv500_tbu *qsmmuv500_find_tbu(struct qcom_smmu *qsmmu, u32 sid)
+{
+	struct qsmmuv500_tbu *tbu = NULL;
+	u32 start, end;
+
+	mutex_lock(&qsmmu->tbu_list_lock);
+
+	list_for_each_entry(tbu, &qsmmu->tbu_list, list) {
+		start = tbu->sid_range[0];
+		end = start + tbu->sid_range[1];
+
+		if (start <= sid && sid < end)
+			break;
+	}
+
+	mutex_unlock(&qsmmu->tbu_list_lock);
+
+	return tbu;
+}
+
+static int qsmmuv500_tbu_halt(struct qsmmuv500_tbu *tbu, struct arm_smmu_domain *smmu_domain)
+{
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	int ret = 0, idx = smmu_domain->cfg.cbndx;
+	unsigned long flags;
+	u32 val, fsr, status;
+
+	spin_lock_irqsave(&tbu->halt_lock, flags);
+	if (tbu->halt_count) {
+		tbu->halt_count++;
+		goto out;
+	}
+
+	val = readl_relaxed(tbu->base + DEBUG_SID_HALT_REG);
+	val |= DEBUG_SID_HALT_VAL;
+	writel_relaxed(val, tbu->base + DEBUG_SID_HALT_REG);
+
+	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+	if ((fsr & ARM_SMMU_FSR_FAULT) && (fsr & ARM_SMMU_FSR_SS)) {
+		u32 sctlr_orig, sctlr;
+
+		/*
+		 * We are in a fault. Our request to halt the bus will not
+		 * complete until transactions in front of us (such as the fault
+		 * itself) have completed. Disable iommu faults and terminate
+		 * any existing transactions.
+		 */
+		sctlr_orig = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR);
+		sctlr = sctlr_orig & ~(ARM_SMMU_SCTLR_CFCFG | ARM_SMMU_SCTLR_CFIE);
+		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, sctlr);
+		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
+
+		/* Ensure that the FSR is cleared */
+		wmb();
+
+		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME, ARM_SMMU_RESUME_TERMINATE);
+		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, sctlr_orig);
+	}
+
+	if (readl_poll_timeout_atomic(tbu->base + DEBUG_SR_HALT_ACK_REG, status,
+				      (status & DEBUG_SR_HALT_ACK_VAL),
+				      0, TBU_DBG_TIMEOUT_US)) {
+		dev_err(tbu->dev, "Timeout while trying to halt TBU!\n");
+		ret = -ETIMEDOUT;
+
+		val = readl_relaxed(tbu->base + DEBUG_SID_HALT_REG);
+		val &= ~DEBUG_SID_HALT_VAL;
+		writel_relaxed(val, tbu->base + DEBUG_SID_HALT_REG);
+
+		goto out;
+	}
+
+	tbu->halt_count = 1;
+
+out:
+	spin_unlock_irqrestore(&tbu->halt_lock, flags);
+	return ret;
+}
+
+static void qsmmuv500_tbu_resume(struct qsmmuv500_tbu *tbu)
+{
+	unsigned long flags;
+	u32 val;
+
+	spin_lock_irqsave(&tbu->halt_lock, flags);
+	if (!tbu->halt_count) {
+		WARN(1, "%s: halt_count is 0", dev_name(tbu->dev));
+		goto out;
+	}
+
+	if (tbu->halt_count > 1) {
+		tbu->halt_count--;
+		goto out;
+	}
+
+	val = readl_relaxed(tbu->base + DEBUG_SID_HALT_REG);
+	val &= ~DEBUG_SID_HALT_VAL;
+	writel_relaxed(val, tbu->base + DEBUG_SID_HALT_REG);
+
+	tbu->halt_count = 0;
+out:
+	spin_unlock_irqrestore(&tbu->halt_lock, flags);
+}
+
+static phys_addr_t qsmmuv500_tbu_trigger_atos(struct arm_smmu_domain *smmu_domain,
+					      struct qsmmuv500_tbu *tbu, dma_addr_t iova, u32 sid)
+{
+	bool ecats_timedout = false;
+	phys_addr_t phys = 0;
+	ktime_t timeout;
+	u64 val;
+
+	/* Set address and stream-id */
+	val = readq_relaxed(tbu->base + DEBUG_SID_HALT_REG);
+	val &= ~DEBUG_SID_HALT_SID;
+	val |= FIELD_PREP(DEBUG_SID_HALT_SID, sid);
+	writeq_relaxed(val, tbu->base + DEBUG_SID_HALT_REG);
+	writeq_relaxed(iova, tbu->base + DEBUG_VA_ADDR_REG);
+	val = FIELD_PREP(DEBUG_AXUSER_CDMID, DEBUG_AXUSER_CDMID_VAL);
+	writeq_relaxed(val, tbu->base + DEBUG_AXUSER_REG);
+
+	/* Write-back read and write-allocate */
+	val = FIELD_PREP(DEBUG_TXN_AXCACHE, 0xF);
+
+	/* Non-secure access */
+	val |= FIELD_PREP(DEBUG_TXN_AXPROT, DEBUG_TXN_AXPROT_NSEC);
+
+	/* Priviledged access */
+	val |= FIELD_PREP(DEBUG_TXN_AXPROT, DEBUG_TXN_AXPROT_PRIV);
+
+	val |= DEBUG_TXN_TRIGGER;
+	writeq_relaxed(val, tbu->base + DEBUG_TXN_TRIGG_REG);
+
+	timeout = ktime_add_us(ktime_get(), TBU_DBG_TIMEOUT_US);
+	for (;;) {
+		val = readl_relaxed(tbu->base + DEBUG_SR_HALT_ACK_REG);
+		if (!(val & DEBUG_SR_ECATS_RUNNING_VAL))
+			break;
+		val = readl_relaxed(tbu->base + DEBUG_PAR_REG);
+		if (val & DEBUG_PAR_FAULT_VAL)
+			break;
+		if (ktime_compare(ktime_get(), timeout) > 0) {
+			ecats_timedout = true;
+			break;
+		}
+	}
+
+	val = readq_relaxed(tbu->base + DEBUG_PAR_REG);
+	if (val & DEBUG_PAR_FAULT_VAL)
+		dev_err(tbu->dev, "ECATS generated a fault interrupt! PAR = %llx, SID=0x%x\n",
+			val, sid);
+	else if (ecats_timedout)
+		dev_err_ratelimited(tbu->dev, "ECATS translation timed out!\n");
+	else
+		phys = FIELD_GET(DEBUG_PAR_PA, val);
+
+	/* Reset hardware */
+	writeq_relaxed(0, tbu->base + DEBUG_TXN_TRIGG_REG);
+	writeq_relaxed(0, tbu->base + DEBUG_VA_ADDR_REG);
+	val = readl_relaxed(tbu->base + DEBUG_SID_HALT_REG);
+	val &= ~DEBUG_SID_HALT_SID;
+	writel_relaxed(val, tbu->base + DEBUG_SID_HALT_REG);
+
+	return phys;
+}
+
+static phys_addr_t qsmmuv500_iova_to_phys(struct arm_smmu_domain *smmu_domain,
+					  dma_addr_t iova, u32 sid)
+{
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+	int idx = smmu_domain->cfg.cbndx;
+	struct qsmmuv500_tbu *tbu;
+	u32 sctlr_orig, sctlr;
+	phys_addr_t phys = 0;
+	unsigned long flags;
+	int needs_redo = 0;
+	int ret;
+	u64 fsr;
+
+	tbu = qsmmuv500_find_tbu(qsmmu, sid);
+	if (!tbu)
+		return 0;
+
+	ret = icc_set_bw(tbu->path, 0, UINT_MAX);
+	if (ret)
+		return ret;
+
+	ret = clk_prepare_enable(tbu->clk);
+	if (ret)
+		goto disable_icc;
+
+	ret = qsmmuv500_tbu_halt(tbu, smmu_domain);
+	if (ret)
+		goto disable_clk;
+
+	/*
+	 * ECATS can trigger the fault interrupt, so disable it temporarily
+	 * and check for an interrupt manually.
+	 */
+	sctlr_orig = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR);
+	sctlr = sctlr_orig & ~(ARM_SMMU_SCTLR_CFCFG | ARM_SMMU_SCTLR_CFIE);
+	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, sctlr);
+
+	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+	if (fsr & ARM_SMMU_FSR_FAULT) {
+		/* Clear pending interrupts */
+		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
+		/*
+		 * Barrier required to ensure that the FSR is cleared
+		 * before resuming SMMU operation.
+		 */
+		wmb();
+
+		/*
+		 * TBU halt takes care of resuming any stalled transcation.
+		 * Kept it here for completeness sake.
+		 */
+		if (fsr & ARM_SMMU_FSR_SS)
+			arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME,
+					  ARM_SMMU_RESUME_TERMINATE);
+	}
+
+	/* Only one concurrent atos operation */
+	spin_lock_irqsave(&ecats_lock, flags);
+
+	/*
+	 * After a failed translation, the next successful translation will
+	 * incorrectly be reported as a failure.
+	 */
+	do {
+		phys = qsmmuv500_tbu_trigger_atos(smmu_domain, tbu, iova, sid);
+
+		fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+		if (fsr & ARM_SMMU_FSR_FAULT) {
+			/* Clear pending interrupts */
+			arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
+			/*
+			 * Barrier required to ensure that the FSR is cleared
+			 * before resuming SMMU operation.
+			 */
+			wmb();
+
+			if (fsr & ARM_SMMU_FSR_SS)
+				arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME,
+						  ARM_SMMU_RESUME_TERMINATE);
+		}
+	} while (!phys && needs_redo++ < 2);
+
+	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, sctlr_orig);
+	spin_unlock_irqrestore(&ecats_lock, flags);
+	qsmmuv500_tbu_resume(tbu);
+
+	/* Read to complete prior write transcations */
+	readl_relaxed(tbu->base + DEBUG_SR_HALT_ACK_REG);
+
+	/* Wait for read to complete */
+	rmb();
+
+disable_clk:
+	clk_disable_unprepare(tbu->clk);
+disable_icc:
+	icc_set_bw(tbu->path, 0, 0);
+
+	return phys;
+}
+#endif
+
 static void qcom_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
 				int sync, int status)
 {
@@ -588,3 +895,80 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
 
 	return smmu;
 }
+
+#ifdef CONFIG_ARM_SMMU_QCOM_TBU
+
+static const struct of_device_id qsmmuv500_tbu_of_match[] = {
+	{ .compatible = "qcom,qsmmuv500-tbu" },
+	{ }
+};
+
+static int qsmmuv500_tbu_probe(struct platform_device *pdev)
+{
+	struct device_node *np = pdev->dev.of_node;
+	struct device *dev = &pdev->dev;
+	struct arm_smmu_device *smmu;
+	struct qsmmuv500_tbu *tbu;
+	struct qcom_smmu *qsmmu;
+	int ret;
+
+	smmu = dev_get_drvdata(dev->parent);
+	if (!smmu)
+		return -EPROBE_DEFER;
+
+	qsmmu = to_qcom_smmu(smmu);
+
+	tbu = devm_kzalloc(dev, sizeof(*tbu), GFP_KERNEL);
+	if (!tbu)
+		return -ENOMEM;
+
+	tbu->dev = dev;
+	INIT_LIST_HEAD(&tbu->list);
+	spin_lock_init(&tbu->halt_lock);
+
+	tbu->base = devm_of_iomap(dev, np, 0, NULL);
+	if (IS_ERR(tbu->base))
+		return PTR_ERR(tbu->base);
+
+	ret = of_property_read_u32_array(np, "qcom,stream-id-range", tbu->sid_range, 2);
+	if (ret) {
+		dev_err(dev, "The DT property 'qcom,stream-id-range' is mandatory\n");
+		return ret;
+	}
+
+	tbu->clk = devm_clk_get_optional(dev, NULL);
+	if (IS_ERR(tbu->clk))
+		return PTR_ERR(tbu->clk);
+
+	tbu->path = devm_of_icc_get(dev, NULL);
+	if (IS_ERR(tbu->path))
+		return PTR_ERR(tbu->path);
+
+	mutex_lock(&qsmmu->tbu_list_lock);
+	list_add_tail(&tbu->list, &qsmmu->tbu_list);
+	mutex_unlock(&qsmmu->tbu_list_lock);
+
+	dev_set_drvdata(dev, tbu);
+
+	return 0;
+}
+
+static void qsmmuv500_tbu_remove(struct platform_device *pdev)
+{
+	struct qsmmuv500_tbu *tbu = dev_get_drvdata(&pdev->dev);
+
+	clk_disable_unprepare(tbu->clk);
+	clk_put(tbu->clk);
+	icc_put(tbu->path);
+}
+
+static struct platform_driver qsmmuv500_tbu_driver = {
+	.driver = {
+		.name           = "qsmmuv500-tbu",
+		.of_match_table = of_match_ptr(qsmmuv500_tbu_of_match),
+	},
+	.probe  = qsmmuv500_tbu_probe,
+	.remove_new = qsmmuv500_tbu_remove,
+};
+module_platform_driver(qsmmuv500_tbu_driver);
+#endif
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 703fd5817ec1..e5df65c0f81a 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -136,6 +136,7 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CBAR_VMID		GENMASK(7, 0)
 
 #define ARM_SMMU_GR1_CBFRSYNRA(n)	(0x400 + ((n) << 2))
+#define ARM_SMMU_CBFRSYNRA_SID		GENMASK(15, 0)
 
 #define ARM_SMMU_GR1_CBA2R(n)		(0x800 + ((n) << 2))
 #define ARM_SMMU_CBA2R_VMID16		GENMASK(31, 16)
@@ -238,6 +239,7 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_ATSR		0x8f0
 #define ARM_SMMU_ATSR_ACTIVE		BIT(0)
 
+#define ARM_SMMU_RESUME_TERMINATE	BIT(0)
 
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS		128

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] iommu/arm-smmu: Allow using a threaded handler for context interrupts
  2023-10-19  2:19 [PATCH 0/6] Add support for Translation Buffer Units Georgi Djakov
                   ` (2 preceding siblings ...)
  2023-10-19  2:19 ` [PATCH 3/6] iommu/arm-smmu-qcom: Add Qualcomm TBU driver Georgi Djakov
@ 2023-10-19  2:19 ` Georgi Djakov
  2023-10-19  2:19 ` [PATCH 5/6] iommu/arm-smmu-qcom: Use a custom context fault handler for sdm845 Georgi Djakov
  2023-10-19  2:19 ` [PATCH 6/6] arm64: dts: qcom: sdm845: Add DT nodes for the TBUs Georgi Djakov
  5 siblings, 0 replies; 15+ messages in thread
From: Georgi Djakov @ 2023-10-19  2:19 UTC (permalink / raw)
  To: robh+dt, krzysztof.kozlowski+dt, conor+dt, will, robin.murphy,
	joro
  Cc: devicetree, andersson, konrad.dybcio, linux-arm-kernel, iommu,
	linux-kernel, linux-arm-msm, quic_cgoldswo, quic_sukadev,
	quic_pdaly, quic_sudaraja, djakov

Threaded IRQ handlers run in a less critical context compared to normal IRQs,
so they can perform more complex and time-consuming operations without causing
significant delays in other parts of the kernel.
During a context fault, it might be needed to do more processing and gather
debug information from TBUs in the handler. These operations may sleep, so
add an option to use a threaded IRQ handler in these cases.

Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 12 ++++++++++--
 drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d6d1a2a55cc0..a6561bb245ad 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -796,8 +796,16 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	else
 		context_fault = arm_smmu_context_fault;
 
-	ret = devm_request_irq(smmu->dev, irq, context_fault,
-			       IRQF_SHARED, "arm-smmu-context-fault", domain);
+	if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq)
+		ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
+						context_fault,
+						IRQF_ONESHOT | IRQF_SHARED,
+						"arm-smmu-context-fault",
+						domain);
+	else
+		ret = devm_request_irq(smmu->dev, irq, context_fault,
+				       IRQF_SHARED, "arm-smmu-context-fault", domain);
+
 	if (ret < 0) {
 		dev_err(smmu->dev, "failed to request context IRQ %d (%u)\n",
 			cfg->irptndx, irq);
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index e5df65c0f81a..09f11f942ff8 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -439,6 +439,7 @@ struct arm_smmu_impl {
 	int (*def_domain_type)(struct device *dev);
 	irqreturn_t (*global_fault)(int irq, void *dev);
 	irqreturn_t (*context_fault)(int irq, void *dev);
+	bool context_fault_needs_threaded_irq;
 	int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain,
 				  struct arm_smmu_device *smmu,
 				  struct device *dev, int start);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] iommu/arm-smmu-qcom: Use a custom context fault handler for sdm845
  2023-10-19  2:19 [PATCH 0/6] Add support for Translation Buffer Units Georgi Djakov
                   ` (3 preceding siblings ...)
  2023-10-19  2:19 ` [PATCH 4/6] iommu/arm-smmu: Allow using a threaded handler for context interrupts Georgi Djakov
@ 2023-10-19  2:19 ` Georgi Djakov
  2023-10-19  2:19 ` [PATCH 6/6] arm64: dts: qcom: sdm845: Add DT nodes for the TBUs Georgi Djakov
  5 siblings, 0 replies; 15+ messages in thread
From: Georgi Djakov @ 2023-10-19  2:19 UTC (permalink / raw)
  To: robh+dt, krzysztof.kozlowski+dt, conor+dt, will, robin.murphy,
	joro
  Cc: devicetree, andersson, konrad.dybcio, linux-arm-kernel, iommu,
	linux-kernel, linux-arm-msm, quic_cgoldswo, quic_sukadev,
	quic_pdaly, quic_sudaraja, djakov

The sdm845 platform now supports TBUs, so let's get additional debug
info from the TBUs when a context fault occurs. Implement a custom
context fault handler that does both software + hardware page table
walks and TLB Invalidate All.

Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 148 +++++++++++++++++++++
 1 file changed, 148 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index d20263a1cd2c..bcb1e2edec4e 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -60,6 +60,11 @@ struct qsmmuv500_tbu {
 
 static DEFINE_SPINLOCK(ecats_lock);
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct arm_smmu_domain, domain);
+}
+
 static struct qsmmuv500_tbu *qsmmuv500_find_tbu(struct qcom_smmu *qsmmu, u32 sid)
 {
 	struct qsmmuv500_tbu *tbu = NULL;
@@ -326,6 +331,145 @@ static phys_addr_t qsmmuv500_iova_to_phys(struct arm_smmu_domain *smmu_domain,
 
 	return phys;
 }
+
+static phys_addr_t qcom_smmu_iova_to_phys_hard(struct iommu_domain *domain, dma_addr_t iova)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	int idx = smmu_domain->cfg.cbndx;
+	u32 frsynra;
+	u16 sid;
+
+	frsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
+	sid = FIELD_GET(ARM_SMMU_CBFRSYNRA_SID, frsynra);
+
+	return qsmmuv500_iova_to_phys(smmu_domain, iova, sid);
+}
+
+static phys_addr_t qcom_smmu_verify_fault(struct iommu_domain *domain, dma_addr_t iova, u32 fsr)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable *iop = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	phys_addr_t phys_post_tlbiall;
+	phys_addr_t phys;
+
+	phys = qcom_smmu_iova_to_phys_hard(domain, iova);
+	io_pgtable_tlb_flush_all(iop);
+	phys_post_tlbiall = qcom_smmu_iova_to_phys_hard(domain, iova);
+
+	if (phys != phys_post_tlbiall) {
+		dev_err(smmu->dev,
+			"ATOS results differed across TLBIALL... (before: %pa after: %pa)\n",
+			&phys, &phys_post_tlbiall);
+	}
+
+	return (phys == 0 ? phys_post_tlbiall : phys);
+}
+
+static irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
+{
+	struct iommu_domain *domain = dev;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	u32 fsr, fsynr, cbfrsynra, resume = 0;
+	int idx = smmu_domain->cfg.cbndx;
+	phys_addr_t phys_soft;
+	unsigned long iova;
+	int ret, tmp;
+
+	static DEFINE_RATELIMIT_STATE(_rs,
+				      DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+
+	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+	if (!(fsr & ARM_SMMU_FSR_FAULT))
+		return IRQ_NONE;
+
+	fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
+	iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
+	cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+	phys_soft = ops->iova_to_phys(ops, iova);
+
+	tmp = report_iommu_fault(domain, NULL, iova,
+				 fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
+	if (!tmp || tmp == -EBUSY) {
+		dev_dbg(smmu->dev,
+			"Context fault handled by client: iova=0x%08lx, fsr=0x%x, fsynr=0x%x, cb=%d\n",
+			iova, fsr, fsynr, idx);
+		dev_dbg(smmu->dev, "soft iova-to-phys=%pa\n", &phys_soft);
+		ret = IRQ_HANDLED;
+		resume = ARM_SMMU_RESUME_TERMINATE;
+	} else {
+		phys_addr_t phys_atos = qcom_smmu_verify_fault(domain, iova, fsr);
+
+		if (__ratelimit(&_rs)) {
+			dev_err(smmu->dev,
+				"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+				fsr, iova, fsynr, cbfrsynra, idx);
+			dev_err(smmu->dev,
+				"FSR    = %08x [%s%s%s%s%s%s%s%s%s], SID=0x%x\n",
+				fsr,
+				(fsr & 0x02) ? "TF " : "",
+				(fsr & 0x04) ? "AFF " : "",
+				(fsr & 0x08) ? "PF " : "",
+				(fsr & 0x10) ? "EF " : "",
+				(fsr & 0x20) ? "TLBMCF " : "",
+				(fsr & 0x40) ? "TLBLKF " : "",
+				(fsr & 0x80) ? "MHF " : "",
+				(fsr & 0x40000000) ? "SS " : "",
+				(fsr & 0x80000000) ? "MULTI " : "",
+				cbfrsynra);
+
+			dev_err(smmu->dev,
+				"soft iova-to-phys=%pa\n", &phys_soft);
+			if (!phys_soft)
+				dev_err(smmu->dev,
+					"SOFTWARE TABLE WALK FAILED! Looks like %s accessed an unmapped address!\n",
+					dev_name(smmu->dev));
+			if (phys_atos)
+				dev_err(smmu->dev, "hard iova-to-phys (ATOS)=%pa\n",
+					&phys_atos);
+			else
+				dev_err(smmu->dev, "hard iova-to-phys (ATOS) failed\n");
+		}
+		ret = IRQ_NONE;
+		resume = ARM_SMMU_RESUME_TERMINATE;
+	}
+
+	/*
+	 * If the client returns -EBUSY, do not clear FSR and do not RESUME
+	 * if stalled. This is required to keep the IOMMU client stalled on
+	 * the outstanding fault. This gives the client a chance to take any
+	 * debug action and then terminate the stalled transaction.
+	 * So, the sequence in case of stall on fault should be:
+	 * 1) Do not clear FSR or write to RESUME here
+	 * 2) Client takes any debug action
+	 * 3) Client terminates the stalled transaction and resumes the IOMMU
+	 * 4) Client clears FSR. The FSR should only be cleared after 3) and
+	 *    not before so that the fault remains outstanding. This ensures
+	 *    SCTLR.HUPCF has the desired effect if subsequent transactions also
+	 *    need to be terminated.
+	 */
+	if (tmp != -EBUSY) {
+		/* Clear the faulting FSR */
+		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
+
+		/*
+		 * Barrier required to ensure that the FSR is cleared
+		 * before resuming SMMU operation
+		 */
+		wmb();
+
+		/* Retry or terminate any stalled transactions */
+		if (fsr & ARM_SMMU_FSR_SS)
+			arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME, resume);
+	}
+
+	return ret;
+}
 #endif
 
 static void qcom_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
@@ -727,6 +871,10 @@ static const struct arm_smmu_impl sdm845_smmu_500_impl = {
 	.reset = qcom_sdm845_smmu500_reset,
 	.write_s2cr = qcom_smmu_write_s2cr,
 	.tlb_sync = qcom_smmu_tlb_sync,
+#ifdef CONFIG_ARM_SMMU_QCOM_TBU
+	.context_fault = qcom_smmu_context_fault,
+	.context_fault_needs_threaded_irq = true,
+#endif
 };
 
 static const struct arm_smmu_impl qcom_adreno_smmu_v2_impl = {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] arm64: dts: qcom: sdm845: Add DT nodes for the TBUs
  2023-10-19  2:19 [PATCH 0/6] Add support for Translation Buffer Units Georgi Djakov
                   ` (4 preceding siblings ...)
  2023-10-19  2:19 ` [PATCH 5/6] iommu/arm-smmu-qcom: Use a custom context fault handler for sdm845 Georgi Djakov
@ 2023-10-19  2:19 ` Georgi Djakov
  5 siblings, 0 replies; 15+ messages in thread
From: Georgi Djakov @ 2023-10-19  2:19 UTC (permalink / raw)
  To: robh+dt, krzysztof.kozlowski+dt, conor+dt, will, robin.murphy,
	joro
  Cc: devicetree, andersson, konrad.dybcio, linux-arm-kernel, iommu,
	linux-kernel, linux-arm-msm, quic_cgoldswo, quic_sukadev,
	quic_pdaly, quic_sudaraja, djakov

Add the device-tree nodes for the TBUs (translation buffer units) that
are present on the sdm845 platforms. The TBUs can be used debug the
kernel and provide additional information when a context faults occur.

Describe the all registers, clocks, interconnects and power-domain
resources that are needed for each of the TBUs.

Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 85 ++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 055ca80c0075..984770be5e08 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -15,6 +15,7 @@
 #include <dt-bindings/dma/qcom-gpi.h>
 #include <dt-bindings/firmware/qcom,scm.h>
 #include <dt-bindings/gpio/gpio.h>
+#include <dt-bindings/interconnect/qcom,icc.h>
 #include <dt-bindings/interconnect/qcom,osm-l3.h>
 #include <dt-bindings/interconnect/qcom,sdm845.h>
 #include <dt-bindings/interrupt-controller/arm-gic.h>
@@ -5030,6 +5031,7 @@ pil-reloc@94c {
 		apps_smmu: iommu@15000000 {
 			compatible = "qcom,sdm845-smmu-500", "arm,mmu-500";
 			reg = <0 0x15000000 0 0x80000>;
+			ranges;
 			#iommu-cells = <2>;
 			#global-interrupts = <1>;
 			interrupts = <GIC_SPI 65 IRQ_TYPE_LEVEL_HIGH>,
@@ -5097,6 +5099,89 @@ apps_smmu: iommu@15000000 {
 				     <GIC_SPI 341 IRQ_TYPE_LEVEL_HIGH>,
 				     <GIC_SPI 342 IRQ_TYPE_LEVEL_HIGH>,
 				     <GIC_SPI 343 IRQ_TYPE_LEVEL_HIGH>;
+
+			#address-cells = <2>;
+			#size-cells = <2>;
+
+			anoc_1_tbu: tbu@150c5000 {
+				compatible = "qcom,qsmmuv500-tbu";
+				reg = <0x0 0x150c5000 0x0 0x1000>;
+				reg-names = "base";
+				interconnects = <&system_noc MASTER_GNOC_SNOC QCOM_ICC_TAG_ACTIVE_ONLY
+						 &config_noc SLAVE_IMEM_CFG QCOM_ICC_TAG_ACTIVE_ONLY>;
+				power-domains = <&gcc HLOS1_VOTE_AGGRE_NOC_MMU_TBU1_GDSC>;
+				qcom,stream-id-range = <0x0 0x400>;
+			};
+
+			anoc_2_tbu: tbu@150c9000 {
+				compatible = "qcom,qsmmuv500-tbu";
+				reg = <0x0 0x150c9000 0x0 0x1000>;
+				reg-names = "base";
+				interconnects = <&system_noc MASTER_GNOC_SNOC QCOM_ICC_TAG_ACTIVE_ONLY
+						 &config_noc SLAVE_IMEM_CFG QCOM_ICC_TAG_ACTIVE_ONLY>;
+				power-domains = <&gcc HLOS1_VOTE_AGGRE_NOC_MMU_TBU2_GDSC>;
+				qcom,stream-id-range = <0x400 0x400>;
+			};
+
+			mnoc_hf_0_tbu: tbu@150cd000 {
+				compatible = "qcom,qsmmuv500-tbu";
+				reg = <0x0 0x150cd000 0x0 0x1000>;
+				reg-names = "base";
+				interconnects = <&mmss_noc MASTER_MDP0 QCOM_ICC_TAG_ACTIVE_ONLY
+						 &mmss_noc SLAVE_MNOC_HF_MEM_NOC QCOM_ICC_TAG_ACTIVE_ONLY>;
+				power-domains = <&gcc HLOS1_VOTE_MMNOC_MMU_TBU_HF0_GDSC>;
+				qcom,stream-id-range = <0x800 0x400>;
+			};
+
+			mnoc_hf_1_tbu: tbu@150d1000 {
+				compatible = "qcom,qsmmuv500-tbu";
+				reg = <0x0 0x150d1000 0x0 0x1000>;
+				reg-names = "base";
+				interconnects = <&mmss_noc MASTER_MDP0 QCOM_ICC_TAG_ACTIVE_ONLY
+						 &mmss_noc SLAVE_MNOC_HF_MEM_NOC QCOM_ICC_TAG_ACTIVE_ONLY>;
+				power-domains = <&gcc HLOS1_VOTE_MMNOC_MMU_TBU_HF1_GDSC>;
+				qcom,stream-id-range = <0xc00 0x400>;
+			};
+
+			mnoc_sf_0_tbu: tbu@150d5000 {
+				compatible = "qcom,qsmmuv500-tbu";
+				reg = <0x0 0x150d5000 0x0 0x1000>;
+				reg-names = "base";
+				interconnects = <&mmss_noc MASTER_CAMNOC_SF QCOM_ICC_TAG_ACTIVE_ONLY
+						 &mmss_noc SLAVE_MNOC_SF_MEM_NOC QCOM_ICC_TAG_ACTIVE_ONLY>;
+				power-domains = <&gcc HLOS1_VOTE_MMNOC_MMU_TBU_SF_GDSC>;
+				qcom,stream-id-range = <0x1000 0x400>;
+			};
+
+			compute_dsp_tbu: tbu@150d9000 {
+				compatible = "qcom,qsmmuv500-tbu";
+				reg = <0x0 0x150d9000 0x0 0x1000>;
+				reg-names = "base";
+				interconnects = <&system_noc MASTER_GNOC_SNOC QCOM_ICC_TAG_ACTIVE_ONLY
+						 &config_noc SLAVE_IMEM_CFG QCOM_ICC_TAG_ACTIVE_ONLY>;
+				qcom,stream-id-range = <0x1400 0x400>;
+			};
+
+			adsp_tbu: tbu@150dd000 {
+				compatible = "qcom,qsmmuv500-tbu";
+				reg = <0x0 0x150dd000 0x0 0x1000>;
+				reg-names = "base";
+				interconnects = <&system_noc MASTER_GNOC_SNOC QCOM_ICC_TAG_ACTIVE_ONLY
+						 &config_noc SLAVE_IMEM_CFG QCOM_ICC_TAG_ACTIVE_ONLY>;
+				power-domains = <&gcc HLOS1_VOTE_AGGRE_NOC_MMU_AUDIO_TBU_GDSC>;
+				qcom,stream-id-range = <0x1800 0x400>;
+			};
+
+			anoc_1_pcie_tbu: tbu@150e1000 {
+				compatible = "qcom,qsmmuv500-tbu";
+				reg = <0x0 0x150e1000 0x0 0x1000>;
+				reg-names = "base";
+				clocks = <&gcc GCC_AGGRE_NOC_PCIE_TBU_CLK>;
+				interconnects = <&system_noc MASTER_GNOC_SNOC QCOM_ICC_TAG_ACTIVE_ONLY
+						 &config_noc SLAVE_IMEM_CFG QCOM_ICC_TAG_ACTIVE_ONLY>;
+				power-domains = <&gcc HLOS1_VOTE_AGGRE_NOC_MMU_PCIE_TBU_GDSC>;
+				qcom,stream-id-range = <0x1c00 0x400>;
+			};
 		};
 
 		lpasscc: clock-controller@17014000 {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings
  2023-10-19  2:19 ` [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings Georgi Djakov
@ 2023-10-19 11:12   ` Konrad Dybcio
  2023-10-24 18:42   ` Rob Herring
  1 sibling, 0 replies; 15+ messages in thread
From: Konrad Dybcio @ 2023-10-19 11:12 UTC (permalink / raw)
  To: Georgi Djakov, robh+dt, krzysztof.kozlowski+dt, conor+dt, will,
	robin.murphy, joro
  Cc: devicetree, andersson, linux-arm-kernel, iommu, linux-kernel,
	linux-arm-msm, quic_cgoldswo, quic_sukadev, quic_pdaly,
	quic_sudaraja, djakov



On 10/19/23 04:19, Georgi Djakov wrote:
> The "apps_smmu" on the Qualcomm sdm845 platform is an implementation
> of the ARM SMMU-500, that consists of a single TCU (Translation Control
> Unit) and multiple TBUs (Translation Buffer Units). The TCU is already
> being described in the ARM SMMU DT schema. Add also bindings for the
> TBUs so that we can describe their properties.
> 
> In this DT schema, the TBUs are modelled as a child devices of the TCU
> and each of them is described with it's own resources such as clocks,
> power domains, interconnects etc.
> 
> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
> ---
>   .../devicetree/bindings/iommu/arm,smmu.yaml   | 13 ++++
>   .../bindings/iommu/qcom,qsmmuv500-tbu.yaml    | 67 +++++++++++++++++++
>   2 files changed, 80 insertions(+)
>   create mode 100644 Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> index cf29ab10501c..afc323b4bbc5 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> @@ -230,6 +230,19 @@ properties:
>         enabled for any given device.
>       $ref: /schemas/types.yaml#/definitions/phandle
>   
> +  '#address-cells':
> +    const: 2
> +
> +  '#size-cells':
> +    const: 2
> +
> +  ranges: true
> +
> +patternProperties:
> +  "^tbu@[0-9a-f]+$":
> +    $ref: qcom,qsmmuv500-tbu.yaml
> +    description: The SMMU may include Translation Buffer Units (TBU) as subnodes
> +
>   required:
>     - compatible
>     - reg
> diff --git a/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml b/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
> new file mode 100644
> index 000000000000..4baba7397e90
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
> @@ -0,0 +1,67 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iommu/qcom,qsmmuv500-tbu.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Qualcomm TBU (Translation Buffer Unit)
> +
> +maintainers:
> +  - Georgi Djakov <quic_c_gdjako@quicinc.com>
> +
> +description:
> +  TBU nodes represent Translation Buffer Units in an ARM SMMU. Each TBU node
> +  should be a child node of the SMMU in the device tree.
description: refers to the hardware, so it should say what this IP
is, what it does and things like that

> +
> +properties:
> +  compatible:
> +    enum:
> +      - qcom,qsmmuv500-tbu
Should we expect this list to grow?

> +
> +  reg:
> +    items:
> +      - description: Address and size of the TBU's register space.
> +
> +  reg-names:
> +    items:
> +      - const: base
> +
> +  clocks:
> +    maxItems: 1
> +
> +  interconnects:
> +    maxItems: 1
> +
> +  power-domains:
> +    maxItems: 1
> +
> +  qcom,stream-id-range:
> +    $ref: /schemas/types.yaml#/definitions/uint32-array
> +    description: Stream ID range (address and size) that is assigned by the TBU
I believe you need to size-limit this.

If it's only supposed to be a single tuple, perhaps it could be said
explicitly.

> +
> +required:
> +  - compatible
> +  - reg
> +  - interconnects
> +  - qcom,stream-id-range
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    #include <dt-bindings/clock/qcom,gcc-sdm845.h>
> +    #include <dt-bindings/interconnect/qcom,sdm845.h>
> +    #include <dt-bindings/power/qcom-rpmpd.h>
> +
> +
2 newlines seems excessive

> +    tbu@150e1000 {
> +        compatible = "qcom,qsmmuv500-tbu";
> +        reg = <0x150e1000 0x1000>;
> +        reg-names = "base";
> +        clocks = <&gcc GCC_AGGRE_NOC_PCIE_TBU_CLK>;
> +        power-domains = <&gcc HLOS1_VOTE_AGGRE_NOC_MMU_PCIE_TBU_GDSC>;
> +        interconnects = <&system_noc MASTER_GNOC_SNOC 0 &config_noc SLAVE_IMEM_CFG 0>;
> +        qcom,stream-id-range = <0x1c00 0x400>;
> +    };
I think it would be beneficial if this tbu was a child of some smmu node
like it's intended to be.

Konrad

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] iommu/arm-smmu-qcom: Add support for TBUs
  2023-10-19  2:19 ` [PATCH 2/6] iommu/arm-smmu-qcom: Add support for TBUs Georgi Djakov
@ 2023-10-19 11:13   ` Konrad Dybcio
  0 siblings, 0 replies; 15+ messages in thread
From: Konrad Dybcio @ 2023-10-19 11:13 UTC (permalink / raw)
  To: Georgi Djakov, robh+dt, krzysztof.kozlowski+dt, conor+dt, will,
	robin.murphy, joro
  Cc: devicetree, andersson, linux-arm-kernel, iommu, linux-kernel,
	linux-arm-msm, quic_cgoldswo, quic_sukadev, quic_pdaly,
	quic_sudaraja, djakov



On 10/19/23 04:19, Georgi Djakov wrote:
> The ARM MMU-500 implements a Translation Buffer Unit (TBU) for each
> connected master besides a single TCU which controls and manages the
> address translations.
> 
> Allow the Qualcomm SMMU driver to probe for any TBU devices that can
> provide additional debug features like triggering transactions, logging
> outstanding transactions, snapshot capture etc. The most basic use-case
> would be to get information from the TBUs and print it during a context
> fault.
> 
> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
> ---
>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 12 ++++++++++++
>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h |  4 +++-
>   2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 7f52ac67495f..655c7f50ca84 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -1,12 +1,14 @@
>   // SPDX-License-Identifier: GPL-2.0-only
>   /*
>    * Copyright (c) 2019, The Linux Foundation. All rights reserved.
> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved
>    */
>   
>   #include <linux/acpi.h>
>   #include <linux/adreno-smmu-priv.h>
>   #include <linux/delay.h>
>   #include <linux/of_device.h>
> +#include <linux/of_platform.h>
>   #include <linux/firmware/qcom/qcom_scm.h>
>   
>   #include "arm-smmu.h"
> @@ -466,6 +468,16 @@ static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
>   	qsmmu->smmu.impl = impl;
>   	qsmmu->cfg = data->cfg;
>   
> +	/* Populate TBU devices if such are present in DT */
> +	if (np && of_device_is_compatible(np, "arm,mmu-500")) {
I'd say this can be unconditional.

> +		int ret;
> +
> +		INIT_LIST_HEAD(&qsmmu->tbu_list);
This list is unused.

Konrad

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] iommu/arm-smmu-qcom: Add Qualcomm TBU driver
  2023-10-19  2:19 ` [PATCH 3/6] iommu/arm-smmu-qcom: Add Qualcomm TBU driver Georgi Djakov
@ 2023-10-21 21:05   ` Bjorn Andersson
  2023-11-18  2:26     ` Georgi Djakov
  2023-10-26 15:25   ` kernel test robot
  1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Andersson @ 2023-10-21 21:05 UTC (permalink / raw)
  To: Georgi Djakov
  Cc: robh+dt, krzysztof.kozlowski+dt, conor+dt, will, robin.murphy,
	joro, devicetree, konrad.dybcio, linux-arm-kernel, iommu,
	linux-kernel, linux-arm-msm, quic_cgoldswo, quic_sukadev,
	quic_pdaly, quic_sudaraja, djakov

On Wed, Oct 18, 2023 at 07:19:20PM -0700, Georgi Djakov wrote:
> Add driver for the Qualcomm implementation of the ARM MMU-500 TBU.
> The driver will enable the resources needed by the TBU and will
> configure the registers for some debug features like checking if
> there are any pending transactions, capturing transactions and
> running ATOS (Address Translation Operations). ATOS/eCATS are used
> to manually trigger an address translation of IOVA to physical
> address by the SMMU hardware.

I still don't think this commit message clearly enough describe the
problem you're trying to solve.

Not until I had read the Kconfig help text did I pay attention to the
significance of the words "some debug features" in the middle of the
paragraph.


Please describe your changes in accordance with [1], i.e. clearly
describe the problem you're trying to solve, then discuss the technical
solution in the patch.

[1] https://docs.kernel.org/process/submitting-patches.html#describe-your-changes

[..]
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
[..]
> +#ifdef CONFIG_ARM_SMMU_QCOM_TBU
> +
> +struct qsmmuv500_tbu {
> +	struct device *dev;
> +	struct arm_smmu_device *smmu;
> +	u32 sid_range[2];
> +	struct list_head list;
> +	struct clk *clk;
> +	struct icc_path	*path;
> +	void __iomem *base;
> +	spinlock_t halt_lock; /* protects halt count */

But in particular it makes sure that multiple halt or resume can't
execute concurrently.

> +	int halt_count;
> +};
> +
> +static DEFINE_SPINLOCK(ecats_lock);
> +
> +static struct qsmmuv500_tbu *qsmmuv500_find_tbu(struct qcom_smmu *qsmmu, u32 sid)
> +{
> +	struct qsmmuv500_tbu *tbu = NULL;
> +	u32 start, end;
> +
> +	mutex_lock(&qsmmu->tbu_list_lock);
> +
> +	list_for_each_entry(tbu, &qsmmu->tbu_list, list) {
> +		start = tbu->sid_range[0];
> +		end = start + tbu->sid_range[1];
> +
> +		if (start <= sid && sid < end)
> +			break;
> +	}
> +
> +	mutex_unlock(&qsmmu->tbu_list_lock);
> +
> +	return tbu;
> +}
> +
> +static int qsmmuv500_tbu_halt(struct qsmmuv500_tbu *tbu, struct arm_smmu_domain *smmu_domain)
> +{
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	int ret = 0, idx = smmu_domain->cfg.cbndx;
> +	unsigned long flags;
> +	u32 val, fsr, status;
> +
> +	spin_lock_irqsave(&tbu->halt_lock, flags);

Does this really need to run with interrupts disabled?

> +	if (tbu->halt_count) {
> +		tbu->halt_count++;
> +		goto out;
> +	}
> +
[..]
> +static phys_addr_t qsmmuv500_iova_to_phys(struct arm_smmu_domain *smmu_domain,
> +					  dma_addr_t iova, u32 sid)
> +{
[..]
> +	/* Only one concurrent atos operation */
> +	spin_lock_irqsave(&ecats_lock, flags);

Does this require interrupts to be disabled?

> +
> +	/*
> +	 * After a failed translation, the next successful translation will
> +	 * incorrectly be reported as a failure.

"So if the ECATS translation fails, attempt the lookup more time."

> +	 */
> +	do {
> +		phys = qsmmuv500_tbu_trigger_atos(smmu_domain, tbu, iova, sid);
> +
> +		fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> +		if (fsr & ARM_SMMU_FSR_FAULT) {
> +			/* Clear pending interrupts */
> +			arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> +			/*
> +			 * Barrier required to ensure that the FSR is cleared
> +			 * before resuming SMMU operation.
> +			 */

Better be clear on what this actually does, for future readers' sake:

	 /* Ensure that FSR and RESUME operations aren't reordered. */

But is this really necessary, the two writes are for the same device,
can they still be reordered?

> +			wmb();
> +
> +			if (fsr & ARM_SMMU_FSR_SS)
> +				arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME,
> +						  ARM_SMMU_RESUME_TERMINATE);
> +		}
> +	} while (!phys && needs_redo++ < 2);

"needs_redo" sounds like a boolean to me. I think "attempt" would be a
better fit here.

> +
> +	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, sctlr_orig);
> +	spin_unlock_irqrestore(&ecats_lock, flags);
> +	qsmmuv500_tbu_resume(tbu);
> +
> +	/* Read to complete prior write transcations */
> +	readl_relaxed(tbu->base + DEBUG_SR_HALT_ACK_REG);
> +
> +	/* Wait for read to complete */

That's not what rmb() does. You don't need to do anything here,
readl_relaxed() returns when the read is done.

> +	rmb();
> +
> +disable_clk:
> +	clk_disable_unprepare(tbu->clk);
> +disable_icc:
> +	icc_set_bw(tbu->path, 0, 0);
> +
> +	return phys;
> +}
> +#endif
> +
>  static void qcom_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
>  				int sync, int status)
>  {
> @@ -588,3 +895,80 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
>  
>  	return smmu;
>  }
> +
> +#ifdef CONFIG_ARM_SMMU_QCOM_TBU
> +
> +static const struct of_device_id qsmmuv500_tbu_of_match[] = {
> +	{ .compatible = "qcom,qsmmuv500-tbu" },
> +	{ }
> +};

Place this below the remove function, as most other drivers do.

> +
> +static int qsmmuv500_tbu_probe(struct platform_device *pdev)
> +{
[..]
> +	mutex_lock(&qsmmu->tbu_list_lock);
> +	list_add_tail(&tbu->list, &qsmmu->tbu_list);

"tbu" is devres allocated, but you don't pull it off the list (or
synchronize) during remove.

> +	mutex_unlock(&qsmmu->tbu_list_lock);
> +
> +	dev_set_drvdata(dev, tbu);
> +
> +	return 0;
> +}
> +
> +static void qsmmuv500_tbu_remove(struct platform_device *pdev)
> +{
> +	struct qsmmuv500_tbu *tbu = dev_get_drvdata(&pdev->dev);
> +
> +	clk_disable_unprepare(tbu->clk);

This isn't balanced.

> +	clk_put(tbu->clk);
> +	icc_put(tbu->path);
> +}
> +
> +static struct platform_driver qsmmuv500_tbu_driver = {
> +	.driver = {
> +		.name           = "qsmmuv500-tbu",
> +		.of_match_table = of_match_ptr(qsmmuv500_tbu_of_match),

Won't of_match_ptr() result in a build warning if built without
CONFIG_OF?

> +	},
> +	.probe  = qsmmuv500_tbu_probe,
> +	.remove_new = qsmmuv500_tbu_remove,
> +};
> +module_platform_driver(qsmmuv500_tbu_driver);

This file acts as a library for the arm-smmu driver today, adding a
platform_driver here makes it look like this is a separate driver.

Which makes me wonder, why is this a separate driver? Why not just
loop over the subnodes and build the tbu_list in qcom_smmu_impl_init()?

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings
  2023-10-19  2:19 ` [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings Georgi Djakov
  2023-10-19 11:12   ` Konrad Dybcio
@ 2023-10-24 18:42   ` Rob Herring
  2023-10-24 22:26     ` Robin Murphy
  1 sibling, 1 reply; 15+ messages in thread
From: Rob Herring @ 2023-10-24 18:42 UTC (permalink / raw)
  To: Georgi Djakov
  Cc: krzysztof.kozlowski+dt, conor+dt, will, robin.murphy, joro,
	devicetree, andersson, konrad.dybcio, linux-arm-kernel, iommu,
	linux-kernel, linux-arm-msm, quic_cgoldswo, quic_sukadev,
	quic_pdaly, quic_sudaraja, djakov

On Wed, Oct 18, 2023 at 07:19:18PM -0700, Georgi Djakov wrote:
> The "apps_smmu" on the Qualcomm sdm845 platform is an implementation
> of the ARM SMMU-500, that consists of a single TCU (Translation Control
> Unit) and multiple TBUs (Translation Buffer Units). The TCU is already
> being described in the ARM SMMU DT schema. Add also bindings for the
> TBUs so that we can describe their properties.

Arm SMMU-500 is an implementation, too. Is QCom's a modified 
implementation or you are just the first to want to control TBU 
resources?

You need to split this into what could be any SMMU-500 implementation 
and what is truly QCom specific (i.e. modified). Unlike some licensed IP 
that's a free-for-all on DT resources, Arm IP has public specs so we 
don't have to guess.

> In this DT schema, the TBUs are modelled as a child devices of the TCU
> and each of them is described with it's own resources such as clocks,
> power domains, interconnects etc.
> 
> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
> ---
>  .../devicetree/bindings/iommu/arm,smmu.yaml   | 13 ++++
>  .../bindings/iommu/qcom,qsmmuv500-tbu.yaml    | 67 +++++++++++++++++++
>  2 files changed, 80 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> index cf29ab10501c..afc323b4bbc5 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> @@ -230,6 +230,19 @@ properties:
>        enabled for any given device.
>      $ref: /schemas/types.yaml#/definitions/phandle
>  
> +  '#address-cells':
> +    const: 2
> +
> +  '#size-cells':
> +    const: 2
> +
> +  ranges: true
> +
> +patternProperties:
> +  "^tbu@[0-9a-f]+$":
> +    $ref: qcom,qsmmuv500-tbu.yaml

Generic SMMU binding includes something QCom specific. That's not right.


> +    description: The SMMU may include Translation Buffer Units (TBU) as subnodes
> +
>  required:
>    - compatible
>    - reg
> diff --git a/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml b/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
> new file mode 100644
> index 000000000000..4baba7397e90
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
> @@ -0,0 +1,67 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iommu/qcom,qsmmuv500-tbu.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Qualcomm TBU (Translation Buffer Unit)
> +
> +maintainers:
> +  - Georgi Djakov <quic_c_gdjako@quicinc.com>
> +
> +description:
> +  TBU nodes represent Translation Buffer Units in an ARM SMMU. Each TBU node
> +  should be a child node of the SMMU in the device tree.
> +
> +properties:
> +  compatible:
> +    enum:
> +      - qcom,qsmmuv500-tbu
> +
> +  reg:
> +    items:
> +      - description: Address and size of the TBU's register space.
> +
> +  reg-names:
> +    items:
> +      - const: base
> +
> +  clocks:
> +    maxItems: 1
> +
> +  interconnects:
> +    maxItems: 1
> +
> +  power-domains:
> +    maxItems: 1
> +
> +  qcom,stream-id-range:
> +    $ref: /schemas/types.yaml#/definitions/uint32-array
> +    description: Stream ID range (address and size) that is assigned by the TBU
> +
> +required:
> +  - compatible
> +  - reg
> +  - interconnects
> +  - qcom,stream-id-range
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    #include <dt-bindings/clock/qcom,gcc-sdm845.h>
> +    #include <dt-bindings/interconnect/qcom,sdm845.h>
> +    #include <dt-bindings/power/qcom-rpmpd.h>
> +
> +
> +    tbu@150e1000 {
> +        compatible = "qcom,qsmmuv500-tbu";
> +        reg = <0x150e1000 0x1000>;
> +        reg-names = "base";
> +        clocks = <&gcc GCC_AGGRE_NOC_PCIE_TBU_CLK>;
> +        power-domains = <&gcc HLOS1_VOTE_AGGRE_NOC_MMU_PCIE_TBU_GDSC>;
> +        interconnects = <&system_noc MASTER_GNOC_SNOC 0 &config_noc SLAVE_IMEM_CFG 0>;
> +        qcom,stream-id-range = <0x1c00 0x400>;
> +    };
> +
> +...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings
  2023-10-24 18:42   ` Rob Herring
@ 2023-10-24 22:26     ` Robin Murphy
  2023-10-26 17:58       ` Georgi Djakov
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Murphy @ 2023-10-24 22:26 UTC (permalink / raw)
  To: Rob Herring, Georgi Djakov
  Cc: krzysztof.kozlowski+dt, conor+dt, will, joro, devicetree,
	andersson, konrad.dybcio, linux-arm-kernel, iommu, linux-kernel,
	linux-arm-msm, quic_cgoldswo, quic_sukadev, quic_pdaly,
	quic_sudaraja, djakov

On 2023-10-24 19:42, Rob Herring wrote:
> On Wed, Oct 18, 2023 at 07:19:18PM -0700, Georgi Djakov wrote:
>> The "apps_smmu" on the Qualcomm sdm845 platform is an implementation
>> of the ARM SMMU-500, that consists of a single TCU (Translation Control
>> Unit) and multiple TBUs (Translation Buffer Units). The TCU is already
>> being described in the ARM SMMU DT schema. Add also bindings for the
>> TBUs so that we can describe their properties.
> 
> Arm SMMU-500 is an implementation, too. Is QCom's a modified
> implementation or you are just the first to want to control TBU
> resources?

It's very very modified. The stock MMU-500 has very few 
microarchitectural registers[1], they all live within the regular SMMU 
address space, are all Secure-only by default, and don't do anything 
like the shenanigans here.

That said, looking at patch #3, I don't really understand why we need 
any of this stuff upstream... AFAICS it's doing an insane amount of work 
to use complicated imp-def debug functionality to duplicate things that 
the main driver can already do far more efficiently. Sure, in general it 
seems like it could potentially be useful stuff for bringing up and 
debugging a new driver, but the Linux SMMUv2 driver is mature and 
frankly already closer to being obsolete than to being new...

[ digression since I can't be bothered to split this discussion by 
replying separately to patch #3: ]

I mean, just looking at qsmmuv500_iova_to_phys(), you do realise that 
that's going to be called potentially multiple times by iommu-dma for 
*every* dma_sync and dma_unmap call and really wants to be fast, right? 
This brings to mind all the work I did a couple of years back[2] because 
strict TLB invalidation on unmap was considered too slow for certain 
devices on QCom platforms by ChromeOS, yet what this achieves looks like 
it could easily be up to an order of magnitude slower again :(

> You need to split this into what could be any SMMU-500 implementation
> and what is truly QCom specific (i.e. modified). Unlike some licensed IP
> that's a free-for-all on DT resources, Arm IP has public specs so we
> don't have to guess.
> 
>> In this DT schema, the TBUs are modelled as a child devices of the TCU
>> and each of them is described with it's own resources such as clocks,
>> power domains, interconnects etc.
>>
>> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
>> ---
>>   .../devicetree/bindings/iommu/arm,smmu.yaml   | 13 ++++
>>   .../bindings/iommu/qcom,qsmmuv500-tbu.yaml    | 67 +++++++++++++++++++
>>   2 files changed, 80 insertions(+)
>>   create mode 100644 Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
>> index cf29ab10501c..afc323b4bbc5 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
>> @@ -230,6 +230,19 @@ properties:
>>         enabled for any given device.
>>       $ref: /schemas/types.yaml#/definitions/phandle
>>   
>> +  '#address-cells':
>> +    const: 2
>> +
>> +  '#size-cells':
>> +    const: 2
>> +
>> +  ranges: true
>> +
>> +patternProperties:
>> +  "^tbu@[0-9a-f]+$":
>> +    $ref: qcom,qsmmuv500-tbu.yaml
> 
> Generic SMMU binding includes something QCom specific. That's not right.
> 
> 
>> +    description: The SMMU may include Translation Buffer Units (TBU) as subnodes
>> +
>>   required:
>>     - compatible
>>     - reg
>> diff --git a/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml b/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
>> new file mode 100644
>> index 000000000000..4baba7397e90
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/iommu/qcom,qsmmuv500-tbu.yaml
>> @@ -0,0 +1,67 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/iommu/qcom,qsmmuv500-tbu.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Qualcomm TBU (Translation Buffer Unit)
>> +
>> +maintainers:
>> +  - Georgi Djakov <quic_c_gdjako@quicinc.com>
>> +
>> +description:
>> +  TBU nodes represent Translation Buffer Units in an ARM SMMU. Each TBU node
>> +  should be a child node of the SMMU in the device tree.
>> +
>> +properties:
>> +  compatible:
>> +    enum:
>> +      - qcom,qsmmuv500-tbu
>> +
>> +  reg:
>> +    items:
>> +      - description: Address and size of the TBU's register space.
>> +
>> +  reg-names:
>> +    items:
>> +      - const: base
>> +
>> +  clocks:
>> +    maxItems: 1
>> +
>> +  interconnects:
>> +    maxItems: 1

What does this interconnect represent? MMU-500 TBUs don't access memory 
themselves[3], they only have an internal AXI Stream interface to the 
TCU to request translations.

Thanks,
Robin.

[1] 
https://developer.arm.com/documentation/ddi0517/f/programmers-model/memory-model
[2] 
https://lore.kernel.org/all/d652966348c78457c38bf18daf369272a4ebc2c9.1628682049.git.robin.murphy@arm.com/
[3] Yeah yeah, other than the special case of TBU0 issuing pagetable 
walks on behalf of the TCU when it's not configured with its own 
dedicated PTW interface, I know :P

>> +
>> +  power-domains:
>> +    maxItems: 1
>> +
>> +  qcom,stream-id-range:
>> +    $ref: /schemas/types.yaml#/definitions/uint32-array
>> +    description: Stream ID range (address and size) that is assigned by the TBU
>> +
>> +required:
>> +  - compatible
>> +  - reg
>> +  - interconnects
>> +  - qcom,stream-id-range
>> +
>> +additionalProperties: false
>> +
>> +examples:
>> +  - |
>> +    #include <dt-bindings/clock/qcom,gcc-sdm845.h>
>> +    #include <dt-bindings/interconnect/qcom,sdm845.h>
>> +    #include <dt-bindings/power/qcom-rpmpd.h>
>> +
>> +
>> +    tbu@150e1000 {
>> +        compatible = "qcom,qsmmuv500-tbu";
>> +        reg = <0x150e1000 0x1000>;
>> +        reg-names = "base";
>> +        clocks = <&gcc GCC_AGGRE_NOC_PCIE_TBU_CLK>;
>> +        power-domains = <&gcc HLOS1_VOTE_AGGRE_NOC_MMU_PCIE_TBU_GDSC>;
>> +        interconnects = <&system_noc MASTER_GNOC_SNOC 0 &config_noc SLAVE_IMEM_CFG 0>;
>> +        qcom,stream-id-range = <0x1c00 0x400>;
>> +    };
>> +
>> +...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] iommu/arm-smmu-qcom: Add Qualcomm TBU driver
  2023-10-19  2:19 ` [PATCH 3/6] iommu/arm-smmu-qcom: Add Qualcomm TBU driver Georgi Djakov
  2023-10-21 21:05   ` Bjorn Andersson
@ 2023-10-26 15:25   ` kernel test robot
  1 sibling, 0 replies; 15+ messages in thread
From: kernel test robot @ 2023-10-26 15:25 UTC (permalink / raw)
  To: Georgi Djakov, robh+dt, krzysztof.kozlowski+dt, conor+dt, will,
	robin.murphy, joro
  Cc: oe-kbuild-all, devicetree, andersson, konrad.dybcio,
	linux-arm-kernel, iommu, linux-kernel, linux-arm-msm,
	quic_cgoldswo, quic_sukadev, quic_pdaly, quic_sudaraja, djakov

Hi Georgi,

kernel test robot noticed the following build errors:

[auto build test ERROR on robh/for-next]
[also build test ERROR on arm/for-next arm/fixes arm64/for-next/core clk/clk-next kvmarm/next rockchip/for-next shawnguo/for-next soc/for-next linus/master v6.6-rc7 next-20231025]
[cannot apply to joro-iommu/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Georgi-Djakov/dt-bindings-iommu-Add-Translation-Buffer-Unit-bindings/20231019-102257
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next
patch link:    https://lore.kernel.org/r/20231019021923.13939-4-quic_c_gdjako%40quicinc.com
patch subject: [PATCH 3/6] iommu/arm-smmu-qcom: Add Qualcomm TBU driver
config: arm64-allmodconfig (https://download.01.org/0day-ci/archive/20231026/202310262330.pfzI76DH-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231026/202310262330.pfzI76DH-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310262330.pfzI76DH-lkp@intel.com/

All errors (new ones prefixed by >>):

   aarch64-linux-ld: drivers/iommu/arm/arm-smmu/arm-smmu-qcom.o: in function `qsmmuv500_tbu_driver_init':
>> arm-smmu-qcom.c:(.init.text+0x8): multiple definition of `init_module'; drivers/iommu/arm/arm-smmu/arm-smmu.o:arm-smmu.c:(.init.text+0x8): first defined here
   aarch64-linux-ld: drivers/iommu/arm/arm-smmu/arm-smmu-qcom.o: in function `qsmmuv500_tbu_driver_exit':
>> arm-smmu-qcom.c:(.exit.text+0x0): multiple definition of `cleanup_module'; drivers/iommu/arm/arm-smmu/arm-smmu.o:arm-smmu.c:(.exit.text+0x0): first defined here

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings
  2023-10-24 22:26     ` Robin Murphy
@ 2023-10-26 17:58       ` Georgi Djakov
  0 siblings, 0 replies; 15+ messages in thread
From: Georgi Djakov @ 2023-10-26 17:58 UTC (permalink / raw)
  To: Robin Murphy, Rob Herring
  Cc: krzysztof.kozlowski+dt, conor+dt, will, joro, devicetree,
	andersson, konrad.dybcio, linux-arm-kernel, iommu, linux-kernel,
	linux-arm-msm, quic_cgoldswo, quic_sukadev, quic_pdaly,
	quic_sudaraja, djakov

Hi Robin,

Thanks for taking a look at this!

On 10/25/2023 1:26 AM, Robin Murphy wrote:
> On 2023-10-24 19:42, Rob Herring wrote:
>> On Wed, Oct 18, 2023 at 07:19:18PM -0700, Georgi Djakov wrote:
>>> The "apps_smmu" on the Qualcomm sdm845 platform is an implementation
>>> of the ARM SMMU-500, that consists of a single TCU (Translation Control
>>> Unit) and multiple TBUs (Translation Buffer Units). The TCU is already
>>> being described in the ARM SMMU DT schema. Add also bindings for the
>>> TBUs so that we can describe their properties.
>>
>> Arm SMMU-500 is an implementation, too. Is QCom's a modified
>> implementation or you are just the first to want to control TBU
>> resources?
> 
> It's very very modified. The stock MMU-500 has very few microarchitectural registers[1], they all live within the regular SMMU address space, are all Secure-only by default, and don't do anything like the shenanigans here.
> 
> That said, looking at patch #3, I don't really understand why we need any of this stuff upstream... AFAICS it's doing an insane amount of work to use complicated imp-def debug functionality to duplicate things that the main driver can already do far more efficiently. Sure, in general it seems like it could potentially be useful stuff for bringing up and debugging a new driver, but the Linux SMMUv2 driver is mature and frankly already closer to being obsolete than to being new...

Yes, the arm-smmu driver already does similar stuff with the ATS feature, but this unfortunately isn't available in Qualcomm's implementation. Instead of that, there is this eCATS thing for debugging various issues including hardware issues. It supports many features, but here we use it just for hardware page table walks. And in the majority of cases it's expected that the software and hardware page table walks give the same result, but if there is a difference, it's sign of a problem. For example, it helped in the past to trace some power management issues of the SMMU. This of course is a debug feature and can enabled when needed.

> [ digression since I can't be bothered to split this discussion by replying separately to patch #3: ]
> 
> I mean, just looking at qsmmuv500_iova_to_phys(), you do realise that that's going to be called potentially multiple times by iommu-dma for *every* dma_sync and dma_unmap call and really wants to be fast, right? This brings to mind all the work I did a couple of years back[2] because strict TLB invalidation on unmap was considered too slow for certain devices on QCom platforms by ChromeOS, yet what this achieves looks like it could easily be up to an order of magnitude slower again :(

No, this is not going to be called for every dma_sync and dma_unmap. In patch 5 we register a custom context_fault handler that uses this code to get information from the TBUs. So all of this is executed only when a context fault occurs. Does this sound acceptable?

[..]>>> +description:
>>> +  TBU nodes represent Translation Buffer Units in an ARM SMMU. Each TBU node
>>> +  should be a child node of the SMMU in the device tree.
>>> +
>>> +properties:
>>> +  compatible:
>>> +    enum:
>>> +      - qcom,qsmmuv500-tbu
>>> +
>>> +  reg:
>>> +    items:
>>> +      - description: Address and size of the TBU's register space.
>>> +
>>> +  reg-names:
>>> +    items:
>>> +      - const: base
>>> +
>>> +  clocks:
>>> +    maxItems: 1
>>> +
>>> +  interconnects:
>>> +    maxItems: 1
> 
> What does this interconnect represent? MMU-500 TBUs don't access memory themselves[3], they only have an internal AXI Stream interface to the TCU to request translations.

It's to enable access from the CPU to the register space of the TBUs.

Thanks,
Georgi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] iommu/arm-smmu-qcom: Add Qualcomm TBU driver
  2023-10-21 21:05   ` Bjorn Andersson
@ 2023-11-18  2:26     ` Georgi Djakov
  0 siblings, 0 replies; 15+ messages in thread
From: Georgi Djakov @ 2023-11-18  2:26 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: robh+dt, krzysztof.kozlowski+dt, conor+dt, will, robin.murphy,
	joro, devicetree, konrad.dybcio, linux-arm-kernel, iommu,
	linux-kernel, linux-arm-msm, quic_cgoldswo, quic_sukadev,
	quic_pdaly, quic_sudaraja, djakov

On 10/22/2023 12:05 AM, Bjorn Andersson wrote:
> On Wed, Oct 18, 2023 at 07:19:20PM -0700, Georgi Djakov wrote:
>> Add driver for the Qualcomm implementation of the ARM MMU-500 TBU.
>> The driver will enable the resources needed by the TBU and will
>> configure the registers for some debug features like checking if
>> there are any pending transactions, capturing transactions and
>> running ATOS (Address Translation Operations). ATOS/eCATS are used
>> to manually trigger an address translation of IOVA to physical
>> address by the SMMU hardware.
> 
> I still don't think this commit message clearly enough describe the
> problem you're trying to solve.
> 
> Not until I had read the Kconfig help text did I pay attention to the
> significance of the words "some debug features" in the middle of the
> paragraph.
> 
> 
> Please describe your changes in accordance with [1], i.e. clearly
> describe the problem you're trying to solve, then discuss the technical
> solution in the patch.

Thanks Bjorn, I'll try to improve it! 

> [1] https://docs.kernel.org/process/submitting-patches.html#describe-your-changes
> 
> [..]
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> [..]
>> +#ifdef CONFIG_ARM_SMMU_QCOM_TBU
>> +
>> +struct qsmmuv500_tbu {
>> +	struct device *dev;
>> +	struct arm_smmu_device *smmu;
>> +	u32 sid_range[2];
>> +	struct list_head list;
>> +	struct clk *clk;
>> +	struct icc_path	*path;
>> +	void __iomem *base;
>> +	spinlock_t halt_lock; /* protects halt count */
> 
> But in particular it makes sure that multiple halt or resume can't
> execute concurrently.

Exactly. Will mention it. 

>> +	int halt_count;
>> +};
>> +
>> +static DEFINE_SPINLOCK(ecats_lock);
>> +
>> +static struct qsmmuv500_tbu *qsmmuv500_find_tbu(struct qcom_smmu *qsmmu, u32 sid)
>> +{
>> +	struct qsmmuv500_tbu *tbu = NULL;
>> +	u32 start, end;
>> +
>> +	mutex_lock(&qsmmu->tbu_list_lock);
>> +
>> +	list_for_each_entry(tbu, &qsmmu->tbu_list, list) {
>> +		start = tbu->sid_range[0];
>> +		end = start + tbu->sid_range[1];
>> +
>> +		if (start <= sid && sid < end)
>> +			break;
>> +	}
>> +
>> +	mutex_unlock(&qsmmu->tbu_list_lock);
>> +
>> +	return tbu;
>> +}
>> +
>> +static int qsmmuv500_tbu_halt(struct qsmmuv500_tbu *tbu, struct arm_smmu_domain *smmu_domain)
>> +{
>> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +	int ret = 0, idx = smmu_domain->cfg.cbndx;
>> +	unsigned long flags;
>> +	u32 val, fsr, status;
>> +
>> +	spin_lock_irqsave(&tbu->halt_lock, flags);
> 
> Does this really need to run with interrupts disabled?

This is being executed in threaded irq context. 

>> +	if (tbu->halt_count) {
>> +		tbu->halt_count++;
>> +		goto out;
>> +	}
>> +
> [..]
>> +static phys_addr_t qsmmuv500_iova_to_phys(struct arm_smmu_domain *smmu_domain,
>> +					  dma_addr_t iova, u32 sid)
>> +{
> [..]
>> +	/* Only one concurrent atos operation */
>> +	spin_lock_irqsave(&ecats_lock, flags);
> 
> Does this require interrupts to be disabled?

This also runs in the irq handler during context fault.

>> +
>> +	/*
>> +	 * After a failed translation, the next successful translation will
>> +	 * incorrectly be reported as a failure.
> 
> "So if the ECATS translation fails, attempt the lookup more time."
> 
>> +	 */
>> +	do {
>> +		phys = qsmmuv500_tbu_trigger_atos(smmu_domain, tbu, iova, sid);
>> +
>> +		fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
>> +		if (fsr & ARM_SMMU_FSR_FAULT) {
>> +			/* Clear pending interrupts */
>> +			arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
>> +			/*
>> +			 * Barrier required to ensure that the FSR is cleared
>> +			 * before resuming SMMU operation.
>> +			 */
> 
> Better be clear on what this actually does, for future readers' sake:
> 
> 	 /* Ensure that FSR and RESUME operations aren't reordered. */
> 
> But is this really necessary, the two writes are for the same device,
> can they still be reordered?

Right, these are to the same endpoint. It can be dropped. 

>> +			wmb();
>> +
>> +			if (fsr & ARM_SMMU_FSR_SS)
>> +				arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME,
>> +						  ARM_SMMU_RESUME_TERMINATE);
>> +		}
>> +	} while (!phys && needs_redo++ < 2);
> 
> "needs_redo" sounds like a boolean to me. I think "attempt" would be a
> better fit here.
> 

Ok.

>> +
>> +	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, sctlr_orig);
>> +	spin_unlock_irqrestore(&ecats_lock, flags);
>> +	qsmmuv500_tbu_resume(tbu);
>> +
>> +	/* Read to complete prior write transcations */
>> +	readl_relaxed(tbu->base + DEBUG_SR_HALT_ACK_REG);
>> +
>> +	/* Wait for read to complete */
> 
> That's not what rmb() does. You don't need to do anything here,
> readl_relaxed() returns when the read is done.

Ack.

>> +	rmb();
>> +
>> +disable_clk:
>> +	clk_disable_unprepare(tbu->clk);
>> +disable_icc:
>> +	icc_set_bw(tbu->path, 0, 0);
>> +
>> +	return phys;
>> +}
>> +#endif
>> +
>>  static void qcom_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
>>  				int sync, int status)
>>  {
>> @@ -588,3 +895,80 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
>>  
>>  	return smmu;
>>  }
>> +
>> +#ifdef CONFIG_ARM_SMMU_QCOM_TBU
>> +
>> +static const struct of_device_id qsmmuv500_tbu_of_match[] = {
>> +	{ .compatible = "qcom,qsmmuv500-tbu" },
>> +	{ }
>> +};
> 
> Place this below the remove function, as most other drivers do.

Ack.

>> +
>> +static int qsmmuv500_tbu_probe(struct platform_device *pdev)
>> +{
> [..]
>> +	mutex_lock(&qsmmu->tbu_list_lock);
>> +	list_add_tail(&tbu->list, &qsmmu->tbu_list);
> 
> "tbu" is devres allocated, but you don't pull it off the list (or
> synchronize) during remove.

Right, but I'll just make this a builtin.

>> +	mutex_unlock(&qsmmu->tbu_list_lock);
>> +
>> +	dev_set_drvdata(dev, tbu);
>> +
>> +	return 0;
>> +}
>> +
>> +static void qsmmuv500_tbu_remove(struct platform_device *pdev)
>> +{
>> +	struct qsmmuv500_tbu *tbu = dev_get_drvdata(&pdev->dev);
>> +
>> +	clk_disable_unprepare(tbu->clk);
> 
> This isn't balanced.
> 
>> +	clk_put(tbu->clk);
>> +	icc_put(tbu->path);
>> +}
>> +
>> +static struct platform_driver qsmmuv500_tbu_driver = {
>> +	.driver = {
>> +		.name           = "qsmmuv500-tbu",
>> +		.of_match_table = of_match_ptr(qsmmuv500_tbu_of_match),
> 
> Won't of_match_ptr() result in a build warning if built without
> CONFIG_OF?

Will drop.

>> +	},
>> +	.probe  = qsmmuv500_tbu_probe,
>> +	.remove_new = qsmmuv500_tbu_remove,
>> +};
>> +module_platform_driver(qsmmuv500_tbu_driver);
> 
> This file acts as a library for the arm-smmu driver today, adding a
> platform_driver here makes it look like this is a separate driver.
> 
> Which makes me wonder, why is this a separate driver? Why not just
> loop over the subnodes and build the tbu_list in qcom_smmu_impl_init()?
> 

I am using the platform framework in order to get a more compact code
for this optional driver, but it adds some overhead.. I'll start with
moving all the TBU stuff into a separate file and will try meanwhile
your suggestion..

Thanks,
Georgi 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-11-18  2:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-19  2:19 [PATCH 0/6] Add support for Translation Buffer Units Georgi Djakov
2023-10-19  2:19 ` [PATCH 1/6] dt-bindings: iommu: Add Translation Buffer Unit bindings Georgi Djakov
2023-10-19 11:12   ` Konrad Dybcio
2023-10-24 18:42   ` Rob Herring
2023-10-24 22:26     ` Robin Murphy
2023-10-26 17:58       ` Georgi Djakov
2023-10-19  2:19 ` [PATCH 2/6] iommu/arm-smmu-qcom: Add support for TBUs Georgi Djakov
2023-10-19 11:13   ` Konrad Dybcio
2023-10-19  2:19 ` [PATCH 3/6] iommu/arm-smmu-qcom: Add Qualcomm TBU driver Georgi Djakov
2023-10-21 21:05   ` Bjorn Andersson
2023-11-18  2:26     ` Georgi Djakov
2023-10-26 15:25   ` kernel test robot
2023-10-19  2:19 ` [PATCH 4/6] iommu/arm-smmu: Allow using a threaded handler for context interrupts Georgi Djakov
2023-10-19  2:19 ` [PATCH 5/6] iommu/arm-smmu-qcom: Use a custom context fault handler for sdm845 Georgi Djakov
2023-10-19  2:19 ` [PATCH 6/6] arm64: dts: qcom: sdm845: Add DT nodes for the TBUs Georgi Djakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).