Linux ARM-MSM sub-architecture
 help / color / mirror / Atom feed
* [PATCH v14 0/6] iommu/arm-smmu: introduction of ACTLR implementation for Qualcomm SoCs
@ 2024-08-16 17:42 Bibek Kumar Patro
  2024-08-16 17:42 ` [PATCH v14 1/6] iommu/arm-smmu: re-enable context caching in smmu reset operation Bibek Kumar Patro
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-08-16 17:42 UTC (permalink / raw)
  To: robdclark, will, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio
  Cc: iommu, linux-arm-msm, linux-arm-kernel, linux-kernel,
	Bibek Kumar Patro

This patch series consist of six parts and covers the following:

1. Re-enable context caching for Qualcomm SoCs to retain prefetcher
   settings during reset and runtime suspend.

2. Remove cfg inside qcom_smmu structure and replace it with single
   pointer to qcom_smmu_match_data avoiding replication of multiple
   members from same.

3. Introduce intital set of driver changes to implement ACTLR register
   for custom prefetcher settings in Qualcomm SoCs.

4. Add ACTLR data and implementation operations for SM8550.

5. Add ACTLR data and implementation operations for SC7280.

6. Add support for ACTLR PRR bit setup via adreno-smmu-priv interface.

Changes in v14 from v13:
 Patch 6/6:
 - As discussed incorprate changes to carry out PRR implementation only for
   targets based on MMU-500 by using compat string based SMMU version detection.
 - Split the set_actlr interface into two separate interface namely set_prr_bit
   and set_prr_addr to set the prr enable bit and prr page address resepectively.
 Patch 3/6:
  - Fix a bug in gfx actlr_config which is uncovered while testing the gfx actlr setting in sc7280
    during PRR experiment which prevented clients on certain sids of gfx smmmu to be skipped during
    setting up of the ACTLR values : Fix involves swapping the arguments passed in smr_is_subset to make
     device smr <from devicetree> a subset of actlr_config table entries < defined in the driver>.
 Patch 4/6, 5/6:
  - Sort the actlr table values in increasing order of the SIDs.
 Link to v13:
 https://lore.kernel.org/all/20240628140435.1652374-1-quic_bibekkum@quicinc.com/

Changes in v13 from v12:
 - Fix the compilation issues reported by kernel test robot [1].
 [1] https://lore.kernel.org/all/202406281241.xEX0TWjt-lkp@intel.com/#t
 Link to v12:
 https://lore.kernel.org/all/20240626143020.3682243-1-quic_bibekkum@quicinc.com/

Changes in v12 from v11:
 Changes to incorporate suggestion from Rob:
 - Fix the set and reset logic for prr bit as pointed out in v11-6/6.
 - Rename set_actlr_bit function name to set_prr.
 - Add extension for PRR name as Partially-Resident-Region in comments
   for set_prr function.
 - Add few missing sids for sc7280 in patch-5/6.
 Link to v11:
 https://lore.kernel.org/all/20240605121713.3596499-1-quic_bibekkum@quicinc.com/

Changes in v11 from v10:
 - Include a new patch 6/6 to add support for ACTLR PRR bit
   through adreno-smmu-priv interface as suggested by Rob and Dmitry.
 Link to v10:
 https://lore.kernel.org/all/20240524131800.2288259-1-quic_bibekkum@quicinc.com/

Changes in v10 from v9:
 - Added reviewed-by tags 1/5,2/5,3/5.
 Changes incorporated:
 - Remove redundant PRR bit setting from gfx actlr table(patch 4/5,5/5)
   as this bit needs special handling in the gfx driver along with
   the associated register settings.
 Link to discussion on PRR bit:
 https://lore.kernel.org/all/f2222714-1e00-424e-946d-c314d55541b8@quicinc.com/
 Link to v9:
 https://lore.kernel.org/all/20240123144543.9405-1-quic_bibekkum@quicinc.com/

Changes in v9 from v8:
 Changes to incorporate suggestions from Konrad as follows:
 - Re-wrap struct members of actlr_variant in patch 4/5,5/5
   in a cleaner way.
 - Move actlr_config members to the header.
 Link to v8:
 https://lore.kernel.org/all/20240116150411.23876-1-quic_bibekkum@quicinc.com/

Changes in v8 from v7:
 - Added reviewed-by tags on patch 1/5, 2/5.
 Changes to incorporate suggestions from Pavan and Konrad:
 - Remove non necessary extra lines.
 - Use num_smmu and num_actlrcfg to store the array size and use the
   same to traverse the table and save on sentinel space along with
   indentation levels.
 - Refactor blocks containing qcom_smmu_set_actlr to remove block
   repetition in patch 3/5.
 - Change copyright year from 2023 to 2022-2023 in patch 3/5.
 - Modify qcom_smmu_match_data.actlrvar and actlr_variant.actlrcfg to
   const pointer to a const resource.
 - use C99 designated initializers and put the address first.
 Link to v7:
 https://lore.kernel.org/all/20240109114220.30243-1-quic_bibekkum@quicinc.com/

Changes in v7 from v6:
 Changes to incorporate suggestions from Dmitry as follows:
 - Use io_start address instead of compatible string to identify the
   correct instance by comparing with smmu start address and check for
   which smmu the corresponding actlr table is to be picked.
Link to v6:
https://lore.kernel.org/all/20231220133808.5654-1-quic_bibekkum@quicinc.com/

Changes in v6 from v5:
 - Remove extra Suggested-by tags.
 - Add return check for arm_mmu500_reset in 1/5 as discussed.
Link to v5:
https://lore.kernel.org/all/20231219135947.1623-1-quic_bibekkum@quicinc.com/

Changes in v5 from v4:
 New addition:
 - Modify copyright year in arm-smmu-qcom.h to 2023 from 2022.
 Changes to incorporate suggestions from Dmitry as follows:
 - Modify the defines for prefetch in (foo << bar) format
   as suggested.(FIELD_PREP could not be used in defines
   is not inside any block/function)
 Changes to incorporate suggestions from Konrad as follows:
 - Shift context caching enablement patch as 1/5 instead of 5/5 to
   be picked up as independent patch.
 - Fix the codestyle to orient variables in reverse xmas tree format
   for patch 1/5.
 - Fix variable name in patch 1/5 as suggested.
 Link to v4:
https://lore.kernel.org/all/20231215101827.30549-1-quic_bibekkum@quicinc.com/

Changes in v4 from v3:
 New addition:
 - Remove actlrcfg_size and use NULL end element instead to traverse
   the actlr table, as this would be a cleaner approach by removing
   redundancy of actlrcfg_size.
 - Renaming of actlr set function to arm_smmu_qcom based proprietary
   convention.
 - break from loop once sid is found and ACTLR value is initialized
   in qcom_smmu_set_actlr.
 - Modify the GFX prefetch value separating into 2 sensible defines.
 - Modify comments for prefetch defines as per SMMU-500 TRM.
 Changes to incorporate suggestions from Konrad as follows:
 - Use Reverse-Christmas-tree sorting wherever applicable.
 - Pass arguments directly to arm_smmu_set_actlr instead of creating
   duplicate variables.
 - Use array indexing instead of direct pointer addressed by new
   addition of eliminating actlrcfg_size.
 - Switch the HEX value's case from upper to lower case in SC7280
   actlrcfg table.
 Changes to incorporate suggestions from Dmitry as follows:
 - Separate changes not related to ACTLR support to different commit
   with patch 5/5.
 - Using pointer to struct for arguments in smr_is_subset().
 Changes to incorporate suggestions from Bjorn as follows:
 - fix the commit message for patch 2/5 to properly document the
   value space to avoid confusion.
 Fixed build issues reported by kernel test robot [1] for
 arm64-allyesconfig [2].
 [1]: https://lore.kernel.org/all/202312011750.Pwca3TWE-lkp@intel.com/
 [2]:
https://download.01.org/0day-ci/archive/20231201/202312011750.Pwca3TWE-lkp@intel.com/config
 Link to v3:
https://lore.kernel.org/all/20231127145412.3981-1-quic_bibekkum@quicinc.com/

Changes in v3 from v2:
 New addition:
 - Include patch 3/4 for adding ACTLR support and data for SC7280.
 - Add driver changes for actlr support in gpu smmu.
 - Add target wise actlr data and implementation ops for gpu smmu.
 Changes to incorporate suggestions from Robin as follows:
 - Match the ACTLR values with individual corresponding SID instead
   of assuming that any SMR will be programmed to match a superset of
   the data.
 - Instead of replicating each elements from qcom_smmu_match_data to
   qcom_smmu structre during smmu device creation, replace the
   replicated members with qcom_smmu_match_data structure inside
   qcom_smmu structre and handle the dereference in places that
   requires them.
 Changes to incorporate suggestions from Dmitry and Konrad as follows:
 - Maintain actlr table inside a single structure instead of
   nested structure.
 - Rename prefetch defines to more appropriately describe their
   behavior.
 - Remove SM8550 specific implementation ops and roll back to default
   qcom_smmu_500_impl implementation ops.
 - Add back the removed comments which are NAK.
 - Fix commit description for patch 4/4.
 Link to v2:
https://lore.kernel.org/all/20231114135654.30475-1-quic_bibekkum@quicinc.com/

Changes in v2 from v1:
 - Incorporated suggestions on v1 from Dmitry,Konrad,Pratyush.
 - Added defines for ACTLR values.
 - Linked sm8550 implementation structure to corresponding
   compatible string.
 - Repackaged actlr value set implementation to separate function.
 - Fixed indentation errors.
 - Link to v1:
https://lore.kernel.org/all/20231103215124.1095-1-quic_bibekkum@quicinc.com/

Changes in v1 from RFC:
 - Incorporated suggestion form Robin on RFC
 - Moved the actlr data table into driver, instead of maintaining
   it inside soc specific DT and piggybacking on exisiting iommus
   property (iommu = <SID, MASK, ACTLR>) to set this value during
   smmu probe.
 - Link to RFC:
https://lore.kernel.org/all/a01e7e60-6ead-4a9e-ba90-22a8a6bbd03f@quicinc.com/

Bibek Kumar Patro (6):
  iommu/arm-smmu: re-enable context caching in smmu reset operation
  iommu/arm-smmu: refactor qcom_smmu structure to include single pointer
  iommu/arm-smmu: introduction of ACTLR for custom prefetcher settings
  iommu/arm-smmu: add ACTLR data and support for SM8550
  iommu/arm-smmu: add ACTLR data and support for SC7280
  iommu/arm-smmu: add support for PRR bit setup

 .../iommu/arm/arm-smmu/arm-smmu-qcom-debug.c  |   2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c    | 284 +++++++++++++++++-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h    |  18 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c         |   5 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h         |   7 +
 include/linux/adreno-smmu-priv.h              |  10 +-
 6 files changed, 315 insertions(+), 11 deletions(-)

--
2.34.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v14 1/6] iommu/arm-smmu: re-enable context caching in smmu reset operation
  2024-08-16 17:42 [PATCH v14 0/6] iommu/arm-smmu: introduction of ACTLR implementation for Qualcomm SoCs Bibek Kumar Patro
@ 2024-08-16 17:42 ` Bibek Kumar Patro
  2024-08-30 12:53   ` Robin Murphy
  2024-08-16 17:42 ` [PATCH v14 2/6] iommu/arm-smmu: refactor qcom_smmu structure to include single pointer Bibek Kumar Patro
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-08-16 17:42 UTC (permalink / raw)
  To: robdclark, will, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio
  Cc: iommu, linux-arm-msm, linux-arm-kernel, linux-kernel,
	Bibek Kumar Patro

Default MMU-500 reset operation disables context caching in
prefetch buffer. It is however expected for context banks using
the ACTLR register to retain their prefetch value during reset
and runtime suspend.

Replace default MMU-500 reset operation with Qualcomm specific reset
operation which envelope the default reset operation and re-enables
context caching in prefetch buffer for Qualcomm SoCs.

Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 36 ++++++++++++++++++++--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 36c6b36ad4ff..8ac1850b852f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -16,6 +16,16 @@

 #define QCOM_DUMMY_VAL	-1

+/*
+ * SMMU-500 TRM defines BIT(0) as CMTLB (Enable context caching in the
+ * macro TLB) and BIT(1) as CPRE (Enable context caching in the prefetch
+ * buffer). The remaining bits are implementation defined and vary across
+ * SoCs.
+ */
+
+#define CPRE			(1 << 1)
+#define CMTLB			(1 << 0)
+
 static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
 {
 	return container_of(smmu, struct qcom_smmu, smmu);
@@ -381,11 +391,31 @@ static int qcom_smmu_def_domain_type(struct device *dev)
 	return match ? IOMMU_DOMAIN_IDENTITY : 0;
 }

+static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
+{
+	int ret;
+	u32 val;
+	int i;
+
+	ret = arm_mmu500_reset(smmu);
+	if (ret)
+		return ret;
+
+	/* arm_mmu500_reset() disables CPRE which is re-enabled here */
+	for (i = 0; i < smmu->num_context_banks; ++i) {
+		val = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR);
+		val |= CPRE;
+		arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_ACTLR, val);
+	}
+
+	return 0;
+}
+
 static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
 {
 	int ret;

-	arm_mmu500_reset(smmu);
+	qcom_smmu500_reset(smmu);

 	/*
 	 * To address performance degradation in non-real time clients,
@@ -412,7 +442,7 @@ static const struct arm_smmu_impl qcom_smmu_500_impl = {
 	.init_context = qcom_smmu_init_context,
 	.cfg_probe = qcom_smmu_cfg_probe,
 	.def_domain_type = qcom_smmu_def_domain_type,
-	.reset = arm_mmu500_reset,
+	.reset = qcom_smmu500_reset,
 	.write_s2cr = qcom_smmu_write_s2cr,
 	.tlb_sync = qcom_smmu_tlb_sync,
 #ifdef CONFIG_ARM_SMMU_QCOM_DEBUG
@@ -445,7 +475,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_v2_impl = {
 static const struct arm_smmu_impl qcom_adreno_smmu_500_impl = {
 	.init_context = qcom_adreno_smmu_init_context,
 	.def_domain_type = qcom_smmu_def_domain_type,
-	.reset = arm_mmu500_reset,
+	.reset = qcom_smmu500_reset,
 	.alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
 	.write_sctlr = qcom_adreno_smmu_write_sctlr,
 	.tlb_sync = qcom_smmu_tlb_sync,
--
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v14 2/6] iommu/arm-smmu: refactor qcom_smmu structure to include single pointer
  2024-08-16 17:42 [PATCH v14 0/6] iommu/arm-smmu: introduction of ACTLR implementation for Qualcomm SoCs Bibek Kumar Patro
  2024-08-16 17:42 ` [PATCH v14 1/6] iommu/arm-smmu: re-enable context caching in smmu reset operation Bibek Kumar Patro
@ 2024-08-16 17:42 ` Bibek Kumar Patro
  2024-08-16 17:42 ` [PATCH v14 3/6] iommu/arm-smmu: introduction of ACTLR for custom prefetcher settings Bibek Kumar Patro
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-08-16 17:42 UTC (permalink / raw)
  To: robdclark, will, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio
  Cc: iommu, linux-arm-msm, linux-arm-kernel, linux-kernel,
	Bibek Kumar Patro

qcom_smmu_match_data is static and constant so refactor qcom_smmu
to store single pointer to qcom_smmu_match_data instead of
replicating multiple child members of the same and handle the further
dereferences in the places that want them.

Suggested-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c | 2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c       | 2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h       | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
index 548783f3f8e8..d03b2239baad 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
@@ -73,7 +73,7 @@ void qcom_smmu_tlb_sync_debug(struct arm_smmu_device *smmu)
 	if (__ratelimit(&rs)) {
 		dev_err(smmu->dev, "TLB sync timed out -- SMMU may be deadlocked\n");

-		cfg = qsmmu->cfg;
+		cfg = qsmmu->data->cfg;
 		if (!cfg)
 			return;

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 8ac1850b852f..90e831f1c0ec 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -509,7 +509,7 @@ static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
 		return ERR_PTR(-ENOMEM);

 	qsmmu->smmu.impl = impl;
-	qsmmu->cfg = data->cfg;
+	qsmmu->data = data;

 	return &qsmmu->smmu;
 }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
index 3c134d1a6277..b55cd3e3ae48 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
@@ -8,7 +8,7 @@

 struct qcom_smmu {
 	struct arm_smmu_device smmu;
-	const struct qcom_smmu_config *cfg;
+	const struct qcom_smmu_match_data *data;
 	bool bypass_quirk;
 	u8 bypass_cbndx;
 	u32 stall_enabled;
--
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v14 3/6] iommu/arm-smmu: introduction of ACTLR for custom prefetcher settings
  2024-08-16 17:42 [PATCH v14 0/6] iommu/arm-smmu: introduction of ACTLR implementation for Qualcomm SoCs Bibek Kumar Patro
  2024-08-16 17:42 ` [PATCH v14 1/6] iommu/arm-smmu: re-enable context caching in smmu reset operation Bibek Kumar Patro
  2024-08-16 17:42 ` [PATCH v14 2/6] iommu/arm-smmu: refactor qcom_smmu structure to include single pointer Bibek Kumar Patro
@ 2024-08-16 17:42 ` Bibek Kumar Patro
  2024-08-16 17:42 ` [PATCH v14 4/6] iommu/arm-smmu: add ACTLR data and support for SM8550 Bibek Kumar Patro
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-08-16 17:42 UTC (permalink / raw)
  To: robdclark, will, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio
  Cc: iommu, linux-arm-msm, linux-arm-kernel, linux-kernel,
	Bibek Kumar Patro

Currently in Qualcomm  SoCs the default prefetch is set to 1 which allows
the TLB to fetch just the next page table. MMU-500 features ACTLR
register which is implementation defined and is used for Qualcomm SoCs
to have a custom prefetch setting enabling TLB to prefetch the next set
of page tables accordingly allowing for faster translations.

ACTLR value is unique for each SMR (Stream matching register) and stored
in a pre-populated table. This value is set to the register during
context bank initialisation.

Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 62 ++++++++++++++++++++++
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h | 16 +++++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c      |  5 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |  5 ++
 4 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 90e831f1c0ec..ebbbbfe4e0bd 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -217,10 +217,43 @@ static bool qcom_adreno_can_do_ttbr1(struct arm_smmu_device *smmu)
 	return true;
 }

+static void qcom_smmu_set_actlr(struct device *dev, struct arm_smmu_device *smmu, int cbndx,
+		const struct actlr_config *actlrcfg, const size_t num_actlrcfg)
+{
+	struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	struct arm_smmu_smr actlrcfg_smr;
+	u16 mask;
+	int idx;
+	u16 id;
+	int i;
+	int j;
+
+	for (i = 0; i < num_actlrcfg; i++) {
+		actlrcfg_smr.id = actlrcfg[i].sid;
+		actlrcfg_smr.mask = actlrcfg[i].mask;
+
+		for_each_cfg_sme(cfg, fwspec, j, idx) {
+			id = smmu->smrs[idx].id;
+			mask = smmu->smrs[idx].mask;
+			if (smr_is_subset(&actlrcfg_smr, id, mask)) {
+				arm_smmu_cb_write(smmu, cbndx, ARM_SMMU_CB_ACTLR,
+						actlrcfg[i].actlr);
+				break;
+			}
+		}
+	}
+}
+
 static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 		struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
 {
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+	const struct actlr_variant *actlrvar;
+	int cbndx = smmu_domain->cfg.cbndx;
 	struct adreno_smmu_priv *priv;
+	int i;

 	smmu_domain->cfg.flush_walk_prefer_tlbiasid = true;

@@ -250,6 +283,18 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 	priv->set_stall = qcom_adreno_smmu_set_stall;
 	priv->resume_translation = qcom_adreno_smmu_resume_translation;

+	actlrvar = qsmmu->data->actlrvar;
+	if (!actlrvar)
+		return 0;
+
+	for (i = 0; i < qsmmu->data->num_smmu ; i++) {
+		if (actlrvar[i].io_start == smmu->ioaddr) {
+			qcom_smmu_set_actlr(dev, smmu, cbndx, actlrvar[i].actlrcfg,
+				       actlrvar[i].num_actlrcfg);
+			break;
+		}
+	}
+
 	return 0;
 }

@@ -279,7 +324,24 @@ static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
 static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 		struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
 {
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+	const struct actlr_variant *actlrvar;
+	int cbndx = smmu_domain->cfg.cbndx;
+	int i;
+
 	smmu_domain->cfg.flush_walk_prefer_tlbiasid = true;
+	actlrvar = qsmmu->data->actlrvar;
+	if (!actlrvar)
+		return 0;
+
+	for (i = 0; i < qsmmu->data->num_smmu ; i++) {
+		if (actlrvar[i].io_start == smmu->ioaddr) {
+			qcom_smmu_set_actlr(dev, smmu, cbndx, actlrvar[i].actlrcfg,
+				       actlrvar[i].num_actlrcfg);
+			break;
+		}
+	}

 	return 0;
 }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
index b55cd3e3ae48..5f2dc90ca879 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 /*
- * Copyright (c) 2022, Qualcomm Innovation Center, Inc. All rights reserved.
+ * Copyright (c) 2022-2023, Qualcomm Innovation Center, Inc. All rights reserved.
  */

 #ifndef _ARM_SMMU_QCOM_H
@@ -24,8 +24,22 @@ struct qcom_smmu_config {
 	const u32 *reg_offset;
 };

+struct actlr_config {
+	u16 sid;
+	u16 mask;
+	u32 actlr;
+};
+
+struct actlr_variant {
+	const resource_size_t io_start;
+	const struct actlr_config * const actlrcfg;
+	const size_t num_actlrcfg;
+};
+
 struct qcom_smmu_match_data {
+	const struct actlr_variant * const actlrvar;
 	const struct qcom_smmu_config *cfg;
+	const size_t num_smmu;
 	const struct arm_smmu_impl *impl;
 	const struct arm_smmu_impl *adreno_impl;
 };
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 723273440c21..82d2a95cc259 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1042,9 +1042,10 @@ static int arm_smmu_find_sme(struct arm_smmu_device *smmu, u16 id, u16 mask)
 		 * expect simply identical entries for this case, but there's
 		 * no harm in accommodating the generalisation.
 		 */
-		if ((mask & smrs[i].mask) == mask &&
-		    !((id ^ smrs[i].id) & ~smrs[i].mask))
+
+		if (smr_is_subset(&smrs[i], id, mask))
 			return i;
+
 		/*
 		 * If the new entry has any other overlap with an existing one,
 		 * though, then there always exists at least one stream ID
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index e2aeb511ae90..172f1b56b879 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -511,6 +511,11 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page,
 		writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
 }

+static inline bool smr_is_subset(struct arm_smmu_smr *smrs, u16 id, u16 mask)
+{
+	return (mask & smrs->mask) == mask && !((id ^ smrs->id) & ~smrs->mask);
+}
+
 #define ARM_SMMU_GR0		0
 #define ARM_SMMU_GR1		1
 #define ARM_SMMU_CB(s, n)	((s)->numpage + (n))
--
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v14 4/6] iommu/arm-smmu: add ACTLR data and support for SM8550
  2024-08-16 17:42 [PATCH v14 0/6] iommu/arm-smmu: introduction of ACTLR implementation for Qualcomm SoCs Bibek Kumar Patro
                   ` (2 preceding siblings ...)
  2024-08-16 17:42 ` [PATCH v14 3/6] iommu/arm-smmu: introduction of ACTLR for custom prefetcher settings Bibek Kumar Patro
@ 2024-08-16 17:42 ` Bibek Kumar Patro
  2024-08-16 17:42 ` [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280 Bibek Kumar Patro
  2024-08-16 17:42 ` [PATCH v14 6/6] iommu/arm-smmu: add support for PRR bit setup Bibek Kumar Patro
  5 siblings, 0 replies; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-08-16 17:42 UTC (permalink / raw)
  To: robdclark, will, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio
  Cc: iommu, linux-arm-msm, linux-arm-kernel, linux-kernel,
	Bibek Kumar Patro

Add ACTLR data table for SM8550 along with support for
same including SM8550 specific implementation operations.

Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 89 ++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index ebbbbfe4e0bd..dc143b250704 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -25,6 +25,85 @@

 #define CPRE			(1 << 1)
 #define CMTLB			(1 << 0)
+#define PREFETCH_SHIFT		8
+#define PREFETCH_DEFAULT	0
+#define PREFETCH_SHALLOW	(1 << PREFETCH_SHIFT)
+#define PREFETCH_MODERATE	(2 << PREFETCH_SHIFT)
+#define PREFETCH_DEEP		(3 << PREFETCH_SHIFT)
+
+static const struct actlr_config sm8550_apps_actlr_cfg[] = {
+	{ 0x0800, 0x0020, PREFETCH_DEFAULT | CMTLB },
+	{ 0x1800, 0x00c0, PREFETCH_DEFAULT | CMTLB },
+	{ 0x1820, 0x0000, PREFETCH_DEFAULT | CMTLB },
+	{ 0x1860, 0x0000, PREFETCH_DEFAULT | CMTLB },
+	{ 0x18a0, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x18e0, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x0c01, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c02, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c03, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c04, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c05, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c06, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c07, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c08, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c09, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c0c, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c0d, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c0e, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x0c0f, 0x0020, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1920, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x1923, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x1924, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x1940, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x1941, 0x0004, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x1943, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x1944, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x1947, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x1961, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1962, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1963, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1964, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1965, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1966, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1967, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1968, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1969, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x196c, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x196d, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x196e, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x196f, 0x0000, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19c1, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19c2, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19c3, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19c4, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19c5, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19c6, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19c7, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19c8, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19c9, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19cc, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19cd, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19ce, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x19cf, 0x0010, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1c00, 0x0002, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x1c01, 0x0000, PREFETCH_DEFAULT | CMTLB },
+};
+
+static const struct actlr_config sm8550_gfx_actlr_cfg[] = {
+	{ 0x0000, 0x03ff, PREFETCH_DEEP | CPRE | CMTLB },
+};
+
+static const struct actlr_variant sm8550_actlr[] = {
+	{
+		.io_start = 0x15000000,
+		.actlrcfg = sm8550_apps_actlr_cfg,
+		.num_actlrcfg = ARRAY_SIZE(sm8550_apps_actlr_cfg)
+	}, {
+		.io_start = 0x03da0000,
+		.actlrcfg = sm8550_gfx_actlr_cfg,
+		.num_actlrcfg = ARRAY_SIZE(sm8550_gfx_actlr_cfg)
+	},
+};

 static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
 {
@@ -610,6 +689,15 @@ static const struct qcom_smmu_match_data sdm845_smmu_500_data = {
 	/* Also no debug configuration. */
 };

+
+static const struct qcom_smmu_match_data sm8550_smmu_500_impl0_data = {
+	.impl = &qcom_smmu_500_impl,
+	.adreno_impl = &qcom_adreno_smmu_500_impl,
+	.cfg = &qcom_smmu_impl0_cfg,
+	.actlrvar = sm8550_actlr,
+	.num_smmu = ARRAY_SIZE(sm8550_actlr),
+};
+
 static const struct qcom_smmu_match_data qcom_smmu_500_impl0_data = {
 	.impl = &qcom_smmu_500_impl,
 	.adreno_impl = &qcom_adreno_smmu_500_impl,
@@ -644,6 +732,7 @@ static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = {
 	{ .compatible = "qcom,sm8250-smmu-500", .data = &qcom_smmu_500_impl0_data },
 	{ .compatible = "qcom,sm8350-smmu-500", .data = &qcom_smmu_500_impl0_data },
 	{ .compatible = "qcom,sm8450-smmu-500", .data = &qcom_smmu_500_impl0_data },
+	{ .compatible = "qcom,sm8550-smmu-500", .data = &sm8550_smmu_500_impl0_data },
 	{ .compatible = "qcom,smmu-500", .data = &qcom_smmu_500_impl0_data },
 	{ }
 };
--
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280
  2024-08-16 17:42 [PATCH v14 0/6] iommu/arm-smmu: introduction of ACTLR implementation for Qualcomm SoCs Bibek Kumar Patro
                   ` (3 preceding siblings ...)
  2024-08-16 17:42 ` [PATCH v14 4/6] iommu/arm-smmu: add ACTLR data and support for SM8550 Bibek Kumar Patro
@ 2024-08-16 17:42 ` Bibek Kumar Patro
  2024-08-23 15:59   ` Will Deacon
  2024-08-16 17:42 ` [PATCH v14 6/6] iommu/arm-smmu: add support for PRR bit setup Bibek Kumar Patro
  5 siblings, 1 reply; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-08-16 17:42 UTC (permalink / raw)
  To: robdclark, will, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio
  Cc: iommu, linux-arm-msm, linux-arm-kernel, linux-kernel,
	Bibek Kumar Patro

Add ACTLR data table for SC7280 along with support for
same including SC7280 specific implementation operations.

Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 58 +++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index dc143b250704..a776c7906c76 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -31,6 +31,55 @@
 #define PREFETCH_MODERATE	(2 << PREFETCH_SHIFT)
 #define PREFETCH_DEEP		(3 << PREFETCH_SHIFT)

+static const struct actlr_config sc7280_apps_actlr_cfg[] = {
+	{ 0x0800, 0x04e0, PREFETCH_DEFAULT | CMTLB },
+	{ 0x0900, 0x0402, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x0901, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x0d01, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x1181, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1182, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1183, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1184, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1185, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1186, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1187, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1188, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x1189, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x118b, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x118c, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x118d, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x118e, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x118f, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
+	{ 0x2000, 0x0020, PREFETCH_DEFAULT | CMTLB },
+	{ 0x2040, 0x0000, PREFETCH_DEFAULT | CMTLB },
+	{ 0x2062, 0x0000, PREFETCH_DEFAULT | CMTLB },
+	{ 0x2080, 0x0020, PREFETCH_DEFAULT | CMTLB },
+	{ 0x20c0, 0x0020, PREFETCH_DEFAULT | CMTLB },
+	{ 0x2100, 0x0020, PREFETCH_DEFAULT | CMTLB },
+	{ 0x2140, 0x0000, PREFETCH_DEFAULT | CMTLB },
+	{ 0x2180, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x2181, 0x0004, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x2183, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x2184, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
+	{ 0x2187, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
+};
+
+static const struct actlr_config sc7280_gfx_actlr_cfg[] = {
+	{ 0x0000, 0x07ff, PREFETCH_DEEP | CPRE | CMTLB },
+};
+
+static const struct actlr_variant sc7280_actlr[] = {
+	{
+		.io_start = 0x15000000,
+		.actlrcfg = sc7280_apps_actlr_cfg,
+		.num_actlrcfg = ARRAY_SIZE(sc7280_apps_actlr_cfg)
+	}, {
+		.io_start = 0x03da0000,
+		.actlrcfg = sc7280_gfx_actlr_cfg,
+		.num_actlrcfg = ARRAY_SIZE(sc7280_gfx_actlr_cfg)
+	},
+};
+
 static const struct actlr_config sm8550_apps_actlr_cfg[] = {
 	{ 0x0800, 0x0020, PREFETCH_DEFAULT | CMTLB },
 	{ 0x1800, 0x00c0, PREFETCH_DEFAULT | CMTLB },
@@ -689,6 +738,13 @@ static const struct qcom_smmu_match_data sdm845_smmu_500_data = {
 	/* Also no debug configuration. */
 };

+static const struct qcom_smmu_match_data sc7280_smmu_500_impl0_data = {
+	.impl = &qcom_smmu_500_impl,
+	.adreno_impl = &qcom_adreno_smmu_500_impl,
+	.cfg = &qcom_smmu_impl0_cfg,
+	.actlrvar = sc7280_actlr,
+	.num_smmu = ARRAY_SIZE(sc7280_actlr),
+};

 static const struct qcom_smmu_match_data sm8550_smmu_500_impl0_data = {
 	.impl = &qcom_smmu_500_impl,
@@ -715,7 +771,7 @@ static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = {
 	{ .compatible = "qcom,qdu1000-smmu-500", .data = &qcom_smmu_500_impl0_data  },
 	{ .compatible = "qcom,sc7180-smmu-500", .data = &qcom_smmu_500_impl0_data },
 	{ .compatible = "qcom,sc7180-smmu-v2", .data = &qcom_smmu_v2_data },
-	{ .compatible = "qcom,sc7280-smmu-500", .data = &qcom_smmu_500_impl0_data },
+	{ .compatible = "qcom,sc7280-smmu-500", .data = &sc7280_smmu_500_impl0_data },
 	{ .compatible = "qcom,sc8180x-smmu-500", .data = &qcom_smmu_500_impl0_data },
 	{ .compatible = "qcom,sc8280xp-smmu-500", .data = &qcom_smmu_500_impl0_data },
 	{ .compatible = "qcom,sdm630-smmu-v2", .data = &qcom_smmu_v2_data },
--
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v14 6/6] iommu/arm-smmu: add support for PRR bit setup
  2024-08-16 17:42 [PATCH v14 0/6] iommu/arm-smmu: introduction of ACTLR implementation for Qualcomm SoCs Bibek Kumar Patro
                   ` (4 preceding siblings ...)
  2024-08-16 17:42 ` [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280 Bibek Kumar Patro
@ 2024-08-16 17:42 ` Bibek Kumar Patro
  2024-08-22 22:37   ` kernel test robot
  5 siblings, 1 reply; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-08-16 17:42 UTC (permalink / raw)
  To: robdclark, will, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio
  Cc: iommu, linux-arm-msm, linux-arm-kernel, linux-kernel,
	Bibek Kumar Patro

Add an adreno-smmu-priv interface for drm/msm to call
into arm-smmu-qcom and initiate the PRR bit setup or reset
sequence as per request.

This will be used by GPU to setup the PRR bit and related
configuration registers through adreno-smmu private
interface instead of directly poking the smmu hardware.

Suggested-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 37 ++++++++++++++++++++++
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |  2 ++
 include/linux/adreno-smmu-priv.h           | 10 +++++-
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index a776c7906c76..f62e20d472d3 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -30,6 +30,7 @@
 #define PREFETCH_SHALLOW	(1 << PREFETCH_SHIFT)
 #define PREFETCH_MODERATE	(2 << PREFETCH_SHIFT)
 #define PREFETCH_DEEP		(3 << PREFETCH_SHIFT)
+#define GFX_ACTLR_PRR		(1 << 5)

 static const struct actlr_config sc7280_apps_actlr_cfg[] = {
 	{ 0x0800, 0x04e0, PREFETCH_DEFAULT | CMTLB },
@@ -237,6 +238,40 @@ static void qcom_adreno_smmu_resume_translation(const void *cookie, bool termina
 	arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg);
 }

+static void qcom_adreno_smmu_set_prr_bit(const void *cookie, bool set)
+{
+	struct arm_smmu_domain *smmu_domain = (void *)cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	const struct device_node *np = smmu->dev->of_node;
+	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	u32 reg = 0;
+
+	if (of_device_is_compatible(np, "qcom,smmu-500") &&
+			of_device_is_compatible(np, "qcom,adreno-smmu")) {
+		reg =  arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_ACTLR);
+		reg &= ~GFX_ACTLR_PRR;
+		if (set)
+			reg |= FIELD_PREP(GFX_ACTLR_PRR, 1);
+		arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_ACTLR, reg);
+	}
+}
+
+static void qcom_adreno_smmu_set_prr_addr(const void *cookie, phys_addr_t page_addr)
+{
+	struct arm_smmu_domain *smmu_domain = (void *)cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	const struct device_node *np = smmu->dev->of_node;
+
+	if (of_device_is_compatible(np, "qcom,smmu-500") &&
+			of_device_is_compatible(np, "qcom,adreno-smmu")) {
+		writel_relaxed(lower_32_bits(page_addr),
+					(void *)smmu->ioaddr + ARM_SMMU_GFX_PRR_CFG_LADDR);
+
+		writel_relaxed(upper_32_bits(page_addr),
+					(void *)smmu->ioaddr + ARM_SMMU_GFX_PRR_CFG_UADDR);
+	}
+}
+
 #define QCOM_ADRENO_SMMU_GPU_SID 0

 static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
@@ -410,6 +445,8 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
 	priv->set_stall = qcom_adreno_smmu_set_stall;
 	priv->resume_translation = qcom_adreno_smmu_resume_translation;
+	priv->set_prr_bit = qcom_adreno_smmu_set_prr_bit;
+	priv->set_prr_addr = qcom_adreno_smmu_set_prr_addr;

 	actlrvar = qsmmu->data->actlrvar;
 	if (!actlrvar)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 172f1b56b879..b3f5ff1ebb8d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -154,6 +154,8 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_SCTLR_M		BIT(0)

 #define ARM_SMMU_CB_ACTLR		0x4
+#define ARM_SMMU_GFX_PRR_CFG_LADDR	0x6008
+#define ARM_SMMU_GFX_PRR_CFG_UADDR	0x600C

 #define ARM_SMMU_CB_RESUME		0x8
 #define ARM_SMMU_RESUME_TERMINATE	BIT(0)
diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index c637e0997f6d..03466eb16933 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -49,7 +49,13 @@ struct adreno_smmu_fault_info {
  *                 before set_ttbr0_cfg().  If stalling on fault is enabled,
  *                 the GPU driver must call resume_translation()
  * @resume_translation: Resume translation after a fault
- *
+ * @set_prr_bit:   Extendible interface to be used by GPU to modify the
+ *		   ACTLR register bits, currently used to configure
+ *		   Partially-Resident-Region (PRR) bit for feature's
+ *		   setup and reset sequence as requested.
+ * @set_prr_addr:  Configure the PRR_CFG_*ADDR register with the
+ *		   physical address of PRR page passed from
+ *		   GPU driver.
  *
  * The GPU driver (drm/msm) and adreno-smmu work together for controlling
  * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
@@ -67,6 +73,8 @@ struct adreno_smmu_priv {
     void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
     void (*set_stall)(const void *cookie, bool enabled);
     void (*resume_translation)(const void *cookie, bool terminate);
+    void (*set_prr_bit)(const void *cookie, bool set);
+    void (*set_prr_addr)(const void *cookie, phys_addr_t page_addr);
 };

 #endif /* __ADRENO_SMMU_PRIV_H */
--
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 6/6] iommu/arm-smmu: add support for PRR bit setup
  2024-08-16 17:42 ` [PATCH v14 6/6] iommu/arm-smmu: add support for PRR bit setup Bibek Kumar Patro
@ 2024-08-22 22:37   ` kernel test robot
  2024-08-26  9:10     ` Bibek Kumar Patro
  0 siblings, 1 reply; 19+ messages in thread
From: kernel test robot @ 2024-08-22 22:37 UTC (permalink / raw)
  To: Bibek Kumar Patro, robdclark, will, robin.murphy, joro, jgg,
	jsnitsel, robh, krzysztof.kozlowski, quic_c_gdjako,
	dmitry.baryshkov, konrad.dybcio
  Cc: oe-kbuild-all, iommu, linux-arm-msm, linux-arm-kernel,
	linux-kernel, Bibek Kumar Patro

Hi Bibek,

kernel test robot noticed the following build warnings:

[auto build test WARNING on joro-iommu/next]
[also build test WARNING on linus/master v6.11-rc4 next-20240822]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Bibek-Kumar-Patro/iommu-arm-smmu-re-enable-context-caching-in-smmu-reset-operation/20240817-014609
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
patch link:    https://lore.kernel.org/r/20240816174259.2056829-7-quic_bibekkum%40quicinc.com
patch subject: [PATCH v14 6/6] iommu/arm-smmu: add support for PRR bit setup
config: arm-randconfig-r071-20240823 (https://download.01.org/0day-ci/archive/20240823/202408230612.1DU9cuSx-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240823/202408230612.1DU9cuSx-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408230612.1DU9cuSx-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from include/linux/scatterlist.h:9,
                    from include/linux/iommu.h:10,
                    from include/linux/io-pgtable.h:6,
                    from include/linux/adreno-smmu-priv.h:9,
                    from drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c:7:
   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c: In function 'qcom_adreno_smmu_set_prr_addr':
>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c:266:41: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     266 |                                         (void *)smmu->ioaddr + ARM_SMMU_GFX_PRR_CFG_LADDR);
         |                                         ^
   arch/arm/include/asm/io.h:282:75: note: in definition of macro 'writel_relaxed'
     282 | #define writel_relaxed(v,c)     __raw_writel((__force u32) cpu_to_le32(v),c)
         |                                                                           ^
   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c:269:41: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     269 |                                         (void *)smmu->ioaddr + ARM_SMMU_GFX_PRR_CFG_UADDR);
         |                                         ^
   arch/arm/include/asm/io.h:282:75: note: in definition of macro 'writel_relaxed'
     282 | #define writel_relaxed(v,c)     __raw_writel((__force u32) cpu_to_le32(v),c)
         |                                                                           ^


vim +266 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c

   256	
   257	static void qcom_adreno_smmu_set_prr_addr(const void *cookie, phys_addr_t page_addr)
   258	{
   259		struct arm_smmu_domain *smmu_domain = (void *)cookie;
   260		struct arm_smmu_device *smmu = smmu_domain->smmu;
   261		const struct device_node *np = smmu->dev->of_node;
   262	
   263		if (of_device_is_compatible(np, "qcom,smmu-500") &&
   264				of_device_is_compatible(np, "qcom,adreno-smmu")) {
   265			writel_relaxed(lower_32_bits(page_addr),
 > 266						(void *)smmu->ioaddr + ARM_SMMU_GFX_PRR_CFG_LADDR);
   267	
   268			writel_relaxed(upper_32_bits(page_addr),
   269						(void *)smmu->ioaddr + ARM_SMMU_GFX_PRR_CFG_UADDR);
   270		}
   271	}
   272	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280
  2024-08-16 17:42 ` [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280 Bibek Kumar Patro
@ 2024-08-23 15:59   ` Will Deacon
  2024-08-26 11:03     ` Bibek Kumar Patro
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2024-08-23 15:59 UTC (permalink / raw)
  To: Bibek Kumar Patro
  Cc: robdclark, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio, iommu, linux-arm-msm, linux-arm-kernel,
	linux-kernel

On Fri, Aug 16, 2024 at 11:12:58PM +0530, Bibek Kumar Patro wrote:
> Add ACTLR data table for SC7280 along with support for
> same including SC7280 specific implementation operations.
> 
> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 58 +++++++++++++++++++++-
>  1 file changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index dc143b250704..a776c7906c76 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -31,6 +31,55 @@
>  #define PREFETCH_MODERATE	(2 << PREFETCH_SHIFT)
>  #define PREFETCH_DEEP		(3 << PREFETCH_SHIFT)
> 
> +static const struct actlr_config sc7280_apps_actlr_cfg[] = {
> +	{ 0x0800, 0x04e0, PREFETCH_DEFAULT | CMTLB },
> +	{ 0x0900, 0x0402, PREFETCH_SHALLOW | CPRE | CMTLB },
> +	{ 0x0901, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> +	{ 0x0d01, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> +	{ 0x1181, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x1182, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x1183, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x1184, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x1185, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x1186, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x1187, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x1188, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x1189, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x118b, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x118c, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x118d, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x118e, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x118f, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> +	{ 0x2000, 0x0020, PREFETCH_DEFAULT | CMTLB },
> +	{ 0x2040, 0x0000, PREFETCH_DEFAULT | CMTLB },
> +	{ 0x2062, 0x0000, PREFETCH_DEFAULT | CMTLB },
> +	{ 0x2080, 0x0020, PREFETCH_DEFAULT | CMTLB },
> +	{ 0x20c0, 0x0020, PREFETCH_DEFAULT | CMTLB },
> +	{ 0x2100, 0x0020, PREFETCH_DEFAULT | CMTLB },
> +	{ 0x2140, 0x0000, PREFETCH_DEFAULT | CMTLB },
> +	{ 0x2180, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
> +	{ 0x2181, 0x0004, PREFETCH_SHALLOW | CPRE | CMTLB },
> +	{ 0x2183, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> +	{ 0x2184, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
> +	{ 0x2187, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> +};
> +
> +static const struct actlr_config sc7280_gfx_actlr_cfg[] = {
> +	{ 0x0000, 0x07ff, PREFETCH_DEEP | CPRE | CMTLB },
> +};

It's Will "stuck record" Deacon here again to say that I don't think
this data belongs in the driver.

Have a great weekend!

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 6/6] iommu/arm-smmu: add support for PRR bit setup
  2024-08-22 22:37   ` kernel test robot
@ 2024-08-26  9:10     ` Bibek Kumar Patro
  0 siblings, 0 replies; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-08-26  9:10 UTC (permalink / raw)
  To: kernel test robot, robdclark, will, robin.murphy, joro, jgg,
	jsnitsel, robh, krzysztof.kozlowski, quic_c_gdjako,
	dmitry.baryshkov, konrad.dybcio
  Cc: oe-kbuild-all, iommu, linux-arm-msm, linux-arm-kernel,
	linux-kernel



On 8/23/2024 4:07 AM, kernel test robot wrote:
>>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c:266:41: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>       266 |                                         (void *)smmu->ioaddr + ARM_SMMU_GFX_PRR_CFG_LADDR);
>           |                                         ^
>     arch/arm/include/asm/io.h:282:75: note: in definition of macro 'writel_relaxed'
>       282 | #define writel_relaxed(v,c)     __raw_writel((__force u32) cpu_to_le32(v),c)
>           |                                                                           ^
>     drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c:269:41: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>       269 |                                         (void *)smmu->ioaddr + ARM_SMMU_GFX_PRR_CFG_UADDR);
>           |                                         ^
>     arch/arm/include/asm/io.h:282:75: note: in definition of macro 'writel_relaxed'
>       282 | #define writel_relaxed(v,c)     __raw_writel((__force u32) cpu_to_le32(v),c)

This is fixed already in v13, but got missed in v14.
I'll take care of this in the next version.

regards,
Bibek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280
  2024-08-23 15:59   ` Will Deacon
@ 2024-08-26 11:03     ` Bibek Kumar Patro
  2024-08-27 12:47       ` Will Deacon
  0 siblings, 1 reply; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-08-26 11:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: robdclark, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio, iommu, linux-arm-msm, linux-arm-kernel,
	linux-kernel



On 8/23/2024 9:29 PM, Will Deacon wrote:
> On Fri, Aug 16, 2024 at 11:12:58PM +0530, Bibek Kumar Patro wrote:
>> Add ACTLR data table for SC7280 along with support for
>> same including SC7280 specific implementation operations.
>>
>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
>> ---
>>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 58 +++++++++++++++++++++-
>>   1 file changed, 57 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> index dc143b250704..a776c7906c76 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> @@ -31,6 +31,55 @@
>>   #define PREFETCH_MODERATE	(2 << PREFETCH_SHIFT)
>>   #define PREFETCH_DEEP		(3 << PREFETCH_SHIFT)
>>
>> +static const struct actlr_config sc7280_apps_actlr_cfg[] = {
>> +	{ 0x0800, 0x04e0, PREFETCH_DEFAULT | CMTLB },
>> +	{ 0x0900, 0x0402, PREFETCH_SHALLOW | CPRE | CMTLB },
>> +	{ 0x0901, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>> +	{ 0x0d01, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>> +	{ 0x1181, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x1182, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x1183, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x1184, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x1185, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x1186, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x1187, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x1188, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x1189, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x118b, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x118c, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x118d, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x118e, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x118f, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>> +	{ 0x2000, 0x0020, PREFETCH_DEFAULT | CMTLB },
>> +	{ 0x2040, 0x0000, PREFETCH_DEFAULT | CMTLB },
>> +	{ 0x2062, 0x0000, PREFETCH_DEFAULT | CMTLB },
>> +	{ 0x2080, 0x0020, PREFETCH_DEFAULT | CMTLB },
>> +	{ 0x20c0, 0x0020, PREFETCH_DEFAULT | CMTLB },
>> +	{ 0x2100, 0x0020, PREFETCH_DEFAULT | CMTLB },
>> +	{ 0x2140, 0x0000, PREFETCH_DEFAULT | CMTLB },
>> +	{ 0x2180, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
>> +	{ 0x2181, 0x0004, PREFETCH_SHALLOW | CPRE | CMTLB },
>> +	{ 0x2183, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>> +	{ 0x2184, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
>> +	{ 0x2187, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>> +};
>> +
>> +static const struct actlr_config sc7280_gfx_actlr_cfg[] = {
>> +	{ 0x0000, 0x07ff, PREFETCH_DEEP | CPRE | CMTLB },
>> +};
> 
> It's Will "stuck record" Deacon here again to say that I don't think
> this data belongs in the driver.
> 

Hi Will,

It will be difficult to reach a consensus here, with Robin and the DT 
folks okay to keep it in the driver, while you believe it doesn't belong 
there.

Robin, Rob, could you please share your thoughts on concluding the 
placement of this prefetch data?

As discussed earlier [1], the prefetch value for each client doesn’t 
define the hardware topology and is implementation-defined register 
writes used by the software driver.

We're okay with either approach, but these points [2] also raised in the
RFC led us to believe that switching from DT to the driver is the
right approach.


[1]:https://lore.kernel.org/all/2b0d8c5b-7e79-41ff-bc57-003d1b947c16@quicinc.com/
[2]:https://lore.kernel.org/all/a01e7e60-6ead-4a9e-ba90-22a8a6bbd03f@quicinc.com/


> Have a great weekend!
> 
> Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280
  2024-08-26 11:03     ` Bibek Kumar Patro
@ 2024-08-27 12:47       ` Will Deacon
  2024-08-30 10:00         ` Bibek Kumar Patro
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2024-08-27 12:47 UTC (permalink / raw)
  To: Bibek Kumar Patro
  Cc: robdclark, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio, iommu, linux-arm-msm, linux-arm-kernel,
	linux-kernel

On Mon, Aug 26, 2024 at 04:33:24PM +0530, Bibek Kumar Patro wrote:
> 
> 
> On 8/23/2024 9:29 PM, Will Deacon wrote:
> > On Fri, Aug 16, 2024 at 11:12:58PM +0530, Bibek Kumar Patro wrote:
> > > Add ACTLR data table for SC7280 along with support for
> > > same including SC7280 specific implementation operations.
> > > 
> > > Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
> > > ---
> > >   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 58 +++++++++++++++++++++-
> > >   1 file changed, 57 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > > index dc143b250704..a776c7906c76 100644
> > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > > @@ -31,6 +31,55 @@
> > >   #define PREFETCH_MODERATE	(2 << PREFETCH_SHIFT)
> > >   #define PREFETCH_DEEP		(3 << PREFETCH_SHIFT)
> > > 
> > > +static const struct actlr_config sc7280_apps_actlr_cfg[] = {
> > > +	{ 0x0800, 0x04e0, PREFETCH_DEFAULT | CMTLB },
> > > +	{ 0x0900, 0x0402, PREFETCH_SHALLOW | CPRE | CMTLB },
> > > +	{ 0x0901, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> > > +	{ 0x0d01, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> > > +	{ 0x1181, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x1182, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x1183, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x1184, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x1185, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x1186, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x1187, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x1188, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x1189, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x118b, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x118c, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x118d, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x118e, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x118f, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> > > +	{ 0x2000, 0x0020, PREFETCH_DEFAULT | CMTLB },
> > > +	{ 0x2040, 0x0000, PREFETCH_DEFAULT | CMTLB },
> > > +	{ 0x2062, 0x0000, PREFETCH_DEFAULT | CMTLB },
> > > +	{ 0x2080, 0x0020, PREFETCH_DEFAULT | CMTLB },
> > > +	{ 0x20c0, 0x0020, PREFETCH_DEFAULT | CMTLB },
> > > +	{ 0x2100, 0x0020, PREFETCH_DEFAULT | CMTLB },
> > > +	{ 0x2140, 0x0000, PREFETCH_DEFAULT | CMTLB },
> > > +	{ 0x2180, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
> > > +	{ 0x2181, 0x0004, PREFETCH_SHALLOW | CPRE | CMTLB },
> > > +	{ 0x2183, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> > > +	{ 0x2184, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
> > > +	{ 0x2187, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> > > +};
> > > +
> > > +static const struct actlr_config sc7280_gfx_actlr_cfg[] = {
> > > +	{ 0x0000, 0x07ff, PREFETCH_DEEP | CPRE | CMTLB },
> > > +};
> > 
> > It's Will "stuck record" Deacon here again to say that I don't think
> > this data belongs in the driver.
> > 
> 
> Hi Will,
> 
> It will be difficult to reach a consensus here, with Robin and the DT folks
> okay to keep it in the driver, while you believe it doesn't belong there.
> 
> Robin, Rob, could you please share your thoughts on concluding the placement
> of this prefetch data?
> 
> As discussed earlier [1], the prefetch value for each client doesn’t define
> the hardware topology and is implementation-defined register writes used by
> the software driver.

It does reflect the hardware topology though, doesn't it? Those magic hex
masks above refer to stream ids, so the table is hard-coding the prefetch
values for particular matches. If I run on a different SoC configuration
with the same table, then the prefetch settings will be applied to the
wrong devices. How is that not hardware topology?

WIll

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280
  2024-08-27 12:47       ` Will Deacon
@ 2024-08-30 10:00         ` Bibek Kumar Patro
  2024-08-30 12:31           ` Robin Murphy
  0 siblings, 1 reply; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-08-30 10:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: robdclark, robin.murphy, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio, iommu, linux-arm-msm, linux-arm-kernel,
	linux-kernel



On 8/27/2024 6:17 PM, Will Deacon wrote:
> On Mon, Aug 26, 2024 at 04:33:24PM +0530, Bibek Kumar Patro wrote:
>>
>>
>> On 8/23/2024 9:29 PM, Will Deacon wrote:
>>> On Fri, Aug 16, 2024 at 11:12:58PM +0530, Bibek Kumar Patro wrote:
>>>> Add ACTLR data table for SC7280 along with support for
>>>> same including SC7280 specific implementation operations.
>>>>
>>>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
>>>> ---
>>>>    drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 58 +++++++++++++++++++++-
>>>>    1 file changed, 57 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>> index dc143b250704..a776c7906c76 100644
>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>> @@ -31,6 +31,55 @@
>>>>    #define PREFETCH_MODERATE	(2 << PREFETCH_SHIFT)
>>>>    #define PREFETCH_DEEP		(3 << PREFETCH_SHIFT)
>>>>
>>>> +static const struct actlr_config sc7280_apps_actlr_cfg[] = {
>>>> +	{ 0x0800, 0x04e0, PREFETCH_DEFAULT | CMTLB },
>>>> +	{ 0x0900, 0x0402, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>> +	{ 0x0901, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>> +	{ 0x0d01, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>> +	{ 0x1181, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x1182, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x1183, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x1184, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x1185, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x1186, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x1187, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x1188, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x1189, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x118b, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x118c, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x118d, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x118e, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x118f, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +	{ 0x2000, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>> +	{ 0x2040, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>> +	{ 0x2062, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>> +	{ 0x2080, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>> +	{ 0x20c0, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>> +	{ 0x2100, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>> +	{ 0x2140, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>> +	{ 0x2180, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>> +	{ 0x2181, 0x0004, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>> +	{ 0x2183, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>> +	{ 0x2184, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>> +	{ 0x2187, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>> +};
>>>> +
>>>> +static const struct actlr_config sc7280_gfx_actlr_cfg[] = {
>>>> +	{ 0x0000, 0x07ff, PREFETCH_DEEP | CPRE | CMTLB },
>>>> +};
>>>
>>> It's Will "stuck record" Deacon here again to say that I don't think
>>> this data belongs in the driver.
>>>
>>
>> Hi Will,
>>
>> It will be difficult to reach a consensus here, with Robin and the DT folks
>> okay to keep it in the driver, while you believe it doesn't belong there.
>>
>> Robin, Rob, could you please share your thoughts on concluding the placement
>> of this prefetch data?
>>
>> As discussed earlier [1], the prefetch value for each client doesn’t define
>> the hardware topology and is implementation-defined register writes used by
>> the software driver.
> 
> It does reflect the hardware topology though, doesn't it? Those magic hex
> masks above refer to stream ids, so the table is hard-coding the prefetch
> values for particular matches.

That is correct in the sense that stream id is mapped to context bank
where these configurations are applied.
However the other part of it is implementation-defined register/values
for which community opinion was register/value kind of data, should not
belong to device tree and are not generally approved of.

Would also like to point out that the prefetch values are recommended
settings and doesn’t mean these are the only configuration which would
work for the soc.
So the SID-to-prefetch isn't strictly SoC defined but is a software
configuration, IMO.

> If I run on a different SoC configuration > with the same table, then the prefetch settings will be applied to the
> wrong devices. How is that not hardware topology?
> 

The configuration table is tied to SoC compatible string however as I
mentioned above, its basically a s/w recommended setting.
(using prefetch settings other than the recommended values e.g 
PREFECH_DEFAULT instead of PREFETCH_DEEP would not render the device 
unusable unlike changing stream-ids which can make it unusable).

Since it is implementation specific we cannot have a generic DT binding,
tying stream ids to these recommended settings.
Even with qcom specific binding due to dependency on implementation, not
sure if we would be able to maintain consistency.

So from maintenance perspective carrying these in driver appear to be
simpler/flexible. And if it doesn’t violate existing precedence, we
would prefer to carry it that way.

This parallels how _"QoS settings"_ are handled within the driver 
(similar to this example [1]).

[1]. 
https://lore.kernel.org/linux-arm-msm/20231030-sc8280xp-dpu-safe-lut-v1-1-6d485d7b428f@quicinc.com/#t

Thanks & regards,
Bibek

> WIll

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280
  2024-08-30 10:00         ` Bibek Kumar Patro
@ 2024-08-30 12:31           ` Robin Murphy
  2024-09-03 12:59             ` Bibek Kumar Patro
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Murphy @ 2024-08-30 12:31 UTC (permalink / raw)
  To: Bibek Kumar Patro, Will Deacon
  Cc: robdclark, joro, jgg, jsnitsel, robh, krzysztof.kozlowski,
	quic_c_gdjako, dmitry.baryshkov, konrad.dybcio, iommu,
	linux-arm-msm, linux-arm-kernel, linux-kernel

On 30/08/2024 11:00 am, Bibek Kumar Patro wrote:
> 
> 
> On 8/27/2024 6:17 PM, Will Deacon wrote:
>> On Mon, Aug 26, 2024 at 04:33:24PM +0530, Bibek Kumar Patro wrote:
>>>
>>>
>>> On 8/23/2024 9:29 PM, Will Deacon wrote:
>>>> On Fri, Aug 16, 2024 at 11:12:58PM +0530, Bibek Kumar Patro wrote:
>>>>> Add ACTLR data table for SC7280 along with support for
>>>>> same including SC7280 specific implementation operations.
>>>>>
>>>>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
>>>>> ---
>>>>>    drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 58 
>>>>> +++++++++++++++++++++-
>>>>>    1 file changed, 57 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
>>>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>> index dc143b250704..a776c7906c76 100644
>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>> @@ -31,6 +31,55 @@
>>>>>    #define PREFETCH_MODERATE    (2 << PREFETCH_SHIFT)
>>>>>    #define PREFETCH_DEEP        (3 << PREFETCH_SHIFT)
>>>>>
>>>>> +static const struct actlr_config sc7280_apps_actlr_cfg[] = {
>>>>> +    { 0x0800, 0x04e0, PREFETCH_DEFAULT | CMTLB },
>>>>> +    { 0x0900, 0x0402, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>> +    { 0x0901, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>> +    { 0x0d01, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>> +    { 0x1181, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x1182, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x1183, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x1184, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x1185, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x1186, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x1187, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x1188, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x1189, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x118b, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x118c, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x118d, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x118e, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x118f, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +    { 0x2000, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>> +    { 0x2040, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>>> +    { 0x2062, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>>> +    { 0x2080, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>> +    { 0x20c0, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>> +    { 0x2100, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>> +    { 0x2140, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>>> +    { 0x2180, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>> +    { 0x2181, 0x0004, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>> +    { 0x2183, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>> +    { 0x2184, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>> +    { 0x2187, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>> +};
>>>>> +
>>>>> +static const struct actlr_config sc7280_gfx_actlr_cfg[] = {
>>>>> +    { 0x0000, 0x07ff, PREFETCH_DEEP | CPRE | CMTLB },
>>>>> +};
>>>>
>>>> It's Will "stuck record" Deacon here again to say that I don't think
>>>> this data belongs in the driver.
>>>>
>>>
>>> Hi Will,
>>>
>>> It will be difficult to reach a consensus here, with Robin and the DT 
>>> folks
>>> okay to keep it in the driver, while you believe it doesn't belong 
>>> there.
>>>
>>> Robin, Rob, could you please share your thoughts on concluding the 
>>> placement
>>> of this prefetch data?
>>>
>>> As discussed earlier [1], the prefetch value for each client doesn’t 
>>> define
>>> the hardware topology and is implementation-defined register writes 
>>> used by
>>> the software driver.
>>
>> It does reflect the hardware topology though, doesn't it? Those magic hex
>> masks above refer to stream ids, so the table is hard-coding the prefetch
>> values for particular matches.
> 
> That is correct in the sense that stream id is mapped to context bank
> where these configurations are applied.
> However the other part of it is implementation-defined register/values
> for which community opinion was register/value kind of data, should not
> belong to device tree and are not generally approved of.
> 
> Would also like to point out that the prefetch values are recommended
> settings and doesn’t mean these are the only configuration which would
> work for the soc.
> So the SID-to-prefetch isn't strictly SoC defined but is a software
> configuration, IMO.

What's particularly confusing is that most of the IDs encoded here don't 
actually seem to line up with what's in the respective SoC DTSIs...

However by this point I'm wary of whether we've lost sight of *why* 
we're doing this, and that we're deep into begging the question of 
whether identifying devices by StreamID is the right thing to do in the 
first place. For example, as best I can tell from a quick skim, we have 
over 2 dozen lines of data here which all serve the exact same purpose 
of applying PREFETCH_DEEP | CPRE | CMTLB to instances of 
"qcom,fastrpc-compute-cb". In general it seems unlikely that the same 
device would want wildly different prefetch settings across different 
SoCs, or even between different instances in the same SoC, so I'm really 
coming round to the conclusion that this data would probably be best 
handled as an extension of the existing qcom_smmu_client_of_match mechanism.

Thanks,
Robin.

> 
>> If I run on a different SoC configuration > with the same table, then 
>> the prefetch settings will be applied to the
>> wrong devices. How is that not hardware topology?
>>
> 
> The configuration table is tied to SoC compatible string however as I
> mentioned above, its basically a s/w recommended setting.
> (using prefetch settings other than the recommended values e.g 
> PREFECH_DEFAULT instead of PREFETCH_DEEP would not render the device 
> unusable unlike changing stream-ids which can make it unusable).
> 
> Since it is implementation specific we cannot have a generic DT binding,
> tying stream ids to these recommended settings.
> Even with qcom specific binding due to dependency on implementation, not
> sure if we would be able to maintain consistency.
> 
> So from maintenance perspective carrying these in driver appear to be
> simpler/flexible. And if it doesn’t violate existing precedence, we
> would prefer to carry it that way.
> 
> This parallels how _"QoS settings"_ are handled within the driver 
> (similar to this example [1]).
> 
> [1]. 
> https://lore.kernel.org/linux-arm-msm/20231030-sc8280xp-dpu-safe-lut-v1-1-6d485d7b428f@quicinc.com/#t
> 
> Thanks & regards,
> Bibek
> 
>> WIll

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 1/6] iommu/arm-smmu: re-enable context caching in smmu reset operation
  2024-08-16 17:42 ` [PATCH v14 1/6] iommu/arm-smmu: re-enable context caching in smmu reset operation Bibek Kumar Patro
@ 2024-08-30 12:53   ` Robin Murphy
  2024-09-03 12:59     ` Bibek Kumar Patro
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Murphy @ 2024-08-30 12:53 UTC (permalink / raw)
  To: Bibek Kumar Patro, robdclark, will, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio
  Cc: iommu, linux-arm-msm, linux-arm-kernel, linux-kernel

On 16/08/2024 6:42 pm, Bibek Kumar Patro wrote:
> Default MMU-500 reset operation disables context caching in
> prefetch buffer. It is however expected for context banks using
> the ACTLR register to retain their prefetch value during reset
> and runtime suspend.
> 
> Replace default MMU-500 reset operation with Qualcomm specific reset
> operation which envelope the default reset operation and re-enables
> context caching in prefetch buffer for Qualcomm SoCs.
> 
> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
> ---
>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 36 ++++++++++++++++++++--
>   1 file changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 36c6b36ad4ff..8ac1850b852f 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -16,6 +16,16 @@
> 
>   #define QCOM_DUMMY_VAL	-1
> 
> +/*
> + * SMMU-500 TRM defines BIT(0) as CMTLB (Enable context caching in the
> + * macro TLB) and BIT(1) as CPRE (Enable context caching in the prefetch
> + * buffer). The remaining bits are implementation defined and vary across
> + * SoCs.
> + */
> +
> +#define CPRE			(1 << 1)
> +#define CMTLB			(1 << 0)
> +
>   static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
>   {
>   	return container_of(smmu, struct qcom_smmu, smmu);
> @@ -381,11 +391,31 @@ static int qcom_smmu_def_domain_type(struct device *dev)
>   	return match ? IOMMU_DOMAIN_IDENTITY : 0;
>   }
> 
> +static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
> +{
> +	int ret;
> +	u32 val;
> +	int i;
> +
> +	ret = arm_mmu500_reset(smmu);
> +	if (ret)
> +		return ret;
> +
> +	/* arm_mmu500_reset() disables CPRE which is re-enabled here */

I still think it would be good to document why we think this is OK, 
given the reasons for disabling CPRE to begin with.

Thanks,
Robin.

> +	for (i = 0; i < smmu->num_context_banks; ++i) {
> +		val = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR);
> +		val |= CPRE;
> +		arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_ACTLR, val);
> +	}
> +
> +	return 0;
> +}
> +
>   static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
>   {
>   	int ret;
> 
> -	arm_mmu500_reset(smmu);
> +	qcom_smmu500_reset(smmu);
> 
>   	/*
>   	 * To address performance degradation in non-real time clients,
> @@ -412,7 +442,7 @@ static const struct arm_smmu_impl qcom_smmu_500_impl = {
>   	.init_context = qcom_smmu_init_context,
>   	.cfg_probe = qcom_smmu_cfg_probe,
>   	.def_domain_type = qcom_smmu_def_domain_type,
> -	.reset = arm_mmu500_reset,
> +	.reset = qcom_smmu500_reset,
>   	.write_s2cr = qcom_smmu_write_s2cr,
>   	.tlb_sync = qcom_smmu_tlb_sync,
>   #ifdef CONFIG_ARM_SMMU_QCOM_DEBUG
> @@ -445,7 +475,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_v2_impl = {
>   static const struct arm_smmu_impl qcom_adreno_smmu_500_impl = {
>   	.init_context = qcom_adreno_smmu_init_context,
>   	.def_domain_type = qcom_smmu_def_domain_type,
> -	.reset = arm_mmu500_reset,
> +	.reset = qcom_smmu500_reset,
>   	.alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
>   	.write_sctlr = qcom_adreno_smmu_write_sctlr,
>   	.tlb_sync = qcom_smmu_tlb_sync,
> --
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 1/6] iommu/arm-smmu: re-enable context caching in smmu reset operation
  2024-08-30 12:53   ` Robin Murphy
@ 2024-09-03 12:59     ` Bibek Kumar Patro
  0 siblings, 0 replies; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-09-03 12:59 UTC (permalink / raw)
  To: Robin Murphy, robdclark, will, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, dmitry.baryshkov,
	konrad.dybcio
  Cc: iommu, linux-arm-msm, linux-arm-kernel, linux-kernel



On 8/30/2024 6:23 PM, Robin Murphy wrote:
> On 16/08/2024 6:42 pm, Bibek Kumar Patro wrote:
>> Default MMU-500 reset operation disables context caching in
>> prefetch buffer. It is however expected for context banks using
>> the ACTLR register to retain their prefetch value during reset
>> and runtime suspend.
>>
>> Replace default MMU-500 reset operation with Qualcomm specific reset
>> operation which envelope the default reset operation and re-enables
>> context caching in prefetch buffer for Qualcomm SoCs.
>>
>> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
>> ---
>>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 36 ++++++++++++++++++++--
>>   1 file changed, 33 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> index 36c6b36ad4ff..8ac1850b852f 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> @@ -16,6 +16,16 @@
>>
>>   #define QCOM_DUMMY_VAL    -1
>>
>> +/*
>> + * SMMU-500 TRM defines BIT(0) as CMTLB (Enable context caching in the
>> + * macro TLB) and BIT(1) as CPRE (Enable context caching in the prefetch
>> + * buffer). The remaining bits are implementation defined and vary 
>> across
>> + * SoCs.
>> + */
>> +
>> +#define CPRE            (1 << 1)
>> +#define CMTLB            (1 << 0)
>> +
>>   static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
>>   {
>>       return container_of(smmu, struct qcom_smmu, smmu);
>> @@ -381,11 +391,31 @@ static int qcom_smmu_def_domain_type(struct 
>> device *dev)
>>       return match ? IOMMU_DOMAIN_IDENTITY : 0;
>>   }
>>
>> +static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
>> +{
>> +    int ret;
>> +    u32 val;
>> +    int i;
>> +
>> +    ret = arm_mmu500_reset(smmu);
>> +    if (ret)
>> +        return ret;
>> +
>> +    /* arm_mmu500_reset() disables CPRE which is re-enabled here */
> 
> I still think it would be good to document why we think this is OK, 
> given the reasons for disabling CPRE to begin with.
> 

Hi Robin,

Sure, I agree with your point. I’ll elaborate on the comment here to 
document the reason for re-enabling CPRE.

/* The errata for MMU-500 before the r2p2 revision requires CPRE to be
disabled. The arm_mmu500_reset function disables CPRE to accommodate all
RTL revisions. Since all Qualcomm SoCs are on the r2p4 revision, where
the CPRE bit can be enabled, the qcom_smmu500_reset function re-enables
the CPRE bit for the next-page prefetcher to retain the prefetch value
during reset and runtime suspend operations. */

Does this sound good, let me know of your suggestions as well incase.

Thanks & regards,
Bibek

> Thanks,
> Robin.
> 
>> +    for (i = 0; i < smmu->num_context_banks; ++i) {
>> +        val = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR);
>> +        val |= CPRE;
>> +        arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_ACTLR, val);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
>>   {
>>       int ret;
>>
>> -    arm_mmu500_reset(smmu);
>> +    qcom_smmu500_reset(smmu);
>>
>>       /*
>>        * To address performance degradation in non-real time clients,
>> @@ -412,7 +442,7 @@ static const struct arm_smmu_impl 
>> qcom_smmu_500_impl = {
>>       .init_context = qcom_smmu_init_context,
>>       .cfg_probe = qcom_smmu_cfg_probe,
>>       .def_domain_type = qcom_smmu_def_domain_type,
>> -    .reset = arm_mmu500_reset,
>> +    .reset = qcom_smmu500_reset,
>>       .write_s2cr = qcom_smmu_write_s2cr,
>>       .tlb_sync = qcom_smmu_tlb_sync,
>>   #ifdef CONFIG_ARM_SMMU_QCOM_DEBUG
>> @@ -445,7 +475,7 @@ static const struct arm_smmu_impl 
>> qcom_adreno_smmu_v2_impl = {
>>   static const struct arm_smmu_impl qcom_adreno_smmu_500_impl = {
>>       .init_context = qcom_adreno_smmu_init_context,
>>       .def_domain_type = qcom_smmu_def_domain_type,
>> -    .reset = arm_mmu500_reset,
>> +    .reset = qcom_smmu500_reset,
>>       .alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
>>       .write_sctlr = qcom_adreno_smmu_write_sctlr,
>>       .tlb_sync = qcom_smmu_tlb_sync,
>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280
  2024-08-30 12:31           ` Robin Murphy
@ 2024-09-03 12:59             ` Bibek Kumar Patro
  2024-09-03 13:13               ` Dmitry Baryshkov
  0 siblings, 1 reply; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-09-03 12:59 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon
  Cc: robdclark, joro, jgg, jsnitsel, robh, krzysztof.kozlowski,
	quic_c_gdjako, dmitry.baryshkov, konrad.dybcio, iommu,
	linux-arm-msm, linux-arm-kernel, linux-kernel



On 8/30/2024 6:01 PM, Robin Murphy wrote:
> On 30/08/2024 11:00 am, Bibek Kumar Patro wrote:
>>
>>
>> On 8/27/2024 6:17 PM, Will Deacon wrote:
>>> On Mon, Aug 26, 2024 at 04:33:24PM +0530, Bibek Kumar Patro wrote:
>>>>
>>>>
>>>> On 8/23/2024 9:29 PM, Will Deacon wrote:
>>>>> On Fri, Aug 16, 2024 at 11:12:58PM +0530, Bibek Kumar Patro wrote:
>>>>>> Add ACTLR data table for SC7280 along with support for
>>>>>> same including SC7280 specific implementation operations.
>>>>>>
>>>>>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
>>>>>> ---
>>>>>>    drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 58 
>>>>>> +++++++++++++++++++++-
>>>>>>    1 file changed, 57 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
>>>>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>>> index dc143b250704..a776c7906c76 100644
>>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>>> @@ -31,6 +31,55 @@
>>>>>>    #define PREFETCH_MODERATE    (2 << PREFETCH_SHIFT)
>>>>>>    #define PREFETCH_DEEP        (3 << PREFETCH_SHIFT)
>>>>>>
>>>>>> +static const struct actlr_config sc7280_apps_actlr_cfg[] = {
>>>>>> +    { 0x0800, 0x04e0, PREFETCH_DEFAULT | CMTLB },
>>>>>> +    { 0x0900, 0x0402, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>> +    { 0x0901, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>> +    { 0x0d01, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>> +    { 0x1181, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x1182, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x1183, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x1184, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x1185, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x1186, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x1187, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x1188, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x1189, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x118b, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x118c, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x118d, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x118e, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x118f, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +    { 0x2000, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>>> +    { 0x2040, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>>>> +    { 0x2062, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>>>> +    { 0x2080, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>>> +    { 0x20c0, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>>> +    { 0x2100, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>>> +    { 0x2140, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>>>> +    { 0x2180, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>> +    { 0x2181, 0x0004, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>> +    { 0x2183, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>> +    { 0x2184, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>> +    { 0x2187, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>> +};
>>>>>> +
>>>>>> +static const struct actlr_config sc7280_gfx_actlr_cfg[] = {
>>>>>> +    { 0x0000, 0x07ff, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>> +};
>>>>>
>>>>> It's Will "stuck record" Deacon here again to say that I don't think
>>>>> this data belongs in the driver.
>>>>>
>>>>
>>>> Hi Will,
>>>>
>>>> It will be difficult to reach a consensus here, with Robin and the 
>>>> DT folks
>>>> okay to keep it in the driver, while you believe it doesn't belong 
>>>> there.
>>>>
>>>> Robin, Rob, could you please share your thoughts on concluding the 
>>>> placement
>>>> of this prefetch data?
>>>>
>>>> As discussed earlier [1], the prefetch value for each client doesn’t 
>>>> define
>>>> the hardware topology and is implementation-defined register writes 
>>>> used by
>>>> the software driver.
>>>
>>> It does reflect the hardware topology though, doesn't it? Those magic 
>>> hex
>>> masks above refer to stream ids, so the table is hard-coding the 
>>> prefetch
>>> values for particular matches.
>>
>> That is correct in the sense that stream id is mapped to context bank
>> where these configurations are applied.
>> However the other part of it is implementation-defined register/values
>> for which community opinion was register/value kind of data, should not
>> belong to device tree and are not generally approved of.
>>
>> Would also like to point out that the prefetch values are recommended
>> settings and doesn’t mean these are the only configuration which would
>> work for the soc.
>> So the SID-to-prefetch isn't strictly SoC defined but is a software
>> configuration, IMO.
> 
> What's particularly confusing is that most of the IDs encoded here don't 
> actually seem to line up with what's in the respective SoC DTSIs...
> 
> However by this point I'm wary of whether we've lost sight of *why* 
> we're doing this, and that we're deep into begging the question of 
> whether identifying devices by StreamID is the right thing to do in the 
> first place. For example, as best I can tell from a quick skim, we have 
> over 2 dozen lines of data here which all serve the exact same purpose 
> of applying PREFETCH_DEEP | CPRE | CMTLB to instances of 
> "qcom,fastrpc-compute-cb". In general it seems unlikely that the same 
> device would want wildly different prefetch settings across different 
> SoCs, or even between different instances in the same SoC, so I'm really 
> coming round to the conclusion that this data would probably be best 
> handled as an extension of the existing qcom_smmu_client_of_match 
> mechanism.
> 

As per your design idea,do you mean to use qcom_smmu_client_of_match to 
identify the device using compatible string and apply the device 
specific settings for all the SoCs (instead of StreamID based device 
identification) ?

something like this rough snippet(?):

qcom_smmu_find_actlr_client(struct device *dev)
{

	if (of_match_device(qcom_smmu_client_of_match, dev) == 
qcom,fastrpc-compute-cb )
		qcom_smmu_set_actlr_value(dev, (PREFETCH_DEEP | CPRE | CMTLB)); 
/*where (PREFETCH_DEEP | CPRE | CMTLB) is used for compute-cb client.*/

	else if (of_match_device(qcom_smmu_client_of_match, dev) == qcom,adreno )
		qcom_smmu_set_actlr_value(dev, (PREFETCH_SHALLOW | CPRE | CMTLB)); 
/*Where (PREFETCH_SHALLOW | CPRE | CMTLB) is for adreno client. */

}

Let me know if my understanding is incorrect.
Then in this case if different SoC would have a different settings for 
same device, then everytime a new compatible would be necessary for same 
device on different SoC?

On similar lines there is another TBU based approach which I can think 
of. Identify the TBU -> Identify clients from TopoID derived from SID 
range specified in qcom,stream-id-range -> Apply the client
specific settings ?

Both approaches would be driver-based, as they are now.

Also I'd like to point out that in the current design, since we fixed 
the smr_is_subset arguments to make the stream IDs a subset of entries 
in the actlr_cfg table, we can reduce the number of entries in the 
table. This way, fewer SID-mask pairs can accommodate several stream IDs.

Thanks & regards,
Bibek

> Thanks,
> Robin.
> 
>>
>>> If I run on a different SoC configuration > with the same table, then 
>>> the prefetch settings will be applied to the
>>> wrong devices. How is that not hardware topology?
>>>
>>
>> The configuration table is tied to SoC compatible string however as I
>> mentioned above, its basically a s/w recommended setting.
>> (using prefetch settings other than the recommended values e.g 
>> PREFECH_DEFAULT instead of PREFETCH_DEEP would not render the device 
>> unusable unlike changing stream-ids which can make it unusable).
>>
>> Since it is implementation specific we cannot have a generic DT binding,
>> tying stream ids to these recommended settings.
>> Even with qcom specific binding due to dependency on implementation, not
>> sure if we would be able to maintain consistency.
>>
>> So from maintenance perspective carrying these in driver appear to be
>> simpler/flexible. And if it doesn’t violate existing precedence, we
>> would prefer to carry it that way.
>>
>> This parallels how _"QoS settings"_ are handled within the driver 
>> (similar to this example [1]).
>>
>> [1]. 
>> https://lore.kernel.org/linux-arm-msm/20231030-sc8280xp-dpu-safe-lut-v1-1-6d485d7b428f@quicinc.com/#t
>>
>> Thanks & regards,
>> Bibek
>>
>>> WIll

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280
  2024-09-03 12:59             ` Bibek Kumar Patro
@ 2024-09-03 13:13               ` Dmitry Baryshkov
  2024-09-20 19:58                 ` Bibek Kumar Patro
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Baryshkov @ 2024-09-03 13:13 UTC (permalink / raw)
  To: Bibek Kumar Patro
  Cc: Robin Murphy, Will Deacon, robdclark, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, konrad.dybcio, iommu,
	linux-arm-msm, linux-arm-kernel, linux-kernel

On Tue, 3 Sept 2024 at 15:59, Bibek Kumar Patro
<quic_bibekkum@quicinc.com> wrote:
>
>
>
> On 8/30/2024 6:01 PM, Robin Murphy wrote:
> > On 30/08/2024 11:00 am, Bibek Kumar Patro wrote:
> >>
> >>
> >> On 8/27/2024 6:17 PM, Will Deacon wrote:
> >>> On Mon, Aug 26, 2024 at 04:33:24PM +0530, Bibek Kumar Patro wrote:
> >>>>
> >>>>
> >>>> On 8/23/2024 9:29 PM, Will Deacon wrote:
> >>>>> On Fri, Aug 16, 2024 at 11:12:58PM +0530, Bibek Kumar Patro wrote:
> >>>>>> Add ACTLR data table for SC7280 along with support for
> >>>>>> same including SC7280 specific implementation operations.
> >>>>>>
> >>>>>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
> >>>>>> ---
> >>>>>>    drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 58
> >>>>>> +++++++++++++++++++++-
> >>>>>>    1 file changed, 57 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >>>>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >>>>>> index dc143b250704..a776c7906c76 100644
> >>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >>>>>> @@ -31,6 +31,55 @@
> >>>>>>    #define PREFETCH_MODERATE    (2 << PREFETCH_SHIFT)
> >>>>>>    #define PREFETCH_DEEP        (3 << PREFETCH_SHIFT)
> >>>>>>
> >>>>>> +static const struct actlr_config sc7280_apps_actlr_cfg[] = {
> >>>>>> +    { 0x0800, 0x04e0, PREFETCH_DEFAULT | CMTLB },
> >>>>>> +    { 0x0900, 0x0402, PREFETCH_SHALLOW | CPRE | CMTLB },
> >>>>>> +    { 0x0901, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> >>>>>> +    { 0x0d01, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> >>>>>> +    { 0x1181, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x1182, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x1183, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x1184, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x1185, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x1186, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x1187, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x1188, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x1189, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x118b, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x118c, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x118d, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x118e, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x118f, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +    { 0x2000, 0x0020, PREFETCH_DEFAULT | CMTLB },
> >>>>>> +    { 0x2040, 0x0000, PREFETCH_DEFAULT | CMTLB },
> >>>>>> +    { 0x2062, 0x0000, PREFETCH_DEFAULT | CMTLB },
> >>>>>> +    { 0x2080, 0x0020, PREFETCH_DEFAULT | CMTLB },
> >>>>>> +    { 0x20c0, 0x0020, PREFETCH_DEFAULT | CMTLB },
> >>>>>> +    { 0x2100, 0x0020, PREFETCH_DEFAULT | CMTLB },
> >>>>>> +    { 0x2140, 0x0000, PREFETCH_DEFAULT | CMTLB },
> >>>>>> +    { 0x2180, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
> >>>>>> +    { 0x2181, 0x0004, PREFETCH_SHALLOW | CPRE | CMTLB },
> >>>>>> +    { 0x2183, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> >>>>>> +    { 0x2184, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
> >>>>>> +    { 0x2187, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
> >>>>>> +};
> >>>>>> +
> >>>>>> +static const struct actlr_config sc7280_gfx_actlr_cfg[] = {
> >>>>>> +    { 0x0000, 0x07ff, PREFETCH_DEEP | CPRE | CMTLB },
> >>>>>> +};
> >>>>>
> >>>>> It's Will "stuck record" Deacon here again to say that I don't think
> >>>>> this data belongs in the driver.
> >>>>>
> >>>>
> >>>> Hi Will,
> >>>>
> >>>> It will be difficult to reach a consensus here, with Robin and the
> >>>> DT folks
> >>>> okay to keep it in the driver, while you believe it doesn't belong
> >>>> there.
> >>>>
> >>>> Robin, Rob, could you please share your thoughts on concluding the
> >>>> placement
> >>>> of this prefetch data?
> >>>>
> >>>> As discussed earlier [1], the prefetch value for each client doesn’t
> >>>> define
> >>>> the hardware topology and is implementation-defined register writes
> >>>> used by
> >>>> the software driver.
> >>>
> >>> It does reflect the hardware topology though, doesn't it? Those magic
> >>> hex
> >>> masks above refer to stream ids, so the table is hard-coding the
> >>> prefetch
> >>> values for particular matches.
> >>
> >> That is correct in the sense that stream id is mapped to context bank
> >> where these configurations are applied.
> >> However the other part of it is implementation-defined register/values
> >> for which community opinion was register/value kind of data, should not
> >> belong to device tree and are not generally approved of.
> >>
> >> Would also like to point out that the prefetch values are recommended
> >> settings and doesn’t mean these are the only configuration which would
> >> work for the soc.
> >> So the SID-to-prefetch isn't strictly SoC defined but is a software
> >> configuration, IMO.
> >
> > What's particularly confusing is that most of the IDs encoded here don't
> > actually seem to line up with what's in the respective SoC DTSIs...
> >
> > However by this point I'm wary of whether we've lost sight of *why*
> > we're doing this, and that we're deep into begging the question of
> > whether identifying devices by StreamID is the right thing to do in the
> > first place. For example, as best I can tell from a quick skim, we have
> > over 2 dozen lines of data here which all serve the exact same purpose
> > of applying PREFETCH_DEEP | CPRE | CMTLB to instances of
> > "qcom,fastrpc-compute-cb". In general it seems unlikely that the same
> > device would want wildly different prefetch settings across different
> > SoCs, or even between different instances in the same SoC, so I'm really
> > coming round to the conclusion that this data would probably be best
> > handled as an extension of the existing qcom_smmu_client_of_match
> > mechanism.
> >
>
> As per your design idea,do you mean to use qcom_smmu_client_of_match to
> identify the device using compatible string and apply the device
> specific settings for all the SoCs (instead of StreamID based device
> identification) ?
>
> something like this rough snippet(?):
>
> qcom_smmu_find_actlr_client(struct device *dev)
> {
>
>         if (of_match_device(qcom_smmu_client_of_match, dev) ==
> qcom,fastrpc-compute-cb )
>                 qcom_smmu_set_actlr_value(dev, (PREFETCH_DEEP | CPRE | CMTLB));
> /*where (PREFETCH_DEEP | CPRE | CMTLB) is used for compute-cb client.*/
>
>         else if (of_match_device(qcom_smmu_client_of_match, dev) == qcom,adreno )
>                 qcom_smmu_set_actlr_value(dev, (PREFETCH_SHALLOW | CPRE | CMTLB));
> /*Where (PREFETCH_SHALLOW | CPRE | CMTLB) is for adreno client. */

I like this idea, especially once it gets converted into a per-SoC
table of compatibles.

>
> }
>
> Let me know if my understanding is incorrect.
> Then in this case if different SoC would have a different settings for
> same device, then everytime a new compatible would be necessary for same
> device on different SoC?
>
> On similar lines there is another TBU based approach which I can think
> of. Identify the TBU -> Identify clients from TopoID derived from SID
> range specified in qcom,stream-id-range -> Apply the client
> specific settings ?
>
> Both approaches would be driver-based, as they are now.
>
> Also I'd like to point out that in the current design, since we fixed
> the smr_is_subset arguments to make the stream IDs a subset of entries
> in the actlr_cfg table, we can reduce the number of entries in the
> table. This way, fewer SID-mask pairs can accommodate several stream IDs.
>
> Thanks & regards,
> Bibek
>
> > Thanks,
> > Robin.
> >
> >>
> >>> If I run on a different SoC configuration > with the same table, then
> >>> the prefetch settings will be applied to the
> >>> wrong devices. How is that not hardware topology?
> >>>
> >>
> >> The configuration table is tied to SoC compatible string however as I
> >> mentioned above, its basically a s/w recommended setting.
> >> (using prefetch settings other than the recommended values e.g
> >> PREFECH_DEFAULT instead of PREFETCH_DEEP would not render the device
> >> unusable unlike changing stream-ids which can make it unusable).
> >>
> >> Since it is implementation specific we cannot have a generic DT binding,
> >> tying stream ids to these recommended settings.
> >> Even with qcom specific binding due to dependency on implementation, not
> >> sure if we would be able to maintain consistency.
> >>
> >> So from maintenance perspective carrying these in driver appear to be
> >> simpler/flexible. And if it doesn’t violate existing precedence, we
> >> would prefer to carry it that way.
> >>
> >> This parallels how _"QoS settings"_ are handled within the driver
> >> (similar to this example [1]).
> >>
> >> [1].
> >> https://lore.kernel.org/linux-arm-msm/20231030-sc8280xp-dpu-safe-lut-v1-1-6d485d7b428f@quicinc.com/#t
> >>
> >> Thanks & regards,
> >> Bibek
> >>
> >>> WIll



-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280
  2024-09-03 13:13               ` Dmitry Baryshkov
@ 2024-09-20 19:58                 ` Bibek Kumar Patro
  0 siblings, 0 replies; 19+ messages in thread
From: Bibek Kumar Patro @ 2024-09-20 19:58 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Robin Murphy, Will Deacon, robdclark, joro, jgg, jsnitsel, robh,
	krzysztof.kozlowski, quic_c_gdjako, konrad.dybcio, iommu,
	linux-arm-msm, linux-arm-kernel, linux-kernel



On 9/3/2024 6:43 PM, Dmitry Baryshkov wrote:
> On Tue, 3 Sept 2024 at 15:59, Bibek Kumar Patro
> <quic_bibekkum@quicinc.com> wrote:
>>
>>
>>
>> On 8/30/2024 6:01 PM, Robin Murphy wrote:
>>> On 30/08/2024 11:00 am, Bibek Kumar Patro wrote:
>>>>
>>>>
>>>> On 8/27/2024 6:17 PM, Will Deacon wrote:
>>>>> On Mon, Aug 26, 2024 at 04:33:24PM +0530, Bibek Kumar Patro wrote:
>>>>>>
>>>>>>
>>>>>> On 8/23/2024 9:29 PM, Will Deacon wrote:
>>>>>>> On Fri, Aug 16, 2024 at 11:12:58PM +0530, Bibek Kumar Patro wrote:
>>>>>>>> Add ACTLR data table for SC7280 along with support for
>>>>>>>> same including SC7280 specific implementation operations.
>>>>>>>>
>>>>>>>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com>
>>>>>>>> ---
>>>>>>>>     drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 58
>>>>>>>> +++++++++++++++++++++-
>>>>>>>>     1 file changed, 57 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>>>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>>>>> index dc143b250704..a776c7906c76 100644
>>>>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>>>>> @@ -31,6 +31,55 @@
>>>>>>>>     #define PREFETCH_MODERATE    (2 << PREFETCH_SHIFT)
>>>>>>>>     #define PREFETCH_DEEP        (3 << PREFETCH_SHIFT)
>>>>>>>>
>>>>>>>> +static const struct actlr_config sc7280_apps_actlr_cfg[] = {
>>>>>>>> +    { 0x0800, 0x04e0, PREFETCH_DEFAULT | CMTLB },
>>>>>>>> +    { 0x0900, 0x0402, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>>>> +    { 0x0901, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>>>> +    { 0x0d01, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>>>> +    { 0x1181, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x1182, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x1183, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x1184, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x1185, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x1186, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x1187, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x1188, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x1189, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x118b, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x118c, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x118d, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x118e, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x118f, 0x0420, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +    { 0x2000, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>>>>> +    { 0x2040, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>>>>>> +    { 0x2062, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>>>>>> +    { 0x2080, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>>>>> +    { 0x20c0, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>>>>> +    { 0x2100, 0x0020, PREFETCH_DEFAULT | CMTLB },
>>>>>>>> +    { 0x2140, 0x0000, PREFETCH_DEFAULT | CMTLB },
>>>>>>>> +    { 0x2180, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>>>> +    { 0x2181, 0x0004, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>>>> +    { 0x2183, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>>>> +    { 0x2184, 0x0020, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>>>> +    { 0x2187, 0x0000, PREFETCH_SHALLOW | CPRE | CMTLB },
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +static const struct actlr_config sc7280_gfx_actlr_cfg[] = {
>>>>>>>> +    { 0x0000, 0x07ff, PREFETCH_DEEP | CPRE | CMTLB },
>>>>>>>> +};
>>>>>>>
>>>>>>> It's Will "stuck record" Deacon here again to say that I don't think
>>>>>>> this data belongs in the driver.
>>>>>>>
>>>>>>
>>>>>> Hi Will,
>>>>>>
>>>>>> It will be difficult to reach a consensus here, with Robin and the
>>>>>> DT folks
>>>>>> okay to keep it in the driver, while you believe it doesn't belong
>>>>>> there.
>>>>>>
>>>>>> Robin, Rob, could you please share your thoughts on concluding the
>>>>>> placement
>>>>>> of this prefetch data?
>>>>>>
>>>>>> As discussed earlier [1], the prefetch value for each client doesn’t
>>>>>> define
>>>>>> the hardware topology and is implementation-defined register writes
>>>>>> used by
>>>>>> the software driver.
>>>>>
>>>>> It does reflect the hardware topology though, doesn't it? Those magic
>>>>> hex
>>>>> masks above refer to stream ids, so the table is hard-coding the
>>>>> prefetch
>>>>> values for particular matches.
>>>>
>>>> That is correct in the sense that stream id is mapped to context bank
>>>> where these configurations are applied.
>>>> However the other part of it is implementation-defined register/values
>>>> for which community opinion was register/value kind of data, should not
>>>> belong to device tree and are not generally approved of.
>>>>
>>>> Would also like to point out that the prefetch values are recommended
>>>> settings and doesn’t mean these are the only configuration which would
>>>> work for the soc.
>>>> So the SID-to-prefetch isn't strictly SoC defined but is a software
>>>> configuration, IMO.
>>>
>>> What's particularly confusing is that most of the IDs encoded here don't
>>> actually seem to line up with what's in the respective SoC DTSIs...
>>>
>>> However by this point I'm wary of whether we've lost sight of *why*
>>> we're doing this, and that we're deep into begging the question of
>>> whether identifying devices by StreamID is the right thing to do in the
>>> first place. For example, as best I can tell from a quick skim, we have
>>> over 2 dozen lines of data here which all serve the exact same purpose
>>> of applying PREFETCH_DEEP | CPRE | CMTLB to instances of
>>> "qcom,fastrpc-compute-cb". In general it seems unlikely that the same
>>> device would want wildly different prefetch settings across different
>>> SoCs, or even between different instances in the same SoC, so I'm really
>>> coming round to the conclusion that this data would probably be best
>>> handled as an extension of the existing qcom_smmu_client_of_match
>>> mechanism.
>>>
>>
>> As per your design idea,do you mean to use qcom_smmu_client_of_match to
>> identify the device using compatible string and apply the device
>> specific settings for all the SoCs (instead of StreamID based device
>> identification) ?
>>
>> something like this rough snippet(?):
>>
>> qcom_smmu_find_actlr_client(struct device *dev)
>> {
>>
>>          if (of_match_device(qcom_smmu_client_of_match, dev) ==
>> qcom,fastrpc-compute-cb )
>>                  qcom_smmu_set_actlr_value(dev, (PREFETCH_DEEP | CPRE | CMTLB));
>> /*where (PREFETCH_DEEP | CPRE | CMTLB) is used for compute-cb client.*/
>>
>>          else if (of_match_device(qcom_smmu_client_of_match, dev) == qcom,adreno )
>>                  qcom_smmu_set_actlr_value(dev, (PREFETCH_SHALLOW | CPRE | CMTLB));
>> /*Where (PREFETCH_SHALLOW | CPRE | CMTLB) is for adreno client. */
> 
> I like this idea, especially once it gets converted into a per-SoC
> table of compatibles.

Ack, Just posted the latest version v15 for this series , based on the 
compatible string based design approach [1] as discussed.

While completion of patch, I came to know that it might sometimes
be possible on Qualcomm SoC's, same IP/device on 2 different SoC
can have different ACTLR settings as well.
[1] _can take care of that case_ as well through different compatible
strings but this can be handled through per-SoC solution as well.

In the latest patch [1], I am going forward with Robin's suggestion
of single table per smmu, and can later follow up there if any
conflict happens.

[1]: 
https://lore.kernel.org/all/20240920155813.3434021-6-quic_bibekkum@quicinc.com/>

Thanks & regards,
Bibek

> 
>>
>> }
>>
>> Let me know if my understanding is incorrect.
>> Then in this case if different SoC would have a different settings for
>> same device, then everytime a new compatible would be necessary for same
>> device on different SoC?
>>
>> On similar lines there is another TBU based approach which I can think
>> of. Identify the TBU -> Identify clients from TopoID derived from SID
>> range specified in qcom,stream-id-range -> Apply the client
>> specific settings ?
>>
>> Both approaches would be driver-based, as they are now.
>>
>> Also I'd like to point out that in the current design, since we fixed
>> the smr_is_subset arguments to make the stream IDs a subset of entries
>> in the actlr_cfg table, we can reduce the number of entries in the
>> table. This way, fewer SID-mask pairs can accommodate several stream IDs.
>>
>> Thanks & regards,
>> Bibek
>>
>>> Thanks,
>>> Robin.
>>>
>>>>
>>>>> If I run on a different SoC configuration > with the same table, then
>>>>> the prefetch settings will be applied to the
>>>>> wrong devices. How is that not hardware topology?
>>>>>
>>>>
>>>> The configuration table is tied to SoC compatible string however as I
>>>> mentioned above, its basically a s/w recommended setting.
>>>> (using prefetch settings other than the recommended values e.g
>>>> PREFECH_DEFAULT instead of PREFETCH_DEEP would not render the device
>>>> unusable unlike changing stream-ids which can make it unusable).
>>>>
>>>> Since it is implementation specific we cannot have a generic DT binding,
>>>> tying stream ids to these recommended settings.
>>>> Even with qcom specific binding due to dependency on implementation, not
>>>> sure if we would be able to maintain consistency.
>>>>
>>>> So from maintenance perspective carrying these in driver appear to be
>>>> simpler/flexible. And if it doesn’t violate existing precedence, we
>>>> would prefer to carry it that way.
>>>>
>>>> This parallels how _"QoS settings"_ are handled within the driver
>>>> (similar to this example [1]).
>>>>
>>>> [1].
>>>> https://lore.kernel.org/linux-arm-msm/20231030-sc8280xp-dpu-safe-lut-v1-1-6d485d7b428f@quicinc.com/#t
>>>>
>>>> Thanks & regards,
>>>> Bibek
>>>>
>>>>> WIll
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-09-20 19:59 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-16 17:42 [PATCH v14 0/6] iommu/arm-smmu: introduction of ACTLR implementation for Qualcomm SoCs Bibek Kumar Patro
2024-08-16 17:42 ` [PATCH v14 1/6] iommu/arm-smmu: re-enable context caching in smmu reset operation Bibek Kumar Patro
2024-08-30 12:53   ` Robin Murphy
2024-09-03 12:59     ` Bibek Kumar Patro
2024-08-16 17:42 ` [PATCH v14 2/6] iommu/arm-smmu: refactor qcom_smmu structure to include single pointer Bibek Kumar Patro
2024-08-16 17:42 ` [PATCH v14 3/6] iommu/arm-smmu: introduction of ACTLR for custom prefetcher settings Bibek Kumar Patro
2024-08-16 17:42 ` [PATCH v14 4/6] iommu/arm-smmu: add ACTLR data and support for SM8550 Bibek Kumar Patro
2024-08-16 17:42 ` [PATCH v14 5/6] iommu/arm-smmu: add ACTLR data and support for SC7280 Bibek Kumar Patro
2024-08-23 15:59   ` Will Deacon
2024-08-26 11:03     ` Bibek Kumar Patro
2024-08-27 12:47       ` Will Deacon
2024-08-30 10:00         ` Bibek Kumar Patro
2024-08-30 12:31           ` Robin Murphy
2024-09-03 12:59             ` Bibek Kumar Patro
2024-09-03 13:13               ` Dmitry Baryshkov
2024-09-20 19:58                 ` Bibek Kumar Patro
2024-08-16 17:42 ` [PATCH v14 6/6] iommu/arm-smmu: add support for PRR bit setup Bibek Kumar Patro
2024-08-22 22:37   ` kernel test robot
2024-08-26  9:10     ` Bibek Kumar Patro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox