linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/2] Add Qualcomm extended CTI support
@ 2025-12-02  6:42 Yingchao Deng
  2025-12-02  6:42 ` [PATCH v6 1/2] coresight: cti: Convert trigger usage fields to dynamic bitmaps and arrays Yingchao Deng
  2025-12-02  6:42 ` [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support Yingchao Deng
  0 siblings, 2 replies; 26+ messages in thread
From: Yingchao Deng @ 2025-12-02  6:42 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin
  Cc: Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Yingchao Deng,
	Mao Jinlong

The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI.
It allows a debugger to send to trigger events to a processor or to send
a trigger event to one or more processors when a trigger event occurs on
another processor on the same SoC, or even between SoCs.

QCOM extended CTI supports up to 128 triggers. And some of the register
offsets are changed.

The commands to configure CTI triggers are the same as ARM's CTI.

Changes in v6:
1. Rename regs_idx to ext_reg_sel and add information in documentation
   file.
2. Reset CLAIMSET to zero for qcom-cti during probe.
3. Retrieve idx value under spinlock.
4. Use yearless copyright for qcom-cti.h.
Link to v5 - https://lore.kernel.org/all/20251020-extended_cti-v5-0-6f193da2d467@oss.qualcomm.com/

Changes in v5:
1. Move common part in qcom-cti.h to coresight-cti.h.
2. Convert trigger usage fields to dynamic bitmaps and arrays.
3. Fix holes in struct cti_config to save some space.
4. Revert the previous changes related to the claim tag in
   cti_enable/disable_hw.
Link to v4 - https://lore.kernel.org/linux-arm-msm/20250902-extended_cti-v4-1-7677de04b416@oss.qualcomm.com/

Changes in v4:
1. Read the DEVARCH registers to identify Qualcomm CTI.
2. Add a reg_idx node, and refactor the coresight_cti_reg_show() and
coresight_cti_reg_store() functions accordingly.
3. The register offsets specific to Qualcomm CTI are moved to qcom_cti.h.
Link to v3 - https://lore.kernel.org/linux-arm-msm/20250722081405.2947294-1-quic_jinlmao@quicinc.com/

Changes in v3:
1. Rename is_extended_cti() to of_is_extended_cti().
2. Add the missing 'i' when write the CTI trigger registers.
3. Convert the multi-line output in sysfs to single line.
4. Initialize offset arrays using designated initializer.
Link to V2 - https://lore.kernel.org/all/20250429071841.1158315-3-quic_jinlmao@quicinc.com/

Changes in V2:
1. Add enum for compatible items.
2. Move offset arrays to coresight-cti-core

Signed-off-by: Jinlong Mao <jinlong.mao@oss.qualcomm.com>
Signed-off-by: Yingchao Deng <yingchao.deng@oss.qualcomm.com>
---
Yingchao Deng (2):
      coresight: cti: Convert trigger usage fields to dynamic bitmaps and arrays
      coresight: cti: Add Qualcomm extended CTI support

 .../ABI/testing/sysfs-bus-coresight-devices-cti    |  11 ++
 drivers/hwtracing/coresight/coresight-cti-core.c   | 153 ++++++++++++++---
 .../hwtracing/coresight/coresight-cti-platform.c   |  16 +-
 drivers/hwtracing/coresight/coresight-cti-sysfs.c  | 187 ++++++++++++++++-----
 drivers/hwtracing/coresight/coresight-cti.h        |  60 ++++++-
 drivers/hwtracing/coresight/qcom-cti.h             |  29 ++++
 6 files changed, 369 insertions(+), 87 deletions(-)
---
base-commit: 7d31f578f3230f3b7b33b0930b08f9afd8429817
change-id: 20251128-extended_cti-3ba8576629f7

Best regards,
-- 
Yingchao Deng <yingchao.deng@oss.qualcomm.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v6 1/2] coresight: cti: Convert trigger usage fields to dynamic bitmaps and arrays
  2025-12-02  6:42 [PATCH v6 0/2] Add Qualcomm extended CTI support Yingchao Deng
@ 2025-12-02  6:42 ` Yingchao Deng
  2025-12-04  9:54   ` Mike Leach
  2025-12-02  6:42 ` [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support Yingchao Deng
  1 sibling, 1 reply; 26+ messages in thread
From: Yingchao Deng @ 2025-12-02  6:42 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin
  Cc: Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Yingchao Deng

Replace the fixed-size u32 fields in the cti_config and cti_trig_grp
structure with dynamically allocated bitmaps and arrays. This allows
memory to be allocated based on the actual number of triggers during probe
time, reducing memory footprint and improving scalability for platforms
with varying trigger counts.
Additionally, repack struct cti_config to reduce its size from 80 bytes to
72 bytes.

Signed-off-by: Yingchao Deng <yingchao.deng@oss.qualcomm.com>
---
 drivers/hwtracing/coresight/coresight-cti-core.c   | 58 ++++++++++++++++------
 .../hwtracing/coresight/coresight-cti-platform.c   | 16 +++---
 drivers/hwtracing/coresight/coresight-cti-sysfs.c  | 10 ++--
 drivers/hwtracing/coresight/coresight-cti.h        | 17 ++++---
 4 files changed, 65 insertions(+), 36 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
index bfbc365bb2ef..f9970e40dd59 100644
--- a/drivers/hwtracing/coresight/coresight-cti-core.c
+++ b/drivers/hwtracing/coresight/coresight-cti-core.c
@@ -214,8 +214,8 @@ void cti_write_intack(struct device *dev, u32 ackval)
 /* DEVID[19:16] - number of CTM channels */
 #define CTI_DEVID_CTMCHANNELS(devid_val) ((int) BMVAL(devid_val, 16, 19))
 
-static void cti_set_default_config(struct device *dev,
-				   struct cti_drvdata *drvdata)
+static int cti_set_default_config(struct device *dev,
+				  struct cti_drvdata *drvdata)
 {
 	struct cti_config *config = &drvdata->config;
 	u32 devid;
@@ -234,12 +234,33 @@ static void cti_set_default_config(struct device *dev,
 		config->nr_trig_max = CTIINOUTEN_MAX;
 	}
 
+	config->trig_in_use = devm_bitmap_zalloc(dev, config->nr_trig_max, GFP_KERNEL);
+	if (!config->trig_in_use)
+		return -ENOMEM;
+
+	config->trig_out_use = devm_bitmap_zalloc(dev, config->nr_trig_max, GFP_KERNEL);
+	if (!config->trig_out_use)
+		return -ENOMEM;
+
+	config->trig_out_filter = devm_bitmap_zalloc(dev, config->nr_trig_max, GFP_KERNEL);
+	if (!config->trig_out_filter)
+		return -ENOMEM;
+
+	config->ctiinen = devm_kcalloc(dev, config->nr_trig_max, sizeof(u32), GFP_KERNEL);
+	if (!config->ctiinen)
+		return -ENOMEM;
+
+	config->ctiouten = devm_kcalloc(dev, config->nr_trig_max, sizeof(u32), GFP_KERNEL);
+	if (!config->ctiouten)
+		return -ENOMEM;
+
 	config->nr_ctm_channels = CTI_DEVID_CTMCHANNELS(devid);
 
 	/* Most regs default to 0 as zalloc'ed except...*/
 	config->trig_filter_enable = true;
 	config->ctigate = GENMASK(config->nr_ctm_channels - 1, 0);
 	config->enable_req_count = 0;
+	return 0;
 }
 
 /*
@@ -270,8 +291,10 @@ int cti_add_connection_entry(struct device *dev, struct cti_drvdata *drvdata,
 	cti_dev->nr_trig_con++;
 
 	/* add connection usage bit info to overall info */
-	drvdata->config.trig_in_use |= tc->con_in->used_mask;
-	drvdata->config.trig_out_use |= tc->con_out->used_mask;
+	bitmap_or(drvdata->config.trig_in_use, drvdata->config.trig_in_use,
+		  tc->con_in->used_mask, drvdata->config.nr_trig_max);
+	bitmap_or(drvdata->config.trig_out_use, drvdata->config.trig_out_use,
+		  tc->con_out->used_mask, drvdata->config.nr_trig_max);
 
 	return 0;
 }
@@ -293,12 +316,20 @@ struct cti_trig_con *cti_allocate_trig_con(struct device *dev, int in_sigs,
 	if (!in)
 		return NULL;
 
+	in->used_mask = devm_bitmap_alloc(dev, in_sigs, GFP_KERNEL);
+	if (!in->used_mask)
+		return NULL;
+
 	out = devm_kzalloc(dev,
 			   offsetof(struct cti_trig_grp, sig_types[out_sigs]),
 			   GFP_KERNEL);
 	if (!out)
 		return NULL;
 
+	out->used_mask = devm_bitmap_alloc(dev, out_sigs, GFP_KERNEL);
+	if (!out->used_mask)
+		return NULL;
+
 	tc->con_in = in;
 	tc->con_out = out;
 	tc->con_in->nr_sigs = in_sigs;
@@ -314,7 +345,6 @@ int cti_add_default_connection(struct device *dev, struct cti_drvdata *drvdata)
 {
 	int ret = 0;
 	int n_trigs = drvdata->config.nr_trig_max;
-	u32 n_trig_mask = GENMASK(n_trigs - 1, 0);
 	struct cti_trig_con *tc = NULL;
 
 	/*
@@ -325,8 +355,9 @@ int cti_add_default_connection(struct device *dev, struct cti_drvdata *drvdata)
 	if (!tc)
 		return -ENOMEM;
 
-	tc->con_in->used_mask = n_trig_mask;
-	tc->con_out->used_mask = n_trig_mask;
+	bitmap_fill(tc->con_in->used_mask, n_trigs);
+	bitmap_fill(tc->con_out->used_mask, n_trigs);
+
 	ret = cti_add_connection_entry(dev, drvdata, tc, NULL, "default");
 	return ret;
 }
@@ -339,7 +370,6 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
 {
 	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
 	struct cti_config *config = &drvdata->config;
-	u32 trig_bitmask;
 	u32 chan_bitmask;
 	u32 reg_value;
 	int reg_offset;
@@ -349,18 +379,16 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
 	   (trigger_idx >= config->nr_trig_max))
 		return -EINVAL;
 
-	trig_bitmask = BIT(trigger_idx);
-
 	/* ensure registered triggers and not out filtered */
 	if (direction == CTI_TRIG_IN)	{
-		if (!(trig_bitmask & config->trig_in_use))
+		if (!(test_bit(trigger_idx, config->trig_in_use)))
 			return -EINVAL;
 	} else {
-		if (!(trig_bitmask & config->trig_out_use))
+		if (!(test_bit(trigger_idx, config->trig_out_use)))
 			return -EINVAL;
 
 		if ((config->trig_filter_enable) &&
-		    (config->trig_out_filter & trig_bitmask))
+		    test_bit(trigger_idx, config->trig_out_filter))
 			return -EINVAL;
 	}
 
@@ -892,7 +920,9 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
 	raw_spin_lock_init(&drvdata->spinlock);
 
 	/* initialise CTI driver config values */
-	cti_set_default_config(dev, drvdata);
+	ret = cti_set_default_config(dev, drvdata);
+	if (ret)
+		return ret;
 
 	pdata = coresight_cti_get_platform_data(dev);
 	if (IS_ERR(pdata)) {
diff --git a/drivers/hwtracing/coresight/coresight-cti-platform.c b/drivers/hwtracing/coresight/coresight-cti-platform.c
index d0ae10bf6128..4bef860a0484 100644
--- a/drivers/hwtracing/coresight/coresight-cti-platform.c
+++ b/drivers/hwtracing/coresight/coresight-cti-platform.c
@@ -136,8 +136,8 @@ static int cti_plat_create_v8_etm_connection(struct device *dev,
 		goto create_v8_etm_out;
 
 	/* build connection data */
-	tc->con_in->used_mask = 0xF0; /* sigs <4,5,6,7> */
-	tc->con_out->used_mask = 0xF0; /* sigs <4,5,6,7> */
+	bitmap_set(tc->con_in->used_mask, 4, 4); /* sigs <4,5,6,7> */
+	bitmap_set(tc->con_out->used_mask, 4, 4); /* sigs <4,5,6,7> */
 
 	/*
 	 * The EXTOUT type signals from the ETM are connected to a set of input
@@ -194,10 +194,10 @@ static int cti_plat_create_v8_connections(struct device *dev,
 		goto of_create_v8_out;
 
 	/* Set the v8 PE CTI connection data */
-	tc->con_in->used_mask = 0x3; /* sigs <0 1> */
+	bitmap_set(tc->con_in->used_mask, 0, 2); /* sigs <0 1> */
 	tc->con_in->sig_types[0] = PE_DBGTRIGGER;
 	tc->con_in->sig_types[1] = PE_PMUIRQ;
-	tc->con_out->used_mask = 0x7; /* sigs <0 1 2 > */
+	bitmap_set(tc->con_out->used_mask, 0, 3); /* sigs <0 1 2 > */
 	tc->con_out->sig_types[0] = PE_EDBGREQ;
 	tc->con_out->sig_types[1] = PE_DBGRESTART;
 	tc->con_out->sig_types[2] = PE_CTIIRQ;
@@ -213,7 +213,7 @@ static int cti_plat_create_v8_connections(struct device *dev,
 		goto of_create_v8_out;
 
 	/* filter pe_edbgreq - PE trigout sig <0> */
-	drvdata->config.trig_out_filter |= 0x1;
+	set_bit(0, drvdata->config.trig_out_filter);
 
 of_create_v8_out:
 	return ret;
@@ -257,7 +257,7 @@ static int cti_plat_read_trig_group(struct cti_trig_grp *tgrp,
 	if (!err) {
 		/* set the signal usage mask */
 		for (idx = 0; idx < tgrp->nr_sigs; idx++)
-			tgrp->used_mask |= BIT(values[idx]);
+			set_bit(values[idx], tgrp->used_mask);
 	}
 
 	kfree(values);
@@ -331,7 +331,9 @@ static int cti_plat_process_filter_sigs(struct cti_drvdata *drvdata,
 
 	err = cti_plat_read_trig_group(tg, fwnode, CTI_DT_FILTER_OUT_SIGS);
 	if (!err)
-		drvdata->config.trig_out_filter |= tg->used_mask;
+		bitmap_or(drvdata->config.trig_out_filter,
+			  drvdata->config.trig_out_filter,
+			  tg->used_mask, drvdata->config.nr_trig_max);
 
 	kfree(tg);
 	return err;
diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
index 572b80ee96fb..a9df77215141 100644
--- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c
+++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
@@ -711,10 +711,8 @@ static ssize_t trigout_filtered_show(struct device *dev,
 	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
 	struct cti_config *cfg = &drvdata->config;
 	int size = 0, nr_trig_max = cfg->nr_trig_max;
-	unsigned long mask = cfg->trig_out_filter;
 
-	if (mask)
-		size = bitmap_print_to_pagebuf(true, buf, &mask, nr_trig_max);
+	size = bitmap_print_to_pagebuf(true, buf, cfg->trig_out_filter, nr_trig_max);
 	return size;
 }
 static DEVICE_ATTR_RO(trigout_filtered);
@@ -926,9 +924,8 @@ static ssize_t trigin_sig_show(struct device *dev,
 	struct cti_trig_con *con = (struct cti_trig_con *)ext_attr->var;
 	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
 	struct cti_config *cfg = &drvdata->config;
-	unsigned long mask = con->con_in->used_mask;
 
-	return bitmap_print_to_pagebuf(true, buf, &mask, cfg->nr_trig_max);
+	return bitmap_print_to_pagebuf(true, buf, con->con_in->used_mask, cfg->nr_trig_max);
 }
 
 static ssize_t trigout_sig_show(struct device *dev,
@@ -940,9 +937,8 @@ static ssize_t trigout_sig_show(struct device *dev,
 	struct cti_trig_con *con = (struct cti_trig_con *)ext_attr->var;
 	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
 	struct cti_config *cfg = &drvdata->config;
-	unsigned long mask = con->con_out->used_mask;
 
-	return bitmap_print_to_pagebuf(true, buf, &mask, cfg->nr_trig_max);
+	return bitmap_print_to_pagebuf(true, buf, con->con_out->used_mask, cfg->nr_trig_max);
 }
 
 /* convert a sig type id to a name */
diff --git a/drivers/hwtracing/coresight/coresight-cti.h b/drivers/hwtracing/coresight/coresight-cti.h
index 4f89091ee93f..e7b88b07cffe 100644
--- a/drivers/hwtracing/coresight/coresight-cti.h
+++ b/drivers/hwtracing/coresight/coresight-cti.h
@@ -68,7 +68,7 @@ struct fwnode_handle;
  */
 struct cti_trig_grp {
 	int nr_sigs;
-	u32 used_mask;
+	unsigned long *used_mask;
 	int sig_types[];
 };
 
@@ -146,20 +146,21 @@ struct cti_config {
 	bool hw_enabled;
 	bool hw_powered;
 
-	/* registered triggers and filtering */
-	u32 trig_in_use;
-	u32 trig_out_use;
-	u32 trig_out_filter;
 	bool trig_filter_enable;
 	u8 xtrig_rchan_sel;
 
 	/* cti cross trig programmable regs */
-	u32 ctiappset;
 	u8 ctiinout_sel;
-	u32 ctiinen[CTIINOUTEN_MAX];
-	u32 ctiouten[CTIINOUTEN_MAX];
+	u32 ctiappset;
 	u32 ctigate;
 	u32 asicctl;
+	u32 *ctiinen;
+	u32 *ctiouten;
+
+	/* registered triggers and filtering */
+	unsigned long *trig_in_use;
+	unsigned long *trig_out_use;
+	unsigned long *trig_out_filter;
 };
 
 /**

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-02  6:42 [PATCH v6 0/2] Add Qualcomm extended CTI support Yingchao Deng
  2025-12-02  6:42 ` [PATCH v6 1/2] coresight: cti: Convert trigger usage fields to dynamic bitmaps and arrays Yingchao Deng
@ 2025-12-02  6:42 ` Yingchao Deng
  2025-12-03 18:29   ` Leo Yan
  1 sibling, 1 reply; 26+ messages in thread
From: Yingchao Deng @ 2025-12-02  6:42 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin
  Cc: Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Yingchao Deng,
	Mao Jinlong

The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI.
It allows a debugger to send to trigger events to a processor or to send
a trigger event to one or more processors when a trigger event occurs
on another processor on the same SoC, or even between SoCs. Qualcomm CTI
implementation differs from the standard CTI in the following aspects:

1. The number of supported triggers is extended to 128.
2. Several register offsets differ from the CoreSight specification.

Co-developed-by: Jinlong Mao <jinlong.mao@oss.qualcomm.com>
Signed-off-by: Jinlong Mao <jinlong.mao@oss.qualcomm.com>
Signed-off-by: Yingchao Deng <yingchao.deng@oss.qualcomm.com>
---
 .../ABI/testing/sysfs-bus-coresight-devices-cti    |  11 ++
 drivers/hwtracing/coresight/coresight-cti-core.c   |  95 +++++++++--
 drivers/hwtracing/coresight/coresight-cti-sysfs.c  | 177 ++++++++++++++++-----
 drivers/hwtracing/coresight/coresight-cti.h        |  43 ++++-
 drivers/hwtracing/coresight/qcom-cti.h             |  29 ++++
 5 files changed, 304 insertions(+), 51 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-cti b/Documentation/ABI/testing/sysfs-bus-coresight-devices-cti
index a2aef7f5a6d7..9478dfcee03b 100644
--- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-cti
+++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-cti
@@ -245,3 +245,14 @@ Date:           Aug 2025
 KernelVersion   6.18
 Contact:        Mao Jinlong <quic_jinlmao@quicinc.com>
 Description:    (Read) Show hardware context information of device.
+
+What:           /sys/bus/coresight/devices/<cti-name>/regs/ext_reg_sel
+Date:           Dec 2025
+KernelVersion:  6.19
+Contact:        Mao Jinlong <jinlong.mao@oss.qualcomm.com>
+Description:    (RW) Select the index for extended registers.
+		QCOM CTI supports up to 128 triggers, there are 6 registers
+		need to be expanded to up to 4 instances:
+		CTITRIGINSTATUS, CTITRIGOUTSTATUS,
+		ITTRIGIN, ITTRIGOUT,
+		ITTRIGINACK, ITTRIGOUTACK.
diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
index f9970e40dd59..51b3644e97c4 100644
--- a/drivers/hwtracing/coresight/coresight-cti-core.c
+++ b/drivers/hwtracing/coresight/coresight-cti-core.c
@@ -21,6 +21,55 @@
 
 #include "coresight-priv.h"
 #include "coresight-cti.h"
+#include "qcom-cti.h"
+
+static const u32 cti_normal_offset[] = {
+	[INDEX_CTIINTACK]		= CTIINTACK,
+	[INDEX_CTIAPPSET]		= CTIAPPSET,
+	[INDEX_CTIAPPCLEAR]		= CTIAPPCLEAR,
+	[INDEX_CTIAPPPULSE]		= CTIAPPPULSE,
+	[INDEX_CTIINEN]			= CTIINEN(0),
+	[INDEX_CTIOUTEN]		= CTIOUTEN(0),
+	[INDEX_CTITRIGINSTATUS]		= CTITRIGINSTATUS,
+	[INDEX_CTITRIGOUTSTATUS]	= CTITRIGOUTSTATUS,
+	[INDEX_CTICHINSTATUS]		= CTICHINSTATUS,
+	[INDEX_CTICHOUTSTATUS]		= CTICHOUTSTATUS,
+	[INDEX_CTIGATE]			= CTIGATE,
+	[INDEX_ASICCTL]			= ASICCTL,
+	[INDEX_ITCHINACK]		= ITCHINACK,
+	[INDEX_ITTRIGINACK]		= ITTRIGINACK,
+	[INDEX_ITCHOUT]			= ITCHOUT,
+	[INDEX_ITTRIGOUT]		= ITTRIGOUT,
+	[INDEX_ITCHOUTACK]		= ITCHOUTACK,
+	[INDEX_ITTRIGOUTACK]		= ITTRIGOUTACK,
+	[INDEX_ITCHIN]			= ITCHIN,
+	[INDEX_ITTRIGIN]		= ITTRIGIN,
+	[INDEX_ITCTRL]			= CORESIGHT_ITCTRL,
+};
+
+static const u32 cti_extended_offset[] = {
+	[INDEX_CTIINTACK]		= QCOM_CTIINTACK,
+	[INDEX_CTIAPPSET]		= QCOM_CTIAPPSET,
+	[INDEX_CTIAPPCLEAR]		= QCOM_CTIAPPCLEAR,
+	[INDEX_CTIAPPPULSE]		= QCOM_CTIAPPPULSE,
+	[INDEX_CTIINEN]			= QCOM_CTIINEN,
+	[INDEX_CTIOUTEN]		= QCOM_CTIOUTEN,
+	[INDEX_CTITRIGINSTATUS]		= QCOM_CTITRIGINSTATUS,
+	[INDEX_CTITRIGOUTSTATUS]	= QCOM_CTITRIGOUTSTATUS,
+	[INDEX_CTICHINSTATUS]		= QCOM_CTICHINSTATUS,
+	[INDEX_CTICHOUTSTATUS]		= QCOM_CTICHOUTSTATUS,
+	[INDEX_CTIGATE]			= QCOM_CTIGATE,
+	[INDEX_ASICCTL]			= QCOM_ASICCTL,
+	[INDEX_ITCHINACK]		= QCOM_ITCHINACK,
+	[INDEX_ITTRIGINACK]		= QCOM_ITTRIGINACK,
+	[INDEX_ITCHOUT]			= QCOM_ITCHOUT,
+	[INDEX_ITTRIGOUT]		= QCOM_ITTRIGOUT,
+	[INDEX_ITCHOUTACK]		= QCOM_ITCHOUTACK,
+	[INDEX_ITTRIGOUTACK]		= QCOM_ITTRIGOUTACK,
+	[INDEX_ITCHIN]			= QCOM_ITCHIN,
+	[INDEX_ITTRIGIN]		= QCOM_ITTRIGIN,
+	[INDEX_ITCTRL]			= CORESIGHT_ITCTRL,
+};
 
 /*
  * CTI devices can be associated with a PE, or be connected to CoreSight
@@ -70,15 +119,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
 
 	/* write the CTI trigger registers */
 	for (i = 0; i < config->nr_trig_max; i++) {
-		writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
+		writel_relaxed(config->ctiinen[i],
+			       drvdata->base + cti_offset(drvdata, INDEX_CTIINEN, i));
 		writel_relaxed(config->ctiouten[i],
-			       drvdata->base + CTIOUTEN(i));
+			       drvdata->base + cti_offset(drvdata, INDEX_CTIOUTEN, i));
 	}
 
 	/* other regs */
-	writel_relaxed(config->ctigate, drvdata->base + CTIGATE);
-	writel_relaxed(config->asicctl, drvdata->base + ASICCTL);
-	writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
+	writel_relaxed(config->ctigate, drvdata->base + cti_offset(drvdata, INDEX_CTIGATE, 0));
+	writel_relaxed(config->asicctl, drvdata->base + cti_offset(drvdata, INDEX_ASICCTL, 0));
+	writel_relaxed(config->ctiappset, drvdata->base + cti_offset(drvdata, INDEX_CTIAPPSET, 0));
 
 	/* re-enable CTI */
 	writel_relaxed(1, drvdata->base + CTICONTROL);
@@ -214,6 +264,9 @@ void cti_write_intack(struct device *dev, u32 ackval)
 /* DEVID[19:16] - number of CTM channels */
 #define CTI_DEVID_CTMCHANNELS(devid_val) ((int) BMVAL(devid_val, 16, 19))
 
+/* DEVARCH[31:21] - ARCHITECT */
+#define CTI_DEVARCH_ARCHITECT(devarch_val) ((int)BMVAL(devarch_val, 21, 31))
+
 static int cti_set_default_config(struct device *dev,
 				  struct cti_drvdata *drvdata)
 {
@@ -394,8 +447,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
 
 	/* update the local register values */
 	chan_bitmask = BIT(channel_idx);
-	reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :
-		      CTIOUTEN(trigger_idx));
+	reg_offset = (direction == CTI_TRIG_IN ? cti_offset(drvdata, INDEX_CTIINEN, trigger_idx) :
+			cti_offset(drvdata, INDEX_CTIOUTEN, trigger_idx));
 
 	raw_spin_lock(&drvdata->spinlock);
 
@@ -479,19 +532,19 @@ int cti_channel_setop(struct device *dev, enum cti_chan_set_op op,
 	case CTI_CHAN_SET:
 		config->ctiappset |= chan_bitmask;
 		reg_value  = config->ctiappset;
-		reg_offset = CTIAPPSET;
+		reg_offset = cti_offset(drvdata, INDEX_CTIAPPSET, 0);
 		break;
 
 	case CTI_CHAN_CLR:
 		config->ctiappset &= ~chan_bitmask;
 		reg_value = chan_bitmask;
-		reg_offset = CTIAPPCLEAR;
+		reg_offset = cti_offset(drvdata, INDEX_CTIAPPCLEAR, 0);
 		break;
 
 	case CTI_CHAN_PULSE:
 		config->ctiappset &= ~chan_bitmask;
 		reg_value = chan_bitmask;
-		reg_offset = CTIAPPPULSE;
+		reg_offset = cti_offset(drvdata, INDEX_CTIAPPPULSE, 0);
 		break;
 
 	default:
@@ -895,6 +948,7 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
 	struct coresight_desc cti_desc;
 	struct coresight_platform_data *pdata = NULL;
 	struct resource *res = &adev->res;
+	u32 devarch;
 
 	/* driver data*/
 	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
@@ -981,9 +1035,28 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
 	drvdata->csdev_release = drvdata->csdev->dev.release;
 	drvdata->csdev->dev.release = cti_device_release;
 
+	/* check architect value*/
+	devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);
+	if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {
+		drvdata->subtype = QCOM_CTI;
+		drvdata->offsets = cti_extended_offset;
+		/*
+		 * QCOM CTI does not implement Claimtag functionality as
+		 * per CoreSight specification, but its CLAIMSET register
+		 * is incorrectly initialized to 0xF. This can mislead
+		 * tools or drivers into thinking the component is claimed.
+		 *
+		 * Reset CLAIMSET to 0 to reflect that no claims are active.
+		 */
+		writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
+	} else {
+		drvdata->subtype = ARM_STD_CTI;
+		drvdata->offsets = cti_normal_offset;
+	}
+
 	/* all done - dec pm refcount */
 	pm_runtime_put(&adev->dev);
-	dev_info(&drvdata->csdev->dev, "CTI initialized\n");
+	dev_info(&drvdata->csdev->dev, "CTI initialized; subtype=%d\n", drvdata->subtype);
 	return 0;
 
 pm_release:
diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
index a9df77215141..12a495382999 100644
--- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c
+++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
@@ -172,9 +172,8 @@ static struct attribute *coresight_cti_attrs[] = {
 
 /* register based attributes */
 
-/* Read registers with power check only (no enable check). */
-static ssize_t coresight_cti_reg_show(struct device *dev,
-			   struct device_attribute *attr, char *buf)
+static ssize_t coresight_cti_mgmt_reg_show(struct device *dev,
+					   struct device_attribute *attr, char *buf)
 {
 	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
 	struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
@@ -189,6 +188,40 @@ static ssize_t coresight_cti_reg_show(struct device *dev,
 	return sysfs_emit(buf, "0x%x\n", val);
 }
 
+/* Read registers with power check only (no enable check). */
+static ssize_t coresight_cti_reg_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
+	u32 idx, val = 0;
+
+	pm_runtime_get_sync(dev->parent);
+	raw_spin_lock(&drvdata->spinlock);
+	idx = drvdata->config.ext_reg_sel;
+	if (drvdata->config.hw_powered) {
+		switch (cti_attr->off) {
+		case INDEX_CTITRIGINSTATUS:
+		case INDEX_CTITRIGOUTSTATUS:
+		case INDEX_ITTRIGINACK:
+		case INDEX_ITTRIGOUT:
+		case INDEX_ITTRIGOUTACK:
+		case INDEX_ITTRIGIN:
+			val = readl_relaxed(drvdata->base +
+					    cti_offset(drvdata, cti_attr->off, idx));
+			break;
+
+		default:
+			val = readl_relaxed(drvdata->base + cti_offset(drvdata, cti_attr->off, 0));
+			break;
+		}
+	}
+
+	raw_spin_unlock(&drvdata->spinlock);
+	pm_runtime_put_sync(dev->parent);
+	return sysfs_emit(buf, "0x%x\n", val);
+}
+
 /* Write registers with power check only (no enable check). */
 static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
 						      struct device_attribute *attr,
@@ -197,19 +230,39 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
 	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
 	struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
 	unsigned long val = 0;
+	u32 idx;
 
 	if (kstrtoul(buf, 0, &val))
 		return -EINVAL;
 
 	pm_runtime_get_sync(dev->parent);
 	raw_spin_lock(&drvdata->spinlock);
-	if (drvdata->config.hw_powered)
-		cti_write_single_reg(drvdata, cti_attr->off, val);
+	idx = drvdata->config.ext_reg_sel;
+	if (drvdata->config.hw_powered) {
+		switch (cti_attr->off) {
+		case INDEX_ITTRIGINACK:
+		case INDEX_ITTRIGOUT:
+			cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, idx), val);
+			break;
+
+		default:
+			cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, 0), val);
+			break;
+		}
+	}
 	raw_spin_unlock(&drvdata->spinlock);
 	pm_runtime_put_sync(dev->parent);
 	return size;
 }
 
+#define coresight_cti_mgmt_reg(name, offset)	                        \
+	(&((struct cs_off_attribute[]) {                                \
+	   {                                                            \
+		__ATTR(name, 0444, coresight_cti_mgmt_reg_show, NULL),  \
+		offset							\
+	   }								\
+	})[0].attr.attr)
+
 #define coresight_cti_reg(name, offset)					\
 	(&((struct cs_off_attribute[]) {				\
 	   {								\
@@ -237,17 +290,17 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
 
 /* coresight management registers */
 static struct attribute *coresight_cti_mgmt_attrs[] = {
-	coresight_cti_reg(devaff0, CTIDEVAFF0),
-	coresight_cti_reg(devaff1, CTIDEVAFF1),
-	coresight_cti_reg(authstatus, CORESIGHT_AUTHSTATUS),
-	coresight_cti_reg(devarch, CORESIGHT_DEVARCH),
-	coresight_cti_reg(devid, CORESIGHT_DEVID),
-	coresight_cti_reg(devtype, CORESIGHT_DEVTYPE),
-	coresight_cti_reg(pidr0, CORESIGHT_PERIPHIDR0),
-	coresight_cti_reg(pidr1, CORESIGHT_PERIPHIDR1),
-	coresight_cti_reg(pidr2, CORESIGHT_PERIPHIDR2),
-	coresight_cti_reg(pidr3, CORESIGHT_PERIPHIDR3),
-	coresight_cti_reg(pidr4, CORESIGHT_PERIPHIDR4),
+	coresight_cti_mgmt_reg(devaff0, CTIDEVAFF0),
+	coresight_cti_mgmt_reg(devaff1, CTIDEVAFF1),
+	coresight_cti_mgmt_reg(authstatus, CORESIGHT_AUTHSTATUS),
+	coresight_cti_mgmt_reg(devarch, CORESIGHT_DEVARCH),
+	coresight_cti_mgmt_reg(devid, CORESIGHT_DEVID),
+	coresight_cti_mgmt_reg(devtype, CORESIGHT_DEVTYPE),
+	coresight_cti_mgmt_reg(pidr0, CORESIGHT_PERIPHIDR0),
+	coresight_cti_mgmt_reg(pidr1, CORESIGHT_PERIPHIDR1),
+	coresight_cti_mgmt_reg(pidr2, CORESIGHT_PERIPHIDR2),
+	coresight_cti_mgmt_reg(pidr3, CORESIGHT_PERIPHIDR3),
+	coresight_cti_mgmt_reg(pidr4, CORESIGHT_PERIPHIDR4),
 	NULL,
 };
 
@@ -258,13 +311,15 @@ static struct attribute *coresight_cti_mgmt_attrs[] = {
  * If inaccessible & pcached_val not NULL then show cached value.
  */
 static ssize_t cti_reg32_show(struct device *dev, char *buf,
-			      u32 *pcached_val, int reg_offset)
+			      u32 *pcached_val, int index)
 {
 	u32 val = 0;
 	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
 	struct cti_config *config = &drvdata->config;
+	int reg_offset;
 
 	raw_spin_lock(&drvdata->spinlock);
+	reg_offset = cti_offset(drvdata, index, 0);
 	if ((reg_offset >= 0) && cti_active(config)) {
 		CS_UNLOCK(drvdata->base);
 		val = readl_relaxed(drvdata->base + reg_offset);
@@ -284,11 +339,12 @@ static ssize_t cti_reg32_show(struct device *dev, char *buf,
  * if reg_offset >= 0 then write through if enabled.
  */
 static ssize_t cti_reg32_store(struct device *dev, const char *buf,
-			       size_t size, u32 *pcached_val, int reg_offset)
+			       size_t size, u32 *pcached_val, int index)
 {
 	unsigned long val;
 	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
 	struct cti_config *config = &drvdata->config;
+	int reg_offset;
 
 	if (kstrtoul(buf, 0, &val))
 		return -EINVAL;
@@ -298,6 +354,7 @@ static ssize_t cti_reg32_store(struct device *dev, const char *buf,
 	if (pcached_val)
 		*pcached_val = (u32)val;
 
+	reg_offset = cti_offset(drvdata, index, 0);
 	/* write through if offset and enabled */
 	if ((reg_offset >= 0) && cti_active(config))
 		cti_write_single_reg(drvdata, reg_offset, val);
@@ -306,14 +363,14 @@ static ssize_t cti_reg32_store(struct device *dev, const char *buf,
 }
 
 /* Standard macro for simple rw cti config registers */
-#define cti_config_reg32_rw(name, cfgname, offset)			\
+#define cti_config_reg32_rw(name, cfgname, index)			\
 static ssize_t name##_show(struct device *dev,				\
 			   struct device_attribute *attr,		\
 			   char *buf)					\
 {									\
 	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);	\
 	return cti_reg32_show(dev, buf,					\
-			      &drvdata->config.cfgname, offset);	\
+			      &drvdata->config.cfgname, index);		\
 }									\
 									\
 static ssize_t name##_store(struct device *dev,				\
@@ -322,7 +379,7 @@ static ssize_t name##_store(struct device *dev,				\
 {									\
 	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);	\
 	return cti_reg32_store(dev, buf, size,				\
-			       &drvdata->config.cfgname, offset);	\
+			       &drvdata->config.cfgname, index);	\
 }									\
 static DEVICE_ATTR_RW(name)
 
@@ -356,6 +413,46 @@ static ssize_t inout_sel_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(inout_sel);
 
+/*
+ * QCOM CTI supports up to 128 triggers, there are 6 registers need to be
+ * expanded to up to 4 instances, and ext_reg_sel can be used to indicate
+ * which one is in use.
+ * CTITRIGINSTATUS, CTITRIGOUTSTATUS,
+ * ITTRIGIN, ITTRIGOUT,
+ * ITTRIGINACK, ITTRIGOUTACK.
+ */
+static ssize_t ext_reg_sel_show(struct device *dev,
+				struct device_attribute *attr,
+				char *buf)
+{
+	u32 val;
+	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	raw_spin_lock(&drvdata->spinlock);
+	val = drvdata->config.ext_reg_sel;
+	raw_spin_unlock(&drvdata->spinlock);
+	return sprintf(buf, "%d\n", val);
+}
+
+static ssize_t ext_reg_sel_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t size)
+{
+	unsigned long val;
+	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	if (kstrtoul(buf, 0, &val))
+		return -EINVAL;
+	if (val > ((drvdata->config.nr_trig_max + 31) / 32 - 1))
+		return -EINVAL;
+
+	raw_spin_lock(&drvdata->spinlock);
+	drvdata->config.ext_reg_sel = val;
+	raw_spin_unlock(&drvdata->spinlock);
+	return size;
+}
+static DEVICE_ATTR_RW(ext_reg_sel);
+
 static ssize_t inen_show(struct device *dev,
 			 struct device_attribute *attr,
 			 char *buf)
@@ -389,7 +486,7 @@ static ssize_t inen_store(struct device *dev,
 
 	/* write through if enabled */
 	if (cti_active(config))
-		cti_write_single_reg(drvdata, CTIINEN(index), val);
+		cti_write_single_reg(drvdata, cti_offset(drvdata, INDEX_CTIINEN, index), val);
 	raw_spin_unlock(&drvdata->spinlock);
 	return size;
 }
@@ -428,7 +525,7 @@ static ssize_t outen_store(struct device *dev,
 
 	/* write through if enabled */
 	if (cti_active(config))
-		cti_write_single_reg(drvdata, CTIOUTEN(index), val);
+		cti_write_single_reg(drvdata, cti_offset(drvdata, INDEX_CTIOUTEN, index), val);
 	raw_spin_unlock(&drvdata->spinlock);
 	return size;
 }
@@ -448,9 +545,9 @@ static ssize_t intack_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(intack);
 
-cti_config_reg32_rw(gate, ctigate, CTIGATE);
-cti_config_reg32_rw(asicctl, asicctl, ASICCTL);
-cti_config_reg32_rw(appset, ctiappset, CTIAPPSET);
+cti_config_reg32_rw(gate, ctigate, INDEX_CTIGATE);
+cti_config_reg32_rw(asicctl, asicctl, INDEX_ASICCTL);
+cti_config_reg32_rw(appset, ctiappset, INDEX_CTIAPPSET);
 
 static ssize_t appclear_store(struct device *dev,
 			      struct device_attribute *attr,
@@ -504,6 +601,7 @@ static DEVICE_ATTR_WO(apppulse);
  */
 static struct attribute *coresight_cti_regs_attrs[] = {
 	&dev_attr_inout_sel.attr,
+	&dev_attr_ext_reg_sel.attr,
 	&dev_attr_inen.attr,
 	&dev_attr_outen.attr,
 	&dev_attr_gate.attr,
@@ -512,20 +610,20 @@ static struct attribute *coresight_cti_regs_attrs[] = {
 	&dev_attr_appset.attr,
 	&dev_attr_appclear.attr,
 	&dev_attr_apppulse.attr,
-	coresight_cti_reg(triginstatus, CTITRIGINSTATUS),
-	coresight_cti_reg(trigoutstatus, CTITRIGOUTSTATUS),
-	coresight_cti_reg(chinstatus, CTICHINSTATUS),
-	coresight_cti_reg(choutstatus, CTICHOUTSTATUS),
+	coresight_cti_reg(triginstatus, INDEX_CTITRIGINSTATUS),
+	coresight_cti_reg(trigoutstatus, INDEX_CTITRIGOUTSTATUS),
+	coresight_cti_reg(chinstatus, INDEX_CTICHINSTATUS),
+	coresight_cti_reg(choutstatus, INDEX_CTICHOUTSTATUS),
 #ifdef CONFIG_CORESIGHT_CTI_INTEGRATION_REGS
-	coresight_cti_reg_rw(itctrl, CORESIGHT_ITCTRL),
-	coresight_cti_reg(ittrigin, ITTRIGIN),
-	coresight_cti_reg(itchin, ITCHIN),
-	coresight_cti_reg_rw(ittrigout, ITTRIGOUT),
-	coresight_cti_reg_rw(itchout, ITCHOUT),
-	coresight_cti_reg(itchoutack, ITCHOUTACK),
-	coresight_cti_reg(ittrigoutack, ITTRIGOUTACK),
-	coresight_cti_reg_wo(ittriginack, ITTRIGINACK),
-	coresight_cti_reg_wo(itchinack, ITCHINACK),
+	coresight_cti_reg_rw(itctrl, INDEX_ITCTRL),
+	coresight_cti_reg(ittrigin, INDEX_ITTRIGIN),
+	coresight_cti_reg(itchin, INDEX_ITCHIN),
+	coresight_cti_reg_rw(ittrigout, INDEX_ITTRIGOUT),
+	coresight_cti_reg_rw(itchout, INDEX_ITCHOUT),
+	coresight_cti_reg(itchoutack, INDEX_ITCHOUTACK),
+	coresight_cti_reg(ittrigoutack, INDEX_ITTRIGOUTACK),
+	coresight_cti_reg_wo(ittriginack, INDEX_ITTRIGINACK),
+	coresight_cti_reg_wo(itchinack, INDEX_ITCHINACK),
 #endif
 	NULL,
 };
@@ -740,6 +838,7 @@ static ssize_t chan_xtrigs_reset_store(struct device *dev,
 	config->ctiappset = 0;
 	config->ctiinout_sel = 0;
 	config->xtrig_rchan_sel = 0;
+	config->ext_reg_sel = 0;
 
 	/* if enabled then write through */
 	if (cti_active(config))
diff --git a/drivers/hwtracing/coresight/coresight-cti.h b/drivers/hwtracing/coresight/coresight-cti.h
index e7b88b07cffe..fdd6d8892a95 100644
--- a/drivers/hwtracing/coresight/coresight-cti.h
+++ b/drivers/hwtracing/coresight/coresight-cti.h
@@ -57,7 +57,38 @@ struct fwnode_handle;
  * Max of in and out defined in the DEVID register.
  * - pick up actual number used from .dts parameters if present.
  */
-#define CTIINOUTEN_MAX		32
+#define CTIINOUTEN_MAX		128
+
+/* Qcom CTI supports up to 128 triggers*/
+enum cti_subtype {
+	ARM_STD_CTI,
+	QCOM_CTI,
+};
+
+/* These registers are remapped in Qcom CTI*/
+enum cti_offset_index {
+	INDEX_CTIINTACK,
+	INDEX_CTIAPPSET,
+	INDEX_CTIAPPCLEAR,
+	INDEX_CTIAPPPULSE,
+	INDEX_CTIINEN,
+	INDEX_CTIOUTEN,
+	INDEX_CTITRIGINSTATUS,
+	INDEX_CTITRIGOUTSTATUS,
+	INDEX_CTICHINSTATUS,
+	INDEX_CTICHOUTSTATUS,
+	INDEX_CTIGATE,
+	INDEX_ASICCTL,
+	INDEX_ITCHINACK,
+	INDEX_ITTRIGINACK,
+	INDEX_ITCHOUT,
+	INDEX_ITTRIGOUT,
+	INDEX_ITCHOUTACK,
+	INDEX_ITTRIGOUTACK,
+	INDEX_ITCHIN,
+	INDEX_ITTRIGIN,
+	INDEX_ITCTRL,
+};
 
 /**
  * Group of related trigger signals
@@ -149,6 +180,9 @@ struct cti_config {
 	bool trig_filter_enable;
 	u8 xtrig_rchan_sel;
 
+	/* qcom_cti regs' index */
+	u8 ext_reg_sel;
+
 	/* cti cross trig programmable regs */
 	u8 ctiinout_sel;
 	u32 ctiappset;
@@ -181,6 +215,8 @@ struct cti_drvdata {
 	struct cti_config config;
 	struct list_head node;
 	void (*csdev_release)(struct device *dev);
+	enum cti_subtype subtype;
+	const u32 *offsets;
 };
 
 /*
@@ -235,6 +271,11 @@ struct coresight_platform_data *
 coresight_cti_get_platform_data(struct device *dev);
 const char *cti_plat_get_node_name(struct fwnode_handle *fwnode);
 
+static inline u32 cti_offset(struct cti_drvdata *drvdata, int index, int num)
+{
+	return drvdata->offsets[index] + 4 * num;
+}
+
 /* cti powered and enabled */
 static inline bool cti_active(struct cti_config *cfg)
 {
diff --git a/drivers/hwtracing/coresight/qcom-cti.h b/drivers/hwtracing/coresight/qcom-cti.h
new file mode 100644
index 000000000000..3c46317dd81f
--- /dev/null
+++ b/drivers/hwtracing/coresight/qcom-cti.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ */
+
+#define ARCHITECT_QCOM 0x477
+
+/* CTI programming registers */
+#define	QCOM_CTIINTACK		0x020
+#define	QCOM_CTIAPPSET		0x004
+#define	QCOM_CTIAPPCLEAR	0x008
+#define	QCOM_CTIAPPPULSE	0x00C
+#define	QCOM_CTIINEN		0x400
+#define	QCOM_CTIOUTEN		0x800
+#define	QCOM_CTITRIGINSTATUS	0x040
+#define	QCOM_CTITRIGOUTSTATUS	0x060
+#define	QCOM_CTICHINSTATUS	0x080
+#define	QCOM_CTICHOUTSTATUS	0x084
+#define	QCOM_CTIGATE		0x088
+#define	QCOM_ASICCTL		0x08c
+/* Integration test registers */
+#define	QCOM_ITCHINACK		0xE70
+#define	QCOM_ITTRIGINACK	0xE80
+#define	QCOM_ITCHOUT		0xE74
+#define	QCOM_ITTRIGOUT		0xEA0
+#define	QCOM_ITCHOUTACK		0xE78
+#define	QCOM_ITTRIGOUTACK	0xEC0
+#define	QCOM_ITCHIN		0xE7C
+#define	QCOM_ITTRIGIN		0xEE0

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-02  6:42 ` [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support Yingchao Deng
@ 2025-12-03 18:29   ` Leo Yan
  2025-12-04  8:38     ` Leo Yan
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Leo Yan @ 2025-12-03 18:29 UTC (permalink / raw)
  To: Yingchao Deng
  Cc: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

On Tue, Dec 02, 2025 at 02:42:21PM +0800, Yingchao Deng wrote:
> The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI.
> It allows a debugger to send to trigger events to a processor or to send
> a trigger event to one or more processors when a trigger event occurs
> on another processor on the same SoC, or even between SoCs. Qualcomm CTI
> implementation differs from the standard CTI in the following aspects:
> 
> 1. The number of supported triggers is extended to 128.
> 2. Several register offsets differ from the CoreSight specification.

I apologize for my late review of this series.  For easier maintenance
later, I have several comments for register access.

[...]

> +static const u32 cti_normal_offset[] = {
> +	[INDEX_CTIINTACK]		= CTIINTACK,
> +	[INDEX_CTIAPPSET]		= CTIAPPSET,
> +	[INDEX_CTIAPPCLEAR]		= CTIAPPCLEAR,
> +	[INDEX_CTIAPPPULSE]		= CTIAPPPULSE,
> +	[INDEX_CTIINEN]			= CTIINEN(0),
> +	[INDEX_CTIOUTEN]		= CTIOUTEN(0),

I prefer to update the these two macros to CTIINENn and CTIOUTENn, as
later we will not use CTIINEN(n) and CTIOUTEN(n) anymore.

> +	[INDEX_CTITRIGINSTATUS]		= CTITRIGINSTATUS,
> +	[INDEX_CTITRIGOUTSTATUS]	= CTITRIGOUTSTATUS,
> +	[INDEX_CTICHINSTATUS]		= CTICHINSTATUS,
> +	[INDEX_CTICHOUTSTATUS]		= CTICHOUTSTATUS,
> +	[INDEX_CTIGATE]			= CTIGATE,
> +	[INDEX_ASICCTL]			= ASICCTL,
> +	[INDEX_ITCHINACK]		= ITCHINACK,
> +	[INDEX_ITTRIGINACK]		= ITTRIGINACK,
> +	[INDEX_ITCHOUT]			= ITCHOUT,
> +	[INDEX_ITTRIGOUT]		= ITTRIGOUT,
> +	[INDEX_ITCHOUTACK]		= ITCHOUTACK,
> +	[INDEX_ITTRIGOUTACK]		= ITTRIGOUTACK,
> +	[INDEX_ITCHIN]			= ITCHIN,
> +	[INDEX_ITTRIGIN]		= ITTRIGIN,
> +	[INDEX_ITCTRL]			= CORESIGHT_ITCTRL,
> +};
> +
> +static const u32 cti_extended_offset[] = {
> +	[INDEX_CTIINTACK]		= QCOM_CTIINTACK,
> +	[INDEX_CTIAPPSET]		= QCOM_CTIAPPSET,
> +	[INDEX_CTIAPPCLEAR]		= QCOM_CTIAPPCLEAR,
> +	[INDEX_CTIAPPPULSE]		= QCOM_CTIAPPPULSE,
> +	[INDEX_CTIINEN]			= QCOM_CTIINEN,
> +	[INDEX_CTIOUTEN]		= QCOM_CTIOUTEN,
> +	[INDEX_CTITRIGINSTATUS]		= QCOM_CTITRIGINSTATUS,
> +	[INDEX_CTITRIGOUTSTATUS]	= QCOM_CTITRIGOUTSTATUS,
> +	[INDEX_CTICHINSTATUS]		= QCOM_CTICHINSTATUS,
> +	[INDEX_CTICHOUTSTATUS]		= QCOM_CTICHOUTSTATUS,
> +	[INDEX_CTIGATE]			= QCOM_CTIGATE,
> +	[INDEX_ASICCTL]			= QCOM_ASICCTL,
> +	[INDEX_ITCHINACK]		= QCOM_ITCHINACK,
> +	[INDEX_ITTRIGINACK]		= QCOM_ITTRIGINACK,
> +	[INDEX_ITCHOUT]			= QCOM_ITCHOUT,
> +	[INDEX_ITTRIGOUT]		= QCOM_ITTRIGOUT,
> +	[INDEX_ITCHOUTACK]		= QCOM_ITCHOUTACK,
> +	[INDEX_ITTRIGOUTACK]		= QCOM_ITTRIGOUTACK,
> +	[INDEX_ITCHIN]			= QCOM_ITCHIN,
> +	[INDEX_ITTRIGIN]		= QCOM_ITTRIGIN,
> +	[INDEX_ITCTRL]			= CORESIGHT_ITCTRL,
> +};

I saw CTI registers are within 4KiB (0x1000), we can don't convert
standard regiserts and only convert to QCOM register based on the
standard ones.  So you can drop the cti_normal_offset strucuture and
only have a cti_reg_qcom_offset[] struct:

  static const u32 cti_extended_offset[] = {
  	[CTIINTACK]		= QCOM_CTIINTACK,
  	[CTIAPPSET]		= QCOM_CTIAPPSET,
  	[CTIAPPCLEAR]		= QCOM_CTIAPPCLEAR,
  	[CTIAPPPULSE]		= QCOM_CTIAPPPULSE,
  	[CTIINEN]		= QCOM_CTIINEN,
        ...
  };

Then you could create two helpers for register address:

    static void __iomem *cti_reg_addr_with_nr(struct cti_drvdata *drvdata,
                                              u32 reg, u32 nr)
    {
        /* convert to qcom specific offset */
        if (unlikely(drvdata->is_qcom_cti))
            reg = cti_extended_offset[reg];

        return drvdata->base + reg + sizeof(u32) * nr;
    }

    static void __iomem *cti_reg_addr(struct cti_drvdata *drvdata, u32 reg)
    {
        return cti_reg_addr_with_nr(drvdata, reg, 0);
    }

>  /*
>   * CTI devices can be associated with a PE, or be connected to CoreSight
> @@ -70,15 +119,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
>  
>  	/* write the CTI trigger registers */
>  	for (i = 0; i < config->nr_trig_max; i++) {
> -		writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
> +		writel_relaxed(config->ctiinen[i],
> +			       drvdata->base + cti_offset(drvdata, INDEX_CTIINEN, i));

writel_relaxed(config->ctiinen[i],
               cti_reg_addr_with_nr(drvdata, CTIINENn, i));

And apply for the same cases below.

>  	/* other regs */
> -	writel_relaxed(config->ctigate, drvdata->base + CTIGATE);
> -	writel_relaxed(config->asicctl, drvdata->base + ASICCTL);
> -	writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
> +	writel_relaxed(config->ctigate, drvdata->base + cti_offset(drvdata, INDEX_CTIGATE, 0));

writel_relaxed(config->ctigate, cti_reg_addr(drvdata, CTIGATE));

And apply for the same cases below.

[...]

> @@ -394,8 +447,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
>  
>  	/* update the local register values */
>  	chan_bitmask = BIT(channel_idx);
> -	reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :
> -		      CTIOUTEN(trigger_idx));
> +	reg_offset = (direction == CTI_TRIG_IN ? cti_offset(drvdata, INDEX_CTIINEN, trigger_idx) :
> +			cti_offset(drvdata, INDEX_CTIOUTEN, trigger_idx));

For readable, we can improve a bit with code alignment:

  reg_offset = (direction == CTI_TRIG_IN) ? cti_reg_addr_with_nr(drvdata, CTIINENn, trigger_idx) :
                                            cti_reg_addr_with_nr(drvdata, CTIOUTENn, trigger_idx);

[...]

> @@ -981,9 +1035,28 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
>  	drvdata->csdev_release = drvdata->csdev->dev.release;
>  	drvdata->csdev->dev.release = cti_device_release;
>  
> +	/* check architect value*/
> +	devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);
> +	if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {
> +		drvdata->subtype = QCOM_CTI;
> +		drvdata->offsets = cti_extended_offset;

As a result, we can only set the is_qcom_cti flag:

  drvdata->is_qcom_cti = true;

> +		/*
> +		 * QCOM CTI does not implement Claimtag functionality as
> +		 * per CoreSight specification, but its CLAIMSET register
> +		 * is incorrectly initialized to 0xF. This can mislead
> +		 * tools or drivers into thinking the component is claimed.
> +		 *
> +		 * Reset CLAIMSET to 0 to reflect that no claims are active.
> +		 */
> +		writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);

I am confused for this.  If QCOM CTI does not implement claim tag,
then what is the designed register at the offset CORESIGHT_CLAIMSET?

Should you bypass all claim tag related operations for QCOM CTI case?
(I don't see you touch anything for claim and declaim tags).

> +	} else {
> +		drvdata->subtype = ARM_STD_CTI;
> +		drvdata->offsets = cti_normal_offset;
> +	}
> +
>  	/* all done - dec pm refcount */
>  	pm_runtime_put(&adev->dev);
> -	dev_info(&drvdata->csdev->dev, "CTI initialized\n");
> +	dev_info(&drvdata->csdev->dev, "CTI initialized; subtype=%d\n", drvdata->subtype);

dev_info(&drvdata->csdev->dev, "%s CTI initialized\n",
         drvdata->is_qcom_cti ? "QCOM" : "");

>  	return 0;
>  
>  pm_release:
> diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> index a9df77215141..12a495382999 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> @@ -172,9 +172,8 @@ static struct attribute *coresight_cti_attrs[] = {
>  
>  /* register based attributes */
>  
> -/* Read registers with power check only (no enable check). */
> -static ssize_t coresight_cti_reg_show(struct device *dev,
> -			   struct device_attribute *attr, char *buf)
> +static ssize_t coresight_cti_mgmt_reg_show(struct device *dev,
> +					   struct device_attribute *attr, char *buf)
>  {
>  	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
>  	struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> @@ -189,6 +188,40 @@ static ssize_t coresight_cti_reg_show(struct device *dev,
>  	return sysfs_emit(buf, "0x%x\n", val);
>  }
>  
> +/* Read registers with power check only (no enable check). */
> +static ssize_t coresight_cti_reg_show(struct device *dev,
> +				      struct device_attribute *attr, char *buf)
> +{
> +	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +	struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> +	u32 idx, val = 0;
> +
> +	pm_runtime_get_sync(dev->parent);
> +	raw_spin_lock(&drvdata->spinlock);
> +	idx = drvdata->config.ext_reg_sel;
> +	if (drvdata->config.hw_powered) {
> +		switch (cti_attr->off) {
> +		case INDEX_CTITRIGINSTATUS:
> +		case INDEX_CTITRIGOUTSTATUS:
> +		case INDEX_ITTRIGINACK:
> +		case INDEX_ITTRIGOUT:
> +		case INDEX_ITTRIGOUTACK:
> +		case INDEX_ITTRIGIN:
> +			val = readl_relaxed(drvdata->base +
> +					    cti_offset(drvdata, cti_attr->off, idx));
> +			break;
> +
> +		default:
> +			val = readl_relaxed(drvdata->base + cti_offset(drvdata, cti_attr->off, 0));
> +			break;
> +		}
> +	}
> +
> +	raw_spin_unlock(&drvdata->spinlock);
> +	pm_runtime_put_sync(dev->parent);
> +	return sysfs_emit(buf, "0x%x\n", val);
> +}
> +
>  /* Write registers with power check only (no enable check). */
>  static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
>  						      struct device_attribute *attr,
> @@ -197,19 +230,39 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
>  	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
>  	struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
>  	unsigned long val = 0;
> +	u32 idx;
>  
>  	if (kstrtoul(buf, 0, &val))
>  		return -EINVAL;
>  
>  	pm_runtime_get_sync(dev->parent);
>  	raw_spin_lock(&drvdata->spinlock);
> -	if (drvdata->config.hw_powered)
> -		cti_write_single_reg(drvdata, cti_attr->off, val);
> +	idx = drvdata->config.ext_reg_sel;
> +	if (drvdata->config.hw_powered) {
> +		switch (cti_attr->off) {
> +		case INDEX_ITTRIGINACK:
> +		case INDEX_ITTRIGOUT:
> +			cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, idx), val);
> +			break;
> +
> +		default:
> +			cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, 0), val);
> +			break;
> +		}
> +	}

For both coresight_cti_reg_show() and coresight_cti_reg_store(), can
we always use "cti_attr->off" as the offset for regitser access?  I
mean we don't need the extra config.ext_reg_sel, eventually any
register we can calculate a offset for it.

>  	raw_spin_unlock(&drvdata->spinlock);
>  	pm_runtime_put_sync(dev->parent);
>  	return size;
>  }
>  
> +#define coresight_cti_mgmt_reg(name, offset)	                        \
> +	(&((struct cs_off_attribute[]) {                                \
> +	   {                                                            \
> +		__ATTR(name, 0444, coresight_cti_mgmt_reg_show, NULL),  \
> +		offset							\
> +	   }								\
> +	})[0].attr.attr)
> +
>  #define coresight_cti_reg(name, offset)					\
>  	(&((struct cs_off_attribute[]) {				\
>  	   {								\
> @@ -237,17 +290,17 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
>  
>  /* coresight management registers */
>  static struct attribute *coresight_cti_mgmt_attrs[] = {
> -	coresight_cti_reg(devaff0, CTIDEVAFF0),
> -	coresight_cti_reg(devaff1, CTIDEVAFF1),
> -	coresight_cti_reg(authstatus, CORESIGHT_AUTHSTATUS),
> -	coresight_cti_reg(devarch, CORESIGHT_DEVARCH),
> -	coresight_cti_reg(devid, CORESIGHT_DEVID),
> -	coresight_cti_reg(devtype, CORESIGHT_DEVTYPE),
> -	coresight_cti_reg(pidr0, CORESIGHT_PERIPHIDR0),
> -	coresight_cti_reg(pidr1, CORESIGHT_PERIPHIDR1),
> -	coresight_cti_reg(pidr2, CORESIGHT_PERIPHIDR2),
> -	coresight_cti_reg(pidr3, CORESIGHT_PERIPHIDR3),
> -	coresight_cti_reg(pidr4, CORESIGHT_PERIPHIDR4),
> +	coresight_cti_mgmt_reg(devaff0, CTIDEVAFF0),
> +	coresight_cti_mgmt_reg(devaff1, CTIDEVAFF1),
> +	coresight_cti_mgmt_reg(authstatus, CORESIGHT_AUTHSTATUS),
> +	coresight_cti_mgmt_reg(devarch, CORESIGHT_DEVARCH),
> +	coresight_cti_mgmt_reg(devid, CORESIGHT_DEVID),
> +	coresight_cti_mgmt_reg(devtype, CORESIGHT_DEVTYPE),
> +	coresight_cti_mgmt_reg(pidr0, CORESIGHT_PERIPHIDR0),
> +	coresight_cti_mgmt_reg(pidr1, CORESIGHT_PERIPHIDR1),
> +	coresight_cti_mgmt_reg(pidr2, CORESIGHT_PERIPHIDR2),
> +	coresight_cti_mgmt_reg(pidr3, CORESIGHT_PERIPHIDR3),
> +	coresight_cti_mgmt_reg(pidr4, CORESIGHT_PERIPHIDR4),

I don't see any benefit for updating from coresight_cti_reg() to
coresight_cti_mgmt_reg().  If really want to do this, should remove
the macro coresight_cti_reg()?

>  	NULL,
>  };
>  
> @@ -258,13 +311,15 @@ static struct attribute *coresight_cti_mgmt_attrs[] = {
>   * If inaccessible & pcached_val not NULL then show cached value.
>   */
>  static ssize_t cti_reg32_show(struct device *dev, char *buf,
> -			      u32 *pcached_val, int reg_offset)
> +			      u32 *pcached_val, int index)

We don't need to change anything for this.  The passed "reg_offset"
should be always a final offset, no matter for standard CTI or QCOM
case, the driver directly uses the offset for register access.

[...]

> +/*
> + * QCOM CTI supports up to 128 triggers, there are 6 registers need to be
> + * expanded to up to 4 instances, and ext_reg_sel can be used to indicate
> + * which one is in use.
> + * CTITRIGINSTATUS, CTITRIGOUTSTATUS,
> + * ITTRIGIN, ITTRIGOUT,
> + * ITTRIGINACK, ITTRIGOUTACK.
> + */
> +static ssize_t ext_reg_sel_show(struct device *dev,
> +				struct device_attribute *attr,
> +				char *buf)
> +{
> +	u32 val;
> +	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +
> +	raw_spin_lock(&drvdata->spinlock);
> +	val = drvdata->config.ext_reg_sel;
> +	raw_spin_unlock(&drvdata->spinlock);
> +	return sprintf(buf, "%d\n", val);
> +}
> +
> +static ssize_t ext_reg_sel_store(struct device *dev,
> +				 struct device_attribute *attr,
> +				 const char *buf, size_t size)
> +{
> +	unsigned long val;
> +	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +
> +	if (kstrtoul(buf, 0, &val))
> +		return -EINVAL;
> +	if (val > ((drvdata->config.nr_trig_max + 31) / 32 - 1))
> +		return -EINVAL;
> +
> +	raw_spin_lock(&drvdata->spinlock);
> +	drvdata->config.ext_reg_sel = val;
> +	raw_spin_unlock(&drvdata->spinlock);
> +	return size;
> +}

As said, I don't think the trigger register is any different from
other register access.  So the existed APIs would be sufficient.

As a result, we don't need to add two above functions.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-03 18:29   ` Leo Yan
@ 2025-12-04  8:38     ` Leo Yan
  2025-12-04  9:04       ` Mike Leach
  2025-12-04  9:07     ` Mike Leach
  2025-12-04  9:15     ` Mike Leach
  2 siblings, 1 reply; 26+ messages in thread
From: Leo Yan @ 2025-12-04  8:38 UTC (permalink / raw)
  To: Yingchao Deng, Mike Leach, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

On Wed, Dec 03, 2025 at 06:29:44PM +0000, Coresight ML wrote:

[...]

> > +/* Read registers with power check only (no enable check). */
> > +static ssize_t coresight_cti_reg_show(struct device *dev,
> > +				      struct device_attribute *attr, char *buf)
> > +{
> > +	struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> > +	struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> > +	u32 idx, val = 0;
> > +
> > +	pm_runtime_get_sync(dev->parent);
> > +	raw_spin_lock(&drvdata->spinlock);
> > +	idx = drvdata->config.ext_reg_sel;
> > +	if (drvdata->config.hw_powered) {
> > +		switch (cti_attr->off) {
> > +		case INDEX_CTITRIGINSTATUS:
> > +		case INDEX_CTITRIGOUTSTATUS:
> > +		case INDEX_ITTRIGINACK:
> > +		case INDEX_ITTRIGOUT:
> > +		case INDEX_ITTRIGOUTACK:
> > +		case INDEX_ITTRIGIN:

I read again and now I understand why you need "config.ext_reg_sel"
as an index for these expending registers.

I think you should extend attrs for the new adding registers:

  static struct attribute *coresight_cti_regs_attrs[] = {
      ...
      coresight_cti_reg(triginstatus, CTITRIGINSTATUS),
      /* Qcom CTI only for triginstatus1/2/3 */
      coresight_cti_reg(triginstatus1, CTITRIGINSTATUS + 0x4),
      coresight_cti_reg(triginstatus2, CTITRIGINSTATUS + 0x8),
      coresight_cti_reg(triginstatus3, CTITRIGINSTATUS + 0xc),
      ...
  }

Then, you can add a is_visible() in coresight_cti_regs_group:

  static umode_t coresight_cti_regs_is_visible(struct kobject *kobj,
                  struct attribute *attr, int n)
  {
          struct device *dev = container_of(kobj, struct device, kobj);
          struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);

          /* Mute QCOM CTI registers for standard CTI module */
          if (!drvdata->is_qcom_cti) {
              if (attr == &triginstatus1.attr ||
                  attr == &triginstatus2.attr ||
                  attr == &triginstatus3.attr)
                  return 0;
          }

          return attr->mode;
  }

  static const struct attribute_group coresight_cti_regs_group = {
          .attrs = coresight_cti_regs_attrs,
          .name = "regs",
          .is_visible = coresight_cti_regs_is_visible,
  };

Thanks,
Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-04  8:38     ` Leo Yan
@ 2025-12-04  9:04       ` Mike Leach
  2025-12-04 10:02         ` Leo Yan
  0 siblings, 1 reply; 26+ messages in thread
From: Mike Leach @ 2025-12-04  9:04 UTC (permalink / raw)
  To: Leo Yan
  Cc: Yingchao Deng, James Clark, Alexander Shishkin, Tingwei Zhang,
	quic_yingdeng, coresight, linux-arm-kernel, linux-kernel,
	linux-arm-msm, Jinlong Mao, Mao Jinlong

Hi,

On Thu, 4 Dec 2025 at 08:38, Leo Yan <leo.yan@arm.com> wrote:
>
> On Wed, Dec 03, 2025 at 06:29:44PM +0000, Coresight ML wrote:
>
> [...]
>
> > > +/* Read registers with power check only (no enable check). */
> > > +static ssize_t coresight_cti_reg_show(struct device *dev,
> > > +                                 struct device_attribute *attr, char *buf)
> > > +{
> > > +   struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> > > +   struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> > > +   u32 idx, val = 0;
> > > +
> > > +   pm_runtime_get_sync(dev->parent);
> > > +   raw_spin_lock(&drvdata->spinlock);
> > > +   idx = drvdata->config.ext_reg_sel;
> > > +   if (drvdata->config.hw_powered) {
> > > +           switch (cti_attr->off) {
> > > +           case INDEX_CTITRIGINSTATUS:
> > > +           case INDEX_CTITRIGOUTSTATUS:
> > > +           case INDEX_ITTRIGINACK:
> > > +           case INDEX_ITTRIGOUT:
> > > +           case INDEX_ITTRIGOUTACK:
> > > +           case INDEX_ITTRIGIN:
>
> I read again and now I understand why you need "config.ext_reg_sel"
> as an index for these expending registers.
>

Having this index for these extended registers matches what we do for
the INEN and OUTEN registers. This gives the user a consistent
approach. We do not want the unnecessary attributes as  it will
increase the memory footprint for all cti instances, not just the qcom
ones.

The first patch in this series works to reduce the memory footprint by
only allocating resource based on the actual configuration. For
example for an ARM designed CTI with 8 trigger registers, we no longer
declare static 128 x 32 bit arrays for each of INEN and OUTEN which
were required by the original design.

Given that there can be 10s or 100s of CTIs in a large multicore
system, reducing the footprint to match the actual configuration, and
offering a level of compression by using an index + single file to
access a set of registers improves the efficiency of the driver.

Regards

Mike

> I think you should extend attrs for the new adding registers:
>
>   static struct attribute *coresight_cti_regs_attrs[] = {
>       ...
>       coresight_cti_reg(triginstatus, CTITRIGINSTATUS),
>       /* Qcom CTI only for triginstatus1/2/3 */
>       coresight_cti_reg(triginstatus1, CTITRIGINSTATUS + 0x4),
>       coresight_cti_reg(triginstatus2, CTITRIGINSTATUS + 0x8),
>       coresight_cti_reg(triginstatus3, CTITRIGINSTATUS + 0xc),
>       ...
>   }
>
> Then, you can add a is_visible() in coresight_cti_regs_group:
>
>   static umode_t coresight_cti_regs_is_visible(struct kobject *kobj,
>                   struct attribute *attr, int n)
>   {
>           struct device *dev = container_of(kobj, struct device, kobj);
>           struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
>
>           /* Mute QCOM CTI registers for standard CTI module */
>           if (!drvdata->is_qcom_cti) {
>               if (attr == &triginstatus1.attr ||
>                   attr == &triginstatus2.attr ||
>                   attr == &triginstatus3.attr)
>                   return 0;
>           }
>
>           return attr->mode;
>   }
>
>   static const struct attribute_group coresight_cti_regs_group = {
>           .attrs = coresight_cti_regs_attrs,
>           .name = "regs",
>           .is_visible = coresight_cti_regs_is_visible,
>   };
>
> Thanks,
> Leo



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-03 18:29   ` Leo Yan
  2025-12-04  8:38     ` Leo Yan
@ 2025-12-04  9:07     ` Mike Leach
  2025-12-04 10:31       ` Leo Yan
  2025-12-04  9:15     ` Mike Leach
  2 siblings, 1 reply; 26+ messages in thread
From: Mike Leach @ 2025-12-04  9:07 UTC (permalink / raw)
  To: Leo Yan
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

Hi,

On Wed, 3 Dec 2025 at 18:29, Leo Yan <leo.yan@arm.com> wrote:
>
> On Tue, Dec 02, 2025 at 02:42:21PM +0800, Yingchao Deng wrote:
> > The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI.
> > It allows a debugger to send to trigger events to a processor or to send
> > a trigger event to one or more processors when a trigger event occurs
> > on another processor on the same SoC, or even between SoCs. Qualcomm CTI
> > implementation differs from the standard CTI in the following aspects:
> >
> > 1. The number of supported triggers is extended to 128.
> > 2. Several register offsets differ from the CoreSight specification.
>
> I apologize for my late review of this series.  For easier maintenance
> later, I have several comments for register access.
>
> [...]
>
> > +static const u32 cti_normal_offset[] = {
> > +     [INDEX_CTIINTACK]               = CTIINTACK,
> > +     [INDEX_CTIAPPSET]               = CTIAPPSET,
> > +     [INDEX_CTIAPPCLEAR]             = CTIAPPCLEAR,
> > +     [INDEX_CTIAPPPULSE]             = CTIAPPPULSE,
> > +     [INDEX_CTIINEN]                 = CTIINEN(0),
> > +     [INDEX_CTIOUTEN]                = CTIOUTEN(0),
>
> I prefer to update the these two macros to CTIINENn and CTIOUTENn, as
> later we will not use CTIINEN(n) and CTIOUTEN(n) anymore.
>
> > +     [INDEX_CTITRIGINSTATUS]         = CTITRIGINSTATUS,
> > +     [INDEX_CTITRIGOUTSTATUS]        = CTITRIGOUTSTATUS,
> > +     [INDEX_CTICHINSTATUS]           = CTICHINSTATUS,
> > +     [INDEX_CTICHOUTSTATUS]          = CTICHOUTSTATUS,
> > +     [INDEX_CTIGATE]                 = CTIGATE,
> > +     [INDEX_ASICCTL]                 = ASICCTL,
> > +     [INDEX_ITCHINACK]               = ITCHINACK,
> > +     [INDEX_ITTRIGINACK]             = ITTRIGINACK,
> > +     [INDEX_ITCHOUT]                 = ITCHOUT,
> > +     [INDEX_ITTRIGOUT]               = ITTRIGOUT,
> > +     [INDEX_ITCHOUTACK]              = ITCHOUTACK,
> > +     [INDEX_ITTRIGOUTACK]            = ITTRIGOUTACK,
> > +     [INDEX_ITCHIN]                  = ITCHIN,
> > +     [INDEX_ITTRIGIN]                = ITTRIGIN,
> > +     [INDEX_ITCTRL]                  = CORESIGHT_ITCTRL,
> > +};
> > +
> > +static const u32 cti_extended_offset[] = {
> > +     [INDEX_CTIINTACK]               = QCOM_CTIINTACK,
> > +     [INDEX_CTIAPPSET]               = QCOM_CTIAPPSET,
> > +     [INDEX_CTIAPPCLEAR]             = QCOM_CTIAPPCLEAR,
> > +     [INDEX_CTIAPPPULSE]             = QCOM_CTIAPPPULSE,
> > +     [INDEX_CTIINEN]                 = QCOM_CTIINEN,
> > +     [INDEX_CTIOUTEN]                = QCOM_CTIOUTEN,
> > +     [INDEX_CTITRIGINSTATUS]         = QCOM_CTITRIGINSTATUS,
> > +     [INDEX_CTITRIGOUTSTATUS]        = QCOM_CTITRIGOUTSTATUS,
> > +     [INDEX_CTICHINSTATUS]           = QCOM_CTICHINSTATUS,
> > +     [INDEX_CTICHOUTSTATUS]          = QCOM_CTICHOUTSTATUS,
> > +     [INDEX_CTIGATE]                 = QCOM_CTIGATE,
> > +     [INDEX_ASICCTL]                 = QCOM_ASICCTL,
> > +     [INDEX_ITCHINACK]               = QCOM_ITCHINACK,
> > +     [INDEX_ITTRIGINACK]             = QCOM_ITTRIGINACK,
> > +     [INDEX_ITCHOUT]                 = QCOM_ITCHOUT,
> > +     [INDEX_ITTRIGOUT]               = QCOM_ITTRIGOUT,
> > +     [INDEX_ITCHOUTACK]              = QCOM_ITCHOUTACK,
> > +     [INDEX_ITTRIGOUTACK]            = QCOM_ITTRIGOUTACK,
> > +     [INDEX_ITCHIN]                  = QCOM_ITCHIN,
> > +     [INDEX_ITTRIGIN]                = QCOM_ITTRIGIN,
> > +     [INDEX_ITCTRL]                  = CORESIGHT_ITCTRL,
> > +};
>
> I saw CTI registers are within 4KiB (0x1000), we can don't convert
> standard regiserts and only convert to QCOM register based on the
> standard ones.  So you can drop the cti_normal_offset strucuture and
> only have a cti_reg_qcom_offset[] struct:
>
>   static const u32 cti_extended_offset[] = {
>         [CTIINTACK]             = QCOM_CTIINTACK,
>         [CTIAPPSET]             = QCOM_CTIAPPSET,
>         [CTIAPPCLEAR]           = QCOM_CTIAPPCLEAR,
>         [CTIAPPPULSE]           = QCOM_CTIAPPPULSE,
>         [CTIINEN]               = QCOM_CTIINEN,
>         ...
>   };
>

I suggested the dual offset approach a couple of patchset revisions
ago as it actually simplifies the code & makes it more efficient. The
offset array in use is set during probe and the remaining code is then
common to both without lots of "if qcom else " occurences.

Regards

Mike

> Then you could create two helpers for register address:
>
>     static void __iomem *cti_reg_addr_with_nr(struct cti_drvdata *drvdata,
>                                               u32 reg, u32 nr)
>     {
>         /* convert to qcom specific offset */
>         if (unlikely(drvdata->is_qcom_cti))
>             reg = cti_extended_offset[reg];
>
>         return drvdata->base + reg + sizeof(u32) * nr;
>     }
>
>     static void __iomem *cti_reg_addr(struct cti_drvdata *drvdata, u32 reg)
>     {
>         return cti_reg_addr_with_nr(drvdata, reg, 0);
>     }
>
> >  /*
> >   * CTI devices can be associated with a PE, or be connected to CoreSight
> > @@ -70,15 +119,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
> >
> >       /* write the CTI trigger registers */
> >       for (i = 0; i < config->nr_trig_max; i++) {
> > -             writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
> > +             writel_relaxed(config->ctiinen[i],
> > +                            drvdata->base + cti_offset(drvdata, INDEX_CTIINEN, i));
>
> writel_relaxed(config->ctiinen[i],
>                cti_reg_addr_with_nr(drvdata, CTIINENn, i));
>
> And apply for the same cases below.
>
> >       /* other regs */
> > -     writel_relaxed(config->ctigate, drvdata->base + CTIGATE);
> > -     writel_relaxed(config->asicctl, drvdata->base + ASICCTL);
> > -     writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
> > +     writel_relaxed(config->ctigate, drvdata->base + cti_offset(drvdata, INDEX_CTIGATE, 0));
>
> writel_relaxed(config->ctigate, cti_reg_addr(drvdata, CTIGATE));
>
> And apply for the same cases below.
>
> [...]
>
> > @@ -394,8 +447,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
> >
> >       /* update the local register values */
> >       chan_bitmask = BIT(channel_idx);
> > -     reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :
> > -                   CTIOUTEN(trigger_idx));
> > +     reg_offset = (direction == CTI_TRIG_IN ? cti_offset(drvdata, INDEX_CTIINEN, trigger_idx) :
> > +                     cti_offset(drvdata, INDEX_CTIOUTEN, trigger_idx));
>
> For readable, we can improve a bit with code alignment:
>
>   reg_offset = (direction == CTI_TRIG_IN) ? cti_reg_addr_with_nr(drvdata, CTIINENn, trigger_idx) :
>                                             cti_reg_addr_with_nr(drvdata, CTIOUTENn, trigger_idx);
>
> [...]
>
> > @@ -981,9 +1035,28 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
> >       drvdata->csdev_release = drvdata->csdev->dev.release;
> >       drvdata->csdev->dev.release = cti_device_release;
> >
> > +     /* check architect value*/
> > +     devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);
> > +     if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {
> > +             drvdata->subtype = QCOM_CTI;
> > +             drvdata->offsets = cti_extended_offset;
>
> As a result, we can only set the is_qcom_cti flag:
>
>   drvdata->is_qcom_cti = true;
>
> > +             /*
> > +              * QCOM CTI does not implement Claimtag functionality as
> > +              * per CoreSight specification, but its CLAIMSET register
> > +              * is incorrectly initialized to 0xF. This can mislead
> > +              * tools or drivers into thinking the component is claimed.
> > +              *
> > +              * Reset CLAIMSET to 0 to reflect that no claims are active.
> > +              */
> > +             writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
>
> I am confused for this.  If QCOM CTI does not implement claim tag,
> then what is the designed register at the offset CORESIGHT_CLAIMSET?
>
> Should you bypass all claim tag related operations for QCOM CTI case?
> (I don't see you touch anything for claim and declaim tags).
>
> > +     } else {
> > +             drvdata->subtype = ARM_STD_CTI;
> > +             drvdata->offsets = cti_normal_offset;
> > +     }
> > +
> >       /* all done - dec pm refcount */
> >       pm_runtime_put(&adev->dev);
> > -     dev_info(&drvdata->csdev->dev, "CTI initialized\n");
> > +     dev_info(&drvdata->csdev->dev, "CTI initialized; subtype=%d\n", drvdata->subtype);
>
> dev_info(&drvdata->csdev->dev, "%s CTI initialized\n",
>          drvdata->is_qcom_cti ? "QCOM" : "");
>
> >       return 0;
> >
> >  pm_release:
> > diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> > index a9df77215141..12a495382999 100644
> > --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> > +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> > @@ -172,9 +172,8 @@ static struct attribute *coresight_cti_attrs[] = {
> >
> >  /* register based attributes */
> >
> > -/* Read registers with power check only (no enable check). */
> > -static ssize_t coresight_cti_reg_show(struct device *dev,
> > -                        struct device_attribute *attr, char *buf)
> > +static ssize_t coresight_cti_mgmt_reg_show(struct device *dev,
> > +                                        struct device_attribute *attr, char *buf)
> >  {
> >       struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> >       struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> > @@ -189,6 +188,40 @@ static ssize_t coresight_cti_reg_show(struct device *dev,
> >       return sysfs_emit(buf, "0x%x\n", val);
> >  }
> >
> > +/* Read registers with power check only (no enable check). */
> > +static ssize_t coresight_cti_reg_show(struct device *dev,
> > +                                   struct device_attribute *attr, char *buf)
> > +{
> > +     struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> > +     struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> > +     u32 idx, val = 0;
> > +
> > +     pm_runtime_get_sync(dev->parent);
> > +     raw_spin_lock(&drvdata->spinlock);
> > +     idx = drvdata->config.ext_reg_sel;
> > +     if (drvdata->config.hw_powered) {
> > +             switch (cti_attr->off) {
> > +             case INDEX_CTITRIGINSTATUS:
> > +             case INDEX_CTITRIGOUTSTATUS:
> > +             case INDEX_ITTRIGINACK:
> > +             case INDEX_ITTRIGOUT:
> > +             case INDEX_ITTRIGOUTACK:
> > +             case INDEX_ITTRIGIN:
> > +                     val = readl_relaxed(drvdata->base +
> > +                                         cti_offset(drvdata, cti_attr->off, idx));
> > +                     break;
> > +
> > +             default:
> > +                     val = readl_relaxed(drvdata->base + cti_offset(drvdata, cti_attr->off, 0));
> > +                     break;
> > +             }
> > +     }
> > +
> > +     raw_spin_unlock(&drvdata->spinlock);
> > +     pm_runtime_put_sync(dev->parent);
> > +     return sysfs_emit(buf, "0x%x\n", val);
> > +}
> > +
> >  /* Write registers with power check only (no enable check). */
> >  static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
> >                                                     struct device_attribute *attr,
> > @@ -197,19 +230,39 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
> >       struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> >       struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> >       unsigned long val = 0;
> > +     u32 idx;
> >
> >       if (kstrtoul(buf, 0, &val))
> >               return -EINVAL;
> >
> >       pm_runtime_get_sync(dev->parent);
> >       raw_spin_lock(&drvdata->spinlock);
> > -     if (drvdata->config.hw_powered)
> > -             cti_write_single_reg(drvdata, cti_attr->off, val);
> > +     idx = drvdata->config.ext_reg_sel;
> > +     if (drvdata->config.hw_powered) {
> > +             switch (cti_attr->off) {
> > +             case INDEX_ITTRIGINACK:
> > +             case INDEX_ITTRIGOUT:
> > +                     cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, idx), val);
> > +                     break;
> > +
> > +             default:
> > +                     cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, 0), val);
> > +                     break;
> > +             }
> > +     }
>
> For both coresight_cti_reg_show() and coresight_cti_reg_store(), can
> we always use "cti_attr->off" as the offset for regitser access?  I
> mean we don't need the extra config.ext_reg_sel, eventually any
> register we can calculate a offset for it.
>
> >       raw_spin_unlock(&drvdata->spinlock);
> >       pm_runtime_put_sync(dev->parent);
> >       return size;
> >  }
> >
> > +#define coresight_cti_mgmt_reg(name, offset)                         \
> > +     (&((struct cs_off_attribute[]) {                                \
> > +        {                                                            \
> > +             __ATTR(name, 0444, coresight_cti_mgmt_reg_show, NULL),  \
> > +             offset                                                  \
> > +        }                                                            \
> > +     })[0].attr.attr)
> > +
> >  #define coresight_cti_reg(name, offset)                                      \
> >       (&((struct cs_off_attribute[]) {                                \
> >          {                                                            \
> > @@ -237,17 +290,17 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
> >
> >  /* coresight management registers */
> >  static struct attribute *coresight_cti_mgmt_attrs[] = {
> > -     coresight_cti_reg(devaff0, CTIDEVAFF0),
> > -     coresight_cti_reg(devaff1, CTIDEVAFF1),
> > -     coresight_cti_reg(authstatus, CORESIGHT_AUTHSTATUS),
> > -     coresight_cti_reg(devarch, CORESIGHT_DEVARCH),
> > -     coresight_cti_reg(devid, CORESIGHT_DEVID),
> > -     coresight_cti_reg(devtype, CORESIGHT_DEVTYPE),
> > -     coresight_cti_reg(pidr0, CORESIGHT_PERIPHIDR0),
> > -     coresight_cti_reg(pidr1, CORESIGHT_PERIPHIDR1),
> > -     coresight_cti_reg(pidr2, CORESIGHT_PERIPHIDR2),
> > -     coresight_cti_reg(pidr3, CORESIGHT_PERIPHIDR3),
> > -     coresight_cti_reg(pidr4, CORESIGHT_PERIPHIDR4),
> > +     coresight_cti_mgmt_reg(devaff0, CTIDEVAFF0),
> > +     coresight_cti_mgmt_reg(devaff1, CTIDEVAFF1),
> > +     coresight_cti_mgmt_reg(authstatus, CORESIGHT_AUTHSTATUS),
> > +     coresight_cti_mgmt_reg(devarch, CORESIGHT_DEVARCH),
> > +     coresight_cti_mgmt_reg(devid, CORESIGHT_DEVID),
> > +     coresight_cti_mgmt_reg(devtype, CORESIGHT_DEVTYPE),
> > +     coresight_cti_mgmt_reg(pidr0, CORESIGHT_PERIPHIDR0),
> > +     coresight_cti_mgmt_reg(pidr1, CORESIGHT_PERIPHIDR1),
> > +     coresight_cti_mgmt_reg(pidr2, CORESIGHT_PERIPHIDR2),
> > +     coresight_cti_mgmt_reg(pidr3, CORESIGHT_PERIPHIDR3),
> > +     coresight_cti_mgmt_reg(pidr4, CORESIGHT_PERIPHIDR4),
>
> I don't see any benefit for updating from coresight_cti_reg() to
> coresight_cti_mgmt_reg().  If really want to do this, should remove
> the macro coresight_cti_reg()?
>
> >       NULL,
> >  };
> >
> > @@ -258,13 +311,15 @@ static struct attribute *coresight_cti_mgmt_attrs[] = {
> >   * If inaccessible & pcached_val not NULL then show cached value.
> >   */
> >  static ssize_t cti_reg32_show(struct device *dev, char *buf,
> > -                           u32 *pcached_val, int reg_offset)
> > +                           u32 *pcached_val, int index)
>
> We don't need to change anything for this.  The passed "reg_offset"
> should be always a final offset, no matter for standard CTI or QCOM
> case, the driver directly uses the offset for register access.
>
> [...]
>
> > +/*
> > + * QCOM CTI supports up to 128 triggers, there are 6 registers need to be
> > + * expanded to up to 4 instances, and ext_reg_sel can be used to indicate
> > + * which one is in use.
> > + * CTITRIGINSTATUS, CTITRIGOUTSTATUS,
> > + * ITTRIGIN, ITTRIGOUT,
> > + * ITTRIGINACK, ITTRIGOUTACK.
> > + */
> > +static ssize_t ext_reg_sel_show(struct device *dev,
> > +                             struct device_attribute *attr,
> > +                             char *buf)
> > +{
> > +     u32 val;
> > +     struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> > +
> > +     raw_spin_lock(&drvdata->spinlock);
> > +     val = drvdata->config.ext_reg_sel;
> > +     raw_spin_unlock(&drvdata->spinlock);
> > +     return sprintf(buf, "%d\n", val);
> > +}
> > +
> > +static ssize_t ext_reg_sel_store(struct device *dev,
> > +                              struct device_attribute *attr,
> > +                              const char *buf, size_t size)
> > +{
> > +     unsigned long val;
> > +     struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> > +
> > +     if (kstrtoul(buf, 0, &val))
> > +             return -EINVAL;
> > +     if (val > ((drvdata->config.nr_trig_max + 31) / 32 - 1))
> > +             return -EINVAL;
> > +
> > +     raw_spin_lock(&drvdata->spinlock);
> > +     drvdata->config.ext_reg_sel = val;
> > +     raw_spin_unlock(&drvdata->spinlock);
> > +     return size;
> > +}
>
> As said, I don't think the trigger register is any different from
> other register access.  So the existed APIs would be sufficient.
>
> As a result, we don't need to add two above functions.
>
> Thanks,
> Leo



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-03 18:29   ` Leo Yan
  2025-12-04  8:38     ` Leo Yan
  2025-12-04  9:07     ` Mike Leach
@ 2025-12-04  9:15     ` Mike Leach
  2025-12-04 10:47       ` Leo Yan
  2 siblings, 1 reply; 26+ messages in thread
From: Mike Leach @ 2025-12-04  9:15 UTC (permalink / raw)
  To: Leo Yan
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

Hi

On Wed, 3 Dec 2025 at 18:29, Leo Yan <leo.yan@arm.com> wrote:
>
> On Tue, Dec 02, 2025 at 02:42:21PM +0800, Yingchao Deng wrote:
> > The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI.
> > It allows a debugger to send to trigger events to a processor or to send
> > a trigger event to one or more processors when a trigger event occurs
> > on another processor on the same SoC, or even between SoCs. Qualcomm CTI
> > implementation differs from the standard CTI in the following aspects:
> >
> > 1. The number of supported triggers is extended to 128.
> > 2. Several register offsets differ from the CoreSight specification.
>
> I apologize for my late review of this series.  For easier maintenance
> later, I have several comments for register access.
>
> [...]
>
> > +static const u32 cti_normal_offset[] = {
> > +     [INDEX_CTIINTACK]               = CTIINTACK,
> > +     [INDEX_CTIAPPSET]               = CTIAPPSET,
> > +     [INDEX_CTIAPPCLEAR]             = CTIAPPCLEAR,
> > +     [INDEX_CTIAPPPULSE]             = CTIAPPPULSE,
> > +     [INDEX_CTIINEN]                 = CTIINEN(0),
> > +     [INDEX_CTIOUTEN]                = CTIOUTEN(0),
>
> I prefer to update the these two macros to CTIINENn and CTIOUTENn, as
> later we will not use CTIINEN(n) and CTIOUTEN(n) anymore.
>
> > +     [INDEX_CTITRIGINSTATUS]         = CTITRIGINSTATUS,
> > +     [INDEX_CTITRIGOUTSTATUS]        = CTITRIGOUTSTATUS,
> > +     [INDEX_CTICHINSTATUS]           = CTICHINSTATUS,
> > +     [INDEX_CTICHOUTSTATUS]          = CTICHOUTSTATUS,
> > +     [INDEX_CTIGATE]                 = CTIGATE,
> > +     [INDEX_ASICCTL]                 = ASICCTL,
> > +     [INDEX_ITCHINACK]               = ITCHINACK,
> > +     [INDEX_ITTRIGINACK]             = ITTRIGINACK,
> > +     [INDEX_ITCHOUT]                 = ITCHOUT,
> > +     [INDEX_ITTRIGOUT]               = ITTRIGOUT,
> > +     [INDEX_ITCHOUTACK]              = ITCHOUTACK,
> > +     [INDEX_ITTRIGOUTACK]            = ITTRIGOUTACK,
> > +     [INDEX_ITCHIN]                  = ITCHIN,
> > +     [INDEX_ITTRIGIN]                = ITTRIGIN,
> > +     [INDEX_ITCTRL]                  = CORESIGHT_ITCTRL,
> > +};
> > +
> > +static const u32 cti_extended_offset[] = {
> > +     [INDEX_CTIINTACK]               = QCOM_CTIINTACK,
> > +     [INDEX_CTIAPPSET]               = QCOM_CTIAPPSET,
> > +     [INDEX_CTIAPPCLEAR]             = QCOM_CTIAPPCLEAR,
> > +     [INDEX_CTIAPPPULSE]             = QCOM_CTIAPPPULSE,
> > +     [INDEX_CTIINEN]                 = QCOM_CTIINEN,
> > +     [INDEX_CTIOUTEN]                = QCOM_CTIOUTEN,
> > +     [INDEX_CTITRIGINSTATUS]         = QCOM_CTITRIGINSTATUS,
> > +     [INDEX_CTITRIGOUTSTATUS]        = QCOM_CTITRIGOUTSTATUS,
> > +     [INDEX_CTICHINSTATUS]           = QCOM_CTICHINSTATUS,
> > +     [INDEX_CTICHOUTSTATUS]          = QCOM_CTICHOUTSTATUS,
> > +     [INDEX_CTIGATE]                 = QCOM_CTIGATE,
> > +     [INDEX_ASICCTL]                 = QCOM_ASICCTL,
> > +     [INDEX_ITCHINACK]               = QCOM_ITCHINACK,
> > +     [INDEX_ITTRIGINACK]             = QCOM_ITTRIGINACK,
> > +     [INDEX_ITCHOUT]                 = QCOM_ITCHOUT,
> > +     [INDEX_ITTRIGOUT]               = QCOM_ITTRIGOUT,
> > +     [INDEX_ITCHOUTACK]              = QCOM_ITCHOUTACK,
> > +     [INDEX_ITTRIGOUTACK]            = QCOM_ITTRIGOUTACK,
> > +     [INDEX_ITCHIN]                  = QCOM_ITCHIN,
> > +     [INDEX_ITTRIGIN]                = QCOM_ITTRIGIN,
> > +     [INDEX_ITCTRL]                  = CORESIGHT_ITCTRL,
> > +};
>
> I saw CTI registers are within 4KiB (0x1000), we can don't convert
> standard regiserts and only convert to QCOM register based on the
> standard ones.  So you can drop the cti_normal_offset strucuture and
> only have a cti_reg_qcom_offset[] struct:
>
>   static const u32 cti_extended_offset[] = {
>         [CTIINTACK]             = QCOM_CTIINTACK,
>         [CTIAPPSET]             = QCOM_CTIAPPSET,
>         [CTIAPPCLEAR]           = QCOM_CTIAPPCLEAR,
>         [CTIAPPPULSE]           = QCOM_CTIAPPPULSE,
>         [CTIINEN]               = QCOM_CTIINEN,
>         ...
>   };
>
> Then you could create two helpers for register address:
>
>     static void __iomem *cti_reg_addr_with_nr(struct cti_drvdata *drvdata,
>                                               u32 reg, u32 nr)
>     {
>         /* convert to qcom specific offset */
>         if (unlikely(drvdata->is_qcom_cti))
>             reg = cti_extended_offset[reg];
>
>         return drvdata->base + reg + sizeof(u32) * nr;
>     }
>
>     static void __iomem *cti_reg_addr(struct cti_drvdata *drvdata, u32 reg)
>     {
>         return cti_reg_addr_with_nr(drvdata, reg, 0);
>     }
>
> >  /*
> >   * CTI devices can be associated with a PE, or be connected to CoreSight
> > @@ -70,15 +119,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
> >
> >       /* write the CTI trigger registers */
> >       for (i = 0; i < config->nr_trig_max; i++) {
> > -             writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
> > +             writel_relaxed(config->ctiinen[i],
> > +                            drvdata->base + cti_offset(drvdata, INDEX_CTIINEN, i));
>
> writel_relaxed(config->ctiinen[i],
>                cti_reg_addr_with_nr(drvdata, CTIINENn, i));
>
> And apply for the same cases below.
>
> >       /* other regs */
> > -     writel_relaxed(config->ctigate, drvdata->base + CTIGATE);
> > -     writel_relaxed(config->asicctl, drvdata->base + ASICCTL);
> > -     writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
> > +     writel_relaxed(config->ctigate, drvdata->base + cti_offset(drvdata, INDEX_CTIGATE, 0));
>
> writel_relaxed(config->ctigate, cti_reg_addr(drvdata, CTIGATE));
>
> And apply for the same cases below.
>
> [...]
>
> > @@ -394,8 +447,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
> >
> >       /* update the local register values */
> >       chan_bitmask = BIT(channel_idx);
> > -     reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :
> > -                   CTIOUTEN(trigger_idx));
> > +     reg_offset = (direction == CTI_TRIG_IN ? cti_offset(drvdata, INDEX_CTIINEN, trigger_idx) :
> > +                     cti_offset(drvdata, INDEX_CTIOUTEN, trigger_idx));
>
> For readable, we can improve a bit with code alignment:
>
>   reg_offset = (direction == CTI_TRIG_IN) ? cti_reg_addr_with_nr(drvdata, CTIINENn, trigger_idx) :
>                                             cti_reg_addr_with_nr(drvdata, CTIOUTENn, trigger_idx);
>
> [...]
>
> > @@ -981,9 +1035,28 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
> >       drvdata->csdev_release = drvdata->csdev->dev.release;
> >       drvdata->csdev->dev.release = cti_device_release;
> >
> > +     /* check architect value*/
> > +     devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);
> > +     if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {
> > +             drvdata->subtype = QCOM_CTI;
> > +             drvdata->offsets = cti_extended_offset;
>
> As a result, we can only set the is_qcom_cti flag:
>
>   drvdata->is_qcom_cti = true;
>
> > +             /*
> > +              * QCOM CTI does not implement Claimtag functionality as
> > +              * per CoreSight specification, but its CLAIMSET register
> > +              * is incorrectly initialized to 0xF. This can mislead
> > +              * tools or drivers into thinking the component is claimed.
> > +              *
> > +              * Reset CLAIMSET to 0 to reflect that no claims are active.
> > +              */
> > +             writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
>
> I am confused for this.  If QCOM CTI does not implement claim tag,
> then what is the designed register at the offset CORESIGHT_CLAIMSET?
>
> Should you bypass all claim tag related operations for QCOM CTI case?
> (I don't see you touch anything for claim and declaim tags).
>

The patch I have created to handle systems without correct claim tag
operation is a dependency for this patch set. Thus no need for
override here as the core code will handle this correctly.

The only issue is ensuring the non-CTI spec implementation will result
in the correct detection of no claim tags present.

Regards

Miike


> > +     } else {
> > +             drvdata->subtype = ARM_STD_CTI;
> > +             drvdata->offsets = cti_normal_offset;
> > +     }
> > +
> >       /* all done - dec pm refcount */
> >       pm_runtime_put(&adev->dev);
> > -     dev_info(&drvdata->csdev->dev, "CTI initialized\n");
> > +     dev_info(&drvdata->csdev->dev, "CTI initialized; subtype=%d\n", drvdata->subtype);
>
> dev_info(&drvdata->csdev->dev, "%s CTI initialized\n",
>          drvdata->is_qcom_cti ? "QCOM" : "");
>
> >       return 0;
> >
> >  pm_release:
> > diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> > index a9df77215141..12a495382999 100644
> > --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> > +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> > @@ -172,9 +172,8 @@ static struct attribute *coresight_cti_attrs[] = {
> >
> >  /* register based attributes */
> >
> > -/* Read registers with power check only (no enable check). */
> > -static ssize_t coresight_cti_reg_show(struct device *dev,
> > -                        struct device_attribute *attr, char *buf)
> > +static ssize_t coresight_cti_mgmt_reg_show(struct device *dev,
> > +                                        struct device_attribute *attr, char *buf)
> >  {
> >       struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> >       struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> > @@ -189,6 +188,40 @@ static ssize_t coresight_cti_reg_show(struct device *dev,
> >       return sysfs_emit(buf, "0x%x\n", val);
> >  }
> >
> > +/* Read registers with power check only (no enable check). */
> > +static ssize_t coresight_cti_reg_show(struct device *dev,
> > +                                   struct device_attribute *attr, char *buf)
> > +{
> > +     struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> > +     struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> > +     u32 idx, val = 0;
> > +
> > +     pm_runtime_get_sync(dev->parent);
> > +     raw_spin_lock(&drvdata->spinlock);
> > +     idx = drvdata->config.ext_reg_sel;
> > +     if (drvdata->config.hw_powered) {
> > +             switch (cti_attr->off) {
> > +             case INDEX_CTITRIGINSTATUS:
> > +             case INDEX_CTITRIGOUTSTATUS:
> > +             case INDEX_ITTRIGINACK:
> > +             case INDEX_ITTRIGOUT:
> > +             case INDEX_ITTRIGOUTACK:
> > +             case INDEX_ITTRIGIN:
> > +                     val = readl_relaxed(drvdata->base +
> > +                                         cti_offset(drvdata, cti_attr->off, idx));
> > +                     break;
> > +
> > +             default:
> > +                     val = readl_relaxed(drvdata->base + cti_offset(drvdata, cti_attr->off, 0));
> > +                     break;
> > +             }
> > +     }
> > +
> > +     raw_spin_unlock(&drvdata->spinlock);
> > +     pm_runtime_put_sync(dev->parent);
> > +     return sysfs_emit(buf, "0x%x\n", val);
> > +}
> > +
> >  /* Write registers with power check only (no enable check). */
> >  static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
> >                                                     struct device_attribute *attr,
> > @@ -197,19 +230,39 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
> >       struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> >       struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> >       unsigned long val = 0;
> > +     u32 idx;
> >
> >       if (kstrtoul(buf, 0, &val))
> >               return -EINVAL;
> >
> >       pm_runtime_get_sync(dev->parent);
> >       raw_spin_lock(&drvdata->spinlock);
> > -     if (drvdata->config.hw_powered)
> > -             cti_write_single_reg(drvdata, cti_attr->off, val);
> > +     idx = drvdata->config.ext_reg_sel;
> > +     if (drvdata->config.hw_powered) {
> > +             switch (cti_attr->off) {
> > +             case INDEX_ITTRIGINACK:
> > +             case INDEX_ITTRIGOUT:
> > +                     cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, idx), val);
> > +                     break;
> > +
> > +             default:
> > +                     cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, 0), val);
> > +                     break;
> > +             }
> > +     }
>
> For both coresight_cti_reg_show() and coresight_cti_reg_store(), can
> we always use "cti_attr->off" as the offset for regitser access?  I
> mean we don't need the extra config.ext_reg_sel, eventually any
> register we can calculate a offset for it.
>
> >       raw_spin_unlock(&drvdata->spinlock);
> >       pm_runtime_put_sync(dev->parent);
> >       return size;
> >  }
> >
> > +#define coresight_cti_mgmt_reg(name, offset)                         \
> > +     (&((struct cs_off_attribute[]) {                                \
> > +        {                                                            \
> > +             __ATTR(name, 0444, coresight_cti_mgmt_reg_show, NULL),  \
> > +             offset                                                  \
> > +        }                                                            \
> > +     })[0].attr.attr)
> > +
> >  #define coresight_cti_reg(name, offset)                                      \
> >       (&((struct cs_off_attribute[]) {                                \
> >          {                                                            \
> > @@ -237,17 +290,17 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
> >
> >  /* coresight management registers */
> >  static struct attribute *coresight_cti_mgmt_attrs[] = {
> > -     coresight_cti_reg(devaff0, CTIDEVAFF0),
> > -     coresight_cti_reg(devaff1, CTIDEVAFF1),
> > -     coresight_cti_reg(authstatus, CORESIGHT_AUTHSTATUS),
> > -     coresight_cti_reg(devarch, CORESIGHT_DEVARCH),
> > -     coresight_cti_reg(devid, CORESIGHT_DEVID),
> > -     coresight_cti_reg(devtype, CORESIGHT_DEVTYPE),
> > -     coresight_cti_reg(pidr0, CORESIGHT_PERIPHIDR0),
> > -     coresight_cti_reg(pidr1, CORESIGHT_PERIPHIDR1),
> > -     coresight_cti_reg(pidr2, CORESIGHT_PERIPHIDR2),
> > -     coresight_cti_reg(pidr3, CORESIGHT_PERIPHIDR3),
> > -     coresight_cti_reg(pidr4, CORESIGHT_PERIPHIDR4),
> > +     coresight_cti_mgmt_reg(devaff0, CTIDEVAFF0),
> > +     coresight_cti_mgmt_reg(devaff1, CTIDEVAFF1),
> > +     coresight_cti_mgmt_reg(authstatus, CORESIGHT_AUTHSTATUS),
> > +     coresight_cti_mgmt_reg(devarch, CORESIGHT_DEVARCH),
> > +     coresight_cti_mgmt_reg(devid, CORESIGHT_DEVID),
> > +     coresight_cti_mgmt_reg(devtype, CORESIGHT_DEVTYPE),
> > +     coresight_cti_mgmt_reg(pidr0, CORESIGHT_PERIPHIDR0),
> > +     coresight_cti_mgmt_reg(pidr1, CORESIGHT_PERIPHIDR1),
> > +     coresight_cti_mgmt_reg(pidr2, CORESIGHT_PERIPHIDR2),
> > +     coresight_cti_mgmt_reg(pidr3, CORESIGHT_PERIPHIDR3),
> > +     coresight_cti_mgmt_reg(pidr4, CORESIGHT_PERIPHIDR4),
>
> I don't see any benefit for updating from coresight_cti_reg() to
> coresight_cti_mgmt_reg().  If really want to do this, should remove
> the macro coresight_cti_reg()?
>
> >       NULL,
> >  };
> >
> > @@ -258,13 +311,15 @@ static struct attribute *coresight_cti_mgmt_attrs[] = {
> >   * If inaccessible & pcached_val not NULL then show cached value.
> >   */
> >  static ssize_t cti_reg32_show(struct device *dev, char *buf,
> > -                           u32 *pcached_val, int reg_offset)
> > +                           u32 *pcached_val, int index)
>
> We don't need to change anything for this.  The passed "reg_offset"
> should be always a final offset, no matter for standard CTI or QCOM
> case, the driver directly uses the offset for register access.
>
> [...]
>
> > +/*
> > + * QCOM CTI supports up to 128 triggers, there are 6 registers need to be
> > + * expanded to up to 4 instances, and ext_reg_sel can be used to indicate
> > + * which one is in use.
> > + * CTITRIGINSTATUS, CTITRIGOUTSTATUS,
> > + * ITTRIGIN, ITTRIGOUT,
> > + * ITTRIGINACK, ITTRIGOUTACK.
> > + */
> > +static ssize_t ext_reg_sel_show(struct device *dev,
> > +                             struct device_attribute *attr,
> > +                             char *buf)
> > +{
> > +     u32 val;
> > +     struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> > +
> > +     raw_spin_lock(&drvdata->spinlock);
> > +     val = drvdata->config.ext_reg_sel;
> > +     raw_spin_unlock(&drvdata->spinlock);
> > +     return sprintf(buf, "%d\n", val);
> > +}
> > +
> > +static ssize_t ext_reg_sel_store(struct device *dev,
> > +                              struct device_attribute *attr,
> > +                              const char *buf, size_t size)
> > +{
> > +     unsigned long val;
> > +     struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> > +
> > +     if (kstrtoul(buf, 0, &val))
> > +             return -EINVAL;
> > +     if (val > ((drvdata->config.nr_trig_max + 31) / 32 - 1))
> > +             return -EINVAL;
> > +
> > +     raw_spin_lock(&drvdata->spinlock);
> > +     drvdata->config.ext_reg_sel = val;
> > +     raw_spin_unlock(&drvdata->spinlock);
> > +     return size;
> > +}
>
> As said, I don't think the trigger register is any different from
> other register access.  So the existed APIs would be sufficient.
>
> As a result, we don't need to add two above functions.
>
> Thanks,
> Leo



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/2] coresight: cti: Convert trigger usage fields to dynamic bitmaps and arrays
  2025-12-02  6:42 ` [PATCH v6 1/2] coresight: cti: Convert trigger usage fields to dynamic bitmaps and arrays Yingchao Deng
@ 2025-12-04  9:54   ` Mike Leach
  0 siblings, 0 replies; 26+ messages in thread
From: Mike Leach @ 2025-12-04  9:54 UTC (permalink / raw)
  To: Yingchao Deng
  Cc: Suzuki K Poulose, James Clark, Alexander Shishkin, Tingwei Zhang,
	quic_yingdeng, coresight, linux-arm-kernel, linux-kernel,
	linux-arm-msm, Jinlong Mao

You are missing the review-by on this patch I sent for v5

On Tue, 2 Dec 2025 at 06:43, Yingchao Deng
<yingchao.deng@oss.qualcomm.com> wrote:
>
> Replace the fixed-size u32 fields in the cti_config and cti_trig_grp
> structure with dynamically allocated bitmaps and arrays. This allows
> memory to be allocated based on the actual number of triggers during probe
> time, reducing memory footprint and improving scalability for platforms
> with varying trigger counts.
> Additionally, repack struct cti_config to reduce its size from 80 bytes to
> 72 bytes.
>
> Signed-off-by: Yingchao Deng <yingchao.deng@oss.qualcomm.com>
> ---
>  drivers/hwtracing/coresight/coresight-cti-core.c   | 58 ++++++++++++++++------
>  .../hwtracing/coresight/coresight-cti-platform.c   | 16 +++---
>  drivers/hwtracing/coresight/coresight-cti-sysfs.c  | 10 ++--
>  drivers/hwtracing/coresight/coresight-cti.h        | 17 ++++---
>  4 files changed, 65 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
> index bfbc365bb2ef..f9970e40dd59 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-core.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-core.c
> @@ -214,8 +214,8 @@ void cti_write_intack(struct device *dev, u32 ackval)
>  /* DEVID[19:16] - number of CTM channels */
>  #define CTI_DEVID_CTMCHANNELS(devid_val) ((int) BMVAL(devid_val, 16, 19))
>
> -static void cti_set_default_config(struct device *dev,
> -                                  struct cti_drvdata *drvdata)
> +static int cti_set_default_config(struct device *dev,
> +                                 struct cti_drvdata *drvdata)
>  {
>         struct cti_config *config = &drvdata->config;
>         u32 devid;
> @@ -234,12 +234,33 @@ static void cti_set_default_config(struct device *dev,
>                 config->nr_trig_max = CTIINOUTEN_MAX;
>         }
>
> +       config->trig_in_use = devm_bitmap_zalloc(dev, config->nr_trig_max, GFP_KERNEL);
> +       if (!config->trig_in_use)
> +               return -ENOMEM;
> +
> +       config->trig_out_use = devm_bitmap_zalloc(dev, config->nr_trig_max, GFP_KERNEL);
> +       if (!config->trig_out_use)
> +               return -ENOMEM;
> +
> +       config->trig_out_filter = devm_bitmap_zalloc(dev, config->nr_trig_max, GFP_KERNEL);
> +       if (!config->trig_out_filter)
> +               return -ENOMEM;
> +
> +       config->ctiinen = devm_kcalloc(dev, config->nr_trig_max, sizeof(u32), GFP_KERNEL);
> +       if (!config->ctiinen)
> +               return -ENOMEM;
> +
> +       config->ctiouten = devm_kcalloc(dev, config->nr_trig_max, sizeof(u32), GFP_KERNEL);
> +       if (!config->ctiouten)
> +               return -ENOMEM;
> +
>         config->nr_ctm_channels = CTI_DEVID_CTMCHANNELS(devid);
>
>         /* Most regs default to 0 as zalloc'ed except...*/
>         config->trig_filter_enable = true;
>         config->ctigate = GENMASK(config->nr_ctm_channels - 1, 0);
>         config->enable_req_count = 0;
> +       return 0;
>  }
>
>  /*
> @@ -270,8 +291,10 @@ int cti_add_connection_entry(struct device *dev, struct cti_drvdata *drvdata,
>         cti_dev->nr_trig_con++;
>
>         /* add connection usage bit info to overall info */
> -       drvdata->config.trig_in_use |= tc->con_in->used_mask;
> -       drvdata->config.trig_out_use |= tc->con_out->used_mask;
> +       bitmap_or(drvdata->config.trig_in_use, drvdata->config.trig_in_use,
> +                 tc->con_in->used_mask, drvdata->config.nr_trig_max);
> +       bitmap_or(drvdata->config.trig_out_use, drvdata->config.trig_out_use,
> +                 tc->con_out->used_mask, drvdata->config.nr_trig_max);
>
>         return 0;
>  }
> @@ -293,12 +316,20 @@ struct cti_trig_con *cti_allocate_trig_con(struct device *dev, int in_sigs,
>         if (!in)
>                 return NULL;
>
> +       in->used_mask = devm_bitmap_alloc(dev, in_sigs, GFP_KERNEL);
> +       if (!in->used_mask)
> +               return NULL;
> +
>         out = devm_kzalloc(dev,
>                            offsetof(struct cti_trig_grp, sig_types[out_sigs]),
>                            GFP_KERNEL);
>         if (!out)
>                 return NULL;
>
> +       out->used_mask = devm_bitmap_alloc(dev, out_sigs, GFP_KERNEL);
> +       if (!out->used_mask)
> +               return NULL;
> +
>         tc->con_in = in;
>         tc->con_out = out;
>         tc->con_in->nr_sigs = in_sigs;
> @@ -314,7 +345,6 @@ int cti_add_default_connection(struct device *dev, struct cti_drvdata *drvdata)
>  {
>         int ret = 0;
>         int n_trigs = drvdata->config.nr_trig_max;
> -       u32 n_trig_mask = GENMASK(n_trigs - 1, 0);
>         struct cti_trig_con *tc = NULL;
>
>         /*
> @@ -325,8 +355,9 @@ int cti_add_default_connection(struct device *dev, struct cti_drvdata *drvdata)
>         if (!tc)
>                 return -ENOMEM;
>
> -       tc->con_in->used_mask = n_trig_mask;
> -       tc->con_out->used_mask = n_trig_mask;
> +       bitmap_fill(tc->con_in->used_mask, n_trigs);
> +       bitmap_fill(tc->con_out->used_mask, n_trigs);
> +
>         ret = cti_add_connection_entry(dev, drvdata, tc, NULL, "default");
>         return ret;
>  }
> @@ -339,7 +370,6 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
>  {
>         struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
>         struct cti_config *config = &drvdata->config;
> -       u32 trig_bitmask;
>         u32 chan_bitmask;
>         u32 reg_value;
>         int reg_offset;
> @@ -349,18 +379,16 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
>            (trigger_idx >= config->nr_trig_max))
>                 return -EINVAL;
>
> -       trig_bitmask = BIT(trigger_idx);
> -
>         /* ensure registered triggers and not out filtered */
>         if (direction == CTI_TRIG_IN)   {
> -               if (!(trig_bitmask & config->trig_in_use))
> +               if (!(test_bit(trigger_idx, config->trig_in_use)))
>                         return -EINVAL;
>         } else {
> -               if (!(trig_bitmask & config->trig_out_use))
> +               if (!(test_bit(trigger_idx, config->trig_out_use)))
>                         return -EINVAL;
>
>                 if ((config->trig_filter_enable) &&
> -                   (config->trig_out_filter & trig_bitmask))
> +                   test_bit(trigger_idx, config->trig_out_filter))
>                         return -EINVAL;
>         }
>
> @@ -892,7 +920,9 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
>         raw_spin_lock_init(&drvdata->spinlock);
>
>         /* initialise CTI driver config values */
> -       cti_set_default_config(dev, drvdata);
> +       ret = cti_set_default_config(dev, drvdata);
> +       if (ret)
> +               return ret;
>
>         pdata = coresight_cti_get_platform_data(dev);
>         if (IS_ERR(pdata)) {
> diff --git a/drivers/hwtracing/coresight/coresight-cti-platform.c b/drivers/hwtracing/coresight/coresight-cti-platform.c
> index d0ae10bf6128..4bef860a0484 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-platform.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-platform.c
> @@ -136,8 +136,8 @@ static int cti_plat_create_v8_etm_connection(struct device *dev,
>                 goto create_v8_etm_out;
>
>         /* build connection data */
> -       tc->con_in->used_mask = 0xF0; /* sigs <4,5,6,7> */
> -       tc->con_out->used_mask = 0xF0; /* sigs <4,5,6,7> */
> +       bitmap_set(tc->con_in->used_mask, 4, 4); /* sigs <4,5,6,7> */
> +       bitmap_set(tc->con_out->used_mask, 4, 4); /* sigs <4,5,6,7> */
>
>         /*
>          * The EXTOUT type signals from the ETM are connected to a set of input
> @@ -194,10 +194,10 @@ static int cti_plat_create_v8_connections(struct device *dev,
>                 goto of_create_v8_out;
>
>         /* Set the v8 PE CTI connection data */
> -       tc->con_in->used_mask = 0x3; /* sigs <0 1> */
> +       bitmap_set(tc->con_in->used_mask, 0, 2); /* sigs <0 1> */
>         tc->con_in->sig_types[0] = PE_DBGTRIGGER;
>         tc->con_in->sig_types[1] = PE_PMUIRQ;
> -       tc->con_out->used_mask = 0x7; /* sigs <0 1 2 > */
> +       bitmap_set(tc->con_out->used_mask, 0, 3); /* sigs <0 1 2 > */
>         tc->con_out->sig_types[0] = PE_EDBGREQ;
>         tc->con_out->sig_types[1] = PE_DBGRESTART;
>         tc->con_out->sig_types[2] = PE_CTIIRQ;
> @@ -213,7 +213,7 @@ static int cti_plat_create_v8_connections(struct device *dev,
>                 goto of_create_v8_out;
>
>         /* filter pe_edbgreq - PE trigout sig <0> */
> -       drvdata->config.trig_out_filter |= 0x1;
> +       set_bit(0, drvdata->config.trig_out_filter);
>
>  of_create_v8_out:
>         return ret;
> @@ -257,7 +257,7 @@ static int cti_plat_read_trig_group(struct cti_trig_grp *tgrp,
>         if (!err) {
>                 /* set the signal usage mask */
>                 for (idx = 0; idx < tgrp->nr_sigs; idx++)
> -                       tgrp->used_mask |= BIT(values[idx]);
> +                       set_bit(values[idx], tgrp->used_mask);
>         }
>
>         kfree(values);
> @@ -331,7 +331,9 @@ static int cti_plat_process_filter_sigs(struct cti_drvdata *drvdata,
>
>         err = cti_plat_read_trig_group(tg, fwnode, CTI_DT_FILTER_OUT_SIGS);
>         if (!err)
> -               drvdata->config.trig_out_filter |= tg->used_mask;
> +               bitmap_or(drvdata->config.trig_out_filter,
> +                         drvdata->config.trig_out_filter,
> +                         tg->used_mask, drvdata->config.nr_trig_max);
>
>         kfree(tg);
>         return err;
> diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> index 572b80ee96fb..a9df77215141 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> @@ -711,10 +711,8 @@ static ssize_t trigout_filtered_show(struct device *dev,
>         struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
>         struct cti_config *cfg = &drvdata->config;
>         int size = 0, nr_trig_max = cfg->nr_trig_max;
> -       unsigned long mask = cfg->trig_out_filter;
>
> -       if (mask)
> -               size = bitmap_print_to_pagebuf(true, buf, &mask, nr_trig_max);
> +       size = bitmap_print_to_pagebuf(true, buf, cfg->trig_out_filter, nr_trig_max);
>         return size;
>  }
>  static DEVICE_ATTR_RO(trigout_filtered);
> @@ -926,9 +924,8 @@ static ssize_t trigin_sig_show(struct device *dev,
>         struct cti_trig_con *con = (struct cti_trig_con *)ext_attr->var;
>         struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
>         struct cti_config *cfg = &drvdata->config;
> -       unsigned long mask = con->con_in->used_mask;
>
> -       return bitmap_print_to_pagebuf(true, buf, &mask, cfg->nr_trig_max);
> +       return bitmap_print_to_pagebuf(true, buf, con->con_in->used_mask, cfg->nr_trig_max);
>  }
>
>  static ssize_t trigout_sig_show(struct device *dev,
> @@ -940,9 +937,8 @@ static ssize_t trigout_sig_show(struct device *dev,
>         struct cti_trig_con *con = (struct cti_trig_con *)ext_attr->var;
>         struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
>         struct cti_config *cfg = &drvdata->config;
> -       unsigned long mask = con->con_out->used_mask;
>
> -       return bitmap_print_to_pagebuf(true, buf, &mask, cfg->nr_trig_max);
> +       return bitmap_print_to_pagebuf(true, buf, con->con_out->used_mask, cfg->nr_trig_max);
>  }
>
>  /* convert a sig type id to a name */
> diff --git a/drivers/hwtracing/coresight/coresight-cti.h b/drivers/hwtracing/coresight/coresight-cti.h
> index 4f89091ee93f..e7b88b07cffe 100644
> --- a/drivers/hwtracing/coresight/coresight-cti.h
> +++ b/drivers/hwtracing/coresight/coresight-cti.h
> @@ -68,7 +68,7 @@ struct fwnode_handle;
>   */
>  struct cti_trig_grp {
>         int nr_sigs;
> -       u32 used_mask;
> +       unsigned long *used_mask;
>         int sig_types[];
>  };
>
> @@ -146,20 +146,21 @@ struct cti_config {
>         bool hw_enabled;
>         bool hw_powered;
>
> -       /* registered triggers and filtering */
> -       u32 trig_in_use;
> -       u32 trig_out_use;
> -       u32 trig_out_filter;
>         bool trig_filter_enable;
>         u8 xtrig_rchan_sel;
>
>         /* cti cross trig programmable regs */
> -       u32 ctiappset;
>         u8 ctiinout_sel;
> -       u32 ctiinen[CTIINOUTEN_MAX];
> -       u32 ctiouten[CTIINOUTEN_MAX];
> +       u32 ctiappset;
>         u32 ctigate;
>         u32 asicctl;
> +       u32 *ctiinen;
> +       u32 *ctiouten;
> +
> +       /* registered triggers and filtering */
> +       unsigned long *trig_in_use;
> +       unsigned long *trig_out_use;
> +       unsigned long *trig_out_filter;
>  };
>
>  /**
>
> --
> 2.43.0
>


-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-04  9:04       ` Mike Leach
@ 2025-12-04 10:02         ` Leo Yan
  0 siblings, 0 replies; 26+ messages in thread
From: Leo Yan @ 2025-12-04 10:02 UTC (permalink / raw)
  To: Mike Leach
  Cc: Yingchao Deng, James Clark, Alexander Shishkin, Tingwei Zhang,
	quic_yingdeng, coresight, linux-arm-kernel, linux-kernel,
	linux-arm-msm, Jinlong Mao, Mao Jinlong

On Thu, Dec 04, 2025 at 09:04:03AM +0000, Mike Leach wrote:
> Hi,
> 
> On Thu, 4 Dec 2025 at 08:38, Leo Yan <leo.yan@arm.com> wrote:
> >
> > On Wed, Dec 03, 2025 at 06:29:44PM +0000, Coresight ML wrote:
> >
> > [...]
> >
> > > > +/* Read registers with power check only (no enable check). */
> > > > +static ssize_t coresight_cti_reg_show(struct device *dev,
> > > > +                                 struct device_attribute *attr, char *buf)
> > > > +{
> > > > +   struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> > > > +   struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
> > > > +   u32 idx, val = 0;
> > > > +
> > > > +   pm_runtime_get_sync(dev->parent);
> > > > +   raw_spin_lock(&drvdata->spinlock);
> > > > +   idx = drvdata->config.ext_reg_sel;
> > > > +   if (drvdata->config.hw_powered) {
> > > > +           switch (cti_attr->off) {
> > > > +           case INDEX_CTITRIGINSTATUS:
> > > > +           case INDEX_CTITRIGOUTSTATUS:
> > > > +           case INDEX_ITTRIGINACK:
> > > > +           case INDEX_ITTRIGOUT:
> > > > +           case INDEX_ITTRIGOUTACK:
> > > > +           case INDEX_ITTRIGIN:
> >
> > I read again and now I understand why you need "config.ext_reg_sel"
> > as an index for these expending registers.
> >
> 
> Having this index for these extended registers matches what we do for
> the INEN and OUTEN registers. This gives the user a consistent
> approach. We do not want the unnecessary attributes as  it will
> increase the memory footprint for all cti instances, not just the qcom
> ones.

I agree with using index for CTI triggers, but it is not necessary to
add a new index for other registers (status, mode setting, ACK, etc).

It would be directive to present the status and mode setting
registers, given these registers are only from 0-3.  This will be easy
accessed from userspace, and avoid complexity in the driver.

> The first patch in this series works to reduce the memory footprint by
> only allocating resource based on the actual configuration. For
> example for an ARM designed CTI with 8 trigger registers, we no longer
> declare static 128 x 32 bit arrays for each of INEN and OUTEN which
> were required by the original design.
> 
> Given that there can be 10s or 100s of CTIs in a large multicore
> system, reducing the footprint to match the actual configuration, and
> offering a level of compression by using an index + single file to
> access a set of registers improves the efficiency of the driver.

It is good for reducing footprint, but I would give priority for a
neat implementation and easy use interfaces.

And the sysfs attr code and global structures (e.g. register conversion
struct) can shared by all instances, so I don't worry much the scale
issue if we extend them.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-04  9:07     ` Mike Leach
@ 2025-12-04 10:31       ` Leo Yan
  2025-12-04 16:17         ` Mike Leach
  0 siblings, 1 reply; 26+ messages in thread
From: Leo Yan @ 2025-12-04 10:31 UTC (permalink / raw)
  To: Mike Leach
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

On Thu, Dec 04, 2025 at 09:07:56AM +0000, Mike Leach wrote:

[...]

> > I saw CTI registers are within 4KiB (0x1000), we can don't convert
> > standard regiserts and only convert to QCOM register based on the
> > standard ones.  So you can drop the cti_normal_offset strucuture and
> > only have a cti_reg_qcom_offset[] struct:
> >
> >   static const u32 cti_extended_offset[] = {
> >         [CTIINTACK]             = QCOM_CTIINTACK,
> >         [CTIAPPSET]             = QCOM_CTIAPPSET,
> >         [CTIAPPCLEAR]           = QCOM_CTIAPPCLEAR,
> >         [CTIAPPPULSE]           = QCOM_CTIAPPPULSE,
> >         [CTIINEN]               = QCOM_CTIINEN,
> >         ...
> >   };
> >
> 
> I suggested the dual offset approach a couple of patchset revisions
> ago as it actually simplifies the code & makes it more efficient. The
> offset array in use is set during probe and the remaining code is then
> common to both without lots of "if qcom else " occurences.

AFAICS, we will handle the QCOM CTI particularly in three cases:

  1) The register access;
  2) The claim tag;
  3) Sysfs attr is visible.

Now we are discussing the reigster access.  As suggested, the
"if qcom / else" is encapsulated (e.g., in cti_reg_addr_with_nr()), it
will not spread out.

I'd use standard registers by default and convert to non-standard ones
only when needed.  A new "neutral" index layer seems redundant, as the
existing standard register indexes already serve this purpose.

For the sysfs attrs, it makes sense to use a central place to decide
which knobs are only visible for QCOM CTI, otherwise, we also will not
spread the condition check.

I will reply separately for claim tag issue.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-04  9:15     ` Mike Leach
@ 2025-12-04 10:47       ` Leo Yan
  2025-12-04 15:07         ` Mike Leach
  0 siblings, 1 reply; 26+ messages in thread
From: Leo Yan @ 2025-12-04 10:47 UTC (permalink / raw)
  To: Mike Leach
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

On Thu, Dec 04, 2025 at 09:15:07AM +0000, Mike Leach wrote:

[...]

> > > +             /*
> > > +              * QCOM CTI does not implement Claimtag functionality as
> > > +              * per CoreSight specification, but its CLAIMSET register
> > > +              * is incorrectly initialized to 0xF. This can mislead
> > > +              * tools or drivers into thinking the component is claimed.
> > > +              *
> > > +              * Reset CLAIMSET to 0 to reflect that no claims are active.
> > > +              */
> > > +             writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
> >
> > I am confused for this.  If QCOM CTI does not implement claim tag,
> > then what is the designed register at the offset CORESIGHT_CLAIMSET?
> >
> > Should you bypass all claim tag related operations for QCOM CTI case?
> > (I don't see you touch anything for claim and declaim tags).
> >
> 
> The patch I have created to handle systems without correct claim tag
> operation is a dependency for this patch set. Thus no need for
> override here as the core code will handle this correctly.
> 
> The only issue is ensuring the non-CTI spec implementation will result
> in the correct detection of no claim tags present.

Your patch works only when a module has implemented claim registers.

This leads to two issues: we end up clearing an unknown register in the
CTI driver, and then the coresight core layer assumes it is reading a
claim register even though it is not.

For QCOM CTI, combined with your patch, I would suggest directly
setting csdev->access.claim_tag_impl to false (perhaps using a helper).
This would be much clearer than the "hacking" way.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-04 10:47       ` Leo Yan
@ 2025-12-04 15:07         ` Mike Leach
  2025-12-05 10:27           ` Leo Yan
  0 siblings, 1 reply; 26+ messages in thread
From: Mike Leach @ 2025-12-04 15:07 UTC (permalink / raw)
  To: Leo Yan
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

Hi Leo

On Thu, 4 Dec 2025 at 10:47, Leo Yan <leo.yan@arm.com> wrote:
>
> On Thu, Dec 04, 2025 at 09:15:07AM +0000, Mike Leach wrote:
>
> [...]
>
> > > > +             /*
> > > > +              * QCOM CTI does not implement Claimtag functionality as
> > > > +              * per CoreSight specification, but its CLAIMSET register
> > > > +              * is incorrectly initialized to 0xF. This can mislead
> > > > +              * tools or drivers into thinking the component is claimed.
> > > > +              *
> > > > +              * Reset CLAIMSET to 0 to reflect that no claims are active.
> > > > +              */
> > > > +             writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
> > >
> > > I am confused for this.  If QCOM CTI does not implement claim tag,
> > > then what is the designed register at the offset CORESIGHT_CLAIMSET?
> > >
> > > Should you bypass all claim tag related operations for QCOM CTI case?
> > > (I don't see you touch anything for claim and declaim tags).
> > >
> >
> > The patch I have created to handle systems without correct claim tag
> > operation is a dependency for this patch set. Thus no need for
> > override here as the core code will handle this correctly.
> >
> > The only issue is ensuring the non-CTI spec implementation will result
> > in the correct detection of no claim tags present.
>
> Your patch works only when a module has implemented claim registers.
>

Per the Coresight spec - unimplemented registers must be RAZ/WI- so
this still works for non implemented claim registers.

> This leads to two issues: we end up clearing an unknown register in the
> CTI driver, and then the coresight core layer assumes it is reading a
> claim register even though it is not.

Again RAZ will simply read 0x0 - which is an indication that there are
no claim bits implemented.

>
> For QCOM CTI, combined with your patch, I would suggest directly
> setting csdev->access.claim_tag_impl to false (perhaps using a helper).

That would be a better solution, though as Qcom appear to have
implemented a pair of standard RW registers rather than the claim tag
functionality, the write solution works for this particular
implementation.

Regards

Mike

> This would be much clearer than the "hacking" way.
>
> Thanks,
> Leo



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-04 10:31       ` Leo Yan
@ 2025-12-04 16:17         ` Mike Leach
  2025-12-05 10:04           ` Leo Yan
  0 siblings, 1 reply; 26+ messages in thread
From: Mike Leach @ 2025-12-04 16:17 UTC (permalink / raw)
  To: Leo Yan
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

Hi Leo

On Thu, 4 Dec 2025 at 10:31, Leo Yan <leo.yan@arm.com> wrote:
>
> On Thu, Dec 04, 2025 at 09:07:56AM +0000, Mike Leach wrote:
>
> [...]
>
> > > I saw CTI registers are within 4KiB (0x1000), we can don't convert
> > > standard regiserts and only convert to QCOM register based on the
> > > standard ones.  So you can drop the cti_normal_offset strucuture and
> > > only have a cti_reg_qcom_offset[] struct:
> > >
> > >   static const u32 cti_extended_offset[] = {
> > >         [CTIINTACK]             = QCOM_CTIINTACK,
> > >         [CTIAPPSET]             = QCOM_CTIAPPSET,
> > >         [CTIAPPCLEAR]           = QCOM_CTIAPPCLEAR,
> > >         [CTIAPPPULSE]           = QCOM_CTIAPPPULSE,
> > >         [CTIINEN]               = QCOM_CTIINEN,
> > >         ...
> > >   };
> > >
> >

The tables in the patch are

    [reg_type_array_index] = offset_address;

e.g.

  [INDEX_CTIINTACK]  = QCOM_CTIINTACK

which resolves to

 [1] = 0x020

where index is constant for a given register type,

As far as I can tell what you have suggested above is a table that is

  [std_addr_offset] = qcom_addr_offset;

e.g.

[CTIINTACK]             = QCOM_CTIINTACK,

which resolves to

[0x10]  = 0x020

which does not appear to work correctly?

The registers are sparsely spread across the memory map, so a simple
mapping does not work, even if we divide the original offset by 4 to
create a register number.

The largest standard offset we have is ITTRIGIN = 0xEF8, so assuming
the compiler allows us to sparselly populate the array (which I think
it does, along with some padding), we end up with an array of at least
0xEF8 elements, rather then the indexed 21?

Regards

Mike

> > I suggested the dual offset approach a couple of patchset revisions
> > ago as it actually simplifies the code & makes it more efficient. The
> > offset array in use is set during probe and the remaining code is then
> > common to both without lots of "if qcom else " occurences.
>
> AFAICS, we will handle the QCOM CTI particularly in three cases:
>
>   1) The register access;
>   2) The claim tag;
>   3) Sysfs attr is visible.
>
> Now we are discussing the reigster access.  As suggested, the
> "if qcom / else" is encapsulated (e.g., in cti_reg_addr_with_nr()), it
> will not spread out.
>
> I'd use standard registers by default and convert to non-standard ones
> only when needed.  A new "neutral" index layer seems redundant, as the
> existing standard register indexes already serve this purpose.
>
> For the sysfs attrs, it makes sense to use a central place to decide
> which knobs are only visible for QCOM CTI, otherwise, we also will not
> spread the condition check.
>
> I will reply separately for claim tag issue.
>
> Thanks,
> Leo


--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-04 16:17         ` Mike Leach
@ 2025-12-05 10:04           ` Leo Yan
  2025-12-08 14:47             ` Mike Leach
  0 siblings, 1 reply; 26+ messages in thread
From: Leo Yan @ 2025-12-05 10:04 UTC (permalink / raw)
  To: Mike Leach
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

Hi Mike,

On Thu, Dec 04, 2025 at 04:17:35PM +0000, Mike Leach wrote:

[...]

> The tables in the patch are
> 
>     [reg_type_array_index] = offset_address;
> 
> e.g.
> 
>   [INDEX_CTIINTACK]  = QCOM_CTIINTACK
> 
> which resolves to
> 
>  [1] = 0x020
> 
> where index is constant for a given register type,
> 
> As far as I can tell what you have suggested above is a table that is
> 
>   [std_addr_offset] = qcom_addr_offset;
> 
> e.g.
> 
> [CTIINTACK]             = QCOM_CTIINTACK,
> 
> which resolves to
> 
> [0x10]  = 0x020
> 
> which does not appear to work correctly?
> 
> The registers are sparsely spread across the memory map, so a simple
> mapping does not work, even if we divide the original offset by 4 to
> create a register number.

This should work.  Though the array is not filled for each item, but
it will return back 0x20 when we access array[0x10], I don't see
problem here.

> The largest standard offset we have is ITTRIGIN = 0xEF8, so assuming
> the compiler allows us to sparselly populate the array (which I think
> it does, along with some padding), we end up with an array of at least
> 0xEF8 elements, rather then the indexed 21?

I tested locally and did not see the GCC complaint for this approach.
And this is a global structure with about 16KiB (~4K items x
sizeof(u32)), we don't need to worry about scaling issue as it is
shared by device instances.

If you dislike this way, then a static function also can fulfill the
same task, something like:

    static noinline u32 cti_qcom_reg_off(u32 offset)
    {
            switch (offset) {
            CTIINTACK: return QCOM_CTIINTACK;
            CTIAPPSET: return QCOM_CTIAPPSET;
            ...
            default:
                WARN(1, "Unknown offset=%u\n", offset);
                return 0;
            }

            /* Should not run here, just for compiling */
	    return 0;
    }

Thanks,
Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-04 15:07         ` Mike Leach
@ 2025-12-05 10:27           ` Leo Yan
  2025-12-08 14:25             ` Mike Leach
  0 siblings, 1 reply; 26+ messages in thread
From: Leo Yan @ 2025-12-05 10:27 UTC (permalink / raw)
  To: Mike Leach
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

On Thu, Dec 04, 2025 at 03:07:10PM +0000, Mike Leach wrote:

[...]

> > > > > +             /*
> > > > > +              * QCOM CTI does not implement Claimtag functionality as
> > > > > +              * per CoreSight specification, but its CLAIMSET register
> > > > > +              * is incorrectly initialized to 0xF. This can mislead
> > > > > +              * tools or drivers into thinking the component is claimed.
> > > > > +              *
> > > > > +              * Reset CLAIMSET to 0 to reflect that no claims are active.
> > > > > +              */
> > > > > +             writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
> > > >
> > > > I am confused for this.  If QCOM CTI does not implement claim tag,
> > > > then what is the designed register at the offset CORESIGHT_CLAIMSET?
> > > >
> > > > Should you bypass all claim tag related operations for QCOM CTI case?
> > > > (I don't see you touch anything for claim and declaim tags).
> > > >
> > >
> > > The patch I have created to handle systems without correct claim tag
> > > operation is a dependency for this patch set. Thus no need for
> > > override here as the core code will handle this correctly.
> > >
> > > The only issue is ensuring the non-CTI spec implementation will result
> > > in the correct detection of no claim tags present.
> >
> > Your patch works only when a module has implemented claim registers.
> >
> 
> Per the Coresight spec - unimplemented registers must be RAZ/WI- so
> this still works for non implemented claim registers.

QCOM CTI does not follow the spec in two aspects:

- Given the patch's comment: "QCOM CTI does not implement Claim tag
  functionality as per CoreSight specification", I am suspect the CLAIM
  registers are not implemented at all in QCOM CTI.

- It neither follows up the "unimplemented registers must be RAZ/WI" -
  the patch says its reset value is 0xF and then even can write 0 to it.

> > This leads to two issues: we end up clearing an unknown register in the
> > CTI driver, and then the coresight core layer assumes it is reading a
> > claim register even though it is not.
> 
> Again RAZ will simply read 0x0 - which is an indication that there are
> no claim bits implemented.
> 
> >
> > For QCOM CTI, combined with your patch, I would suggest directly
> > setting csdev->access.claim_tag_impl to false (perhaps using a helper).
> 
> That would be a better solution, though as Qcom appear to have
> implemented a pair of standard RW registers rather than the claim tag
> functionality, the write solution works for this particular
> implementation.

If an IP violates both the rules for implemented claim registers and
the rules for non-implemented claim registers, how can we rely on
these registers to detect the claim feature?

My feeling is we are building a house on sand - these registers are not
used for claim tags purpose at all.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-05 10:27           ` Leo Yan
@ 2025-12-08 14:25             ` Mike Leach
  0 siblings, 0 replies; 26+ messages in thread
From: Mike Leach @ 2025-12-08 14:25 UTC (permalink / raw)
  To: Leo Yan
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

Hi Leo,

On Fri, 5 Dec 2025 at 10:27, Leo Yan <leo.yan@arm.com> wrote:
>
> On Thu, Dec 04, 2025 at 03:07:10PM +0000, Mike Leach wrote:
>
> [...]
>
> > > > > > +             /*
> > > > > > +              * QCOM CTI does not implement Claimtag functionality as
> > > > > > +              * per CoreSight specification, but its CLAIMSET register
> > > > > > +              * is incorrectly initialized to 0xF. This can mislead
> > > > > > +              * tools or drivers into thinking the component is claimed.
> > > > > > +              *
> > > > > > +              * Reset CLAIMSET to 0 to reflect that no claims are active.
> > > > > > +              */
> > > > > > +             writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
> > > > >
> > > > > I am confused for this.  If QCOM CTI does not implement claim tag,
> > > > > then what is the designed register at the offset CORESIGHT_CLAIMSET?
> > > > >
> > > > > Should you bypass all claim tag related operations for QCOM CTI case?
> > > > > (I don't see you touch anything for claim and declaim tags).
> > > > >
> > > >
> > > > The patch I have created to handle systems without correct claim tag
> > > > operation is a dependency for this patch set. Thus no need for
> > > > override here as the core code will handle this correctly.
> > > >
> > > > The only issue is ensuring the non-CTI spec implementation will result
> > > > in the correct detection of no claim tags present.
> > >
> > > Your patch works only when a module has implemented claim registers.
> > >
> >
> > Per the Coresight spec - unimplemented registers must be RAZ/WI- so
> > this still works for non implemented claim registers.
>
> QCOM CTI does not follow the spec in two aspects:
>
> - Given the patch's comment: "QCOM CTI does not implement Claim tag
>   functionality as per CoreSight specification", I am suspect the CLAIM
>   registers are not implemented at all in QCOM CTI.
>
> - It neither follows up the "unimplemented registers must be RAZ/WI" -
>   the patch says its reset value is 0xF and then even can write 0 to it.
>

Correct - but my point is that the patch to handle lack of claim tags
does work for both existent claim tags and unimplemented ones, where
the spec is followed.

> > > This leads to two issues: we end up clearing an unknown register in the
> > > CTI driver, and then the coresight core layer assumes it is reading a
> > > claim register even though it is not.
> >
> > Again RAZ will simply read 0x0 - which is an indication that there are
> > no claim bits implemented.
> >
> > >
> > > For QCOM CTI, combined with your patch, I would suggest directly
> > > setting csdev->access.claim_tag_impl to false (perhaps using a helper).
> >
> > That would be a better solution, though as Qcom appear to have
> > implemented a pair of standard RW registers rather than the claim tag
> > functionality, the write solution works for this particular
> > implementation.
>
> If an IP violates both the rules for implemented claim registers and
> the rules for non-implemented claim registers, how can we rely on
> these registers to detect the claim feature?
>
> My feeling is we are building a house on sand - these registers are not
> used for claim tags purpose at all.
>

No they are not, but by either writing the claim tag register with 0,
or by setting the csdev->access.claim_tag_impl to false, this
particular incorrect implementation instance can be made to work with
the existing claim tag code.

There are effectively two issues here:-

1) The existing core code does not correctly handle insufficient claim
tags or none existent claim tag register - per the coresight spec.
This needs fixing for all potential components - not just this one. My
claim tag patch does that.

2) This particular QCom CTI does not implement claim tags correctly.
This particular instance can be made to operate correctly as detailed
above as long as the core code can handle none-existent claim tags.
The alternative is to bracket every claim tag access call with an "if
!qcom_cti" statement. (which is what was happening in earlier
patches).

We need to fix the core code anyway, and the specific QCom CTI can
then be fixed in a single location without changes elsewhere in the
code. This is the cleanest and least disruptive solution.

Regards

Mike


> Thanks,
> Leo



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-05 10:04           ` Leo Yan
@ 2025-12-08 14:47             ` Mike Leach
  2025-12-09  8:16               ` Yingchao Deng
  2025-12-09 13:59               ` Leo Yan
  0 siblings, 2 replies; 26+ messages in thread
From: Mike Leach @ 2025-12-08 14:47 UTC (permalink / raw)
  To: Leo Yan
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

Hi Leo

On Fri, 5 Dec 2025 at 10:04, Leo Yan <leo.yan@arm.com> wrote:
>
> Hi Mike,
>
> On Thu, Dec 04, 2025 at 04:17:35PM +0000, Mike Leach wrote:
>
> [...]
>
> > The tables in the patch are
> >
> >     [reg_type_array_index] = offset_address;
> >
> > e.g.
> >
> >   [INDEX_CTIINTACK]  = QCOM_CTIINTACK
> >
> > which resolves to
> >
> >  [1] = 0x020
> >
> > where index is constant for a given register type,
> >
> > As far as I can tell what you have suggested above is a table that is
> >
> >   [std_addr_offset] = qcom_addr_offset;
> >
> > e.g.
> >
> > [CTIINTACK]             = QCOM_CTIINTACK,
> >
> > which resolves to
> >
> > [0x10]  = 0x020
> >
> > which does not appear to work correctly?

Sorry - what I mean here is the contiguous array that appears to be in
the source, does not match the reality when compiled into memory - not
that it doesn't work programmatically.

> >
> > The registers are sparsely spread across the memory map, so a simple
> > mapping does not work, even if we divide the original offset by 4 to
> > create a register number.
>
> This should work.  Though the array is not filled for each item, but
> it will return back 0x20 when we access array[0x10], I don't see
> problem here.
>
> > The largest standard offset we have is ITTRIGIN = 0xEF8, so assuming
> > the compiler allows us to sparselly populate the array (which I think
> > it does, along with some padding), we end up with an array of at least
> > 0xEF8 elements, rather then the indexed 21?
>
> I tested locally and did not see the GCC complaint for this approach.
> And this is a global structure with about 16KiB (~4K items x

Which is precisely the issue - why use 16k bytes of space when a pair
of indexed tables will use 21 x 32bit locations per table -> 168 bytes
- 100x smaller!

This space matters little to high end server systems but is much more
important in smaller embedded systems.

Moreover the table + inline helper is more efficient at extracting the
correct offset value. The helper is a simple de-reference - whereas
the helper functions you suggest require the code to make the
comparison at every register access.
The "if qcom ..." may be contained in one place in the source code,
but is called and executed for every access.

Why add inefficiencies, either in footprint or execution?

Regards

Mike


> sizeof(u32)), we don't need to worry about scaling issue as it is
> shared by device instances.
>
> If you dislike this way, then a static function also can fulfill the
> same task, something like:
>
>     static noinline u32 cti_qcom_reg_off(u32 offset)
>     {
>             switch (offset) {
>             CTIINTACK: return QCOM_CTIINTACK;
>             CTIAPPSET: return QCOM_CTIAPPSET;
>             ...
>             default:
>                 WARN(1, "Unknown offset=%u\n", offset);
>                 return 0;
>             }
>
>             /* Should not run here, just for compiling */
>             return 0;
>     }
>
> Thanks,
> Leo



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-08 14:47             ` Mike Leach
@ 2025-12-09  8:16               ` Yingchao Deng
  2025-12-09  9:40                 ` Jie Gan
                                   ` (2 more replies)
  2025-12-09 13:59               ` Leo Yan
  1 sibling, 3 replies; 26+ messages in thread
From: Yingchao Deng @ 2025-12-09  8:16 UTC (permalink / raw)
  To: mike.leach
  Cc: alexander.shishkin, coresight, james.clark, jinlong.mao, leo.yan,
	linux-arm-kernel, linux-arm-msm, linux-kernel, quic_jinlmao,
	quic_yingdeng, suzuki.poulose, tingwei.zhang, yingchao.deng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 20471 bytes --]

Hi Leo & Mike

Based on Leo’s suggestions, I created a new patch, but there are three points that do not fully align with his recommendations:

    1. The helper function for returning the register address now returns only the offset, because returning the full address would conflict with cti_write_single_reg.
    2. For registers such as triginstatus1...3, I defined additional macros CTITRIGINSTATUS1...3. This is because CTITRIGINSTATUS + 0x4 equals CTITRIGOUTSTATUS, and to avoid conflicts with existing macros, I chose numbers starting from 0x1000 for the new definitions.
    3. Regarding the visibility of attributes for triginstatus1...3, since coresight_cti_reg produces an anonymous variable that cannot be directly referenced, I used coresight_cti_regs_attrs[i] to obtain the attribute corresponding to triginstatus1.

I appreciate both suggestions. After reviewing them, I lean toward Mike's approach.

Signed-off-by: Yingchao Deng <yingchao.deng@oss.qualcomm.com>
---
 .../hwtracing/coresight/coresight-cti-core.c  |  52 +++++--
 .../hwtracing/coresight/coresight-cti-sysfs.c |  72 ++++++++--
 drivers/hwtracing/coresight/coresight-cti.h   |   3 +-
 drivers/hwtracing/coresight/qcom-cti.h        | 136 ++++++++++++++++++
 4 files changed, 238 insertions(+), 25 deletions(-)
 create mode 100644 drivers/hwtracing/coresight/qcom-cti.h

diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
index f9970e40dd59..d2b0b46f2846 100644
--- a/drivers/hwtracing/coresight/coresight-cti-core.c
+++ b/drivers/hwtracing/coresight/coresight-cti-core.c
@@ -21,7 +21,7 @@

 #include "coresight-priv.h"
 #include "coresight-cti.h"
-
+#include "qcom-cti.h"
 /*
  * CTI devices can be associated with a PE, or be connected to CoreSight
  * hardware. We have a list of all CTIs irrespective of CPU bound or
@@ -70,15 +70,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)

     /* write the CTI trigger registers */
     for (i = 0; i < config->nr_trig_max; i++) {
-        writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
+        writel_relaxed(config->ctiinen[i],
+                drvdata->base + cti_reg_addr_with_nr(drvdata, CTIINEN(0), i));
         writel_relaxed(config->ctiouten[i],
-                   drvdata->base + CTIOUTEN(i));
+                drvdata->base + cti_reg_addr_with_nr(drvdata, CTIOUTEN(0), i));
     }

     /* other regs */
-    writel_relaxed(config->ctigate, drvdata->base + CTIGATE);
-    writel_relaxed(config->asicctl, drvdata->base + ASICCTL);
-    writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
+    writel_relaxed(config->ctigate, drvdata->base + cti_reg_addr(drvdata, CTIGATE));
+    writel_relaxed(config->asicctl, drvdata->base + cti_reg_addr(drvdata, ASICCTL));
+    writel_relaxed(config->ctiappset, drvdata->base + cti_reg_addr(drvdata, CTIAPPSET));

     /* re-enable CTI */
     writel_relaxed(1, drvdata->base + CTICONTROL);
@@ -201,7 +202,7 @@ void cti_write_intack(struct device *dev, u32 ackval)
     raw_spin_lock(&drvdata->spinlock);
     /* write if enabled */
     if (cti_active(config))
-        cti_write_single_reg(drvdata, CTIINTACK, ackval);
+        cti_write_single_reg(drvdata, cti_reg_addr(drvdata, CTIINTACK), ackval);
     raw_spin_unlock(&drvdata->spinlock);
 }

@@ -214,6 +215,9 @@ void cti_write_intack(struct device *dev, u32 ackval)
 /* DEVID[19:16] - number of CTM channels */
 #define CTI_DEVID_CTMCHANNELS(devid_val) ((int) BMVAL(devid_val, 16, 19))

+/* DEVARCH[31:21] - ARCHITECT */
+#define CTI_DEVARCH_ARCHITECT(devarch_val) ((int)BMVAL(devarch_val, 21, 31))
+
 static int cti_set_default_config(struct device *dev,
                   struct cti_drvdata *drvdata)
 {
@@ -394,9 +398,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,

     /* update the local register values */
     chan_bitmask = BIT(channel_idx);
-    reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :
-              CTIOUTEN(trigger_idx));
-
+    reg_offset = (direction == CTI_TRIG_IN ? cti_reg_addr_with_nr(drvdata, CTIINEN(0), trigger_idx):
+                         cti_reg_addr_with_nr(drvdata, CTIOUTEN(0), trigger_idx));
     raw_spin_lock(&drvdata->spinlock);

     /* read - modify write - the trigger / channel enable value */
@@ -452,7 +455,7 @@ int cti_channel_gate_op(struct device *dev, enum cti_chan_gate_op op,
     if (err == 0) {
         config->ctigate = reg_value;
         if (cti_active(config))
-            cti_write_single_reg(drvdata, CTIGATE, reg_value);
+            cti_write_single_reg(drvdata, cti_reg_addr(drvdata, CTIGATE), reg_value);
     }
     raw_spin_unlock(&drvdata->spinlock);
     return err;
@@ -479,19 +482,19 @@ int cti_channel_setop(struct device *dev, enum cti_chan_set_op op,
     case CTI_CHAN_SET:
         config->ctiappset |= chan_bitmask;
         reg_value  = config->ctiappset;
-        reg_offset = CTIAPPSET;
+        reg_offset = cti_reg_addr(drvdata, CTIAPPSET);
         break;

     case CTI_CHAN_CLR:
         config->ctiappset &= ~chan_bitmask;
         reg_value = chan_bitmask;
-        reg_offset = CTIAPPCLEAR;
+        reg_offset = cti_reg_addr(drvdata, CTIAPPCLEAR);
         break;

     case CTI_CHAN_PULSE:
         config->ctiappset &= ~chan_bitmask;
         reg_value = chan_bitmask;
-        reg_offset = CTIAPPPULSE;
+        reg_offset = cti_reg_addr(drvdata, CTIAPPPULSE);
         break;

     default:
@@ -895,6 +898,7 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
     struct coresight_desc cti_desc;
     struct coresight_platform_data *pdata = NULL;
     struct resource *res = &adev->res;
+    u32 devarch;

     /* driver data*/
     drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
@@ -981,9 +985,27 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
     drvdata->csdev_release = drvdata->csdev->dev.release;
     drvdata->csdev->dev.release = cti_device_release;

+    /* check architect value*/
+    devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);
+    if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {
+        drvdata->is_qcom_cti = 1;
+
+        /*
+         * QCOM CTI does not implement Claimtag functionality as
+         * per CoreSight specification, but its CLAIMSET register
+         * is incorrectly initialized to 0xF. This can mislead
+         * tools or drivers into thinking the component is claimed.
+         *
+         * Reset CLAIMSET to 0 to reflect that no claims are active.
+         */
+        drvdata->csdev->claim_tag_info = CS_CLAIM_TAG_NOT_IMPL;
+        //writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
+    }
+
     /* all done - dec pm refcount */
     pm_runtime_put(&adev->dev);
-    dev_info(&drvdata->csdev->dev, "CTI initialized\n");
+    dev_info(&drvdata->csdev->dev, "%s CTI initialized\n",
+                    drvdata->is_qcom_cti ? "QCOM" : "");
     return 0;

 pm_release:
diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
index a9df77215141..5d23a138b4a7 100644
--- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c
+++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
@@ -13,7 +13,7 @@
 #include <linux/sysfs.h>

 #include "coresight-cti.h"
-
+#include "qcom-cti.h"
 /*
  * Declare the number of static declared attribute groups
  * Value includes groups + NULL value at end of table.
@@ -183,7 +183,7 @@ static ssize_t coresight_cti_reg_show(struct device *dev,
     pm_runtime_get_sync(dev->parent);
     raw_spin_lock(&drvdata->spinlock);
     if (drvdata->config.hw_powered)
-        val = readl_relaxed(drvdata->base + cti_attr->off);
+        val = readl_relaxed(drvdata->base + cti_reg_addr(drvdata, cti_attr->off));
     raw_spin_unlock(&drvdata->spinlock);
     pm_runtime_put_sync(dev->parent);
     return sysfs_emit(buf, "0x%x\n", val);
@@ -204,7 +204,7 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
     pm_runtime_get_sync(dev->parent);
     raw_spin_lock(&drvdata->spinlock);
     if (drvdata->config.hw_powered)
-        cti_write_single_reg(drvdata, cti_attr->off, val);
+        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, cti_attr->off), val);
     raw_spin_unlock(&drvdata->spinlock);
     pm_runtime_put_sync(dev->parent);
     return size;
@@ -267,7 +267,7 @@ static ssize_t cti_reg32_show(struct device *dev, char *buf,
     raw_spin_lock(&drvdata->spinlock);
     if ((reg_offset >= 0) && cti_active(config)) {
         CS_UNLOCK(drvdata->base);
-        val = readl_relaxed(drvdata->base + reg_offset);
+        val = readl_relaxed(drvdata->base + cti_reg_addr(drvdata, reg_offset));
         if (pcached_val)
             *pcached_val = val;
         CS_LOCK(drvdata->base);
@@ -300,7 +300,7 @@ static ssize_t cti_reg32_store(struct device *dev, const char *buf,

     /* write through if offset and enabled */
     if ((reg_offset >= 0) && cti_active(config))
-        cti_write_single_reg(drvdata, reg_offset, val);
+        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, reg_offset), val);
     raw_spin_unlock(&drvdata->spinlock);
     return size;
 }
@@ -389,7 +389,7 @@ static ssize_t inen_store(struct device *dev,

     /* write through if enabled */
     if (cti_active(config))
-        cti_write_single_reg(drvdata, CTIINEN(index), val);
+        cti_write_single_reg(drvdata, cti_reg_addr_with_nr(drvdata, CTIINEN(0), index), val);
     raw_spin_unlock(&drvdata->spinlock);
     return size;
 }
@@ -428,7 +428,7 @@ static ssize_t outen_store(struct device *dev,

     /* write through if enabled */
     if (cti_active(config))
-        cti_write_single_reg(drvdata, CTIOUTEN(index), val);
+        cti_write_single_reg(drvdata, cti_reg_addr_with_nr(drvdata, CTIOUTEN(0), index), val);
     raw_spin_unlock(&drvdata->spinlock);
     return size;
 }
@@ -470,7 +470,7 @@ static ssize_t appclear_store(struct device *dev,

     /* write through if enabled */
     if (cti_active(config))
-        cti_write_single_reg(drvdata, CTIAPPCLEAR, val);
+        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, CTIAPPCLEAR), val);
     raw_spin_unlock(&drvdata->spinlock);
     return size;
 }
@@ -491,7 +491,7 @@ static ssize_t apppulse_store(struct device *dev,

     /* write through if enabled */
     if (cti_active(config))
-        cti_write_single_reg(drvdata, CTIAPPPULSE, val);
+        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, CTIAPPPULSE), val);
     raw_spin_unlock(&drvdata->spinlock);
     return size;
 }
@@ -513,18 +513,36 @@ static struct attribute *coresight_cti_regs_attrs[] = {
     &dev_attr_appclear.attr,
     &dev_attr_apppulse.attr,
     coresight_cti_reg(triginstatus, CTITRIGINSTATUS),
+    coresight_cti_reg(triginstatus1, CTITRIGINSTATUS1),
+    coresight_cti_reg(triginstatus2, CTITRIGINSTATUS2),
+    coresight_cti_reg(triginstatus3, CTITRIGINSTATUS3),
     coresight_cti_reg(trigoutstatus, CTITRIGOUTSTATUS),
+    coresight_cti_reg(trigoutstatus1, CTITRIGOUTSTATUS1),
+    coresight_cti_reg(trigoutstatus2, CTITRIGOUTSTATUS2),
+    coresight_cti_reg(trigoutstatus3, CTITRIGOUTSTATUS3),
     coresight_cti_reg(chinstatus, CTICHINSTATUS),
     coresight_cti_reg(choutstatus, CTICHOUTSTATUS),
 #ifdef CONFIG_CORESIGHT_CTI_INTEGRATION_REGS
     coresight_cti_reg_rw(itctrl, CORESIGHT_ITCTRL),
     coresight_cti_reg(ittrigin, ITTRIGIN),
+    coresight_cti_reg(ittrigin1, ITTRIGIN1),
+    coresight_cti_reg(ittrigin2, ITTRIGIN2),
+    coresight_cti_reg(ittrigin3, ITTRIGIN3),
     coresight_cti_reg(itchin, ITCHIN),
     coresight_cti_reg_rw(ittrigout, ITTRIGOUT),
+    coresight_cti_reg_rw(ittrigout1, ITTRIGOUT1),
+    coresight_cti_reg_rw(ittrigout2, ITTRIGOUT2),
+    coresight_cti_reg_rw(ittrigout3, ITTRIGOUT3),
     coresight_cti_reg_rw(itchout, ITCHOUT),
     coresight_cti_reg(itchoutack, ITCHOUTACK),
     coresight_cti_reg(ittrigoutack, ITTRIGOUTACK),
+    coresight_cti_reg(ittrigoutack1, ITTRIGOUTACK1),
+    coresight_cti_reg(ittrigoutack2, ITTRIGOUTACK2),
+    coresight_cti_reg(ittrigoutack3, ITTRIGOUTACK3),
     coresight_cti_reg_wo(ittriginack, ITTRIGINACK),
+    coresight_cti_reg_wo(ittriginack1, ITTRIGINACK1),
+    coresight_cti_reg_wo(ittriginack2, ITTRIGINACK2),
+    coresight_cti_reg_wo(ittriginack3, ITTRIGINACK3),
     coresight_cti_reg_wo(itchinack, ITCHINACK),
 #endif
     NULL,
@@ -1153,6 +1171,41 @@ int cti_create_cons_sysfs(struct device *dev, struct cti_drvdata *drvdata)
     return err;
 }

+  static umode_t coresight_cti_regs_is_visible(struct kobject *kobj,
+                  struct attribute *attr, int n)
+  {
+    struct device *dev = container_of(kobj, struct device, kobj);
+    struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+    /* Mute QCOM CTI registers for standard CTI module */
+    if (!drvdata->is_qcom_cti) {
+        if (attr == coresight_cti_regs_attrs[10]
+         || attr == coresight_cti_regs_attrs[11]
+         || attr == coresight_cti_regs_attrs[12]
+         || attr == coresight_cti_regs_attrs[14]
+         || attr == coresight_cti_regs_attrs[15]
+         || attr == coresight_cti_regs_attrs[16]
+#ifdef CONFIG_CORESIGHT_CTI_INTEGRATION_REGS
+         || attr == coresight_cti_regs_attrs[21]
+         || attr == coresight_cti_regs_attrs[22]
+         || attr == coresight_cti_regs_attrs[23]
+         || attr == coresight_cti_regs_attrs[26]
+         || attr == coresight_cti_regs_attrs[27]
+         || attr == coresight_cti_regs_attrs[28]
+         || attr == coresight_cti_regs_attrs[32]
+         || attr == coresight_cti_regs_attrs[33]
+         || attr == coresight_cti_regs_attrs[34]
+         || attr == coresight_cti_regs_attrs[36]
+         || attr == coresight_cti_regs_attrs[37]
+         || attr == coresight_cti_regs_attrs[38]
+#endif
+         )
+                  return 0;
+          }
+
+          return attr->mode;
+  }
+
 /* attribute and group sysfs tables. */
 static const struct attribute_group coresight_cti_group = {
     .attrs = coresight_cti_attrs,
@@ -1166,6 +1219,7 @@ static const struct attribute_group coresight_cti_mgmt_group = {
 static const struct attribute_group coresight_cti_regs_group = {
     .attrs = coresight_cti_regs_attrs,
     .name = "regs",
+    .is_visible = coresight_cti_regs_is_visible,
 };

 static const struct attribute_group coresight_cti_channels_group = {
diff --git a/drivers/hwtracing/coresight/coresight-cti.h b/drivers/hwtracing/coresight/coresight-cti.h
index e7b88b07cffe..413d5ef483e8 100644
--- a/drivers/hwtracing/coresight/coresight-cti.h
+++ b/drivers/hwtracing/coresight/coresight-cti.h
@@ -57,7 +57,7 @@ struct fwnode_handle;
  * Max of in and out defined in the DEVID register.
  * - pick up actual number used from .dts parameters if present.
  */
-#define CTIINOUTEN_MAX        32
+#define CTIINOUTEN_MAX        128

 /**
  * Group of related trigger signals
@@ -181,6 +181,7 @@ struct cti_drvdata {
     struct cti_config config;
     struct list_head node;
     void (*csdev_release)(struct device *dev);
+    bool is_qcom_cti;
 };

 /*
diff --git a/drivers/hwtracing/coresight/qcom-cti.h b/drivers/hwtracing/coresight/qcom-cti.h
new file mode 100644
index 000000000000..aa41f9425b36
--- /dev/null
+++ b/drivers/hwtracing/coresight/qcom-cti.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ */
+#include "coresight-cti.h"
+
+#define ARCHITECT_QCOM 0x477
+
+#define CTITRIGINSTATUS1 0x1000
+#define CTITRIGINSTATUS2 0x1001
+#define CTITRIGINSTATUS3 0x1002
+
+#define CTITRIGOUTSTATUS1 0x1003
+#define CTITRIGOUTSTATUS2 0x1004
+#define CTITRIGOUTSTATUS3 0x1005
+
+#define ITTRIGIN1 0x1006
+#define ITTRIGIN2 0x1007
+#define ITTRIGIN3 0x1008
+
+#define ITTRIGOUT1 0x1009
+#define ITTRIGOUT2 0x100A
+#define ITTRIGOUT3 0x100B
+
+#define ITTRIGINACK1 0x100C
+#define ITTRIGINACK2 0x100D
+#define ITTRIGINACK3 0x100E
+
+#define ITTRIGOUTACK1 0x100F
+#define ITTRIGOUTACK2 0x1010
+#define ITTRIGOUTACK3 0x1011
+/* CTI programming registers */
+#define    QCOM_CTIINTACK        0x020
+#define    QCOM_CTIAPPSET        0x004
+#define    QCOM_CTIAPPCLEAR    0x008
+#define    QCOM_CTIAPPPULSE    0x00C
+#define    QCOM_CTIINEN        0x400
+#define    QCOM_CTIOUTEN        0x800
+#define    QCOM_CTITRIGINSTATUS    0x040
+#define    QCOM_CTITRIGINSTATUS1    0x044
+#define    QCOM_CTITRIGINSTATUS2    0x048
+#define    QCOM_CTITRIGINSTATUS3    0x04C
+#define    QCOM_CTITRIGOUTSTATUS    0x060
+#define    QCOM_CTITRIGOUTSTATUS1    0x064
+#define    QCOM_CTITRIGOUTSTATUS2    0x068
+#define    QCOM_CTITRIGOUTSTATUS3    0x06C
+#define    QCOM_CTICHINSTATUS    0x080
+#define    QCOM_CTICHOUTSTATUS    0x084
+#define    QCOM_CTIGATE        0x088
+#define    QCOM_ASICCTL        0x08c
+/* Integration test registers */
+#define    QCOM_ITCHINACK        0xE70
+#define    QCOM_ITTRIGINACK    0xE80
+#define    QCOM_ITTRIGINACK1    0xE84
+#define    QCOM_ITTRIGINACK2    0xE88
+#define    QCOM_ITTRIGINACK3    0xE8C
+#define    QCOM_ITCHOUT        0xE74
+#define    QCOM_ITTRIGOUT        0xEA0
+#define    QCOM_ITTRIGOUT1        0xEA4
+#define    QCOM_ITTRIGOUT2        0xEA8
+#define    QCOM_ITTRIGOUT3        0xEAC
+#define    QCOM_ITCHOUTACK        0xE78
+#define    QCOM_ITTRIGOUTACK    0xEC0
+#define    QCOM_ITTRIGOUTACK1    0xEC4
+#define    QCOM_ITTRIGOUTACK2    0xEC8
+#define    QCOM_ITTRIGOUTACK3    0xECC
+#define    QCOM_ITCHIN        0xE7C
+#define    QCOM_ITTRIGIN        0xEE0
+#define    QCOM_ITTRIGIN1        0xEE4
+#define    QCOM_ITTRIGIN2        0xEE8
+#define    QCOM_ITTRIGIN3        0xEEC
+
+static noinline u32 cti_qcom_reg_off(u32 offset)
+{
+    switch (offset) {
+        case CTIINTACK:        return QCOM_CTIINTACK;
+        case CTIAPPSET:        return QCOM_CTIAPPSET;
+        case CTIAPPCLEAR:    return QCOM_CTIAPPCLEAR;
+        case CTIAPPPULSE:    return QCOM_CTIAPPPULSE;
+        case CTIINEN(0):    return QCOM_CTIINEN;
+        case CTIOUTEN(0):    return QCOM_CTIOUTEN;
+        case CTITRIGINSTATUS:    return QCOM_CTITRIGINSTATUS;
+        case CTITRIGINSTATUS1:    return QCOM_CTITRIGINSTATUS1;
+        case CTITRIGINSTATUS2:    return QCOM_CTITRIGINSTATUS2;
+        case CTITRIGINSTATUS3:    return QCOM_CTITRIGINSTATUS3;
+        case CTITRIGOUTSTATUS:    return QCOM_CTITRIGOUTSTATUS;
+        case CTITRIGOUTSTATUS1:    return QCOM_CTITRIGOUTSTATUS1;
+        case CTITRIGOUTSTATUS2:    return QCOM_CTITRIGOUTSTATUS2;
+        case CTITRIGOUTSTATUS3:    return QCOM_CTITRIGOUTSTATUS3;
+        case CTICHINSTATUS:        return QCOM_CTICHINSTATUS;
+        case CTICHOUTSTATUS:    return QCOM_CTICHOUTSTATUS;
+        case CTIGATE:        return QCOM_CTIGATE;
+        case ASICCTL:        return QCOM_ASICCTL;
+        case ITCHINACK:        return QCOM_ITCHINACK;
+        case ITTRIGINACK:    return QCOM_ITTRIGINACK;
+        case ITTRIGINACK1:    return QCOM_ITTRIGINACK1;
+        case ITTRIGINACK2:    return QCOM_ITTRIGINACK2;
+        case ITTRIGINACK3:    return QCOM_ITTRIGINACK3;
+        case ITCHOUT:        return QCOM_ITCHOUT;
+        case ITTRIGOUT:        return QCOM_ITTRIGOUT;
+        case ITTRIGOUT1:    return QCOM_ITTRIGOUT1;
+        case ITTRIGOUT2:    return QCOM_ITTRIGOUT2;
+        case ITTRIGOUT3:    return QCOM_ITTRIGOUT3;
+        case ITCHOUTACK:    return QCOM_ITCHOUTACK;
+        case ITTRIGOUTACK:    return QCOM_ITTRIGOUTACK;
+        case ITTRIGOUTACK1:    return QCOM_ITTRIGOUTACK1;
+        case ITTRIGOUTACK2:    return QCOM_ITTRIGOUTACK2;
+        case ITTRIGOUTACK3:    return QCOM_ITTRIGOUTACK3;
+        case ITCHIN:               return QCOM_ITCHIN;
+        case ITTRIGIN:            return QCOM_ITTRIGIN;
+        case ITTRIGIN1:        return QCOM_ITTRIGIN1;
+        case ITTRIGIN2:        return QCOM_ITTRIGIN2;
+        case ITTRIGIN3:        return QCOM_ITTRIGIN3;
+        default:
+                    WARN(1, "Unknown offset=%u\n", offset);
+                    return 0;
+    }
+
+    return 0;
+}
+
+static u32 cti_reg_addr_with_nr(struct cti_drvdata *drvdata,
+                                          u32 reg, u32 nr)
+{
+    /* convert to qcom specific offset */
+    if (unlikely(drvdata->is_qcom_cti))
+        reg = cti_qcom_reg_off(reg);
+
+    return reg + sizeof(u32) * nr;
+}
+
+static u32 cti_reg_addr(struct cti_drvdata *drvdata, u32 reg)
+{
+        return cti_reg_addr_with_nr(drvdata, reg, 0);
+}
+


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-09  8:16               ` Yingchao Deng
@ 2025-12-09  9:40                 ` Jie Gan
  2025-12-09 11:03                 ` Jie Gan
  2025-12-09 12:19                 ` Leo Yan
  2 siblings, 0 replies; 26+ messages in thread
From: Jie Gan @ 2025-12-09  9:40 UTC (permalink / raw)
  To: Yingchao Deng, mike.leach
  Cc: alexander.shishkin, coresight, james.clark, jinlong.mao, leo.yan,
	linux-arm-kernel, linux-arm-msm, linux-kernel, quic_jinlmao,
	quic_yingdeng, suzuki.poulose, tingwei.zhang



On 12/9/2025 4:16 PM, Yingchao Deng wrote:
> Hi Leo & Mike
> 
> Based on Leo’s suggestions, I created a new patch, but there are three points that do not fully align with his recommendations:
> 
>      1. The helper function for returning the register address now returns only the offset, because returning the full address would conflict with cti_write_single_reg.
>      2. For registers such as triginstatus1...3, I defined additional macros CTITRIGINSTATUS1...3. This is because CTITRIGINSTATUS + 0x4 equals CTITRIGOUTSTATUS, and to avoid conflicts with existing macros, I chose numbers starting from 0x1000 for the new definitions.
>      3. Regarding the visibility of attributes for triginstatus1...3, since coresight_cti_reg produces an anonymous variable that cannot be directly referenced, I used coresight_cti_regs_attrs[i] to obtain the attribute corresponding to triginstatus1.
> 
> I appreciate both suggestions. After reviewing them, I lean toward Mike's approach.
> 
> Signed-off-by: Yingchao Deng <yingchao.deng@oss.qualcomm.com>
> ---
>   .../hwtracing/coresight/coresight-cti-core.c  |  52 +++++--
>   .../hwtracing/coresight/coresight-cti-sysfs.c |  72 ++++++++--
>   drivers/hwtracing/coresight/coresight-cti.h   |   3 +-
>   drivers/hwtracing/coresight/qcom-cti.h        | 136 ++++++++++++++++++
>   4 files changed, 238 insertions(+), 25 deletions(-)
>   create mode 100644 drivers/hwtracing/coresight/qcom-cti.h
> 
> diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
> index f9970e40dd59..d2b0b46f2846 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-core.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-core.c
> @@ -21,7 +21,7 @@
> 
>   #include "coresight-priv.h"
>   #include "coresight-cti.h"
> -
> +#include "qcom-cti.h"
>   /*
>    * CTI devices can be associated with a PE, or be connected to CoreSight
>    * hardware. We have a list of all CTIs irrespective of CPU bound or
> @@ -70,15 +70,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
> 
>       /* write the CTI trigger registers */
>       for (i = 0; i < config->nr_trig_max; i++) {
> -        writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
> +        writel_relaxed(config->ctiinen[i],
> +                drvdata->base + cti_reg_addr_with_nr(drvdata, CTIINEN(0), i));
>           writel_relaxed(config->ctiouten[i],
> -                   drvdata->base + CTIOUTEN(i));
> +                drvdata->base + cti_reg_addr_with_nr(drvdata, CTIOUTEN(0), i));
>       }
> 
>       /* other regs */
> -    writel_relaxed(config->ctigate, drvdata->base + CTIGATE);
> -    writel_relaxed(config->asicctl, drvdata->base + ASICCTL);
> -    writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
> +    writel_relaxed(config->ctigate, drvdata->base + cti_reg_addr(drvdata, CTIGATE));
> +    writel_relaxed(config->asicctl, drvdata->base + cti_reg_addr(drvdata, ASICCTL));
> +    writel_relaxed(config->ctiappset, drvdata->base + cti_reg_addr(drvdata, CTIAPPSET));
> 
>       /* re-enable CTI */
>       writel_relaxed(1, drvdata->base + CTICONTROL);
> @@ -201,7 +202,7 @@ void cti_write_intack(struct device *dev, u32 ackval)
>       raw_spin_lock(&drvdata->spinlock);
>       /* write if enabled */
>       if (cti_active(config))
> -        cti_write_single_reg(drvdata, CTIINTACK, ackval);
> +        cti_write_single_reg(drvdata, cti_reg_addr(drvdata, CTIINTACK), ackval);
>       raw_spin_unlock(&drvdata->spinlock);
>   }
> 
> @@ -214,6 +215,9 @@ void cti_write_intack(struct device *dev, u32 ackval)
>   /* DEVID[19:16] - number of CTM channels */
>   #define CTI_DEVID_CTMCHANNELS(devid_val) ((int) BMVAL(devid_val, 16, 19))
> 
> +/* DEVARCH[31:21] - ARCHITECT */
> +#define CTI_DEVARCH_ARCHITECT(devarch_val) ((int)BMVAL(devarch_val, 21, 31))
> +
>   static int cti_set_default_config(struct device *dev,
>                     struct cti_drvdata *drvdata)
>   {
> @@ -394,9 +398,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
> 
>       /* update the local register values */
>       chan_bitmask = BIT(channel_idx);
> -    reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :
> -              CTIOUTEN(trigger_idx));
> -
> +    reg_offset = (direction == CTI_TRIG_IN ? cti_reg_addr_with_nr(drvdata, CTIINEN(0), trigger_idx):
> +                         cti_reg_addr_with_nr(drvdata, CTIOUTEN(0), trigger_idx));
>       raw_spin_lock(&drvdata->spinlock);
> 
>       /* read - modify write - the trigger / channel enable value */
> @@ -452,7 +455,7 @@ int cti_channel_gate_op(struct device *dev, enum cti_chan_gate_op op,
>       if (err == 0) {
>           config->ctigate = reg_value;
>           if (cti_active(config))
> -            cti_write_single_reg(drvdata, CTIGATE, reg_value);
> +            cti_write_single_reg(drvdata, cti_reg_addr(drvdata, CTIGATE), reg_value);
>       }
>       raw_spin_unlock(&drvdata->spinlock);
>       return err;
> @@ -479,19 +482,19 @@ int cti_channel_setop(struct device *dev, enum cti_chan_set_op op,
>       case CTI_CHAN_SET:
>           config->ctiappset |= chan_bitmask;
>           reg_value  = config->ctiappset;
> -        reg_offset = CTIAPPSET;
> +        reg_offset = cti_reg_addr(drvdata, CTIAPPSET);
>           break;
> 
>       case CTI_CHAN_CLR:
>           config->ctiappset &= ~chan_bitmask;
>           reg_value = chan_bitmask;
> -        reg_offset = CTIAPPCLEAR;
> +        reg_offset = cti_reg_addr(drvdata, CTIAPPCLEAR);
>           break;
> 
>       case CTI_CHAN_PULSE:
>           config->ctiappset &= ~chan_bitmask;
>           reg_value = chan_bitmask;
> -        reg_offset = CTIAPPPULSE;
> +        reg_offset = cti_reg_addr(drvdata, CTIAPPPULSE);
>           break;
> 
>       default:
> @@ -895,6 +898,7 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
>       struct coresight_desc cti_desc;
>       struct coresight_platform_data *pdata = NULL;
>       struct resource *res = &adev->res;
> +    u32 devarch;
> 
>       /* driver data*/
>       drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
> @@ -981,9 +985,27 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
>       drvdata->csdev_release = drvdata->csdev->dev.release;
>       drvdata->csdev->dev.release = cti_device_release;
> 
> +    /* check architect value*/
> +    devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);
> +    if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {
> +        drvdata->is_qcom_cti = 1;
> +
> +        /*
> +         * QCOM CTI does not implement Claimtag functionality as
> +         * per CoreSight specification, but its CLAIMSET register
> +         * is incorrectly initialized to 0xF. This can mislead
> +         * tools or drivers into thinking the component is claimed.
> +         *
> +         * Reset CLAIMSET to 0 to reflect that no claims are active.
> +         */
> +        drvdata->csdev->claim_tag_info = CS_CLAIM_TAG_NOT_IMPL;
> +        //writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
> +    }
> +
>       /* all done - dec pm refcount */
>       pm_runtime_put(&adev->dev);
> -    dev_info(&drvdata->csdev->dev, "CTI initialized\n");
> +    dev_info(&drvdata->csdev->dev, "%s CTI initialized\n",
> +                    drvdata->is_qcom_cti ? "QCOM" : "");
>       return 0;
> 
>   pm_release:
> diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> index a9df77215141..5d23a138b4a7 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> @@ -13,7 +13,7 @@
>   #include <linux/sysfs.h>
> 
>   #include "coresight-cti.h"
> -
> +#include "qcom-cti.h"
>   /*
>    * Declare the number of static declared attribute groups
>    * Value includes groups + NULL value at end of table.
> @@ -183,7 +183,7 @@ static ssize_t coresight_cti_reg_show(struct device *dev,
>       pm_runtime_get_sync(dev->parent);
>       raw_spin_lock(&drvdata->spinlock);
>       if (drvdata->config.hw_powered)
> -        val = readl_relaxed(drvdata->base + cti_attr->off);
> +        val = readl_relaxed(drvdata->base + cti_reg_addr(drvdata, cti_attr->off));
>       raw_spin_unlock(&drvdata->spinlock);
>       pm_runtime_put_sync(dev->parent);
>       return sysfs_emit(buf, "0x%x\n", val);
> @@ -204,7 +204,7 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
>       pm_runtime_get_sync(dev->parent);
>       raw_spin_lock(&drvdata->spinlock);
>       if (drvdata->config.hw_powered)
> -        cti_write_single_reg(drvdata, cti_attr->off, val);
> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, cti_attr->off), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       pm_runtime_put_sync(dev->parent);
>       return size;
> @@ -267,7 +267,7 @@ static ssize_t cti_reg32_show(struct device *dev, char *buf,
>       raw_spin_lock(&drvdata->spinlock);
>       if ((reg_offset >= 0) && cti_active(config)) {
>           CS_UNLOCK(drvdata->base);
> -        val = readl_relaxed(drvdata->base + reg_offset);
> +        val = readl_relaxed(drvdata->base + cti_reg_addr(drvdata, reg_offset));
>           if (pcached_val)
>               *pcached_val = val;
>           CS_LOCK(drvdata->base);
> @@ -300,7 +300,7 @@ static ssize_t cti_reg32_store(struct device *dev, const char *buf,
> 
>       /* write through if offset and enabled */
>       if ((reg_offset >= 0) && cti_active(config))
> -        cti_write_single_reg(drvdata, reg_offset, val);
> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, reg_offset), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       return size;
>   }
> @@ -389,7 +389,7 @@ static ssize_t inen_store(struct device *dev,
> 
>       /* write through if enabled */
>       if (cti_active(config))
> -        cti_write_single_reg(drvdata, CTIINEN(index), val);
> +        cti_write_single_reg(drvdata, cti_reg_addr_with_nr(drvdata, CTIINEN(0), index), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       return size;
>   }
> @@ -428,7 +428,7 @@ static ssize_t outen_store(struct device *dev,
> 
>       /* write through if enabled */
>       if (cti_active(config))
> -        cti_write_single_reg(drvdata, CTIOUTEN(index), val);
> +        cti_write_single_reg(drvdata, cti_reg_addr_with_nr(drvdata, CTIOUTEN(0), index), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       return size;
>   }
> @@ -470,7 +470,7 @@ static ssize_t appclear_store(struct device *dev,
> 
>       /* write through if enabled */
>       if (cti_active(config))
> -        cti_write_single_reg(drvdata, CTIAPPCLEAR, val);
> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, CTIAPPCLEAR), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       return size;
>   }
> @@ -491,7 +491,7 @@ static ssize_t apppulse_store(struct device *dev,
> 
>       /* write through if enabled */
>       if (cti_active(config))
> -        cti_write_single_reg(drvdata, CTIAPPPULSE, val);
> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, CTIAPPPULSE), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       return size;
>   }
> @@ -513,18 +513,36 @@ static struct attribute *coresight_cti_regs_attrs[] = {
>       &dev_attr_appclear.attr,
>       &dev_attr_apppulse.attr,
>       coresight_cti_reg(triginstatus, CTITRIGINSTATUS),
> +    coresight_cti_reg(triginstatus1, CTITRIGINSTATUS1),
> +    coresight_cti_reg(triginstatus2, CTITRIGINSTATUS2),
> +    coresight_cti_reg(triginstatus3, CTITRIGINSTATUS3),
>       coresight_cti_reg(trigoutstatus, CTITRIGOUTSTATUS),
> +    coresight_cti_reg(trigoutstatus1, CTITRIGOUTSTATUS1),
> +    coresight_cti_reg(trigoutstatus2, CTITRIGOUTSTATUS2),
> +    coresight_cti_reg(trigoutstatus3, CTITRIGOUTSTATUS3),
>       coresight_cti_reg(chinstatus, CTICHINSTATUS),
>       coresight_cti_reg(choutstatus, CTICHOUTSTATUS),
>   #ifdef CONFIG_CORESIGHT_CTI_INTEGRATION_REGS
>       coresight_cti_reg_rw(itctrl, CORESIGHT_ITCTRL),
>       coresight_cti_reg(ittrigin, ITTRIGIN),
> +    coresight_cti_reg(ittrigin1, ITTRIGIN1),
> +    coresight_cti_reg(ittrigin2, ITTRIGIN2),
> +    coresight_cti_reg(ittrigin3, ITTRIGIN3),
>       coresight_cti_reg(itchin, ITCHIN),
>       coresight_cti_reg_rw(ittrigout, ITTRIGOUT),
> +    coresight_cti_reg_rw(ittrigout1, ITTRIGOUT1),
> +    coresight_cti_reg_rw(ittrigout2, ITTRIGOUT2),
> +    coresight_cti_reg_rw(ittrigout3, ITTRIGOUT3),
>       coresight_cti_reg_rw(itchout, ITCHOUT),
>       coresight_cti_reg(itchoutack, ITCHOUTACK),
>       coresight_cti_reg(ittrigoutack, ITTRIGOUTACK),
> +    coresight_cti_reg(ittrigoutack1, ITTRIGOUTACK1),
> +    coresight_cti_reg(ittrigoutack2, ITTRIGOUTACK2),
> +    coresight_cti_reg(ittrigoutack3, ITTRIGOUTACK3),
>       coresight_cti_reg_wo(ittriginack, ITTRIGINACK),
> +    coresight_cti_reg_wo(ittriginack1, ITTRIGINACK1),
> +    coresight_cti_reg_wo(ittriginack2, ITTRIGINACK2),
> +    coresight_cti_reg_wo(ittriginack3, ITTRIGINACK3),
>       coresight_cti_reg_wo(itchinack, ITCHINACK),
>   #endif
>       NULL,
> @@ -1153,6 +1171,41 @@ int cti_create_cons_sysfs(struct device *dev, struct cti_drvdata *drvdata)
>       return err;
>   }
> 
> +  static umode_t coresight_cti_regs_is_visible(struct kobject *kobj,
> +                  struct attribute *attr, int n)
> +  {
> +    struct device *dev = container_of(kobj, struct device, kobj);
> +    struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +
> +    /* Mute QCOM CTI registers for standard CTI module */
> +    if (!drvdata->is_qcom_cti) {
> +        if (attr == coresight_cti_regs_attrs[10]
> +         || attr == coresight_cti_regs_attrs[11]
> +         || attr == coresight_cti_regs_attrs[12]
> +         || attr == coresight_cti_regs_attrs[14]
> +         || attr == coresight_cti_regs_attrs[15]
> +         || attr == coresight_cti_regs_attrs[16]
> +#ifdef CONFIG_CORESIGHT_CTI_INTEGRATION_REGS
> +         || attr == coresight_cti_regs_attrs[21]
> +         || attr == coresight_cti_regs_attrs[22]
> +         || attr == coresight_cti_regs_attrs[23]
> +         || attr == coresight_cti_regs_attrs[26]
> +         || attr == coresight_cti_regs_attrs[27]
> +         || attr == coresight_cti_regs_attrs[28]
> +         || attr == coresight_cti_regs_attrs[32]
> +         || attr == coresight_cti_regs_attrs[33]
> +         || attr == coresight_cti_regs_attrs[34]
> +         || attr == coresight_cti_regs_attrs[36]
> +         || attr == coresight_cti_regs_attrs[37]
> +         || attr == coresight_cti_regs_attrs[38]
> +#endif
> +         )
> +                  return 0;
> +          }
> +
> +          return attr->mode;
> +  }
> +
>   /* attribute and group sysfs tables. */
>   static const struct attribute_group coresight_cti_group = {
>       .attrs = coresight_cti_attrs,
> @@ -1166,6 +1219,7 @@ static const struct attribute_group coresight_cti_mgmt_group = {
>   static const struct attribute_group coresight_cti_regs_group = {
>       .attrs = coresight_cti_regs_attrs,
>       .name = "regs",
> +    .is_visible = coresight_cti_regs_is_visible,
>   };
> 
>   static const struct attribute_group coresight_cti_channels_group = {
> diff --git a/drivers/hwtracing/coresight/coresight-cti.h b/drivers/hwtracing/coresight/coresight-cti.h
> index e7b88b07cffe..413d5ef483e8 100644
> --- a/drivers/hwtracing/coresight/coresight-cti.h
> +++ b/drivers/hwtracing/coresight/coresight-cti.h
> @@ -57,7 +57,7 @@ struct fwnode_handle;
>    * Max of in and out defined in the DEVID register.
>    * - pick up actual number used from .dts parameters if present.
>    */
> -#define CTIINOUTEN_MAX        32
> +#define CTIINOUTEN_MAX        128
> 
>   /**
>    * Group of related trigger signals
> @@ -181,6 +181,7 @@ struct cti_drvdata {
>       struct cti_config config;
>       struct list_head node;
>       void (*csdev_release)(struct device *dev);
> +    bool is_qcom_cti;
>   };
> 
>   /*
> diff --git a/drivers/hwtracing/coresight/qcom-cti.h b/drivers/hwtracing/coresight/qcom-cti.h
> new file mode 100644
> index 000000000000..aa41f9425b36
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/qcom-cti.h
> @@ -0,0 +1,136 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +#include "coresight-cti.h"
> +
> +#define ARCHITECT_QCOM 0x477
> +
> +#define CTITRIGINSTATUS1 0x1000
> +#define CTITRIGINSTATUS2 0x1001
> +#define CTITRIGINSTATUS3 0x1002
> +
> +#define CTITRIGOUTSTATUS1 0x1003
> +#define CTITRIGOUTSTATUS2 0x1004
> +#define CTITRIGOUTSTATUS3 0x1005
> +
> +#define ITTRIGIN1 0x1006
> +#define ITTRIGIN2 0x1007
> +#define ITTRIGIN3 0x1008
> +
> +#define ITTRIGOUT1 0x1009
> +#define ITTRIGOUT2 0x100A
> +#define ITTRIGOUT3 0x100B
> +
> +#define ITTRIGINACK1 0x100C
> +#define ITTRIGINACK2 0x100D
> +#define ITTRIGINACK3 0x100E
> +
> +#define ITTRIGOUTACK1 0x100F
> +#define ITTRIGOUTACK2 0x1010
> +#define ITTRIGOUTACK3 0x1011
> +/* CTI programming registers */
> +#define    QCOM_CTIINTACK        0x020
> +#define    QCOM_CTIAPPSET        0x004
> +#define    QCOM_CTIAPPCLEAR    0x008
> +#define    QCOM_CTIAPPPULSE    0x00C
> +#define    QCOM_CTIINEN        0x400
> +#define    QCOM_CTIOUTEN        0x800
> +#define    QCOM_CTITRIGINSTATUS    0x040
> +#define    QCOM_CTITRIGINSTATUS1    0x044
> +#define    QCOM_CTITRIGINSTATUS2    0x048
> +#define    QCOM_CTITRIGINSTATUS3    0x04C
> +#define    QCOM_CTITRIGOUTSTATUS    0x060
> +#define    QCOM_CTITRIGOUTSTATUS1    0x064
> +#define    QCOM_CTITRIGOUTSTATUS2    0x068
> +#define    QCOM_CTITRIGOUTSTATUS3    0x06C
> +#define    QCOM_CTICHINSTATUS    0x080
> +#define    QCOM_CTICHOUTSTATUS    0x084
> +#define    QCOM_CTIGATE        0x088
> +#define    QCOM_ASICCTL        0x08c
> +/* Integration test registers */
> +#define    QCOM_ITCHINACK        0xE70
> +#define    QCOM_ITTRIGINACK    0xE80
> +#define    QCOM_ITTRIGINACK1    0xE84
> +#define    QCOM_ITTRIGINACK2    0xE88
> +#define    QCOM_ITTRIGINACK3    0xE8C
> +#define    QCOM_ITCHOUT        0xE74
> +#define    QCOM_ITTRIGOUT        0xEA0
> +#define    QCOM_ITTRIGOUT1        0xEA4
> +#define    QCOM_ITTRIGOUT2        0xEA8
> +#define    QCOM_ITTRIGOUT3        0xEAC
> +#define    QCOM_ITCHOUTACK        0xE78
> +#define    QCOM_ITTRIGOUTACK    0xEC0
> +#define    QCOM_ITTRIGOUTACK1    0xEC4
> +#define    QCOM_ITTRIGOUTACK2    0xEC8
> +#define    QCOM_ITTRIGOUTACK3    0xECC
> +#define    QCOM_ITCHIN        0xE7C
> +#define    QCOM_ITTRIGIN        0xEE0
> +#define    QCOM_ITTRIGIN1        0xEE4
> +#define    QCOM_ITTRIGIN2        0xEE8
> +#define    QCOM_ITTRIGIN3        0xEEC
> +
> +static noinline u32 cti_qcom_reg_off(u32 offset)
> +{
> +    switch (offset) {
> +        case CTIINTACK:        return QCOM_CTIINTACK;
> +        case CTIAPPSET:        return QCOM_CTIAPPSET;
> +        case CTIAPPCLEAR:    return QCOM_CTIAPPCLEAR;
> +        case CTIAPPPULSE:    return QCOM_CTIAPPPULSE;
> +        case CTIINEN(0):    return QCOM_CTIINEN;
> +        case CTIOUTEN(0):    return QCOM_CTIOUTEN;
> +        case CTITRIGINSTATUS:    return QCOM_CTITRIGINSTATUS;
> +        case CTITRIGINSTATUS1:    return QCOM_CTITRIGINSTATUS1;
> +        case CTITRIGINSTATUS2:    return QCOM_CTITRIGINSTATUS2;
> +        case CTITRIGINSTATUS3:    return QCOM_CTITRIGINSTATUS3;
> +        case CTITRIGOUTSTATUS:    return QCOM_CTITRIGOUTSTATUS;
> +        case CTITRIGOUTSTATUS1:    return QCOM_CTITRIGOUTSTATUS1;
> +        case CTITRIGOUTSTATUS2:    return QCOM_CTITRIGOUTSTATUS2;
> +        case CTITRIGOUTSTATUS3:    return QCOM_CTITRIGOUTSTATUS3;
> +        case CTICHINSTATUS:        return QCOM_CTICHINSTATUS;
> +        case CTICHOUTSTATUS:    return QCOM_CTICHOUTSTATUS;
> +        case CTIGATE:        return QCOM_CTIGATE;
> +        case ASICCTL:        return QCOM_ASICCTL;
> +        case ITCHINACK:        return QCOM_ITCHINACK;
> +        case ITTRIGINACK:    return QCOM_ITTRIGINACK;
> +        case ITTRIGINACK1:    return QCOM_ITTRIGINACK1;
> +        case ITTRIGINACK2:    return QCOM_ITTRIGINACK2;
> +        case ITTRIGINACK3:    return QCOM_ITTRIGINACK3;
> +        case ITCHOUT:        return QCOM_ITCHOUT;
> +        case ITTRIGOUT:        return QCOM_ITTRIGOUT;
> +        case ITTRIGOUT1:    return QCOM_ITTRIGOUT1;
> +        case ITTRIGOUT2:    return QCOM_ITTRIGOUT2;
> +        case ITTRIGOUT3:    return QCOM_ITTRIGOUT3;
> +        case ITCHOUTACK:    return QCOM_ITCHOUTACK;
> +        case ITTRIGOUTACK:    return QCOM_ITTRIGOUTACK;
> +        case ITTRIGOUTACK1:    return QCOM_ITTRIGOUTACK1;
> +        case ITTRIGOUTACK2:    return QCOM_ITTRIGOUTACK2;
> +        case ITTRIGOUTACK3:    return QCOM_ITTRIGOUTACK3;
> +        case ITCHIN:               return QCOM_ITCHIN;
> +        case ITTRIGIN:            return QCOM_ITTRIGIN;
> +        case ITTRIGIN1:        return QCOM_ITTRIGIN1;
> +        case ITTRIGIN2:        return QCOM_ITTRIGIN2;
> +        case ITTRIGIN3:        return QCOM_ITTRIGIN3;
> +        default:
> +                    WARN(1, "Unknown offset=%u\n", offset);
> +                    return 0;
> +    }
> +
> +    return 0;
> +}
> +
> +static u32 cti_reg_addr_with_nr(struct cti_drvdata *drvdata,
> +                                          u32 reg, u32 nr)
> +{
> +    /* convert to qcom specific offset */
> +    if (unlikely(drvdata->is_qcom_cti))

Hi yingchao,

For standard ARM cti, it's unlikely to true, but for qcom_cti, it's 
likely to true, so I think the compile predication here is not mandatory?

Thanks,
Jie

> +        reg = cti_qcom_reg_off(reg);
> +
> +    return reg + sizeof(u32) * nr;
> +}
> +
> +static u32 cti_reg_addr(struct cti_drvdata *drvdata, u32 reg)
> +{
> +        return cti_reg_addr_with_nr(drvdata, reg, 0);
> +}
> +
> 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-09  8:16               ` Yingchao Deng
  2025-12-09  9:40                 ` Jie Gan
@ 2025-12-09 11:03                 ` Jie Gan
  2025-12-09 12:42                   ` Yingchao Deng (Consultant)
  2025-12-09 12:19                 ` Leo Yan
  2 siblings, 1 reply; 26+ messages in thread
From: Jie Gan @ 2025-12-09 11:03 UTC (permalink / raw)
  To: Yingchao Deng, mike.leach
  Cc: alexander.shishkin, coresight, james.clark, jinlong.mao, leo.yan,
	linux-arm-kernel, linux-arm-msm, linux-kernel, quic_jinlmao,
	quic_yingdeng, suzuki.poulose, tingwei.zhang



On 12/9/2025 4:16 PM, Yingchao Deng wrote:
> Hi Leo & Mike
> 
> Based on Leo’s suggestions, I created a new patch, but there are three points that do not fully align with his recommendations:
> 
>      1. The helper function for returning the register address now returns only the offset, because returning the full address would conflict with cti_write_single_reg.
>      2. For registers such as triginstatus1...3, I defined additional macros CTITRIGINSTATUS1...3. This is because CTITRIGINSTATUS + 0x4 equals CTITRIGOUTSTATUS, and to avoid conflicts with existing macros, I chose numbers starting from 0x1000 for the new definitions.
>      3. Regarding the visibility of attributes for triginstatus1...3, since coresight_cti_reg produces an anonymous variable that cannot be directly referenced, I used coresight_cti_regs_attrs[i] to obtain the attribute corresponding to triginstatus1.
> 
> I appreciate both suggestions. After reviewing them, I lean toward Mike's approach.
> 
> Signed-off-by: Yingchao Deng <yingchao.deng@oss.qualcomm.com>
> ---
>   .../hwtracing/coresight/coresight-cti-core.c  |  52 +++++--
>   .../hwtracing/coresight/coresight-cti-sysfs.c |  72 ++++++++--
>   drivers/hwtracing/coresight/coresight-cti.h   |   3 +-
>   drivers/hwtracing/coresight/qcom-cti.h        | 136 ++++++++++++++++++
>   4 files changed, 238 insertions(+), 25 deletions(-)
>   create mode 100644 drivers/hwtracing/coresight/qcom-cti.h
> 
> diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
> index f9970e40dd59..d2b0b46f2846 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-core.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-core.c
> @@ -21,7 +21,7 @@
> 
>   #include "coresight-priv.h"
>   #include "coresight-cti.h"
> -
> +#include "qcom-cti.h"
>   /*
>    * CTI devices can be associated with a PE, or be connected to CoreSight
>    * hardware. We have a list of all CTIs irrespective of CPU bound or
> @@ -70,15 +70,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
> 
>       /* write the CTI trigger registers */
>       for (i = 0; i < config->nr_trig_max; i++) {
> -        writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
> +        writel_relaxed(config->ctiinen[i],
> +                drvdata->base + cti_reg_addr_with_nr(drvdata, CTIINEN(0), i));
>           writel_relaxed(config->ctiouten[i],
> -                   drvdata->base + CTIOUTEN(i));
> +                drvdata->base + cti_reg_addr_with_nr(drvdata, CTIOUTEN(0), i));
>       }
> 
>       /* other regs */
> -    writel_relaxed(config->ctigate, drvdata->base + CTIGATE);
> -    writel_relaxed(config->asicctl, drvdata->base + ASICCTL);
> -    writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
> +    writel_relaxed(config->ctigate, drvdata->base + cti_reg_addr(drvdata, CTIGATE));
> +    writel_relaxed(config->asicctl, drvdata->base + cti_reg_addr(drvdata, ASICCTL));
> +    writel_relaxed(config->ctiappset, drvdata->base + cti_reg_addr(drvdata, CTIAPPSET));
> 
>       /* re-enable CTI */
>       writel_relaxed(1, drvdata->base + CTICONTROL);
> @@ -201,7 +202,7 @@ void cti_write_intack(struct device *dev, u32 ackval)
>       raw_spin_lock(&drvdata->spinlock);
>       /* write if enabled */
>       if (cti_active(config))
> -        cti_write_single_reg(drvdata, CTIINTACK, ackval);
> +        cti_write_single_reg(drvdata, cti_reg_addr(drvdata, CTIINTACK), ackval);
>       raw_spin_unlock(&drvdata->spinlock);
>   }
> 
> @@ -214,6 +215,9 @@ void cti_write_intack(struct device *dev, u32 ackval)
>   /* DEVID[19:16] - number of CTM channels */
>   #define CTI_DEVID_CTMCHANNELS(devid_val) ((int) BMVAL(devid_val, 16, 19))
> 
> +/* DEVARCH[31:21] - ARCHITECT */
> +#define CTI_DEVARCH_ARCHITECT(devarch_val) ((int)BMVAL(devarch_val, 21, 31))
> +
>   static int cti_set_default_config(struct device *dev,
>                     struct cti_drvdata *drvdata)
>   {
> @@ -394,9 +398,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
> 
>       /* update the local register values */
>       chan_bitmask = BIT(channel_idx);
> -    reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :
> -              CTIOUTEN(trigger_idx));
> -
> +    reg_offset = (direction == CTI_TRIG_IN ? cti_reg_addr_with_nr(drvdata, CTIINEN(0), trigger_idx):
> +                         cti_reg_addr_with_nr(drvdata, CTIOUTEN(0), trigger_idx));
>       raw_spin_lock(&drvdata->spinlock);
> 
>       /* read - modify write - the trigger / channel enable value */
> @@ -452,7 +455,7 @@ int cti_channel_gate_op(struct device *dev, enum cti_chan_gate_op op,
>       if (err == 0) {
>           config->ctigate = reg_value;
>           if (cti_active(config))
> -            cti_write_single_reg(drvdata, CTIGATE, reg_value);
> +            cti_write_single_reg(drvdata, cti_reg_addr(drvdata, CTIGATE), reg_value);
>       }
>       raw_spin_unlock(&drvdata->spinlock);
>       return err;
> @@ -479,19 +482,19 @@ int cti_channel_setop(struct device *dev, enum cti_chan_set_op op,
>       case CTI_CHAN_SET:
>           config->ctiappset |= chan_bitmask;
>           reg_value  = config->ctiappset;
> -        reg_offset = CTIAPPSET;
> +        reg_offset = cti_reg_addr(drvdata, CTIAPPSET);
>           break;
> 
>       case CTI_CHAN_CLR:
>           config->ctiappset &= ~chan_bitmask;
>           reg_value = chan_bitmask;
> -        reg_offset = CTIAPPCLEAR;
> +        reg_offset = cti_reg_addr(drvdata, CTIAPPCLEAR);
>           break;
> 
>       case CTI_CHAN_PULSE:
>           config->ctiappset &= ~chan_bitmask;
>           reg_value = chan_bitmask;
> -        reg_offset = CTIAPPPULSE;
> +        reg_offset = cti_reg_addr(drvdata, CTIAPPPULSE);
>           break;
> 
>       default:
> @@ -895,6 +898,7 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
>       struct coresight_desc cti_desc;
>       struct coresight_platform_data *pdata = NULL;
>       struct resource *res = &adev->res;
> +    u32 devarch;
> 
>       /* driver data*/
>       drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
> @@ -981,9 +985,27 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id)
>       drvdata->csdev_release = drvdata->csdev->dev.release;
>       drvdata->csdev->dev.release = cti_device_release;
> 
> +    /* check architect value*/
> +    devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);
> +    if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {
> +        drvdata->is_qcom_cti = 1;
> +
> +        /*
> +         * QCOM CTI does not implement Claimtag functionality as
> +         * per CoreSight specification, but its CLAIMSET register
> +         * is incorrectly initialized to 0xF. This can mislead
> +         * tools or drivers into thinking the component is claimed.
> +         *
> +         * Reset CLAIMSET to 0 to reflect that no claims are active.
> +         */
> +        drvdata->csdev->claim_tag_info = CS_CLAIM_TAG_NOT_IMPL;
> +        //writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
> +    }
> +
>       /* all done - dec pm refcount */
>       pm_runtime_put(&adev->dev);
> -    dev_info(&drvdata->csdev->dev, "CTI initialized\n");
> +    dev_info(&drvdata->csdev->dev, "%s CTI initialized\n",
> +                    drvdata->is_qcom_cti ? "QCOM" : "");
>       return 0;
> 
>   pm_release:
> diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> index a9df77215141..5d23a138b4a7 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
> @@ -13,7 +13,7 @@
>   #include <linux/sysfs.h>
> 
>   #include "coresight-cti.h"
> -
> +#include "qcom-cti.h"
>   /*
>    * Declare the number of static declared attribute groups
>    * Value includes groups + NULL value at end of table.
> @@ -183,7 +183,7 @@ static ssize_t coresight_cti_reg_show(struct device *dev,
>       pm_runtime_get_sync(dev->parent);
>       raw_spin_lock(&drvdata->spinlock);
>       if (drvdata->config.hw_powered)
> -        val = readl_relaxed(drvdata->base + cti_attr->off);
> +        val = readl_relaxed(drvdata->base + cti_reg_addr(drvdata, cti_attr->off));
>       raw_spin_unlock(&drvdata->spinlock);
>       pm_runtime_put_sync(dev->parent);
>       return sysfs_emit(buf, "0x%x\n", val);
> @@ -204,7 +204,7 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
>       pm_runtime_get_sync(dev->parent);
>       raw_spin_lock(&drvdata->spinlock);
>       if (drvdata->config.hw_powered)
> -        cti_write_single_reg(drvdata, cti_attr->off, val);
> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, cti_attr->off), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       pm_runtime_put_sync(dev->parent);
>       return size;
> @@ -267,7 +267,7 @@ static ssize_t cti_reg32_show(struct device *dev, char *buf,
>       raw_spin_lock(&drvdata->spinlock);
>       if ((reg_offset >= 0) && cti_active(config)) {
>           CS_UNLOCK(drvdata->base);
> -        val = readl_relaxed(drvdata->base + reg_offset);
> +        val = readl_relaxed(drvdata->base + cti_reg_addr(drvdata, reg_offset));
>           if (pcached_val)
>               *pcached_val = val;
>           CS_LOCK(drvdata->base);
> @@ -300,7 +300,7 @@ static ssize_t cti_reg32_store(struct device *dev, const char *buf,
> 
>       /* write through if offset and enabled */
>       if ((reg_offset >= 0) && cti_active(config))
> -        cti_write_single_reg(drvdata, reg_offset, val);
> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, reg_offset), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       return size;
>   }
> @@ -389,7 +389,7 @@ static ssize_t inen_store(struct device *dev,
> 
>       /* write through if enabled */
>       if (cti_active(config))
> -        cti_write_single_reg(drvdata, CTIINEN(index), val);
> +        cti_write_single_reg(drvdata, cti_reg_addr_with_nr(drvdata, CTIINEN(0), index), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       return size;
>   }
> @@ -428,7 +428,7 @@ static ssize_t outen_store(struct device *dev,
> 
>       /* write through if enabled */
>       if (cti_active(config))
> -        cti_write_single_reg(drvdata, CTIOUTEN(index), val);
> +        cti_write_single_reg(drvdata, cti_reg_addr_with_nr(drvdata, CTIOUTEN(0), index), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       return size;
>   }
> @@ -470,7 +470,7 @@ static ssize_t appclear_store(struct device *dev,
> 
>       /* write through if enabled */
>       if (cti_active(config))
> -        cti_write_single_reg(drvdata, CTIAPPCLEAR, val);
> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, CTIAPPCLEAR), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       return size;
>   }
> @@ -491,7 +491,7 @@ static ssize_t apppulse_store(struct device *dev,
> 
>       /* write through if enabled */
>       if (cti_active(config))
> -        cti_write_single_reg(drvdata, CTIAPPPULSE, val);
> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, CTIAPPPULSE), val);
>       raw_spin_unlock(&drvdata->spinlock);
>       return size;
>   }
> @@ -513,18 +513,36 @@ static struct attribute *coresight_cti_regs_attrs[] = {
>       &dev_attr_appclear.attr,
>       &dev_attr_apppulse.attr,
>       coresight_cti_reg(triginstatus, CTITRIGINSTATUS),
> +    coresight_cti_reg(triginstatus1, CTITRIGINSTATUS1),
> +    coresight_cti_reg(triginstatus2, CTITRIGINSTATUS2),
> +    coresight_cti_reg(triginstatus3, CTITRIGINSTATUS3),
>       coresight_cti_reg(trigoutstatus, CTITRIGOUTSTATUS),
> +    coresight_cti_reg(trigoutstatus1, CTITRIGOUTSTATUS1),
> +    coresight_cti_reg(trigoutstatus2, CTITRIGOUTSTATUS2),
> +    coresight_cti_reg(trigoutstatus3, CTITRIGOUTSTATUS3),
>       coresight_cti_reg(chinstatus, CTICHINSTATUS),
>       coresight_cti_reg(choutstatus, CTICHOUTSTATUS),
>   #ifdef CONFIG_CORESIGHT_CTI_INTEGRATION_REGS
>       coresight_cti_reg_rw(itctrl, CORESIGHT_ITCTRL),
>       coresight_cti_reg(ittrigin, ITTRIGIN),
> +    coresight_cti_reg(ittrigin1, ITTRIGIN1),
> +    coresight_cti_reg(ittrigin2, ITTRIGIN2),
> +    coresight_cti_reg(ittrigin3, ITTRIGIN3),
>       coresight_cti_reg(itchin, ITCHIN),
>       coresight_cti_reg_rw(ittrigout, ITTRIGOUT),
> +    coresight_cti_reg_rw(ittrigout1, ITTRIGOUT1),
> +    coresight_cti_reg_rw(ittrigout2, ITTRIGOUT2),
> +    coresight_cti_reg_rw(ittrigout3, ITTRIGOUT3),
>       coresight_cti_reg_rw(itchout, ITCHOUT),
>       coresight_cti_reg(itchoutack, ITCHOUTACK),
>       coresight_cti_reg(ittrigoutack, ITTRIGOUTACK),
> +    coresight_cti_reg(ittrigoutack1, ITTRIGOUTACK1),
> +    coresight_cti_reg(ittrigoutack2, ITTRIGOUTACK2),
> +    coresight_cti_reg(ittrigoutack3, ITTRIGOUTACK3),
>       coresight_cti_reg_wo(ittriginack, ITTRIGINACK),
> +    coresight_cti_reg_wo(ittriginack1, ITTRIGINACK1),
> +    coresight_cti_reg_wo(ittriginack2, ITTRIGINACK2),
> +    coresight_cti_reg_wo(ittriginack3, ITTRIGINACK3),
>       coresight_cti_reg_wo(itchinack, ITCHINACK),
>   #endif
>       NULL,
> @@ -1153,6 +1171,41 @@ int cti_create_cons_sysfs(struct device *dev, struct cti_drvdata *drvdata)
>       return err;
>   }
> 
> +  static umode_t coresight_cti_regs_is_visible(struct kobject *kobj,
> +                  struct attribute *attr, int n)
> +  {

I was thinking can we create a separate attribute_group for added Qcom 
regs? Let devarch to determine whether to create these sysfs nodes?

Thanks,
Jie

> +    struct device *dev = container_of(kobj, struct device, kobj);
> +    struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +
> +    /* Mute QCOM CTI registers for standard CTI module */
> +    if (!drvdata->is_qcom_cti) {
> +        if (attr == coresight_cti_regs_attrs[10]
> +         || attr == coresight_cti_regs_attrs[11]
> +         || attr == coresight_cti_regs_attrs[12]
> +         || attr == coresight_cti_regs_attrs[14]
> +         || attr == coresight_cti_regs_attrs[15]
> +         || attr == coresight_cti_regs_attrs[16]
> +#ifdef CONFIG_CORESIGHT_CTI_INTEGRATION_REGS
> +         || attr == coresight_cti_regs_attrs[21]
> +         || attr == coresight_cti_regs_attrs[22]
> +         || attr == coresight_cti_regs_attrs[23]
> +         || attr == coresight_cti_regs_attrs[26]
> +         || attr == coresight_cti_regs_attrs[27]
> +         || attr == coresight_cti_regs_attrs[28]
> +         || attr == coresight_cti_regs_attrs[32]
> +         || attr == coresight_cti_regs_attrs[33]
> +         || attr == coresight_cti_regs_attrs[34]
> +         || attr == coresight_cti_regs_attrs[36]
> +         || attr == coresight_cti_regs_attrs[37]
> +         || attr == coresight_cti_regs_attrs[38]
> +#endif
> +         )
> +                  return 0;
> +          }
> +
> +          return attr->mode;
> +  }
> +
>   /* attribute and group sysfs tables. */
>   static const struct attribute_group coresight_cti_group = {
>       .attrs = coresight_cti_attrs,
> @@ -1166,6 +1219,7 @@ static const struct attribute_group coresight_cti_mgmt_group = {
>   static const struct attribute_group coresight_cti_regs_group = {
>       .attrs = coresight_cti_regs_attrs,
>       .name = "regs",
> +    .is_visible = coresight_cti_regs_is_visible,
>   };
> 
>   static const struct attribute_group coresight_cti_channels_group = {
> diff --git a/drivers/hwtracing/coresight/coresight-cti.h b/drivers/hwtracing/coresight/coresight-cti.h
> index e7b88b07cffe..413d5ef483e8 100644
> --- a/drivers/hwtracing/coresight/coresight-cti.h
> +++ b/drivers/hwtracing/coresight/coresight-cti.h
> @@ -57,7 +57,7 @@ struct fwnode_handle;
>    * Max of in and out defined in the DEVID register.
>    * - pick up actual number used from .dts parameters if present.
>    */
> -#define CTIINOUTEN_MAX        32
> +#define CTIINOUTEN_MAX        128
> 
>   /**
>    * Group of related trigger signals
> @@ -181,6 +181,7 @@ struct cti_drvdata {
>       struct cti_config config;
>       struct list_head node;
>       void (*csdev_release)(struct device *dev);
> +    bool is_qcom_cti;
>   };
> 
>   /*
> diff --git a/drivers/hwtracing/coresight/qcom-cti.h b/drivers/hwtracing/coresight/qcom-cti.h
> new file mode 100644
> index 000000000000..aa41f9425b36
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/qcom-cti.h
> @@ -0,0 +1,136 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +#include "coresight-cti.h"
> +
> +#define ARCHITECT_QCOM 0x477
> +
> +#define CTITRIGINSTATUS1 0x1000
> +#define CTITRIGINSTATUS2 0x1001
> +#define CTITRIGINSTATUS3 0x1002
> +
> +#define CTITRIGOUTSTATUS1 0x1003
> +#define CTITRIGOUTSTATUS2 0x1004
> +#define CTITRIGOUTSTATUS3 0x1005
> +
> +#define ITTRIGIN1 0x1006
> +#define ITTRIGIN2 0x1007
> +#define ITTRIGIN3 0x1008
> +
> +#define ITTRIGOUT1 0x1009
> +#define ITTRIGOUT2 0x100A
> +#define ITTRIGOUT3 0x100B
> +
> +#define ITTRIGINACK1 0x100C
> +#define ITTRIGINACK2 0x100D
> +#define ITTRIGINACK3 0x100E
> +
> +#define ITTRIGOUTACK1 0x100F
> +#define ITTRIGOUTACK2 0x1010
> +#define ITTRIGOUTACK3 0x1011
> +/* CTI programming registers */
> +#define    QCOM_CTIINTACK        0x020
> +#define    QCOM_CTIAPPSET        0x004
> +#define    QCOM_CTIAPPCLEAR    0x008
> +#define    QCOM_CTIAPPPULSE    0x00C
> +#define    QCOM_CTIINEN        0x400
> +#define    QCOM_CTIOUTEN        0x800
> +#define    QCOM_CTITRIGINSTATUS    0x040
> +#define    QCOM_CTITRIGINSTATUS1    0x044
> +#define    QCOM_CTITRIGINSTATUS2    0x048
> +#define    QCOM_CTITRIGINSTATUS3    0x04C
> +#define    QCOM_CTITRIGOUTSTATUS    0x060
> +#define    QCOM_CTITRIGOUTSTATUS1    0x064
> +#define    QCOM_CTITRIGOUTSTATUS2    0x068
> +#define    QCOM_CTITRIGOUTSTATUS3    0x06C
> +#define    QCOM_CTICHINSTATUS    0x080
> +#define    QCOM_CTICHOUTSTATUS    0x084
> +#define    QCOM_CTIGATE        0x088
> +#define    QCOM_ASICCTL        0x08c
> +/* Integration test registers */
> +#define    QCOM_ITCHINACK        0xE70
> +#define    QCOM_ITTRIGINACK    0xE80
> +#define    QCOM_ITTRIGINACK1    0xE84
> +#define    QCOM_ITTRIGINACK2    0xE88
> +#define    QCOM_ITTRIGINACK3    0xE8C
> +#define    QCOM_ITCHOUT        0xE74
> +#define    QCOM_ITTRIGOUT        0xEA0
> +#define    QCOM_ITTRIGOUT1        0xEA4
> +#define    QCOM_ITTRIGOUT2        0xEA8
> +#define    QCOM_ITTRIGOUT3        0xEAC
> +#define    QCOM_ITCHOUTACK        0xE78
> +#define    QCOM_ITTRIGOUTACK    0xEC0
> +#define    QCOM_ITTRIGOUTACK1    0xEC4
> +#define    QCOM_ITTRIGOUTACK2    0xEC8
> +#define    QCOM_ITTRIGOUTACK3    0xECC
> +#define    QCOM_ITCHIN        0xE7C
> +#define    QCOM_ITTRIGIN        0xEE0
> +#define    QCOM_ITTRIGIN1        0xEE4
> +#define    QCOM_ITTRIGIN2        0xEE8
> +#define    QCOM_ITTRIGIN3        0xEEC
> +
> +static noinline u32 cti_qcom_reg_off(u32 offset)
> +{
> +    switch (offset) {
> +        case CTIINTACK:        return QCOM_CTIINTACK;
> +        case CTIAPPSET:        return QCOM_CTIAPPSET;
> +        case CTIAPPCLEAR:    return QCOM_CTIAPPCLEAR;
> +        case CTIAPPPULSE:    return QCOM_CTIAPPPULSE;
> +        case CTIINEN(0):    return QCOM_CTIINEN;
> +        case CTIOUTEN(0):    return QCOM_CTIOUTEN;
> +        case CTITRIGINSTATUS:    return QCOM_CTITRIGINSTATUS;
> +        case CTITRIGINSTATUS1:    return QCOM_CTITRIGINSTATUS1;
> +        case CTITRIGINSTATUS2:    return QCOM_CTITRIGINSTATUS2;
> +        case CTITRIGINSTATUS3:    return QCOM_CTITRIGINSTATUS3;
> +        case CTITRIGOUTSTATUS:    return QCOM_CTITRIGOUTSTATUS;
> +        case CTITRIGOUTSTATUS1:    return QCOM_CTITRIGOUTSTATUS1;
> +        case CTITRIGOUTSTATUS2:    return QCOM_CTITRIGOUTSTATUS2;
> +        case CTITRIGOUTSTATUS3:    return QCOM_CTITRIGOUTSTATUS3;
> +        case CTICHINSTATUS:        return QCOM_CTICHINSTATUS;
> +        case CTICHOUTSTATUS:    return QCOM_CTICHOUTSTATUS;
> +        case CTIGATE:        return QCOM_CTIGATE;
> +        case ASICCTL:        return QCOM_ASICCTL;
> +        case ITCHINACK:        return QCOM_ITCHINACK;
> +        case ITTRIGINACK:    return QCOM_ITTRIGINACK;
> +        case ITTRIGINACK1:    return QCOM_ITTRIGINACK1;
> +        case ITTRIGINACK2:    return QCOM_ITTRIGINACK2;
> +        case ITTRIGINACK3:    return QCOM_ITTRIGINACK3;
> +        case ITCHOUT:        return QCOM_ITCHOUT;
> +        case ITTRIGOUT:        return QCOM_ITTRIGOUT;
> +        case ITTRIGOUT1:    return QCOM_ITTRIGOUT1;
> +        case ITTRIGOUT2:    return QCOM_ITTRIGOUT2;
> +        case ITTRIGOUT3:    return QCOM_ITTRIGOUT3;
> +        case ITCHOUTACK:    return QCOM_ITCHOUTACK;
> +        case ITTRIGOUTACK:    return QCOM_ITTRIGOUTACK;
> +        case ITTRIGOUTACK1:    return QCOM_ITTRIGOUTACK1;
> +        case ITTRIGOUTACK2:    return QCOM_ITTRIGOUTACK2;
> +        case ITTRIGOUTACK3:    return QCOM_ITTRIGOUTACK3;
> +        case ITCHIN:               return QCOM_ITCHIN;
> +        case ITTRIGIN:            return QCOM_ITTRIGIN;
> +        case ITTRIGIN1:        return QCOM_ITTRIGIN1;
> +        case ITTRIGIN2:        return QCOM_ITTRIGIN2;
> +        case ITTRIGIN3:        return QCOM_ITTRIGIN3;
> +        default:
> +                    WARN(1, "Unknown offset=%u\n", offset);
> +                    return 0;
> +    }
> +
> +    return 0;
> +}
> +
> +static u32 cti_reg_addr_with_nr(struct cti_drvdata *drvdata,
> +                                          u32 reg, u32 nr)
> +{
> +    /* convert to qcom specific offset */
> +    if (unlikely(drvdata->is_qcom_cti))
> +        reg = cti_qcom_reg_off(reg);
> +
> +    return reg + sizeof(u32) * nr;
> +}
> +
> +static u32 cti_reg_addr(struct cti_drvdata *drvdata, u32 reg)
> +{
> +        return cti_reg_addr_with_nr(drvdata, reg, 0);
> +}
> +
> 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-09  8:16               ` Yingchao Deng
  2025-12-09  9:40                 ` Jie Gan
  2025-12-09 11:03                 ` Jie Gan
@ 2025-12-09 12:19                 ` Leo Yan
  2025-12-09 12:51                   ` Yingchao Deng (Consultant)
  2 siblings, 1 reply; 26+ messages in thread
From: Leo Yan @ 2025-12-09 12:19 UTC (permalink / raw)
  To: Yingchao Deng
  Cc: mike.leach, alexander.shishkin, coresight, james.clark,
	jinlong.mao, linux-arm-kernel, linux-arm-msm, linux-kernel,
	quic_jinlmao, quic_yingdeng, suzuki.poulose, tingwei.zhang

Hi Yingchao,

On Tue, Dec 09, 2025 at 04:16:28PM +0800, Yingchao Deng wrote:
> Hi Leo & Mike
> 
> Based on Leo’s suggestions, I created a new patch, but there are three points that do not fully align with his recommendations:
> 
>     1. The helper function for returning the register address now returns only the offset, because returning the full address would conflict with cti_write_single_reg.

No need to change each callsite for cti_write_single_reg().  You could
update cti_write_single_reg() instead:

  void cti_write_single_reg(struct cti_drvdata *drvdata,
                            int offset, u32 value)
  {
          CS_UNLOCK(drvdata->base);
          writel_relaxed(value, cti_reg_addr(drvdata, offset));
          CS_LOCK(drvdata->base);
  }

>     2. For registers such as triginstatus1...3, I defined additional macros CTITRIGINSTATUS1...3. This is because CTITRIGINSTATUS + 0x4 equals CTITRIGOUTSTATUS, and to avoid conflicts with existing macros, I chose numbers starting from 0x1000 for the new definitions.

To avoid the register naming pollution, please don't define the common
names but only used for Qcom registers.

AFAIK, you even don't need to define these registers.  These registers
are only used for sysfs knobs, we can define an extra "nr" field (e.g.,
bits[31..28] for indexing these registers, something like:

  #define CIT_REG_NR_SHIFT          28
  #define CIT_REG_NR_MASK           GENMASK(31, 28)
  #define CTI_REG_GET_NR(reg)       FIELD_GET(CIT_REG_NR_MASK, (reg))
  #define CTI_REG_SET_NR(reg, nr)   ((reg) | FIELD_PREP(CIT_REG_NR_MASK, (nr))

  static struct attribute *coresight_cti_regs_attrs[] = {
  ...
    coresight_cti_reg(triginstatus, CTITRIGINSTATUS),
    coresight_cti_reg(triginstatus1, CTI_REG_SET_NR(CTITRIGINSTATUS, 1)),
    coresight_cti_reg(triginstatus2, CTI_REG_SET_NR(CTITRIGINSTATUS, 2)),
    coresight_cti_reg(triginstatus3, CTI_REG_SET_NR(CTITRIGINSTATUS, 3)),
  ...

Then, you just need to decode "nr" fields in cti_qcom_reg_off().

>     3. Regarding the visibility of attributes for triginstatus1...3, since coresight_cti_reg produces an anonymous variable that cannot be directly referenced, I used coresight_cti_regs_attrs[i] to obtain the attribute corresponding to triginstatus1.

Okay, I get the meaning for "an anonymous variable" - there have no
field naming when define attr with the macro coresight_cti_reg().

but you could comparing the attr string?

  if (!strcmp(attr->name, "triginstatus1") ||
      !strcmp(attr->name, "triginstatus2") ||
      !strcmp(attr->name, "triginstatus3"))
      ...

Thanks,
Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-09 11:03                 ` Jie Gan
@ 2025-12-09 12:42                   ` Yingchao Deng (Consultant)
  0 siblings, 0 replies; 26+ messages in thread
From: Yingchao Deng (Consultant) @ 2025-12-09 12:42 UTC (permalink / raw)
  To: Jie Gan, Yingchao Deng
  Cc: alexander.shishkin, coresight, james.clark, jinlong.mao, leo.yan,
	linux-arm-kernel, linux-arm-msm, linux-kernel, quic_jinlmao,
	suzuki.poulose, tingwei.zhang, Mike Leach


On 12/9/2025 7:03 PM, Jie Gan wrote:
>
>
> On 12/9/2025 4:16 PM, Yingchao Deng wrote:
>> Hi Leo & Mike
>>
>> Based on Leo’s suggestions, I created a new patch, but there are 
>> three points that do not fully align with his recommendations:
>>
>>      1. The helper function for returning the register address now 
>> returns only the offset, because returning the full address would 
>> conflict with cti_write_single_reg.
>>      2. For registers such as triginstatus1...3, I defined additional 
>> macros CTITRIGINSTATUS1...3. This is because CTITRIGINSTATUS + 0x4 
>> equals CTITRIGOUTSTATUS, and to avoid conflicts with existing macros, 
>> I chose numbers starting from 0x1000 for the new definitions.
>>      3. Regarding the visibility of attributes for triginstatus1...3, 
>> since coresight_cti_reg produces an anonymous variable that cannot be 
>> directly referenced, I used coresight_cti_regs_attrs[i] to obtain the 
>> attribute corresponding to triginstatus1.
>>
>> I appreciate both suggestions. After reviewing them, I lean toward 
>> Mike's approach.
>>
>> Signed-off-by: Yingchao Deng <yingchao.deng@oss.qualcomm.com>
>> ---
>>   .../hwtracing/coresight/coresight-cti-core.c  |  52 +++++--
>>   .../hwtracing/coresight/coresight-cti-sysfs.c |  72 ++++++++--
>>   drivers/hwtracing/coresight/coresight-cti.h   |   3 +-
>>   drivers/hwtracing/coresight/qcom-cti.h        | 136 ++++++++++++++++++
>>   4 files changed, 238 insertions(+), 25 deletions(-)
>>   create mode 100644 drivers/hwtracing/coresight/qcom-cti.h
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c 
>> b/drivers/hwtracing/coresight/coresight-cti-core.c
>> index f9970e40dd59..d2b0b46f2846 100644
>> --- a/drivers/hwtracing/coresight/coresight-cti-core.c
>> +++ b/drivers/hwtracing/coresight/coresight-cti-core.c
>> @@ -21,7 +21,7 @@
>>
>>   #include "coresight-priv.h"
>>   #include "coresight-cti.h"
>> -
>> +#include "qcom-cti.h"
>>   /*
>>    * CTI devices can be associated with a PE, or be connected to 
>> CoreSight
>>    * hardware. We have a list of all CTIs irrespective of CPU bound or
>> @@ -70,15 +70,16 @@ void cti_write_all_hw_regs(struct cti_drvdata 
>> *drvdata)
>>
>>       /* write the CTI trigger registers */
>>       for (i = 0; i < config->nr_trig_max; i++) {
>> -        writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
>> +        writel_relaxed(config->ctiinen[i],
>> +                drvdata->base + cti_reg_addr_with_nr(drvdata, 
>> CTIINEN(0), i));
>>           writel_relaxed(config->ctiouten[i],
>> -                   drvdata->base + CTIOUTEN(i));
>> +                drvdata->base + cti_reg_addr_with_nr(drvdata, 
>> CTIOUTEN(0), i));
>>       }
>>
>>       /* other regs */
>> -    writel_relaxed(config->ctigate, drvdata->base + CTIGATE);
>> -    writel_relaxed(config->asicctl, drvdata->base + ASICCTL);
>> -    writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
>> +    writel_relaxed(config->ctigate, drvdata->base + 
>> cti_reg_addr(drvdata, CTIGATE));
>> +    writel_relaxed(config->asicctl, drvdata->base + 
>> cti_reg_addr(drvdata, ASICCTL));
>> +    writel_relaxed(config->ctiappset, drvdata->base + 
>> cti_reg_addr(drvdata, CTIAPPSET));
>>
>>       /* re-enable CTI */
>>       writel_relaxed(1, drvdata->base + CTICONTROL);
>> @@ -201,7 +202,7 @@ void cti_write_intack(struct device *dev, u32 
>> ackval)
>>       raw_spin_lock(&drvdata->spinlock);
>>       /* write if enabled */
>>       if (cti_active(config))
>> -        cti_write_single_reg(drvdata, CTIINTACK, ackval);
>> +        cti_write_single_reg(drvdata, cti_reg_addr(drvdata, 
>> CTIINTACK), ackval);
>>       raw_spin_unlock(&drvdata->spinlock);
>>   }
>>
>> @@ -214,6 +215,9 @@ void cti_write_intack(struct device *dev, u32 
>> ackval)
>>   /* DEVID[19:16] - number of CTM channels */
>>   #define CTI_DEVID_CTMCHANNELS(devid_val) ((int) BMVAL(devid_val, 
>> 16, 19))
>>
>> +/* DEVARCH[31:21] - ARCHITECT */
>> +#define CTI_DEVARCH_ARCHITECT(devarch_val) ((int)BMVAL(devarch_val, 
>> 21, 31))
>> +
>>   static int cti_set_default_config(struct device *dev,
>>                     struct cti_drvdata *drvdata)
>>   {
>> @@ -394,9 +398,8 @@ int cti_channel_trig_op(struct device *dev, enum 
>> cti_chan_op op,
>>
>>       /* update the local register values */
>>       chan_bitmask = BIT(channel_idx);
>> -    reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :
>> -              CTIOUTEN(trigger_idx));
>> -
>> +    reg_offset = (direction == CTI_TRIG_IN ? 
>> cti_reg_addr_with_nr(drvdata, CTIINEN(0), trigger_idx):
>> +                         cti_reg_addr_with_nr(drvdata, CTIOUTEN(0), 
>> trigger_idx));
>>       raw_spin_lock(&drvdata->spinlock);
>>
>>       /* read - modify write - the trigger / channel enable value */
>> @@ -452,7 +455,7 @@ int cti_channel_gate_op(struct device *dev, enum 
>> cti_chan_gate_op op,
>>       if (err == 0) {
>>           config->ctigate = reg_value;
>>           if (cti_active(config))
>> -            cti_write_single_reg(drvdata, CTIGATE, reg_value);
>> +            cti_write_single_reg(drvdata, cti_reg_addr(drvdata, 
>> CTIGATE), reg_value);
>>       }
>>       raw_spin_unlock(&drvdata->spinlock);
>>       return err;
>> @@ -479,19 +482,19 @@ int cti_channel_setop(struct device *dev, enum 
>> cti_chan_set_op op,
>>       case CTI_CHAN_SET:
>>           config->ctiappset |= chan_bitmask;
>>           reg_value  = config->ctiappset;
>> -        reg_offset = CTIAPPSET;
>> +        reg_offset = cti_reg_addr(drvdata, CTIAPPSET);
>>           break;
>>
>>       case CTI_CHAN_CLR:
>>           config->ctiappset &= ~chan_bitmask;
>>           reg_value = chan_bitmask;
>> -        reg_offset = CTIAPPCLEAR;
>> +        reg_offset = cti_reg_addr(drvdata, CTIAPPCLEAR);
>>           break;
>>
>>       case CTI_CHAN_PULSE:
>>           config->ctiappset &= ~chan_bitmask;
>>           reg_value = chan_bitmask;
>> -        reg_offset = CTIAPPPULSE;
>> +        reg_offset = cti_reg_addr(drvdata, CTIAPPPULSE);
>>           break;
>>
>>       default:
>> @@ -895,6 +898,7 @@ static int cti_probe(struct amba_device *adev, 
>> const struct amba_id *id)
>>       struct coresight_desc cti_desc;
>>       struct coresight_platform_data *pdata = NULL;
>>       struct resource *res = &adev->res;
>> +    u32 devarch;
>>
>>       /* driver data*/
>>       drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
>> @@ -981,9 +985,27 @@ static int cti_probe(struct amba_device *adev, 
>> const struct amba_id *id)
>>       drvdata->csdev_release = drvdata->csdev->dev.release;
>>       drvdata->csdev->dev.release = cti_device_release;
>>
>> +    /* check architect value*/
>> +    devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);
>> +    if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {
>> +        drvdata->is_qcom_cti = 1;
>> +
>> +        /*
>> +         * QCOM CTI does not implement Claimtag functionality as
>> +         * per CoreSight specification, but its CLAIMSET register
>> +         * is incorrectly initialized to 0xF. This can mislead
>> +         * tools or drivers into thinking the component is claimed.
>> +         *
>> +         * Reset CLAIMSET to 0 to reflect that no claims are active.
>> +         */
>> +        drvdata->csdev->claim_tag_info = CS_CLAIM_TAG_NOT_IMPL;
>> +        //writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
>> +    }
>> +
>>       /* all done - dec pm refcount */
>>       pm_runtime_put(&adev->dev);
>> -    dev_info(&drvdata->csdev->dev, "CTI initialized\n");
>> +    dev_info(&drvdata->csdev->dev, "%s CTI initialized\n",
>> +                    drvdata->is_qcom_cti ? "QCOM" : "");
>>       return 0;
>>
>>   pm_release:
>> diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c 
>> b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
>> index a9df77215141..5d23a138b4a7 100644
>> --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c
>> +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c
>> @@ -13,7 +13,7 @@
>>   #include <linux/sysfs.h>
>>
>>   #include "coresight-cti.h"
>> -
>> +#include "qcom-cti.h"
>>   /*
>>    * Declare the number of static declared attribute groups
>>    * Value includes groups + NULL value at end of table.
>> @@ -183,7 +183,7 @@ static ssize_t coresight_cti_reg_show(struct 
>> device *dev,
>>       pm_runtime_get_sync(dev->parent);
>>       raw_spin_lock(&drvdata->spinlock);
>>       if (drvdata->config.hw_powered)
>> -        val = readl_relaxed(drvdata->base + cti_attr->off);
>> +        val = readl_relaxed(drvdata->base + cti_reg_addr(drvdata, 
>> cti_attr->off));
>>       raw_spin_unlock(&drvdata->spinlock);
>>       pm_runtime_put_sync(dev->parent);
>>       return sysfs_emit(buf, "0x%x\n", val);
>> @@ -204,7 +204,7 @@ static __maybe_unused ssize_t 
>> coresight_cti_reg_store(struct device *dev,
>>       pm_runtime_get_sync(dev->parent);
>>       raw_spin_lock(&drvdata->spinlock);
>>       if (drvdata->config.hw_powered)
>> -        cti_write_single_reg(drvdata, cti_attr->off, val);
>> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, 
>> cti_attr->off), val);
>>       raw_spin_unlock(&drvdata->spinlock);
>>       pm_runtime_put_sync(dev->parent);
>>       return size;
>> @@ -267,7 +267,7 @@ static ssize_t cti_reg32_show(struct device *dev, 
>> char *buf,
>>       raw_spin_lock(&drvdata->spinlock);
>>       if ((reg_offset >= 0) && cti_active(config)) {
>>           CS_UNLOCK(drvdata->base);
>> -        val = readl_relaxed(drvdata->base + reg_offset);
>> +        val = readl_relaxed(drvdata->base + cti_reg_addr(drvdata, 
>> reg_offset));
>>           if (pcached_val)
>>               *pcached_val = val;
>>           CS_LOCK(drvdata->base);
>> @@ -300,7 +300,7 @@ static ssize_t cti_reg32_store(struct device 
>> *dev, const char *buf,
>>
>>       /* write through if offset and enabled */
>>       if ((reg_offset >= 0) && cti_active(config))
>> -        cti_write_single_reg(drvdata, reg_offset, val);
>> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, 
>> reg_offset), val);
>>       raw_spin_unlock(&drvdata->spinlock);
>>       return size;
>>   }
>> @@ -389,7 +389,7 @@ static ssize_t inen_store(struct device *dev,
>>
>>       /* write through if enabled */
>>       if (cti_active(config))
>> -        cti_write_single_reg(drvdata, CTIINEN(index), val);
>> +        cti_write_single_reg(drvdata, cti_reg_addr_with_nr(drvdata, 
>> CTIINEN(0), index), val);
>>       raw_spin_unlock(&drvdata->spinlock);
>>       return size;
>>   }
>> @@ -428,7 +428,7 @@ static ssize_t outen_store(struct device *dev,
>>
>>       /* write through if enabled */
>>       if (cti_active(config))
>> -        cti_write_single_reg(drvdata, CTIOUTEN(index), val);
>> +        cti_write_single_reg(drvdata, cti_reg_addr_with_nr(drvdata, 
>> CTIOUTEN(0), index), val);
>>       raw_spin_unlock(&drvdata->spinlock);
>>       return size;
>>   }
>> @@ -470,7 +470,7 @@ static ssize_t appclear_store(struct device *dev,
>>
>>       /* write through if enabled */
>>       if (cti_active(config))
>> -        cti_write_single_reg(drvdata, CTIAPPCLEAR, val);
>> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, 
>> CTIAPPCLEAR), val);
>>       raw_spin_unlock(&drvdata->spinlock);
>>       return size;
>>   }
>> @@ -491,7 +491,7 @@ static ssize_t apppulse_store(struct device *dev,
>>
>>       /* write through if enabled */
>>       if (cti_active(config))
>> -        cti_write_single_reg(drvdata, CTIAPPPULSE, val);
>> +        cti_write_single_reg(drvdata,  cti_reg_addr(drvdata, 
>> CTIAPPPULSE), val);
>>       raw_spin_unlock(&drvdata->spinlock);
>>       return size;
>>   }
>> @@ -513,18 +513,36 @@ static struct attribute 
>> *coresight_cti_regs_attrs[] = {
>>       &dev_attr_appclear.attr,
>>       &dev_attr_apppulse.attr,
>>       coresight_cti_reg(triginstatus, CTITRIGINSTATUS),
>> +    coresight_cti_reg(triginstatus1, CTITRIGINSTATUS1),
>> +    coresight_cti_reg(triginstatus2, CTITRIGINSTATUS2),
>> +    coresight_cti_reg(triginstatus3, CTITRIGINSTATUS3),
>>       coresight_cti_reg(trigoutstatus, CTITRIGOUTSTATUS),
>> +    coresight_cti_reg(trigoutstatus1, CTITRIGOUTSTATUS1),
>> +    coresight_cti_reg(trigoutstatus2, CTITRIGOUTSTATUS2),
>> +    coresight_cti_reg(trigoutstatus3, CTITRIGOUTSTATUS3),
>>       coresight_cti_reg(chinstatus, CTICHINSTATUS),
>>       coresight_cti_reg(choutstatus, CTICHOUTSTATUS),
>>   #ifdef CONFIG_CORESIGHT_CTI_INTEGRATION_REGS
>>       coresight_cti_reg_rw(itctrl, CORESIGHT_ITCTRL),
>>       coresight_cti_reg(ittrigin, ITTRIGIN),
>> +    coresight_cti_reg(ittrigin1, ITTRIGIN1),
>> +    coresight_cti_reg(ittrigin2, ITTRIGIN2),
>> +    coresight_cti_reg(ittrigin3, ITTRIGIN3),
>>       coresight_cti_reg(itchin, ITCHIN),
>>       coresight_cti_reg_rw(ittrigout, ITTRIGOUT),
>> +    coresight_cti_reg_rw(ittrigout1, ITTRIGOUT1),
>> +    coresight_cti_reg_rw(ittrigout2, ITTRIGOUT2),
>> +    coresight_cti_reg_rw(ittrigout3, ITTRIGOUT3),
>>       coresight_cti_reg_rw(itchout, ITCHOUT),
>>       coresight_cti_reg(itchoutack, ITCHOUTACK),
>>       coresight_cti_reg(ittrigoutack, ITTRIGOUTACK),
>> +    coresight_cti_reg(ittrigoutack1, ITTRIGOUTACK1),
>> +    coresight_cti_reg(ittrigoutack2, ITTRIGOUTACK2),
>> +    coresight_cti_reg(ittrigoutack3, ITTRIGOUTACK3),
>>       coresight_cti_reg_wo(ittriginack, ITTRIGINACK),
>> +    coresight_cti_reg_wo(ittriginack1, ITTRIGINACK1),
>> +    coresight_cti_reg_wo(ittriginack2, ITTRIGINACK2),
>> +    coresight_cti_reg_wo(ittriginack3, ITTRIGINACK3),
>>       coresight_cti_reg_wo(itchinack, ITCHINACK),
>>   #endif
>>       NULL,
>> @@ -1153,6 +1171,41 @@ int cti_create_cons_sysfs(struct device *dev, 
>> struct cti_drvdata *drvdata)
>>       return err;
>>   }
>>
>> +  static umode_t coresight_cti_regs_is_visible(struct kobject *kobj,
>> +                  struct attribute *attr, int n)
>> +  {
>
> I was thinking can we create a separate attribute_group for added Qcom 
> regs? Let devarch to determine whether to create these sysfs nodes?
>
> Thanks,
> Jie
>
As there seems to be no interface to create an attribute_group within 
another attribute_group, the group would have to reside under the 
.../devices/cti-xxx/ directory. I'm uncertain whether this arrangement 
would be considered acceptable.

Thanks
Yingchao
>> +    struct device *dev = container_of(kobj, struct device, kobj);
>> +    struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
>> +
>> +    /* Mute QCOM CTI registers for standard CTI module */
>> +    if (!drvdata->is_qcom_cti) {
>> +        if (attr == coresight_cti_regs_attrs[10]
>> +         || attr == coresight_cti_regs_attrs[11]
>> +         || attr == coresight_cti_regs_attrs[12]
>> +         || attr == coresight_cti_regs_attrs[14]
>> +         || attr == coresight_cti_regs_attrs[15]
>> +         || attr == coresight_cti_regs_attrs[16]
>> +#ifdef CONFIG_CORESIGHT_CTI_INTEGRATION_REGS
>> +         || attr == coresight_cti_regs_attrs[21]
>> +         || attr == coresight_cti_regs_attrs[22]
>> +         || attr == coresight_cti_regs_attrs[23]
>> +         || attr == coresight_cti_regs_attrs[26]
>> +         || attr == coresight_cti_regs_attrs[27]
>> +         || attr == coresight_cti_regs_attrs[28]
>> +         || attr == coresight_cti_regs_attrs[32]
>> +         || attr == coresight_cti_regs_attrs[33]
>> +         || attr == coresight_cti_regs_attrs[34]
>> +         || attr == coresight_cti_regs_attrs[36]
>> +         || attr == coresight_cti_regs_attrs[37]
>> +         || attr == coresight_cti_regs_attrs[38]
>> +#endif
>> +         )
>> +                  return 0;
>> +          }
>> +
>> +          return attr->mode;
>> +  }
>> +
>>   /* attribute and group sysfs tables. */
>>   static const struct attribute_group coresight_cti_group = {
>>       .attrs = coresight_cti_attrs,
>> @@ -1166,6 +1219,7 @@ static const struct attribute_group 
>> coresight_cti_mgmt_group = {
>>   static const struct attribute_group coresight_cti_regs_group = {
>>       .attrs = coresight_cti_regs_attrs,
>>       .name = "regs",
>> +    .is_visible = coresight_cti_regs_is_visible,
>>   };
>>
>>   static const struct attribute_group coresight_cti_channels_group = {
>> diff --git a/drivers/hwtracing/coresight/coresight-cti.h 
>> b/drivers/hwtracing/coresight/coresight-cti.h
>> index e7b88b07cffe..413d5ef483e8 100644
>> --- a/drivers/hwtracing/coresight/coresight-cti.h
>> +++ b/drivers/hwtracing/coresight/coresight-cti.h
>> @@ -57,7 +57,7 @@ struct fwnode_handle;
>>    * Max of in and out defined in the DEVID register.
>>    * - pick up actual number used from .dts parameters if present.
>>    */
>> -#define CTIINOUTEN_MAX        32
>> +#define CTIINOUTEN_MAX        128
>>
>>   /**
>>    * Group of related trigger signals
>> @@ -181,6 +181,7 @@ struct cti_drvdata {
>>       struct cti_config config;
>>       struct list_head node;
>>       void (*csdev_release)(struct device *dev);
>> +    bool is_qcom_cti;
>>   };
>>
>>   /*
>> diff --git a/drivers/hwtracing/coresight/qcom-cti.h 
>> b/drivers/hwtracing/coresight/qcom-cti.h
>> new file mode 100644
>> index 000000000000..aa41f9425b36
>> --- /dev/null
>> +++ b/drivers/hwtracing/coresight/qcom-cti.h
>> @@ -0,0 +1,136 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
>> + */
>> +#include "coresight-cti.h"
>> +
>> +#define ARCHITECT_QCOM 0x477
>> +
>> +#define CTITRIGINSTATUS1 0x1000
>> +#define CTITRIGINSTATUS2 0x1001
>> +#define CTITRIGINSTATUS3 0x1002
>> +
>> +#define CTITRIGOUTSTATUS1 0x1003
>> +#define CTITRIGOUTSTATUS2 0x1004
>> +#define CTITRIGOUTSTATUS3 0x1005
>> +
>> +#define ITTRIGIN1 0x1006
>> +#define ITTRIGIN2 0x1007
>> +#define ITTRIGIN3 0x1008
>> +
>> +#define ITTRIGOUT1 0x1009
>> +#define ITTRIGOUT2 0x100A
>> +#define ITTRIGOUT3 0x100B
>> +
>> +#define ITTRIGINACK1 0x100C
>> +#define ITTRIGINACK2 0x100D
>> +#define ITTRIGINACK3 0x100E
>> +
>> +#define ITTRIGOUTACK1 0x100F
>> +#define ITTRIGOUTACK2 0x1010
>> +#define ITTRIGOUTACK3 0x1011
>> +/* CTI programming registers */
>> +#define    QCOM_CTIINTACK        0x020
>> +#define    QCOM_CTIAPPSET        0x004
>> +#define    QCOM_CTIAPPCLEAR    0x008
>> +#define    QCOM_CTIAPPPULSE    0x00C
>> +#define    QCOM_CTIINEN        0x400
>> +#define    QCOM_CTIOUTEN        0x800
>> +#define    QCOM_CTITRIGINSTATUS    0x040
>> +#define    QCOM_CTITRIGINSTATUS1    0x044
>> +#define    QCOM_CTITRIGINSTATUS2    0x048
>> +#define    QCOM_CTITRIGINSTATUS3    0x04C
>> +#define    QCOM_CTITRIGOUTSTATUS    0x060
>> +#define    QCOM_CTITRIGOUTSTATUS1    0x064
>> +#define    QCOM_CTITRIGOUTSTATUS2    0x068
>> +#define    QCOM_CTITRIGOUTSTATUS3    0x06C
>> +#define    QCOM_CTICHINSTATUS    0x080
>> +#define    QCOM_CTICHOUTSTATUS    0x084
>> +#define    QCOM_CTIGATE        0x088
>> +#define    QCOM_ASICCTL        0x08c
>> +/* Integration test registers */
>> +#define    QCOM_ITCHINACK        0xE70
>> +#define    QCOM_ITTRIGINACK    0xE80
>> +#define    QCOM_ITTRIGINACK1    0xE84
>> +#define    QCOM_ITTRIGINACK2    0xE88
>> +#define    QCOM_ITTRIGINACK3    0xE8C
>> +#define    QCOM_ITCHOUT        0xE74
>> +#define    QCOM_ITTRIGOUT        0xEA0
>> +#define    QCOM_ITTRIGOUT1        0xEA4
>> +#define    QCOM_ITTRIGOUT2        0xEA8
>> +#define    QCOM_ITTRIGOUT3        0xEAC
>> +#define    QCOM_ITCHOUTACK        0xE78
>> +#define    QCOM_ITTRIGOUTACK    0xEC0
>> +#define    QCOM_ITTRIGOUTACK1    0xEC4
>> +#define    QCOM_ITTRIGOUTACK2    0xEC8
>> +#define    QCOM_ITTRIGOUTACK3    0xECC
>> +#define    QCOM_ITCHIN        0xE7C
>> +#define    QCOM_ITTRIGIN        0xEE0
>> +#define    QCOM_ITTRIGIN1        0xEE4
>> +#define    QCOM_ITTRIGIN2        0xEE8
>> +#define    QCOM_ITTRIGIN3        0xEEC
>> +
>> +static noinline u32 cti_qcom_reg_off(u32 offset)
>> +{
>> +    switch (offset) {
>> +        case CTIINTACK:        return QCOM_CTIINTACK;
>> +        case CTIAPPSET:        return QCOM_CTIAPPSET;
>> +        case CTIAPPCLEAR:    return QCOM_CTIAPPCLEAR;
>> +        case CTIAPPPULSE:    return QCOM_CTIAPPPULSE;
>> +        case CTIINEN(0):    return QCOM_CTIINEN;
>> +        case CTIOUTEN(0):    return QCOM_CTIOUTEN;
>> +        case CTITRIGINSTATUS:    return QCOM_CTITRIGINSTATUS;
>> +        case CTITRIGINSTATUS1:    return QCOM_CTITRIGINSTATUS1;
>> +        case CTITRIGINSTATUS2:    return QCOM_CTITRIGINSTATUS2;
>> +        case CTITRIGINSTATUS3:    return QCOM_CTITRIGINSTATUS3;
>> +        case CTITRIGOUTSTATUS:    return QCOM_CTITRIGOUTSTATUS;
>> +        case CTITRIGOUTSTATUS1:    return QCOM_CTITRIGOUTSTATUS1;
>> +        case CTITRIGOUTSTATUS2:    return QCOM_CTITRIGOUTSTATUS2;
>> +        case CTITRIGOUTSTATUS3:    return QCOM_CTITRIGOUTSTATUS3;
>> +        case CTICHINSTATUS:        return QCOM_CTICHINSTATUS;
>> +        case CTICHOUTSTATUS:    return QCOM_CTICHOUTSTATUS;
>> +        case CTIGATE:        return QCOM_CTIGATE;
>> +        case ASICCTL:        return QCOM_ASICCTL;
>> +        case ITCHINACK:        return QCOM_ITCHINACK;
>> +        case ITTRIGINACK:    return QCOM_ITTRIGINACK;
>> +        case ITTRIGINACK1:    return QCOM_ITTRIGINACK1;
>> +        case ITTRIGINACK2:    return QCOM_ITTRIGINACK2;
>> +        case ITTRIGINACK3:    return QCOM_ITTRIGINACK3;
>> +        case ITCHOUT:        return QCOM_ITCHOUT;
>> +        case ITTRIGOUT:        return QCOM_ITTRIGOUT;
>> +        case ITTRIGOUT1:    return QCOM_ITTRIGOUT1;
>> +        case ITTRIGOUT2:    return QCOM_ITTRIGOUT2;
>> +        case ITTRIGOUT3:    return QCOM_ITTRIGOUT3;
>> +        case ITCHOUTACK:    return QCOM_ITCHOUTACK;
>> +        case ITTRIGOUTACK:    return QCOM_ITTRIGOUTACK;
>> +        case ITTRIGOUTACK1:    return QCOM_ITTRIGOUTACK1;
>> +        case ITTRIGOUTACK2:    return QCOM_ITTRIGOUTACK2;
>> +        case ITTRIGOUTACK3:    return QCOM_ITTRIGOUTACK3;
>> +        case ITCHIN:               return QCOM_ITCHIN;
>> +        case ITTRIGIN:            return QCOM_ITTRIGIN;
>> +        case ITTRIGIN1:        return QCOM_ITTRIGIN1;
>> +        case ITTRIGIN2:        return QCOM_ITTRIGIN2;
>> +        case ITTRIGIN3:        return QCOM_ITTRIGIN3;
>> +        default:
>> +                    WARN(1, "Unknown offset=%u\n", offset);
>> +                    return 0;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static u32 cti_reg_addr_with_nr(struct cti_drvdata *drvdata,
>> +                                          u32 reg, u32 nr)
>> +{
>> +    /* convert to qcom specific offset */
>> +    if (unlikely(drvdata->is_qcom_cti))
>> +        reg = cti_qcom_reg_off(reg);
>> +
>> +    return reg + sizeof(u32) * nr;
>> +}
>> +
>> +static u32 cti_reg_addr(struct cti_drvdata *drvdata, u32 reg)
>> +{
>> +        return cti_reg_addr_with_nr(drvdata, reg, 0);
>> +}
>> +
>>
>>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-09 12:19                 ` Leo Yan
@ 2025-12-09 12:51                   ` Yingchao Deng (Consultant)
  2025-12-09 14:24                     ` Leo Yan
  0 siblings, 1 reply; 26+ messages in thread
From: Yingchao Deng (Consultant) @ 2025-12-09 12:51 UTC (permalink / raw)
  To: Leo Yan, Yingchao Deng
  Cc: mike.leach, alexander.shishkin, coresight, james.clark,
	jinlong.mao, linux-arm-kernel, linux-arm-msm, linux-kernel,
	quic_jinlmao, suzuki.poulose, tingwei.zhang

Hi Leo,

On 12/9/2025 8:19 PM, Leo Yan wrote:
> Hi Yingchao,
>
> On Tue, Dec 09, 2025 at 04:16:28PM +0800, Yingchao Deng wrote:
>> Hi Leo & Mike
>>
>> Based on Leo’s suggestions, I created a new patch, but there are three points that do not fully align with his recommendations:
>>
>>      1. The helper function for returning the register address now returns only the offset, because returning the full address would conflict with cti_write_single_reg.
> No need to change each callsite for cti_write_single_reg().  You could
> update cti_write_single_reg() instead:
>
>    void cti_write_single_reg(struct cti_drvdata *drvdata,
>                              int offset, u32 value)
>    {
>            CS_UNLOCK(drvdata->base);
>            writel_relaxed(value, cti_reg_addr(drvdata, offset));
>            CS_LOCK(drvdata->base);
>    }

However, since we also need to handle cti_reg_addr_with_nr, it will be 
necessary to add an additional parameter "nr" to cti_write_single_reg?

Thanks
Yingchao
>>      2. For registers such as triginstatus1...3, I defined additional macros CTITRIGINSTATUS1...3. This is because CTITRIGINSTATUS + 0x4 equals CTITRIGOUTSTATUS, and to avoid conflicts with existing macros, I chose numbers starting from 0x1000 for the new definitions.
> To avoid the register naming pollution, please don't define the common
> names but only used for Qcom registers.
>
> AFAIK, you even don't need to define these registers.  These registers
> are only used for sysfs knobs, we can define an extra "nr" field (e.g.,
> bits[31..28] for indexing these registers, something like:
>
>    #define CIT_REG_NR_SHIFT          28
>    #define CIT_REG_NR_MASK           GENMASK(31, 28)
>    #define CTI_REG_GET_NR(reg)       FIELD_GET(CIT_REG_NR_MASK, (reg))
>    #define CTI_REG_SET_NR(reg, nr)   ((reg) | FIELD_PREP(CIT_REG_NR_MASK, (nr))
>
>    static struct attribute *coresight_cti_regs_attrs[] = {
>    ...
>      coresight_cti_reg(triginstatus, CTITRIGINSTATUS),
>      coresight_cti_reg(triginstatus1, CTI_REG_SET_NR(CTITRIGINSTATUS, 1)),
>      coresight_cti_reg(triginstatus2, CTI_REG_SET_NR(CTITRIGINSTATUS, 2)),
>      coresight_cti_reg(triginstatus3, CTI_REG_SET_NR(CTITRIGINSTATUS, 3)),
>    ...
>
> Then, you just need to decode "nr" fields in cti_qcom_reg_off().
>
>>      3. Regarding the visibility of attributes for triginstatus1...3, since coresight_cti_reg produces an anonymous variable that cannot be directly referenced, I used coresight_cti_regs_attrs[i] to obtain the attribute corresponding to triginstatus1.
> Okay, I get the meaning for "an anonymous variable" - there have no
> field naming when define attr with the macro coresight_cti_reg().
>
> but you could comparing the attr string?
>
>    if (!strcmp(attr->name, "triginstatus1") ||
>        !strcmp(attr->name, "triginstatus2") ||
>        !strcmp(attr->name, "triginstatus3"))
>        ...
>
> Thanks,
> Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-08 14:47             ` Mike Leach
  2025-12-09  8:16               ` Yingchao Deng
@ 2025-12-09 13:59               ` Leo Yan
  1 sibling, 0 replies; 26+ messages in thread
From: Leo Yan @ 2025-12-09 13:59 UTC (permalink / raw)
  To: Mike Leach
  Cc: Yingchao Deng, Suzuki K Poulose, James Clark, Alexander Shishkin,
	Tingwei Zhang, quic_yingdeng, coresight, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jinlong Mao, Mao Jinlong

Hi Mike,

On Mon, Dec 08, 2025 at 02:47:21PM +0000, Mike Leach wrote:

[...]

> > I tested locally and did not see the GCC complaint for this approach.
> > And this is a global structure with about 16KiB (~4K items x
> 
> Which is precisely the issue - why use 16k bytes of space when a pair
> of indexed tables will use 21 x 32bit locations per table -> 168 bytes
> - 100x smaller!
> 
> This space matters little to high end server systems but is much more
> important in smaller embedded systems.

For the concern of performance and footprint, my approach can
avoid any conversion for standard registers, we end up need to
convert registers for non-standard registers anyway.

I understand your concern for using an array for conversion, this is
cost 16KiB memory but this can benefit a bit performance.  It is a
trade-off between memory and speed.  As said, we can use a static
function for register conversion, the side effect is this might cause
more time.

Given the CTI MMIO register access, I don't think an extra branch
instruction (checking the flag) would cause significant panelty,
given the flag is set once at init and never changed afterwards.

> Moreover the table + inline helper is more efficient at extracting the
> correct offset value. The helper is a simple de-reference - whereas
> the helper functions you suggest require the code to make the
> comparison at every register access.
> The "if qcom ..." may be contained in one place in the source code,
> but is called and executed for every access.
> 
> Why add inefficiencies, either in footprint or execution?

This is about how we design a driver that supports both a standard IP
and non-standard implementations.

Because the standard IP is well defined, its register layout should be
the default; it keeps the code simple and makes future CTI extensions
easier.  For non-standard IPs, we only apply the register translations
needed.

TBH, the optimization topic is a bit over design for me now.  The CTI
module is configured once and remains untouched until it is disabled,
so it is not a hot path.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
  2025-12-09 12:51                   ` Yingchao Deng (Consultant)
@ 2025-12-09 14:24                     ` Leo Yan
  0 siblings, 0 replies; 26+ messages in thread
From: Leo Yan @ 2025-12-09 14:24 UTC (permalink / raw)
  To: Yingchao Deng (Consultant)
  Cc: mike.leach, alexander.shishkin, coresight, james.clark,
	jinlong.mao, linux-arm-kernel, linux-arm-msm, linux-kernel,
	quic_jinlmao, suzuki.poulose, tingwei.zhang

On Tue, Dec 09, 2025 at 08:51:38PM +0800, Yingchao Deng (Consultant) wrote:

[...]

> >    void cti_write_single_reg(struct cti_drvdata *drvdata,
> >                              int offset, u32 value)
> >    {
> >            CS_UNLOCK(drvdata->base);
> >            writel_relaxed(value, cti_reg_addr(drvdata, offset));
> >            CS_LOCK(drvdata->base);
> >    }
> 
> However, since we also need to handle cti_reg_addr_with_nr, it will be
> necessary to add an additional parameter "nr" to cti_write_single_reg?

I expect the argument "offset" has already containted the nr in
bits[31..28], so don't need to pass "nr" parameter to
cti_write_single_reg().

You will change inen_store() / outen_store(), e.g.,:

    cti_write_single_reg(drvdata, CTI_REG_SET_NR(CTIINEN, index),
                         value);

Just remind, this might be a separate refactor for common code and you
need to write a patch for this, then is followed by QCOM CTI support
patch.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2025-12-09 14:24 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-02  6:42 [PATCH v6 0/2] Add Qualcomm extended CTI support Yingchao Deng
2025-12-02  6:42 ` [PATCH v6 1/2] coresight: cti: Convert trigger usage fields to dynamic bitmaps and arrays Yingchao Deng
2025-12-04  9:54   ` Mike Leach
2025-12-02  6:42 ` [PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support Yingchao Deng
2025-12-03 18:29   ` Leo Yan
2025-12-04  8:38     ` Leo Yan
2025-12-04  9:04       ` Mike Leach
2025-12-04 10:02         ` Leo Yan
2025-12-04  9:07     ` Mike Leach
2025-12-04 10:31       ` Leo Yan
2025-12-04 16:17         ` Mike Leach
2025-12-05 10:04           ` Leo Yan
2025-12-08 14:47             ` Mike Leach
2025-12-09  8:16               ` Yingchao Deng
2025-12-09  9:40                 ` Jie Gan
2025-12-09 11:03                 ` Jie Gan
2025-12-09 12:42                   ` Yingchao Deng (Consultant)
2025-12-09 12:19                 ` Leo Yan
2025-12-09 12:51                   ` Yingchao Deng (Consultant)
2025-12-09 14:24                     ` Leo Yan
2025-12-09 13:59               ` Leo Yan
2025-12-04  9:15     ` Mike Leach
2025-12-04 10:47       ` Leo Yan
2025-12-04 15:07         ` Mike Leach
2025-12-05 10:27           ` Leo Yan
2025-12-08 14:25             ` Mike Leach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).