linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Ampere Computing ETMv4.x Support
@ 2023-01-20  0:51 Steve Clevenger
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Clevenger @ 2023-01-20  0:51 UTC (permalink / raw)
  To: mathieu.poirier, suzuki.poulose
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel

First pass at Ampere ETMv4.x support. Added Ampere ETM ID, and changes
required by the Ampere ETMv4.x hardware implementation.

Steve Clevenger (3):
  coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
  coresight etm4x: Add 32-bit read/write option to split 64-bit words
  coresight etm4x: Add pr_debug statement for Coresight component
    PID/CID

 drivers/amba/bus.c                            |   7 ++
 .../coresight/coresight-etm4x-core.c          | 115 ++++++++++++++----
 drivers/hwtracing/coresight/coresight-etm4x.h |  34 ++++++
 3 files changed, 131 insertions(+), 25 deletions(-)

-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 0/3] Ampere Computing ETMv4.x Support
@ 2023-03-06  5:54 Steve Clevenger
  2023-03-06  5:54 ` [PATCH 1/3] Add known list of Ampere ETMv4 errata Steve Clevenger
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Steve Clevenger @ 2023-03-06  5:54 UTC (permalink / raw)
  To: mathieu.poirier, suzuki.poulose
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel

Ampere ETMv4.x support. Added Ampere ETM ID, and changes required by
the Ampere ETMv4.x hardware implementation.

Steve Clevenger (3):
  Add known list of Ampere ETMv4 errata
  coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
  coresight etm4x: Add 32-bit read/write option to split 64-bit words

 Documentation/arm64/silicon-errata.rst        |  6 +-
 .../coresight/coresight-etm4x-core.c          | 50 +++++++++++-----
 drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
 include/linux/coresight.h                     |  3 +
 4 files changed, 89 insertions(+), 28 deletions(-)

-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/3] Add known list of Ampere ETMv4 errata
  2023-03-06  5:54 [PATCH 0/3] Ampere Computing ETMv4.x Support Steve Clevenger
@ 2023-03-06  5:54 ` Steve Clevenger
  2023-03-06  5:54 ` [PATCH 2/3] coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read Steve Clevenger
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Steve Clevenger @ 2023-03-06  5:54 UTC (permalink / raw)
  To: mathieu.poirier, suzuki.poulose
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel

An Ampere Computing design decision is MMIO reads are considered the
same as an external debug access. If TRCOSLAR.OSLK is left set, a
TRCIDR1 access results in a bus fault followed by a kernel panic. A
TRCIDR1 read should be valid regardless of TRCOSLAR.OSLK provided MMIO
access (now deprecated) is supported.

The Ampere work around is to early clear ETM TRCOSLAR.OSLK prior to
TRCIDR1 access. Do not distinguish between manufacturers.
AC03_DEBUG_06 is described in the AmpereOne Developer Errata document.

An Ampere Computing design decision to not support 64-bit read/write
access in the ETMv4.6 implementation.

The Ampere work around is to split ETMv4.6 64-bit register access
into 2 ea. 32-bit read/write accesses.
AC03_DEBUG_10 is described in the AmpereOne Developer Errata document:

https://solutions.amperecomputing.com/customer-connect/products/AmpereOne-device-documentation

Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com>
---
 Documentation/arm64/silicon-errata.rst | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index ec5f889d7681..a4de304a8bd8 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -205,7 +205,11 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | Qualcomm Tech. | Kryo4xx Gold    | N/A             | ARM64_ERRATUM_1286807       |
 +----------------+-----------------+-----------------+-----------------------------+
-
 +----------------+-----------------+-----------------+-----------------------------+
 | Fujitsu        | A64FX           | E#010001        | FUJITSU_ERRATUM_010001      |
 +----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
+| Ampere         | AmpereOne       | AC03_DEBUG_06   | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
+| Ampere         | AmpereOne       | AC03_DEBUG_10   | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/3] coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
  2023-03-06  5:54 [PATCH 0/3] Ampere Computing ETMv4.x Support Steve Clevenger
  2023-03-06  5:54 ` [PATCH 1/3] Add known list of Ampere ETMv4 errata Steve Clevenger
@ 2023-03-06  5:54 ` Steve Clevenger
  2023-03-17 11:58   ` Suzuki K Poulose
  2023-03-06  5:54 ` [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words Steve Clevenger
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Steve Clevenger @ 2023-03-06  5:54 UTC (permalink / raw)
  To: mathieu.poirier, suzuki.poulose
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel

An Ampere Computing design decision is MMIO reads are considered the
same as an external debug access. If TRCOSLAR.OSLK is left set, the
TRCIDR1 access results in a bus fault followed by a kernel panic. A
TRCIDR1 read should be valid regardless of TRCOSLAR.OSLK provided MMIO
access (now deprecated) is supported.

The Ampere work around is to early clear ETM TRCOSLAR.OSLK prior to
TRCIDR1 access. Do not distinguish between manufacturers.
AC03_DEBUG_06 is described in the AmpereOne Developer Errata:

Add Ampere ETM PID required for Coresight ETM driver support.

https://solutions.amperecomputing.com/customer-connect/products/AmpereOne-device-documentation

Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com>
---
 .../coresight/coresight-etm4x-core.c          | 50 ++++++++++++++-----
 include/linux/coresight.h                     |  3 ++
 2 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 104333c2c8a3..6f7a14c4c78c 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -1047,6 +1047,7 @@ static bool etm4_init_sysreg_access(struct etmv4_drvdata *drvdata,
 	case ETM_DEVARCH_ETMv4x_ARCH:
 		*csa = (struct csdev_access) {
 			.io_mem	= false,
+			.no_quad_mmio = false,
 			.read	= etm4x_sysreg_read,
 			.write	= etm4x_sysreg_write,
 		};
@@ -1054,6 +1055,7 @@ static bool etm4_init_sysreg_access(struct etmv4_drvdata *drvdata,
 	case ETM_DEVARCH_ETE_ARCH:
 		*csa = (struct csdev_access) {
 			.io_mem	= false,
+			.no_quad_mmio = false,
 			.read	= ete_sysreg_read,
 			.write	= ete_sysreg_write,
 		};
@@ -1069,8 +1071,20 @@ static bool etm4_init_sysreg_access(struct etmv4_drvdata *drvdata,
 static bool etm4_init_iomem_access(struct etmv4_drvdata *drvdata,
 				   struct csdev_access *csa)
 {
-	u32 devarch = readl_relaxed(drvdata->base + TRCDEVARCH);
-	u32 idr1 = readl_relaxed(drvdata->base + TRCIDR1);
+	u32 devarch;
+	u32 idr1;
+
+	/* Detect the support for OS Lock before we actually use it */
+	etm_detect_os_lock(drvdata, csa);
+
+	/*
+	 * An ETM implementation may consider memory mapped I/O
+	 * an external debug access so clear TRCOSLAR.OSLK early.
+	 */
+	etm4_os_unlock_csa(drvdata, csa);
+
+	devarch = readl_relaxed(drvdata->base + TRCDEVARCH);
+	idr1 = readl_relaxed(drvdata->base + TRCIDR1);
 
 	/*
 	 * All ETMs must implement TRCDEVARCH to indicate that
@@ -1089,7 +1103,10 @@ static bool etm4_init_iomem_access(struct etmv4_drvdata *drvdata,
 		drvdata->arch = etm_trcidr_to_arch(idr1);
 	}
 
-	*csa = CSDEV_ACCESS_IOMEM(drvdata->base);
+	/* preserve pre-initialized fields */
+	csa->io_mem = true;
+	csa->base = drvdata->base;
+
 	return true;
 }
 
@@ -1151,18 +1168,13 @@ static void etm4_init_arch_data(void *info)
 	csa = init_arg->csa;
 
 	/*
-	 * If we are unable to detect the access mechanism,
-	 * or unable to detect the trace unit type, fail
-	 * early.
+	 * Make sure all registers are accessible. If we are unable
+	 * to detect the access mechanism, or unable to detect the
+	 * trace unit type, fail early.
 	 */
-	if (!etm4_init_csdev_access(drvdata, csa))
-		return;
-
-	/* Detect the support for OS Lock before we actually use it */
-	etm_detect_os_lock(drvdata, csa);
+	if (!etm4_init_csdev_access(drvdata, csa)) 
+               return;
 
-	/* Make sure all registers are accessible */
-	etm4_os_unlock_csa(drvdata, csa);
 	etm4_cs_unlock(drvdata, csa);
 
 	etm4_check_arch_features(drvdata, init_arg->pid);
@@ -2084,6 +2096,17 @@ static int etm4_probe(struct device *dev, void __iomem *base, u32 etm_pid)
 	init_arg.csa = &access;
 	init_arg.pid = etm_pid;
 
+        /*
+         * Ampere ETM v4.6 does not suport 64-bit MMIO access.
+         * This mask isolates the manufacturer JEP106 ID in the PID.
+         * TRCPIDR2 (JEDC|DES_1) << 16 | TRCPIDR1 (DES_0) << 8).
+         * Ampere does not currently use the implementation defined
+         * TRCIDR0 PART_0 and/or TRCIDR1 PART_1 fields to further
+         * describe the ETMv4 parts.
+	 */
+	if ((init_arg.pid & 0x000FF000) == 0x00096000)
+		init_arg.csa->no_quad_mmio = true;
+
 	/*
 	 * Serialize against CPUHP callbacks to avoid race condition
 	 * between the smp call and saving the delayed probe.
@@ -2249,6 +2272,7 @@ static const struct amba_id etm4_ids[] = {
 	CS_AMBA_ID(0x000bb95e),			/* Cortex-A57 */
 	CS_AMBA_ID(0x000bb95a),			/* Cortex-A72 */
 	CS_AMBA_ID(0x000bb959),			/* Cortex-A73 */
+	CS_AMBA_UCI_ID(0x00096000, uci_id_etm4),/* Ampere ARMv8 */
 	CS_AMBA_UCI_ID(0x000bb9da, uci_id_etm4),/* Cortex-A35 */
 	CS_AMBA_UCI_ID(0x000bbd05, uci_id_etm4),/* Cortex-A55 */
 	CS_AMBA_UCI_ID(0x000bbd0a, uci_id_etm4),/* Cortex-A75 */
diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index f19a47b9bb5a..285d7aad6ef4 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -116,12 +116,14 @@ struct coresight_platform_data {
  * struct csdev_access - Abstraction of a CoreSight device access.
  *
  * @io_mem	: True if the device has memory mapped I/O
+ * @no_quad_mmio: True if 64-bit register access is not suported
  * @base	: When io_mem == true, base address of the component
  * @read	: Read from the given "offset" of the given instance.
  * @write	: Write "val" to the given "offset".
  */
 struct csdev_access {
 	bool io_mem;
+	bool no_quad_mmio;
 	union {
 		void __iomem *base;
 		struct {
@@ -135,6 +137,7 @@ struct csdev_access {
 #define CSDEV_ACCESS_IOMEM(_addr)		\
 	((struct csdev_access)	{		\
 		.io_mem		= true,		\
+		.no_quad_mmio	= false,	\
 		.base		= (_addr),	\
 	})
 
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-06  5:54 [PATCH 0/3] Ampere Computing ETMv4.x Support Steve Clevenger
  2023-03-06  5:54 ` [PATCH 1/3] Add known list of Ampere ETMv4 errata Steve Clevenger
  2023-03-06  5:54 ` [PATCH 2/3] coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read Steve Clevenger
@ 2023-03-06  5:54 ` Steve Clevenger
  2023-03-08  7:04   ` Leo Yan
  2023-03-06 10:29 ` [PATCH 0/3] Ampere Computing ETMv4.x Support Suzuki K Poulose
  2023-03-06 10:54 ` Suzuki K Poulose
  4 siblings, 1 reply; 23+ messages in thread
From: Steve Clevenger @ 2023-03-06  5:54 UTC (permalink / raw)
  To: mathieu.poirier, suzuki.poulose
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel

An Ampere Computing design decision to not support 64-bit read/write
access in the ETMv4.6 implementation.

The Ampere work around is to split ETMv4.6 64-bit register access
into 2 ea. 32-bit read/write accesses.
AC03_DEBUG_10 is described in the AmpereOne Developer Errata document:

https://solutions.amperecomputing.com/customer-connect/products/AmpereOne-device-documentation

Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com>
---
 drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
 1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index 434f4e95ee17..17457ec71876 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -539,11 +539,6 @@
 		 readl_relaxed((csa)->base + (offset)) :		\
 		 read_etm4x_sysreg_offset((offset), false)))
 
-#define etm4x_relaxed_read64(csa, offset)				\
-	((u64)((csa)->io_mem ?						\
-		 readq_relaxed((csa)->base + (offset)) :		\
-		 read_etm4x_sysreg_offset((offset), true)))
-
 #define etm4x_read32(csa, offset)					\
 	({								\
 		u32 __val = etm4x_relaxed_read32((csa), (offset));	\
@@ -567,15 +562,6 @@
 						  false);		\
 	} while (0)
 
-#define etm4x_relaxed_write64(csa, val, offset)				\
-	do {								\
-		if ((csa)->io_mem)					\
-			writeq_relaxed((val), (csa)->base + (offset));	\
-		else							\
-			write_etm4x_sysreg_offset((val), (offset),	\
-						  true);		\
-	} while (0)
-
 #define etm4x_write32(csa, val, offset)					\
 	do {								\
 		__io_bw();						\
@@ -1091,6 +1077,50 @@ void etm4_config_trace_mode(struct etmv4_config *config);
 u64 etm4x_sysreg_read(u32 offset, bool _relaxed, bool _64bit);
 void etm4x_sysreg_write(u64 val, u32 offset, bool _relaxed, bool _64bit);
 
+/* 64-bit aligned to convert 64-bit access to 2 ea. 32-bit access */
+#pragma pack(push, 8)
+
+struct etm_quad_split {
+	u32 lsw;
+	u32 msw;
+};
+
+#pragma pack(pop)
+
+static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
+{
+	if (csa->io_mem) {
+		if (csa->no_quad_mmio) {
+			/* split 64-bit reads into 2 consecutive 32-bit reads */
+			struct etm_quad_split container;
+
+			container.lsw = etm4x_read32(csa, offset);
+			container.msw = etm4x_read32(csa, offset + sizeof(u32));
+
+			return *(u64 *) &container;
+		} else
+			return readq_relaxed(csa->base + offset);
+	} else
+		return read_etm4x_sysreg_offset(offset, true);
+}
+
+static inline void etm4x_relaxed_write64(struct csdev_access *csa, u64 quad, unsigned int offset)
+{
+	if (csa->io_mem) {
+		/* split 64-bit writes into 2 consecutive 32-bit writes */
+		if (csa->no_quad_mmio) {
+			struct etm_quad_split container;
+
+			*(u64 *) &container = quad;
+
+			etm4x_relaxed_write32(csa, container.lsw, offset);
+			etm4x_relaxed_write32(csa, container.msw, offset + sizeof(u32));
+		} else
+			writeq_relaxed(quad, csa->base + offset);
+	} else
+		write_etm4x_sysreg_offset(quad, offset, true);		\
+}
+
 static inline bool etm4x_is_ete(struct etmv4_drvdata *drvdata)
 {
 	return drvdata->arch >= ETM_ARCH_ETE;
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] Ampere Computing ETMv4.x Support
  2023-03-06  5:54 [PATCH 0/3] Ampere Computing ETMv4.x Support Steve Clevenger
                   ` (2 preceding siblings ...)
  2023-03-06  5:54 ` [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words Steve Clevenger
@ 2023-03-06 10:29 ` Suzuki K Poulose
  2023-03-07  1:23   ` Steve Clevenger
  2023-03-06 10:54 ` Suzuki K Poulose
  4 siblings, 1 reply; 23+ messages in thread
From: Suzuki K Poulose @ 2023-03-06 10:29 UTC (permalink / raw)
  To: Steve Clevenger, mathieu.poirier
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel

Hi Steve,

On 06/03/2023 05:54, Steve Clevenger wrote:
> Ampere ETMv4.x support. Added Ampere ETM ID, and changes required by
> the Ampere ETMv4.x hardware implementation.
> 

I don't see any mention of the access via system instructions. Where did
that end up ? What is the conclusion on that front ?

Kind regards
Suzuki


> Steve Clevenger (3):
>    Add known list of Ampere ETMv4 errata
>    coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
>    coresight etm4x: Add 32-bit read/write option to split 64-bit words
> 
>   Documentation/arm64/silicon-errata.rst        |  6 +-
>   .../coresight/coresight-etm4x-core.c          | 50 +++++++++++-----
>   drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
>   include/linux/coresight.h                     |  3 +
>   4 files changed, 89 insertions(+), 28 deletions(-)
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] Ampere Computing ETMv4.x Support
  2023-03-06  5:54 [PATCH 0/3] Ampere Computing ETMv4.x Support Steve Clevenger
                   ` (3 preceding siblings ...)
  2023-03-06 10:29 ` [PATCH 0/3] Ampere Computing ETMv4.x Support Suzuki K Poulose
@ 2023-03-06 10:54 ` Suzuki K Poulose
  2023-03-07  1:24   ` Steve Clevenger
  4 siblings, 1 reply; 23+ messages in thread
From: Suzuki K Poulose @ 2023-03-06 10:54 UTC (permalink / raw)
  To: Steve Clevenger, mathieu.poirier
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel

On 06/03/2023 05:54, Steve Clevenger wrote:
> Ampere ETMv4.x support. Added Ampere ETM ID, and changes required by
> the Ampere ETMv4.x hardware implementation.
> 

nit: Subject should be :

[PATCH <version> 0/X] <subsystem>: <component>: <Title>

in this case case:

<version> = NULL if version == 1
           = version, otherwise
Helps us to keep track of different versions of the postings.

<subsystem> == coresight

This helps the reviewers to "notice" or "ignore" a given series
depending on their interest.

<component> == etm4x
This further helps the reviewers to filter the mails within a subsystem.

[PATCH v2 0/3] coresight: etm4x: Support for Ampere computing

Also, please include a changelog from the previous version to indicate
what has changed. e.g,

"Changes since v1:
   - Modified xyz
   - Dropped abc
   - Addressed comments on ijk (<name of the requester>
"

That helps the reviewers to get a picture of what they should be looking
at and better spend their time.

Kind regards
Suzuki
> Steve Clevenger (3):
>    Add known list of Ampere ETMv4 errata
>    coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
>    coresight etm4x: Add 32-bit read/write option to split 64-bit words
> 
>   Documentation/arm64/silicon-errata.rst        |  6 +-
>   .../coresight/coresight-etm4x-core.c          | 50 +++++++++++-----
>   drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
>   include/linux/coresight.h                     |  3 +
>   4 files changed, 89 insertions(+), 28 deletions(-)
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] Ampere Computing ETMv4.x Support
  2023-03-06 10:29 ` [PATCH 0/3] Ampere Computing ETMv4.x Support Suzuki K Poulose
@ 2023-03-07  1:23   ` Steve Clevenger
  2023-03-07 10:21     ` Suzuki K Poulose
  0 siblings, 1 reply; 23+ messages in thread
From: Steve Clevenger @ 2023-03-07  1:23 UTC (permalink / raw)
  To: Suzuki K Poulose, mathieu.poirier
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel


Hi Suzuki,

To answer your question, Ampere plans to use the existing MMIO
implementation to introduce CoreSight HW Assisted Trace since we're
preparing for release. As a minimum, we know this would require a respin
of our CoreSight DSDT. Also, if I didn't misunderstand, it sounded like
you planned supporting work (e.g. ETMv4 not handled as an AMBA
device). Since our ETMv4 sink components (ETF, ETR, +CATU) remain memory
mapped, do these remain AMBA?

We understand ETMv4/ETE MMIO is going away. As a sysreg quick test, I
bypassed the code which checks for an ETMv4 base address in order to
default to sysreg access. Trace collection failed with an error. I don't
have the time to chase after this right now, but I intend to budget the
time in the near future.

Thanks and regards,
Steve

On 3/6/2023 2:29 AM, Suzuki K Poulose wrote:
> Hi Steve,
> 
> On 06/03/2023 05:54, Steve Clevenger wrote:
>> Ampere ETMv4.x support. Added Ampere ETM ID, and changes required by
>> the Ampere ETMv4.x hardware implementation.
>>
> 
> I don't see any mention of the access via system instructions. Where did
> that end up ? What is the conclusion on that front ?
> 
> Kind regards
> Suzuki
> 
> 
>> Steve Clevenger (3):
>>    Add known list of Ampere ETMv4 errata
>>    coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
>>    coresight etm4x: Add 32-bit read/write option to split 64-bit words
>>
>>   Documentation/arm64/silicon-errata.rst        |  6 +-
>>   .../coresight/coresight-etm4x-core.c          | 50 +++++++++++-----
>>   drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
>>   include/linux/coresight.h                     |  3 +
>>   4 files changed, 89 insertions(+), 28 deletions(-)
>>
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] Ampere Computing ETMv4.x Support
  2023-03-06 10:54 ` Suzuki K Poulose
@ 2023-03-07  1:24   ` Steve Clevenger
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Clevenger @ 2023-03-07  1:24 UTC (permalink / raw)
  To: Suzuki K Poulose, mathieu.poirier
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel


Hi Suzuki,

I submitted the Ampere patches on a new thread to forego the delta. But,
here's a summary of the changes from Version 1. Please allow some nits
(title, etc.) to remain in place so I don't have to resubmit all the
patch files.

Thanks,
Steve

Version 2 changes:
Addressed Suzuki Poulose, Mike Leach, and James Clark V1 comments:

1. Moved early clear of TRCOSLAR.OSLK code from etm4_init_arch_data to
etm4_init_iomem_access. The early clear is now done for all
manufacturers, instead of based on drvdata mmio_external flag (set for
Ampere). The flag is deprecated.
2. Moved et4x_split_read64 implementation into etm4x_relaxed_read64, and
etm4x_split_write64 into etm4x_relaxed_write64. This cleared up code
changes in coresight-etm4x-core.c where ever these were used.
3. etm4x_relaxed_read64 and etm4x_relaxed_write64 are implemented as
static inline instead of macro definitions.
4. Removed struct etm4_drvdata mmio_external flag, and moved
no_quad_mmio flag to struct csdev_access. Then ensure pre-initialized
csdev_access fields get preserved.
5. Dropped /linux/drivers/amba/bus.c debug statement patch to provide
visibility to CoreSight PID/CID.

On 3/6/2023 2:54 AM, Suzuki K Poulose wrote:
> On 06/03/2023 05:54, Steve Clevenger wrote:
>> Ampere ETMv4.x support. Added Ampere ETM ID, and changes required by
>> the Ampere ETMv4.x hardware implementation.
>>
> 
> nit: Subject should be :
> 
> [PATCH <version> 0/X] <subsystem>: <component>: <Title>
> 
> in this case case:
> 
> <version> = NULL if version == 1
>           = version, otherwise
> Helps us to keep track of different versions of the postings.
> 
> <subsystem> == coresight
> 
> This helps the reviewers to "notice" or "ignore" a given series
> depending on their interest.
> 
> <component> == etm4x
> This further helps the reviewers to filter the mails within a subsystem.
> 
> [PATCH v2 0/3] coresight: etm4x: Support for Ampere computing
> 
> Also, please include a changelog from the previous version to indicate
> what has changed. e.g,
> 
> "Changes since v1:
>   - Modified xyz
>   - Dropped abc
>   - Addressed comments on ijk (<name of the requester>
> "
> 
> That helps the reviewers to get a picture of what they should be looking
> at and better spend their time.
> 
> Kind regards
> Suzuki
>> Steve Clevenger (3):
>>    Add known list of Ampere ETMv4 errata
>>    coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
>>    coresight etm4x: Add 32-bit read/write option to split 64-bit words
>>
>>   Documentation/arm64/silicon-errata.rst        |  6 +-
>>   .../coresight/coresight-etm4x-core.c          | 50 +++++++++++-----
>>   drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
>>   include/linux/coresight.h                     |  3 +
>>   4 files changed, 89 insertions(+), 28 deletions(-)
>>
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] Ampere Computing ETMv4.x Support
  2023-03-07  1:23   ` Steve Clevenger
@ 2023-03-07 10:21     ` Suzuki K Poulose
  2023-03-07 23:39       ` Steve Clevenger
  0 siblings, 1 reply; 23+ messages in thread
From: Suzuki K Poulose @ 2023-03-07 10:21 UTC (permalink / raw)
  To: scclevenger, mathieu.poirier
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel

Hi Steve

On 07/03/2023 01:23, Steve Clevenger wrote:
> 
> Hi Suzuki,
> 
> To answer your question, Ampere plans to use the existing MMIO
> implementation to introduce CoreSight HW Assisted Trace since we're
> preparing for release. As a minimum, we know this would require a respin

Please note that MMIO interface is for "external debug" not for 
self-hosted usecase, with system instruction support. This is why you need
to work around the driver "as an errata".


> of our CoreSight DSDT. Also, if I didn't misunderstand, it sounded like
> you planned supporting work (e.g. ETMv4 not handled as an AMBA
> device). Since our ETMv4 sink components (ETF, ETR, +CATU) remain memory
> mapped, do these remain AMBA?

Just to be clear, all of these components will be MMIO. We are simply
moving the "Linux" driver framework from AMBA driver to platform device.
This will be done in stages. ETMv4x would be moved in the first pass to
allow us to support system instruction based devices seemlessly with
ACPI.

> 
> We understand ETMv4/ETE MMIO is going away. As a sysreg quick test, I

As mentioned above it is not going away. MMIO is the way to access if
that is *the only* access mode. For ETMv4.4+ and ETE MMIO system
instructions is *the mode* of access for self-hosted. Please be aware
that, going down this route may prevent them being virtualised (when
we add the support in the near future).

> bypassed the code which checks for an ETMv4 base address in order to
> default to sysreg access. Trace collection failed with an error. I don't
> have the time to chase after this right now, but I intend to budget the
> time in the near future.

If we can get to the bottom of this, we should be able to support
the platform in a future proof way than adding this ugly work around
of accessing via the *external MMIO* interface.

Suzuki

> 
> Thanks and regards,
> Steve
> 
> On 3/6/2023 2:29 AM, Suzuki K Poulose wrote:
>> Hi Steve,
>>
>> On 06/03/2023 05:54, Steve Clevenger wrote:
>>> Ampere ETMv4.x support. Added Ampere ETM ID, and changes required by
>>> the Ampere ETMv4.x hardware implementation.
>>>
>>
>> I don't see any mention of the access via system instructions. Where did
>> that end up ? What is the conclusion on that front ?
>>
>> Kind regards
>> Suzuki
>>
>>
>>> Steve Clevenger (3):
>>>     Add known list of Ampere ETMv4 errata
>>>     coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
>>>     coresight etm4x: Add 32-bit read/write option to split 64-bit words
>>>
>>>    Documentation/arm64/silicon-errata.rst        |  6 +-
>>>    .../coresight/coresight-etm4x-core.c          | 50 +++++++++++-----
>>>    drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
>>>    include/linux/coresight.h                     |  3 +
>>>    4 files changed, 89 insertions(+), 28 deletions(-)
>>>
>>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3] Ampere Computing ETMv4.x Support
  2023-03-07 10:21     ` Suzuki K Poulose
@ 2023-03-07 23:39       ` Steve Clevenger
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Clevenger @ 2023-03-07 23:39 UTC (permalink / raw)
  To: Suzuki K Poulose, mathieu.poirier
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel


Hi Suzuki,

Comments inline.

Steve

On 3/7/2023 2:21 AM, Suzuki K Poulose wrote:
> Hi Steve
> 
> On 07/03/2023 01:23, Steve Clevenger wrote:
>>
>> Hi Suzuki,
>>
>> To answer your question, Ampere plans to use the existing MMIO
>> implementation to introduce CoreSight HW Assisted Trace since we're
>> preparing for release. As a minimum, we know this would require a respin
> 
> Please note that MMIO interface is for "external debug" not for
> self-hosted usecase, with system instruction support. This is why you need
> to work around the driver "as an errata".
> 

Sorry, my earlier response was poorly worded.

> 
>> of our CoreSight DSDT. Also, if I didn't misunderstand, it sounded like
>> you planned supporting work (e.g. ETMv4 not handled as an AMBA
>> device). Since our ETMv4 sink components (ETF, ETR, +CATU) remain memory
>> mapped, do these remain AMBA?
> 
> Just to be clear, all of these components will be MMIO. We are simply
> moving the "Linux" driver framework from AMBA driver to platform device.
> This will be done in stages. ETMv4x would be moved in the first pass to
> allow us to support system instruction based devices seemlessly with
> ACPI.
> 
>>
>> We understand ETMv4/ETE MMIO is going away. As a sysreg quick test, I
> 
> As mentioned above it is not going away. MMIO is the way to access if
> that is *the only* access mode. For ETMv4.4+ and ETE MMIO system
> instructions is *the mode* of access for self-hosted. Please be aware
> that, going down this route may prevent them being virtualised (when
> we add the support in the near future).

Ampere intends to support sysreg access going forward. What we're
lacking in this moment is time to work out the kinks.

> 
>> bypassed the code which checks for an ETMv4 base address in order to
>> default to sysreg access. Trace collection failed with an error. I don't
>> have the time to chase after this right now, but I intend to budget the
>> time in the near future.
> 
> If we can get to the bottom of this, we should be able to support
> the platform in a future proof way than adding this ugly work around
> of accessing via the *external MMIO* interface.

As I mention above, my assessment is it's too far along in our product
cycle to work through unforeseen sysreg issues which might surface. I
plan to devote time to this shortly. At present, a subset of our cores
have working self-hosted trace using MMIO. Our priority right now is to
scale the solution for our SoC. There are challenges here which span
OpenCSD, perf, and possibly the CoreSight driver.

In this patch submission, I've addressed the maintainer comments brought
to my attention. Nothing else has surfaced. The Ampere Linux team meets
every week or biweekly with ARM. I'll ask they add this topic to the
agenda if you believe it's warranted. Otherwise, can I ask your support
to move these upstream?

Thank you

> 
> Suzuki
> 
>>
>> Thanks and regards,
>> Steve
>>
>> On 3/6/2023 2:29 AM, Suzuki K Poulose wrote:
>>> Hi Steve,
>>>
>>> On 06/03/2023 05:54, Steve Clevenger wrote:
>>>> Ampere ETMv4.x support. Added Ampere ETM ID, and changes required by
>>>> the Ampere ETMv4.x hardware implementation.
>>>>
>>>
>>> I don't see any mention of the access via system instructions. Where did
>>> that end up ? What is the conclusion on that front ?
>>>
>>> Kind regards
>>> Suzuki
>>>
>>>
>>>> Steve Clevenger (3):
>>>>     Add known list of Ampere ETMv4 errata
>>>>     coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
>>>>     coresight etm4x: Add 32-bit read/write option to split 64-bit words
>>>>
>>>>    Documentation/arm64/silicon-errata.rst        |  6 +-
>>>>    .../coresight/coresight-etm4x-core.c          | 50 +++++++++++-----
>>>>    drivers/hwtracing/coresight/coresight-etm4x.h | 58
>>>> ++++++++++++++-----
>>>>    include/linux/coresight.h                     |  3 +
>>>>    4 files changed, 89 insertions(+), 28 deletions(-)
>>>>
>>>
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-06  5:54 ` [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words Steve Clevenger
@ 2023-03-08  7:04   ` Leo Yan
  2023-03-08 10:06     ` Al Grant
  0 siblings, 1 reply; 23+ messages in thread
From: Leo Yan @ 2023-03-08  7:04 UTC (permalink / raw)
  To: Steve Clevenger
  Cc: mathieu.poirier, suzuki.poulose, mike.leach, coresight,
	linux-arm-kernel

Hi Steve,

On Sun, Mar 05, 2023 at 10:54:08PM -0700, Steve Clevenger wrote:
> An Ampere Computing design decision to not support 64-bit read/write
> access in the ETMv4.6 implementation.
> 
> The Ampere work around is to split ETMv4.6 64-bit register access
> into 2 ea. 32-bit read/write accesses.
> AC03_DEBUG_10 is described in the AmpereOne Developer Errata document:
> 
> https://solutions.amperecomputing.com/customer-connect/products/AmpereOne-device-documentation
> 
> Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com>
> ---
>  drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
>  1 file changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
> index 434f4e95ee17..17457ec71876 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x.h
> +++ b/drivers/hwtracing/coresight/coresight-etm4x.h
> @@ -539,11 +539,6 @@
>  		 readl_relaxed((csa)->base + (offset)) :		\
>  		 read_etm4x_sysreg_offset((offset), false)))
>  
> -#define etm4x_relaxed_read64(csa, offset)				\
> -	((u64)((csa)->io_mem ?						\
> -		 readq_relaxed((csa)->base + (offset)) :		\
> -		 read_etm4x_sysreg_offset((offset), true)))
> -
>  #define etm4x_read32(csa, offset)					\
>  	({								\
>  		u32 __val = etm4x_relaxed_read32((csa), (offset));	\
> @@ -567,15 +562,6 @@
>  						  false);		\
>  	} while (0)
>  
> -#define etm4x_relaxed_write64(csa, val, offset)				\
> -	do {								\
> -		if ((csa)->io_mem)					\
> -			writeq_relaxed((val), (csa)->base + (offset));	\
> -		else							\
> -			write_etm4x_sysreg_offset((val), (offset),	\
> -						  true);		\
> -	} while (0)
> -
>  #define etm4x_write32(csa, val, offset)					\
>  	do {								\
>  		__io_bw();						\
> @@ -1091,6 +1077,50 @@ void etm4_config_trace_mode(struct etmv4_config *config);
>  u64 etm4x_sysreg_read(u32 offset, bool _relaxed, bool _64bit);
>  void etm4x_sysreg_write(u64 val, u32 offset, bool _relaxed, bool _64bit);
>  
> +/* 64-bit aligned to convert 64-bit access to 2 ea. 32-bit access */
> +#pragma pack(push, 8)
> +
> +struct etm_quad_split {
> +	u32 lsw;
> +	u32 msw;
> +};
> +
> +#pragma pack(pop)
> +
> +static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
> +{
> +	if (csa->io_mem) {
> +		if (csa->no_quad_mmio) {
> +			/* split 64-bit reads into 2 consecutive 32-bit reads */
> +			struct etm_quad_split container;
> +
> +			container.lsw = etm4x_read32(csa, offset);
> +			container.msw = etm4x_read32(csa, offset + sizeof(u32));
> +
> +			return *(u64 *) &container;

To be honest, I am not familiar with this part, just want to remind two
things.

Firstly, here you could use hi_lo_readq_relaxed(), seems to me,
hi_lo_readq_relaxed() does the same thing with above section and
you don't need to define the structure etm_quad_split().

Secondly, IIUC, a main problem with splitting 64-bit access into two
32-bit accesses is breaking atomicity.  If here have race condition
between reading and writing 64-bit registers, we need to consider to
use spinlock for register accessing.

I'd like to leave the second issue to Suzuki/Mike/James for
comfirmation in case I introduced complexity.

> +		} else
> +			return readq_relaxed(csa->base + offset);
> +	} else
> +		return read_etm4x_sysreg_offset(offset, true);

Here need to add brackets:

        } else {
                return read_etm4x_sysreg_offset(offset, true);
        }

> +}
> +
> +static inline void etm4x_relaxed_write64(struct csdev_access *csa, u64 quad, unsigned int offset)
> +{
> +	if (csa->io_mem) {
> +		/* split 64-bit writes into 2 consecutive 32-bit writes */
> +		if (csa->no_quad_mmio) {
> +			struct etm_quad_split container;
> +
> +			*(u64 *) &container = quad;
> +
> +			etm4x_relaxed_write32(csa, container.lsw, offset);
> +			etm4x_relaxed_write32(csa, container.msw, offset + sizeof(u32));
> +		} else
> +			writeq_relaxed(quad, csa->base + offset);
> +	} else
> +		write_etm4x_sysreg_offset(quad, offset, true);		\

Ditto.  Please drop the character '\' at the end of the line.

Thanks,
Leo

P.s. I have a ADLink AVA platform (Ampere Altra SoC with 32 CPUs), I
am glad to give a test if you can confirm this patch set can apply on
it (and please clarify if there have any prerequisite for firmwares).

> +}
> +
>  static inline bool etm4x_is_ete(struct etmv4_drvdata *drvdata)
>  {
>  	return drvdata->arch >= ETM_ARCH_ETE;
> -- 
> 2.25.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-08  7:04   ` Leo Yan
@ 2023-03-08 10:06     ` Al Grant
  2023-03-08 11:26       ` Mike Leach
  0 siblings, 1 reply; 23+ messages in thread
From: Al Grant @ 2023-03-08 10:06 UTC (permalink / raw)
  To: leo.yan@linaro.org, Steve Clevenger
  Cc: mathieu.poirier@linaro.org, mike.leach@linaro.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org

> Hi Steve,
> 
> On Sun, Mar 05, 2023 at 10:54:08PM -0700, Steve Clevenger wrote:
> > An Ampere Computing design decision to not support 64-bit read/write
> > access in the ETMv4.6 implementation.
> >
> > The Ampere work around is to split ETMv4.6 64-bit register access into
> > 2 ea. 32-bit read/write accesses.
> > AC03_DEBUG_10 is described in the AmpereOne Developer Errata document:
> >
> > https://solutions.amperecomputing.com/customer-connect/products/Ampere
> > One-device-documentation
> >
> > Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com>
> > ---
> >  drivers/hwtracing/coresight/coresight-etm4x.h | 58
> > ++++++++++++++-----
> >  1 file changed, 44 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h
> > b/drivers/hwtracing/coresight/coresight-etm4x.h
> > index 434f4e95ee17..17457ec71876 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm4x.h
> > +++ b/drivers/hwtracing/coresight/coresight-etm4x.h
> > @@ -539,11 +539,6 @@
> >  		 readl_relaxed((csa)->base + (offset)) :		\
> >  		 read_etm4x_sysreg_offset((offset), false)))
> >
> > -#define etm4x_relaxed_read64(csa, offset)				\
> > -	((u64)((csa)->io_mem ?						\
> > -		 readq_relaxed((csa)->base + (offset)) :		\
> > -		 read_etm4x_sysreg_offset((offset), true)))
> > -
> >  #define etm4x_read32(csa, offset)					\
> >  	({								\
> >  		u32 __val = etm4x_relaxed_read32((csa), (offset));	\
> > @@ -567,15 +562,6 @@
> >  						  false);		\
> >  	} while (0)
> >
> > -#define etm4x_relaxed_write64(csa, val, offset)
> 	\
> > -	do {								\
> > -		if ((csa)->io_mem)					\
> > -			writeq_relaxed((val), (csa)->base + (offset));	\
> > -		else							\
> > -			write_etm4x_sysreg_offset((val), (offset),	\
> > -						  true);		\
> > -	} while (0)
> > -
> >  #define etm4x_write32(csa, val, offset)
> 	\
> >  	do {								\
> >  		__io_bw();						\
> > @@ -1091,6 +1077,50 @@ void etm4_config_trace_mode(struct
> etmv4_config
> > *config);
> >  u64 etm4x_sysreg_read(u32 offset, bool _relaxed, bool _64bit);  void
> > etm4x_sysreg_write(u64 val, u32 offset, bool _relaxed, bool _64bit);
> >
> > +/* 64-bit aligned to convert 64-bit access to 2 ea. 32-bit access */
> > +#pragma pack(push, 8)
> > +
> > +struct etm_quad_split {
> > +	u32 lsw;
> > +	u32 msw;
> > +};
> > +
> > +#pragma pack(pop)
> > +
> > +static inline u64 etm4x_relaxed_read64(struct csdev_access *csa,
> > +unsigned int offset) {
> > +	if (csa->io_mem) {
> > +		if (csa->no_quad_mmio) {
> > +			/* split 64-bit reads into 2 consecutive 32-bit reads */
> > +			struct etm_quad_split container;
> > +
> > +			container.lsw = etm4x_read32(csa, offset);
> > +			container.msw = etm4x_read32(csa, offset +
> sizeof(u32));
> > +
> > +			return *(u64 *) &container;
> 
> To be honest, I am not familiar with this part, just want to remind two things.
> 
> Firstly, here you could use hi_lo_readq_relaxed(), seems to me,
> hi_lo_readq_relaxed() does the same thing with above section and you don't
> need to define the structure etm_quad_split().
> 
> Secondly, IIUC, a main problem with splitting 64-bit access into two 32-bit
> accesses is breaking atomicity.  If here have race condition between reading and
> writing 64-bit registers, we need to consider to use spinlock for register
> accessing.

That shouldn't be an issue. 32-bit memory-mapped interface is 
normal for ETMs. The peripheral bus interface to ETMs (debug APB)
is maximum 32-bit. The ETM architecture defines that registers are
only single-copy atomic up to 32-bit. This is generally true of
CoreSight programming registers.

What typically happens to a 64-bit read/write from a CPU is that it
is split into two 32-bit accesses by a downsizing bridge on the path
from the main system interconnect to the debug APB (see 4.3.5 
in the ETM architecture spec). In this case, it sounds like there is no
downsizing bridge. But because there should be no existing code
relying on 64-bit atomic access to ETM registers, it should be safe
to split the accesses in software without worrying about atomicity.

Overall the ETM configuration involves complicated interactions
between multiple registers, so if you've got one CPU reading out the
configuration while another CPU is writing it, you've likely got much
bigger problems than 64-bit atomicity.

Al


> 
> I'd like to leave the second issue to Suzuki/Mike/James for comfirmation in case
> I introduced complexity.
> 
> > +		} else
> > +			return readq_relaxed(csa->base + offset);
> > +	} else
> > +		return read_etm4x_sysreg_offset(offset, true);
> 
> Here need to add brackets:
> 
>         } else {
>                 return read_etm4x_sysreg_offset(offset, true);
>         }
> 
> > +}
> > +
> > +static inline void etm4x_relaxed_write64(struct csdev_access *csa,
> > +u64 quad, unsigned int offset) {
> > +	if (csa->io_mem) {
> > +		/* split 64-bit writes into 2 consecutive 32-bit writes */
> > +		if (csa->no_quad_mmio) {
> > +			struct etm_quad_split container;
> > +
> > +			*(u64 *) &container = quad;
> > +
> > +			etm4x_relaxed_write32(csa, container.lsw, offset);
> > +			etm4x_relaxed_write32(csa, container.msw, offset +
> sizeof(u32));
> > +		} else
> > +			writeq_relaxed(quad, csa->base + offset);
> > +	} else
> > +		write_etm4x_sysreg_offset(quad, offset, true);		\
> 
> Ditto.  Please drop the character '\' at the end of the line.
> 
> Thanks,
> Leo
> 
> P.s. I have a ADLink AVA platform (Ampere Altra SoC with 32 CPUs), I am glad to
> give a test if you can confirm this patch set can apply on it (and please clarify if
> there have any prerequisite for firmwares).
> 
> > +}
> > +
> >  static inline bool etm4x_is_ete(struct etmv4_drvdata *drvdata)  {
> >  	return drvdata->arch >= ETM_ARCH_ETE;
> > --
> > 2.25.1
> >
> _______________________________________________
> CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email
> to coresight-leave@lists.linaro.org

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-08 10:06     ` Al Grant
@ 2023-03-08 11:26       ` Mike Leach
  2023-03-08 11:54         ` Leo Yan
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Leach @ 2023-03-08 11:26 UTC (permalink / raw)
  To: Al Grant
  Cc: leo.yan@linaro.org, Steve Clevenger, mathieu.poirier@linaro.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org

Hi,

On Wed, 8 Mar 2023 at 10:07, Al Grant <Al.Grant@arm.com> wrote:
>
> > Hi Steve,
> >
> > On Sun, Mar 05, 2023 at 10:54:08PM -0700, Steve Clevenger wrote:
> > > An Ampere Computing design decision to not support 64-bit read/write
> > > access in the ETMv4.6 implementation.
> > >
> > > The Ampere work around is to split ETMv4.6 64-bit register access into
> > > 2 ea. 32-bit read/write accesses.
> > > AC03_DEBUG_10 is described in the AmpereOne Developer Errata document:
> > >
> > > https://solutions.amperecomputing.com/customer-connect/products/Ampere
> > > One-device-documentation
> > >
> > > Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com>
> > > ---
> > >  drivers/hwtracing/coresight/coresight-etm4x.h | 58
> > > ++++++++++++++-----
> > >  1 file changed, 44 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h
> > > b/drivers/hwtracing/coresight/coresight-etm4x.h
> > > index 434f4e95ee17..17457ec71876 100644
> > > --- a/drivers/hwtracing/coresight/coresight-etm4x.h
> > > +++ b/drivers/hwtracing/coresight/coresight-etm4x.h
> > > @@ -539,11 +539,6 @@
> > >              readl_relaxed((csa)->base + (offset)) :                \
> > >              read_etm4x_sysreg_offset((offset), false)))
> > >
> > > -#define etm4x_relaxed_read64(csa, offset)                          \
> > > -   ((u64)((csa)->io_mem ?                                          \
> > > -            readq_relaxed((csa)->base + (offset)) :                \
> > > -            read_etm4x_sysreg_offset((offset), true)))
> > > -
> > >  #define etm4x_read32(csa, offset)                                  \
> > >     ({                                                              \
> > >             u32 __val = etm4x_relaxed_read32((csa), (offset));      \
> > > @@ -567,15 +562,6 @@
> > >                                               false);               \
> > >     } while (0)
> > >
> > > -#define etm4x_relaxed_write64(csa, val, offset)
> >       \
> > > -   do {                                                            \
> > > -           if ((csa)->io_mem)                                      \
> > > -                   writeq_relaxed((val), (csa)->base + (offset));  \
> > > -           else                                                    \
> > > -                   write_etm4x_sysreg_offset((val), (offset),      \
> > > -                                             true);                \
> > > -   } while (0)
> > > -
> > >  #define etm4x_write32(csa, val, offset)
> >       \
> > >     do {                                                            \
> > >             __io_bw();                                              \
> > > @@ -1091,6 +1077,50 @@ void etm4_config_trace_mode(struct
> > etmv4_config
> > > *config);
> > >  u64 etm4x_sysreg_read(u32 offset, bool _relaxed, bool _64bit);  void
> > > etm4x_sysreg_write(u64 val, u32 offset, bool _relaxed, bool _64bit);
> > >
> > > +/* 64-bit aligned to convert 64-bit access to 2 ea. 32-bit access */
> > > +#pragma pack(push, 8)
> > > +
> > > +struct etm_quad_split {
> > > +   u32 lsw;
> > > +   u32 msw;
> > > +};
> > > +
> > > +#pragma pack(pop)
> > > +
> > > +static inline u64 etm4x_relaxed_read64(struct csdev_access *csa,
> > > +unsigned int offset) {
> > > +   if (csa->io_mem) {
> > > +           if (csa->no_quad_mmio) {
> > > +                   /* split 64-bit reads into 2 consecutive 32-bit reads */
> > > +                   struct etm_quad_split container;
> > > +
> > > +                   container.lsw = etm4x_read32(csa, offset);
> > > +                   container.msw = etm4x_read32(csa, offset +
> > sizeof(u32));
> > > +
> > > +                   return *(u64 *) &container;
> >
> > To be honest, I am not familiar with this part, just want to remind two things.
> >
> > Firstly, here you could use hi_lo_readq_relaxed(), seems to me,
> > hi_lo_readq_relaxed() does the same thing with above section and you don't
> > need to define the structure etm_quad_split().
> >
> > Secondly, IIUC, a main problem with splitting 64-bit access into two 32-bit
> > accesses is breaking atomicity.  If here have race condition between reading and
> > writing 64-bit registers, we need to consider to use spinlock for register
> > accessing.

The drivers ensure that:-
a) the ETM is only accessed by the CPU it is associated with.
b) there are appropriate locks in place while the ETM is programmed up.

Moreover, the 64 bit registers are programmed only when the device is
disabled - as required by the spec, so there cannot be a case of using
"half" an updated value as all the required registers are programmed
before the device is finally enabled to begin tracing

Regards

Mike


>
> That shouldn't be an issue. 32-bit memory-mapped interface is
> normal for ETMs. The peripheral bus interface to ETMs (debug APB)
> is maximum 32-bit. The ETM architecture defines that registers are
> only single-copy atomic up to 32-bit. This is generally true of
> CoreSight programming registers.
>
> What typically happens to a 64-bit read/write from a CPU is that it
> is split into two 32-bit accesses by a downsizing bridge on the path
> from the main system interconnect to the debug APB (see 4.3.5
> in the ETM architecture spec). In this case, it sounds like there is no
> downsizing bridge. But because there should be no existing code
> relying on 64-bit atomic access to ETM registers, it should be safe
> to split the accesses in software without worrying about atomicity.
>
> Overall the ETM configuration involves complicated interactions
> between multiple registers, so if you've got one CPU reading out the
> configuration while another CPU is writing it, you've likely got much
> bigger problems than 64-bit atomicity.
>
> Al
>
>
> >
> > I'd like to leave the second issue to Suzuki/Mike/James for comfirmation in case
> > I introduced complexity.
> >
> > > +           } else
> > > +                   return readq_relaxed(csa->base + offset);
> > > +   } else
> > > +           return read_etm4x_sysreg_offset(offset, true);
> >
> > Here need to add brackets:
> >
> >         } else {
> >                 return read_etm4x_sysreg_offset(offset, true);
> >         }
> >
> > > +}
> > > +
> > > +static inline void etm4x_relaxed_write64(struct csdev_access *csa,
> > > +u64 quad, unsigned int offset) {
> > > +   if (csa->io_mem) {
> > > +           /* split 64-bit writes into 2 consecutive 32-bit writes */
> > > +           if (csa->no_quad_mmio) {
> > > +                   struct etm_quad_split container;
> > > +
> > > +                   *(u64 *) &container = quad;
> > > +
> > > +                   etm4x_relaxed_write32(csa, container.lsw, offset);
> > > +                   etm4x_relaxed_write32(csa, container.msw, offset +
> > sizeof(u32));
> > > +           } else
> > > +                   writeq_relaxed(quad, csa->base + offset);
> > > +   } else
> > > +           write_etm4x_sysreg_offset(quad, offset, true);          \
> >
> > Ditto.  Please drop the character '\' at the end of the line.
> >
> > Thanks,
> > Leo
> >
> > P.s. I have a ADLink AVA platform (Ampere Altra SoC with 32 CPUs), I am glad to
> > give a test if you can confirm this patch set can apply on it (and please clarify if
> > there have any prerequisite for firmwares).
> >
> > > +}
> > > +
> > >  static inline bool etm4x_is_ete(struct etmv4_drvdata *drvdata)  {
> > >     return drvdata->arch >= ETM_ARCH_ETE;
> > > --
> > > 2.25.1
> > >
> > _______________________________________________
> > CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email
> > to coresight-leave@lists.linaro.org



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-08 11:26       ` Mike Leach
@ 2023-03-08 11:54         ` Leo Yan
  2023-03-08 19:08           ` Steve Clevenger
  0 siblings, 1 reply; 23+ messages in thread
From: Leo Yan @ 2023-03-08 11:54 UTC (permalink / raw)
  To: Mike Leach
  Cc: Al Grant, Steve Clevenger, mathieu.poirier@linaro.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org

On Wed, Mar 08, 2023 at 11:26:48AM +0000, Mike Leach wrote:

[...]

> > > > +static inline u64 etm4x_relaxed_read64(struct csdev_access *csa,
> > > > +unsigned int offset) {
> > > > +   if (csa->io_mem) {
> > > > +           if (csa->no_quad_mmio) {
> > > > +                   /* split 64-bit reads into 2 consecutive 32-bit reads */
> > > > +                   struct etm_quad_split container;
> > > > +
> > > > +                   container.lsw = etm4x_read32(csa, offset);
> > > > +                   container.msw = etm4x_read32(csa, offset +
> > > sizeof(u32));
> > > > +
> > > > +                   return *(u64 *) &container;
> > >
> > > To be honest, I am not familiar with this part, just want to remind two things.
> > >
> > > Firstly, here you could use hi_lo_readq_relaxed(), seems to me,
> > > hi_lo_readq_relaxed() does the same thing with above section and you don't
> > > need to define the structure etm_quad_split().
> > >
> > > Secondly, IIUC, a main problem with splitting 64-bit access into two 32-bit
> > > accesses is breaking atomicity.  If here have race condition between reading and
> > > writing 64-bit registers, we need to consider to use spinlock for register
> > > accessing.
> 
> The drivers ensure that:-
> a) the ETM is only accessed by the CPU it is associated with.
> b) there are appropriate locks in place while the ETM is programmed up.
> 
> Moreover, the 64 bit registers are programmed only when the device is
> disabled - as required by the spec, so there cannot be a case of using
> "half" an updated value as all the required registers are programmed
> before the device is finally enabled to begin tracing

Thanks a lot for confirmation, Al and Mike.  So the second issue is
not valid anymore and please ignore it.

Thanks,
Leo

> Regards
> 
> Mike
> 
> 
> >
> > That shouldn't be an issue. 32-bit memory-mapped interface is
> > normal for ETMs. The peripheral bus interface to ETMs (debug APB)
> > is maximum 32-bit. The ETM architecture defines that registers are
> > only single-copy atomic up to 32-bit. This is generally true of
> > CoreSight programming registers.
> >
> > What typically happens to a 64-bit read/write from a CPU is that it
> > is split into two 32-bit accesses by a downsizing bridge on the path
> > from the main system interconnect to the debug APB (see 4.3.5
> > in the ETM architecture spec). In this case, it sounds like there is no
> > downsizing bridge. But because there should be no existing code
> > relying on 64-bit atomic access to ETM registers, it should be safe
> > to split the accesses in software without worrying about atomicity.
> >
> > Overall the ETM configuration involves complicated interactions
> > between multiple registers, so if you've got one CPU reading out the
> > configuration while another CPU is writing it, you've likely got much
> > bigger problems than 64-bit atomicity.
> >
> > Al
> >
> >
> > >
> > > I'd like to leave the second issue to Suzuki/Mike/James for comfirmation in case
> > > I introduced complexity.
> > >
> > > > +           } else
> > > > +                   return readq_relaxed(csa->base + offset);
> > > > +   } else
> > > > +           return read_etm4x_sysreg_offset(offset, true);
> > >
> > > Here need to add brackets:
> > >
> > >         } else {
> > >                 return read_etm4x_sysreg_offset(offset, true);
> > >         }
> > >
> > > > +}
> > > > +
> > > > +static inline void etm4x_relaxed_write64(struct csdev_access *csa,
> > > > +u64 quad, unsigned int offset) {
> > > > +   if (csa->io_mem) {
> > > > +           /* split 64-bit writes into 2 consecutive 32-bit writes */
> > > > +           if (csa->no_quad_mmio) {
> > > > +                   struct etm_quad_split container;
> > > > +
> > > > +                   *(u64 *) &container = quad;
> > > > +
> > > > +                   etm4x_relaxed_write32(csa, container.lsw, offset);
> > > > +                   etm4x_relaxed_write32(csa, container.msw, offset +
> > > sizeof(u32));
> > > > +           } else
> > > > +                   writeq_relaxed(quad, csa->base + offset);
> > > > +   } else
> > > > +           write_etm4x_sysreg_offset(quad, offset, true);          \
> > >
> > > Ditto.  Please drop the character '\' at the end of the line.
> > >
> > > Thanks,
> > > Leo
> > >
> > > P.s. I have a ADLink AVA platform (Ampere Altra SoC with 32 CPUs), I am glad to
> > > give a test if you can confirm this patch set can apply on it (and please clarify if
> > > there have any prerequisite for firmwares).
> > >
> > > > +}
> > > > +
> > > >  static inline bool etm4x_is_ete(struct etmv4_drvdata *drvdata)  {
> > > >     return drvdata->arch >= ETM_ARCH_ETE;
> > > > --
> > > > 2.25.1
> > > >
> > > _______________________________________________
> > > CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email
> > > to coresight-leave@lists.linaro.org
> 
> 
> 
> -- 
> Mike Leach
> Principal Engineer, ARM Ltd.
> Manchester Design Centre. UK

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-08 11:54         ` Leo Yan
@ 2023-03-08 19:08           ` Steve Clevenger
  2023-03-09 12:11             ` Leo Yan
  0 siblings, 1 reply; 23+ messages in thread
From: Steve Clevenger @ 2023-03-08 19:08 UTC (permalink / raw)
  To: Leo Yan, Mike Leach
  Cc: Al Grant, mathieu.poirier@linaro.org, coresight@lists.linaro.org,
	linux-arm-kernel@lists.infradead.org


Hi Leo,

Thanks for your feedback. Comments inline.

Steve

On 3/8/2023 3:54 AM, Leo Yan wrote:
> On Wed, Mar 08, 2023 at 11:26:48AM +0000, Mike Leach wrote:
> 
> [...]
> 
>>>>> +static inline u64 etm4x_relaxed_read64(struct csdev_access *csa,
>>>>> +unsigned int offset) {
>>>>> +   if (csa->io_mem) {
>>>>> +           if (csa->no_quad_mmio) {
>>>>> +                   /* split 64-bit reads into 2 consecutive 32-bit reads */
>>>>> +                   struct etm_quad_split container;
>>>>> +
>>>>> +                   container.lsw = etm4x_read32(csa, offset);
>>>>> +                   container.msw = etm4x_read32(csa, offset +
>>>> sizeof(u32));
>>>>> +
>>>>> +                   return *(u64 *) &container;
>>>>
>>>> To be honest, I am not familiar with this part, just want to remind two things.
>>>>
>>>> Firstly, here you could use hi_lo_readq_relaxed(), seems to me,
>>>> hi_lo_readq_relaxed() does the same thing with above section and you don't
>>>> need to define the structure etm_quad_split().

I'm familiar with this interface. The reason I chose not to use it is
this solution is configured at compile time. The writeq_relaxed (used by
etm4x_relaxed_write64) and readq_relaxed (used by etm4x_relaxed_read64)
otherwise default to 64-bit access. I was uncertain how a compile time
change would go over with the maintainers. Correct me, but compile time
configurations in the kernel seem to be related to ARM64 erratum, versus
manufacturer specific?

My take (based on limited knowledge) of the changes necessary to support
a compile time decision seemed to exceed the changes compared with the
implementation I submitted upstream. If this becomes a sticking point,
let me know.

>>>>
>>>> Secondly, IIUC, a main problem with splitting 64-bit access into two 32-bit
>>>> accesses is breaking atomicity.  If here have race condition between reading and
>>>> writing 64-bit registers, we need to consider to use spinlock for register
>>>> accessing.
>>
>> The drivers ensure that:-
>> a) the ETM is only accessed by the CPU it is associated with.
>> b) there are appropriate locks in place while the ETM is programmed up.
>>
>> Moreover, the 64 bit registers are programmed only when the device is
>> disabled - as required by the spec, so there cannot be a case of using
>> "half" an updated value as all the required registers are programmed
>> before the device is finally enabled to begin tracing
> 
> Thanks a lot for confirmation, Al and Mike.  So the second issue is
> not valid anymore and please ignore it.
> 
> Thanks,
> Leo
> 
>> Regards
>>
>> Mike
>>
>>
>>>
>>> That shouldn't be an issue. 32-bit memory-mapped interface is
>>> normal for ETMs. The peripheral bus interface to ETMs (debug APB)
>>> is maximum 32-bit. The ETM architecture defines that registers are
>>> only single-copy atomic up to 32-bit. This is generally true of
>>> CoreSight programming registers.
>>>
>>> What typically happens to a 64-bit read/write from a CPU is that it
>>> is split into two 32-bit accesses by a downsizing bridge on the path
>>> from the main system interconnect to the debug APB (see 4.3.5
>>> in the ETM architecture spec). In this case, it sounds like there is no
>>> downsizing bridge. But because there should be no existing code
>>> relying on 64-bit atomic access to ETM registers, it should be safe
>>> to split the accesses in software without worrying about atomicity.
>>>
>>> Overall the ETM configuration involves complicated interactions
>>> between multiple registers, so if you've got one CPU reading out the
>>> configuration while another CPU is writing it, you've likely got much
>>> bigger problems than 64-bit atomicity.
>>>
>>> Al
>>>
>>>
>>>>
>>>> I'd like to leave the second issue to Suzuki/Mike/James for comfirmation in case
>>>> I introduced complexity.
>>>>
>>>>> +           } else
>>>>> +                   return readq_relaxed(csa->base + offset);
>>>>> +   } else
>>>>> +           return read_etm4x_sysreg_offset(offset, true);
>>>>
>>>> Here need to add brackets:
>>>>
>>>>         } else {
>>>>                 return read_etm4x_sysreg_offset(offset, true);
>>>>         }
>>>>
>>>>> +}
>>>>> +
>>>>> +static inline void etm4x_relaxed_write64(struct csdev_access *csa,
>>>>> +u64 quad, unsigned int offset) {
>>>>> +   if (csa->io_mem) {
>>>>> +           /* split 64-bit writes into 2 consecutive 32-bit writes */
>>>>> +           if (csa->no_quad_mmio) {
>>>>> +                   struct etm_quad_split container;
>>>>> +
>>>>> +                   *(u64 *) &container = quad;
>>>>> +
>>>>> +                   etm4x_relaxed_write32(csa, container.lsw, offset);
>>>>> +                   etm4x_relaxed_write32(csa, container.msw, offset +
>>>> sizeof(u32));
>>>>> +           } else
>>>>> +                   writeq_relaxed(quad, csa->base + offset);
>>>>> +   } else
>>>>> +           write_etm4x_sysreg_offset(quad, offset, true);          \
>>>>
>>>> Ditto.  Please drop the character '\' at the end of the line.
>>>>
>>>> Thanks,
>>>> Leo
>>>>
>>>> P.s. I have a ADLink AVA platform (Ampere Altra SoC with 32 CPUs), I am glad to
>>>> give a test if you can confirm this patch set can apply on it (and please clarify if
>>>> there have any prerequisite for firmwares).
>>>>
>>>>> +}
>>>>> +
>>>>>  static inline bool etm4x_is_ete(struct etmv4_drvdata *drvdata)  {
>>>>>     return drvdata->arch >= ETM_ARCH_ETE;
>>>>> --
>>>>> 2.25.1
>>>>>
>>>> _______________________________________________
>>>> CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email
>>>> to coresight-leave@lists.linaro.org
>>
>>
>>
>> -- 
>> Mike Leach
>> Principal Engineer, ARM Ltd.
>> Manchester Design Centre. UK

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-08 19:08           ` Steve Clevenger
@ 2023-03-09 12:11             ` Leo Yan
  2023-03-09 20:46               ` Steve Clevenger
  0 siblings, 1 reply; 23+ messages in thread
From: Leo Yan @ 2023-03-09 12:11 UTC (permalink / raw)
  To: Steve Clevenger
  Cc: Mike Leach, Al Grant, mathieu.poirier@linaro.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org

Hi Steve,

On Wed, Mar 08, 2023 at 11:08:34AM -0800, Steve Clevenger wrote:

[...]

> I'm familiar with this interface. The reason I chose not to use it is
> this solution is configured at compile time. The writeq_relaxed (used by
> etm4x_relaxed_write64) and readq_relaxed (used by etm4x_relaxed_read64)
> otherwise default to 64-bit access. I was uncertain how a compile time
> change would go over with the maintainers. Correct me, but compile time
> configurations in the kernel seem to be related to ARM64 erratum, versus
> manufacturer specific?
> 
> My take (based on limited knowledge) of the changes necessary to support
> a compile time decision seemed to exceed the changes compared with the
> implementation I submitted upstream. If this becomes a sticking point,
> let me know.

I am suggesting something like below:

#include <linux/io-64-nonatomic-hi-lo.h>

static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
{
	if (csa->io_mem) {
		if (csa->no_quad_mmio)
                        return hi_lo_readq_relaxed(csa->base + offset);
		else
			return readq_relaxed(csa->base + offset);
	} else {
		return read_etm4x_sysreg_offset(offset, true);
        }
}

I'm not saying to play magic on {writeq|readq}_relaxed(), would it be
fine to directly call hi_lo_readq_relaxed()?

Thanks,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-09 12:11             ` Leo Yan
@ 2023-03-09 20:46               ` Steve Clevenger
  2023-03-10  1:35                 ` Leo Yan
  0 siblings, 1 reply; 23+ messages in thread
From: Steve Clevenger @ 2023-03-09 20:46 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mike Leach, Al Grant, mathieu.poirier@linaro.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org


Hi Leo,

Comments below.

Steve

On 3/9/2023 4:11 AM, Leo Yan wrote:
> Hi Steve,
> 
> On Wed, Mar 08, 2023 at 11:08:34AM -0800, Steve Clevenger wrote:
> 
> [...]
> 
>> I'm familiar with this interface. The reason I chose not to use it is
>> this solution is configured at compile time. The writeq_relaxed (used by
>> etm4x_relaxed_write64) and readq_relaxed (used by etm4x_relaxed_read64)
>> otherwise default to 64-bit access. I was uncertain how a compile time
>> change would go over with the maintainers. Correct me, but compile time
>> configurations in the kernel seem to be related to ARM64 erratum, versus
>> manufacturer specific?
>>
>> My take (based on limited knowledge) of the changes necessary to support
>> a compile time decision seemed to exceed the changes compared with the
>> implementation I submitted upstream. If this becomes a sticking point,
>> let me know.
> 
> I am suggesting something like below:
> 
> #include <linux/io-64-nonatomic-hi-lo.h>
> 
> static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
> {
> 	if (csa->io_mem) {
> 		if (csa->no_quad_mmio)
>                         return hi_lo_readq_relaxed(csa->base + offset);
> 		else
> 			return readq_relaxed(csa->base + offset);
> 	} else {
> 		return read_etm4x_sysreg_offset(offset, true);
>         }
> }
> 
> I'm not saying to play magic on {writeq|readq}_relaxed(), would it be
> fine to directly call hi_lo_readq_relaxed()?

I can do this, and it would work fine. Without compile protection, one
concern is the silent inclusion of io-64-nonatomic-hi-lo.h definitions
for readq_relaxed/writeq_relaxed if the include hierarchy gets changed.
The only (minor) risk I see is to performance.

Second, it seems to me the hi_lo and lo_hi macros were intended to deal
with endianness. IMO, the implementation I offered is agnostic without a
hint to endianness or atomicity. The CoreSight maintainer audience is
limited and already aware of these issues, so I'll make the change for you.

> 
> Thanks,
> Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-09 20:46               ` Steve Clevenger
@ 2023-03-10  1:35                 ` Leo Yan
  2023-03-10 16:55                   ` Steve Clevenger
  0 siblings, 1 reply; 23+ messages in thread
From: Leo Yan @ 2023-03-10  1:35 UTC (permalink / raw)
  To: Steve Clevenger
  Cc: Mike Leach, Al Grant, mathieu.poirier@linaro.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org

On Thu, Mar 09, 2023 at 12:46:55PM -0800, Steve Clevenger wrote:

[...]

> > #include <linux/io-64-nonatomic-hi-lo.h>
> > 
> > static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
> > {
> > 	if (csa->io_mem) {
> > 		if (csa->no_quad_mmio)
> >                         return hi_lo_readq_relaxed(csa->base + offset);
> > 		else
> > 			return readq_relaxed(csa->base + offset);
> > 	} else {
> > 		return read_etm4x_sysreg_offset(offset, true);
> >         }
> > }
> > 
> > I'm not saying to play magic on {writeq|readq}_relaxed(), would it be
> > fine to directly call hi_lo_readq_relaxed()?
> 
> I can do this, and it would work fine. Without compile protection, one
> concern is the silent inclusion of io-64-nonatomic-hi-lo.h definitions
> for readq_relaxed/writeq_relaxed if the include hierarchy gets changed.
> The only (minor) risk I see is to performance.

Yeah, we could include io.h ahead io-64-nonatomic-hi-lo.h:

  #include <linux/io.h>
  #include <linux/io-64-nonatomic-lo-hi.h>

Thus we can get the definitions for {readq|writeq}_relaxed() from
"io.h".

> Second, it seems to me the hi_lo and lo_hi macros were intended to deal
> with endianness. IMO, the implementation I offered is agnostic without a
> hint to endianness or atomicity.

Seems to me, kernel functions with explict endianness naming is not bad
thing :)

> The CoreSight maintainer audience is
> limited and already aware of these issues, so I'll make the change for you.

Thanks!  Suzuki, Mike, I think it's good to get your opinion before
Steve can proceed for updating patch.

Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-10  1:35                 ` Leo Yan
@ 2023-03-10 16:55                   ` Steve Clevenger
  2023-03-13 16:14                     ` Steve Clevenger
  0 siblings, 1 reply; 23+ messages in thread
From: Steve Clevenger @ 2023-03-10 16:55 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mike Leach, Al Grant, mathieu.poirier@linaro.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org



On 3/9/2023 5:35 PM, Leo Yan wrote:
> On Thu, Mar 09, 2023 at 12:46:55PM -0800, Steve Clevenger wrote:
> 
> [...]
> 
>>> #include <linux/io-64-nonatomic-hi-lo.h>
>>>
>>> static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
>>> {
>>> 	if (csa->io_mem) {
>>> 		if (csa->no_quad_mmio)
>>>                         return hi_lo_readq_relaxed(csa->base + offset);
>>> 		else
>>> 			return readq_relaxed(csa->base + offset);
>>> 	} else {
>>> 		return read_etm4x_sysreg_offset(offset, true);
>>>         }
>>> }
>>>
>>> I'm not saying to play magic on {writeq|readq}_relaxed(), would it be
>>> fine to directly call hi_lo_readq_relaxed()?
>>
>> I can do this, and it would work fine. Without compile protection, one
>> concern is the silent inclusion of io-64-nonatomic-hi-lo.h definitions
>> for readq_relaxed/writeq_relaxed if the include hierarchy gets changed.
>> The only (minor) risk I see is to performance.
> 
> Yeah, we could include io.h ahead io-64-nonatomic-hi-lo.h:
> 
>   #include <linux/io.h>
>   #include <linux/io-64-nonatomic-lo-hi.h>
> 
> Thus we can get the definitions for {readq|writeq}_relaxed() from
> "io.h".

It seems the actual read_relaxed/write_relaxed definitions come from the
asm-generic/io.h version already in the include hierarchy without adding
linux/io.h. A more foolproof protection is to check whether these are
defined before the io-64-nonatomic-hi-lo.h include and error out if not.

> 
>> Second, it seems to me the hi_lo and lo_hi macros were intended to deal
>> with endianness. IMO, the implementation I offered is agnostic without a
>> hint to endianness or atomicity.
> 
> Seems to me, kernel functions with explict endianness naming is not bad
> thing :)
> 
>> The CoreSight maintainer audience is
>> limited and already aware of these issues, so I'll make the change for you.
> 
> Thanks!  Suzuki, Mike, I think it's good to get your opinion before
> Steve can proceed for updating patch.
> 
> Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words
  2023-03-10 16:55                   ` Steve Clevenger
@ 2023-03-13 16:14                     ` Steve Clevenger
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Clevenger @ 2023-03-13 16:14 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mike Leach, Al Grant, mathieu.poirier@linaro.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org


It seems there's a bit of a lull in this patch discussion. Leo, I'll
resubmit the patch with the compile protection I suggested in my last
response.

Steve

On 3/10/2023 8:55 AM, Steve Clevenger wrote:
> 
> 
> On 3/9/2023 5:35 PM, Leo Yan wrote:
>> On Thu, Mar 09, 2023 at 12:46:55PM -0800, Steve Clevenger wrote:
>>
>> [...]
>>
>>>> #include <linux/io-64-nonatomic-hi-lo.h>
>>>>
>>>> static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
>>>> {
>>>> 	if (csa->io_mem) {
>>>> 		if (csa->no_quad_mmio)
>>>>                         return hi_lo_readq_relaxed(csa->base + offset);
>>>> 		else
>>>> 			return readq_relaxed(csa->base + offset);
>>>> 	} else {
>>>> 		return read_etm4x_sysreg_offset(offset, true);
>>>>         }
>>>> }
>>>>
>>>> I'm not saying to play magic on {writeq|readq}_relaxed(), would it be
>>>> fine to directly call hi_lo_readq_relaxed()?
>>>
>>> I can do this, and it would work fine. Without compile protection, one
>>> concern is the silent inclusion of io-64-nonatomic-hi-lo.h definitions
>>> for readq_relaxed/writeq_relaxed if the include hierarchy gets changed.
>>> The only (minor) risk I see is to performance.
>>
>> Yeah, we could include io.h ahead io-64-nonatomic-hi-lo.h:
>>
>>   #include <linux/io.h>
>>   #include <linux/io-64-nonatomic-lo-hi.h>
>>
>> Thus we can get the definitions for {readq|writeq}_relaxed() from
>> "io.h".
> 
> It seems the actual read_relaxed/write_relaxed definitions come from the
> asm-generic/io.h version already in the include hierarchy without adding
> linux/io.h. A more foolproof protection is to check whether these are
> defined before the io-64-nonatomic-hi-lo.h include and error out if not.
> 
>>
>>> Second, it seems to me the hi_lo and lo_hi macros were intended to deal
>>> with endianness. IMO, the implementation I offered is agnostic without a
>>> hint to endianness or atomicity.
>>
>> Seems to me, kernel functions with explict endianness naming is not bad
>> thing :)
>>
>>> The CoreSight maintainer audience is
>>> limited and already aware of these issues, so I'll make the change for you.
>>
>> Thanks!  Suzuki, Mike, I think it's good to get your opinion before
>> Steve can proceed for updating patch.
>>
>> Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/3] coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
  2023-03-06  5:54 ` [PATCH 2/3] coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read Steve Clevenger
@ 2023-03-17 11:58   ` Suzuki K Poulose
  2023-03-17 20:53     ` Steve Clevenger
  0 siblings, 1 reply; 23+ messages in thread
From: Suzuki K Poulose @ 2023-03-17 11:58 UTC (permalink / raw)
  To: Steve Clevenger, mathieu.poirier
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel

Hi Steve

On 06/03/2023 05:54, Steve Clevenger wrote:
> An Ampere Computing design decision is MMIO reads are considered the
> same as an external debug access. If TRCOSLAR.OSLK is left set, the
> TRCIDR1 access results in a bus fault followed by a kernel panic. A
> TRCIDR1 read should be valid regardless of TRCOSLAR.OSLK provided MMIO
> access (now deprecated) is supported.
> 
> The Ampere work around is to early clear ETM TRCOSLAR.OSLK prior to
> TRCIDR1 access. Do not distinguish between manufacturers.
> AC03_DEBUG_06 is described in the AmpereOne Developer Errata:
> 
> Add Ampere ETM PID required for Coresight ETM driver support.
> 
> https://solutions.amperecomputing.com/customer-connect/products/AmpereOne-device-documentation
> 
> Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com>

I went back to the CoreSight ETM4x architecture and this is not an
erratum at your end. This is actually a bug in the ETMv4 driver.

TRCIDR1 is part of the "Trace" and not "Management" registers of
ETMv4. Access to "Trace" registers without OSLK is going to
cause issues. (Un)Fortunately, we never hit this on existing
systems, but that doesn't mean it can stay there.

So we should really fix our detection and only rely on
accessing the TRCDEVARCH to detect the device to be ETMv4.

I have a sent out the fix here  [0], are you able to test it :

[0] 
https://lkml.kernel.org/r/20230317115728.1358368-1-suzuki.poulose@arm.com

Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/3] coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
  2023-03-17 11:58   ` Suzuki K Poulose
@ 2023-03-17 20:53     ` Steve Clevenger
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Clevenger @ 2023-03-17 20:53 UTC (permalink / raw)
  To: Suzuki K Poulose, mathieu.poirier
  Cc: mike.leach, leo.yan, coresight, linux-arm-kernel


Hi Suzuki,

I don't need to test this explicitly provided the TRCIDR1 access is
removed in etm4_init_iomem_access. When I first encountered this, I
substituted a hard coded value for the TRCIDR1 read. This hack
eliminated the initialization problem. I later submitted a patch to
clear TRCOSLAR.OSLK before the access. I didn't have visibility to other
implementations which could rely on the fallback.

Thanks,
Steve

On 3/17/2023 4:58 AM, Suzuki K Poulose wrote:
> Hi Steve
> 
> On 06/03/2023 05:54, Steve Clevenger wrote:
>> An Ampere Computing design decision is MMIO reads are considered the
>> same as an external debug access. If TRCOSLAR.OSLK is left set, the
>> TRCIDR1 access results in a bus fault followed by a kernel panic. A
>> TRCIDR1 read should be valid regardless of TRCOSLAR.OSLK provided MMIO
>> access (now deprecated) is supported.
>>
>> The Ampere work around is to early clear ETM TRCOSLAR.OSLK prior to
>> TRCIDR1 access. Do not distinguish between manufacturers.
>> AC03_DEBUG_06 is described in the AmpereOne Developer Errata:
>>
>> Add Ampere ETM PID required for Coresight ETM driver support.
>>
>> https://solutions.amperecomputing.com/customer-connect/products/AmpereOne-device-documentation
>>
>> Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com>
> 
> I went back to the CoreSight ETM4x architecture and this is not an
> erratum at your end. This is actually a bug in the ETMv4 driver.
> 
> TRCIDR1 is part of the "Trace" and not "Management" registers of
> ETMv4. Access to "Trace" registers without OSLK is going to
> cause issues. (Un)Fortunately, we never hit this on existing
> systems, but that doesn't mean it can stay there.
> 
> So we should really fix our detection and only rely on
> accessing the TRCDEVARCH to detect the device to be ETMv4.
> 
> I have a sent out the fix here  [0], are you able to test it :
> 
> [0]
> https://lkml.kernel.org/r/20230317115728.1358368-1-suzuki.poulose@arm.com
> 
> Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-03-17 20:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-06  5:54 [PATCH 0/3] Ampere Computing ETMv4.x Support Steve Clevenger
2023-03-06  5:54 ` [PATCH 1/3] Add known list of Ampere ETMv4 errata Steve Clevenger
2023-03-06  5:54 ` [PATCH 2/3] coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read Steve Clevenger
2023-03-17 11:58   ` Suzuki K Poulose
2023-03-17 20:53     ` Steve Clevenger
2023-03-06  5:54 ` [PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words Steve Clevenger
2023-03-08  7:04   ` Leo Yan
2023-03-08 10:06     ` Al Grant
2023-03-08 11:26       ` Mike Leach
2023-03-08 11:54         ` Leo Yan
2023-03-08 19:08           ` Steve Clevenger
2023-03-09 12:11             ` Leo Yan
2023-03-09 20:46               ` Steve Clevenger
2023-03-10  1:35                 ` Leo Yan
2023-03-10 16:55                   ` Steve Clevenger
2023-03-13 16:14                     ` Steve Clevenger
2023-03-06 10:29 ` [PATCH 0/3] Ampere Computing ETMv4.x Support Suzuki K Poulose
2023-03-07  1:23   ` Steve Clevenger
2023-03-07 10:21     ` Suzuki K Poulose
2023-03-07 23:39       ` Steve Clevenger
2023-03-06 10:54 ` Suzuki K Poulose
2023-03-07  1:24   ` Steve Clevenger
  -- strict thread matches above, loose matches on Subject: below --
2023-01-20  0:51 Steve Clevenger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).