linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/2] Fix Coresight instruction synthesis logic
@ 2025-03-27 11:11 Tanmay Jagdale
  2025-03-27 11:11 ` [PATCH V3 1/2] perf: cs-etm: Fixes in instruction sample synthesis Tanmay Jagdale
  2025-03-27 11:11 ` [PATCH V3 2/2] perf: cs-etm: Store previous timestamp in packet queue Tanmay Jagdale
  0 siblings, 2 replies; 5+ messages in thread
From: Tanmay Jagdale @ 2025-03-27 11:11 UTC (permalink / raw)
  To: suzuki.poulose, mike.leach, james.clark, john.g.garry, leo.yan,
	will, acme, adrian.hunter
  Cc: linux-arm-kernel, linux-perf-users, coresight, linux-kernel,
	sgoutham, gcherian, lcherian, Tanmay Jagdale

When we use perf to catpure Coresight trace and generate instruction
trace using 'perf script', we get the following output:

# perf record -e cs_etm/@tmc_etr0/ -C 9 taskset -c 9 sleep 1
# perf script --itrace=i1ns --ns -Fcomm,tid,pid,time,cpu,event,ip,sym,addr,symoff,flags,callindent
..
 perf  9024/9024  [009]  2690.650470551:      instructions:   call                                 0 ffffb305591aed54 coresight_timeout+0x28
 perf  9024/9024  [009]  2690.650470551:      instructions:   call                                 0 ffffb305591aed58 coresight_timeout+0x2c
 perf  9024/9024  [009]  2690.650470551:      instructions:   call                                 0 ffffb305591aed5c coresight_timeout+0x30
 perf  9024/9024  [009]  2690.650470551:      instructions:   call                                 0 ffffb305591aed60 coresight_timeout+0x34
 perf  9024/9024  [009]  2690.650470551:      instructions:   jmp                                  0 ffffb305591aed7c coresight_timeout+0x50
 perf  9024/9024  [009]  2690.650470551:      instructions:   jmp                                  0 ffffb305591aed80 coresight_timeout+0x54
 perf  9024/9024  [009]  2690.650470551:      instructions:   jmp                                  0 ffffb305591aed84 coresight_timeout+0x58
 perf  9024/9024  [009]  2690.650470552:      instructions:   jcc                                  0 ffffb305591aede4 coresight_timeout+0xb8
 perf  9024/9024  [009]  2690.650470552:      instructions:   jcc                                  0 ffffb305591aede8 coresight_timeout+0xbc
 perf  9024/9024  [009]  2690.650470552:      instructions:   jcc                                  0 ffffb305591aedec coresight_timeout+0xc0
 perf  9024/9024  [009]  2690.650470552:      instructions:   jcc                                  0 ffffb305591aedf0 coresight_timeout+0xc4
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bccec ete_sysreg_read+0x0
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bccf0 ete_sysreg_read+0x4
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bccf4 ete_sysreg_read+0x8
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bccf8 ete_sysreg_read+0xc
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bccfc ete_sysreg_read+0x10
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bcd00 ete_sysreg_read+0x14

This output has the following issues:
1. Non-branch instructions have mnemonics of branch instructions (Column 6)
2. Branch target address is missing (Column 7)

This patch fixes these issues by changing the logic of instruction syntehsis
for the Coresight trace queues.

Output after applying the patch:
 ...
 perf  6111/6111  [008]   457.332794461:      instructions:                                        0 ffffb305591aed54 coresight_timeout+0x28
 perf  6111/6111  [008]   457.332794461:      instructions:                                        0 ffffb305591aed58 coresight_timeout+0x2c
 perf  6111/6111  [008]   457.332794461:      instructions:                                        0 ffffb305591aed5c coresight_timeout+0x30
 perf  6111/6111  [008]   457.332794461:      instructions:   jmp                   ffffb305591aed7c ffffb305591aed60 coresight_timeout+0x34
 perf  6111/6111  [008]   457.332794461:      instructions:                                        0 ffffb305591aed7c coresight_timeout+0x50
 perf  6111/6111  [008]   457.332794461:      instructions:                                        0 ffffb305591aed80 coresight_timeout+0x54
 perf  6111/6111  [008]   457.332794461:      instructions:   jcc                   ffffb305591aede4 ffffb305591aed84 coresight_timeout+0x58
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591aede4 coresight_timeout+0xb8
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591aede8 coresight_timeout+0xbc
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591aedec coresight_timeout+0xc0
 perf  6111/6111  [008]   457.332794462:      instructions:   call                  ffffb305591bccec ffffb305591aedf0 coresight_timeout+0xc4
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bccec ete_sysreg_read+0x0
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bccf0 ete_sysreg_read+0x4
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bccf4 ete_sysreg_read+0x8
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bccf8 ete_sysreg_read+0xc
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bccfc ete_sysreg_read+0x10
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bcd00 ete_sysreg_read+0x14

Changes in V3
  - Rebased to latest coresight-next branch
  - Added Reviewed-by tags from V2 [2]

Changes in V2
  - Updated commit message of Patch 1
  - As discussed in the previous version [1], there were differences in instruction
    trace output before and after the patch. The timestamps for the instructions
    were not in sync. Added a patch 2 which fixes this issue.

Changes in V1
  - https://lkml.org/lkml/2023/6/23/912

[1] https://lkml.org/lkml/2023/6/28/506
[2] https://lore.kernel.org/lkml/b2c02eb9-0940-4770-a4b7-22d2af8078db@arm.com/

Tanmay Jagdale (2):
  perf: cs-etm: Fixes in instruction sample synthesis
  perf: cs-etm: Store previous timestamp in packet queue

 tools/perf/util/cs-etm.c | 49 +++++++++++++++++++++++++++++++---------
 tools/perf/util/cs-etm.h |  1 +
 2 files changed, 39 insertions(+), 11 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH V3 1/2] perf: cs-etm: Fixes in instruction sample synthesis
  2025-03-27 11:11 [PATCH V3 0/2] Fix Coresight instruction synthesis logic Tanmay Jagdale
@ 2025-03-27 11:11 ` Tanmay Jagdale
  2025-03-27 15:35   ` Leo Yan
  2025-03-27 11:11 ` [PATCH V3 2/2] perf: cs-etm: Store previous timestamp in packet queue Tanmay Jagdale
  1 sibling, 1 reply; 5+ messages in thread
From: Tanmay Jagdale @ 2025-03-27 11:11 UTC (permalink / raw)
  To: suzuki.poulose, mike.leach, james.clark, john.g.garry, leo.yan,
	will, acme, adrian.hunter
  Cc: linux-arm-kernel, linux-perf-users, coresight, linux-kernel,
	sgoutham, gcherian, lcherian, Tanmay Jagdale, James Clark

The existing method to synthesize instruction samples has the
following issues:
1. Branch instruction mnemonics were being added to non-branch
   instructions too.
2. Branch target address was missing

To fix the issues, start synthesizing the instructions from the
previous packet (tidq->prev_packet) instead of current packet
(tidq->packet). This way it's easy to figure out the target
address of the branch instruction in tidq->prev_packet which
is the current packet's (tidq->packet) first executed instruction.

Since we have now switched to processing the previous packet
first, we need not swap the packets during cs_etm__flush().

Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
Reviewed-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 0bf9e5c27b59..ebed5b98860e 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1576,10 +1576,26 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 	sample.stream_id = etmq->etm->instructions_id;
 	sample.period = period;
 	sample.cpu = tidq->packet->cpu;
-	sample.flags = tidq->prev_packet->flags;
 	sample.cpumode = event->sample.header.misc;
 
-	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
+	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet, &sample);
+
+	/* Populate branch target information only when we encounter
+	 * branch instruction, which is at the end of tidq->prev_packet.
+	 */
+	if (addr == (tidq->prev_packet->end_addr - 4)) {
+		/* Update the perf_sample flags using the prev_packet
+		 * since that is the queue we are synthesizing.
+		 */
+		sample.flags = tidq->prev_packet->flags;
+
+		/* The last instruction of the previous queue would be a
+		 * branch operation. Get the target of that branch by looking
+		 * into the first executed instruction of the current packet
+		 * queue.
+		 */
+		sample.addr = cs_etm__first_executed_instr(tidq->packet);
+	}
 
 	if (etm->synth_opts.last_branch)
 		sample.branch_stack = tidq->last_branch;
@@ -1771,7 +1787,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 	/* Get instructions remainder from previous packet */
 	instrs_prev = tidq->period_instructions;
 
-	tidq->period_instructions += tidq->packet->instr_count;
+	tidq->period_instructions += tidq->prev_packet->instr_count;
 
 	/*
 	 * Record a branch when the last instruction in
@@ -1851,8 +1867,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 			 * been executed, but PC has not advanced to next
 			 * instruction)
 			 */
+			/* Get address from prev_packet since we are synthesizing
+			 * that in cs_etm__synth_instruction_sample()
+			 */
 			addr = cs_etm__instr_addr(etmq, trace_chan_id,
-						  tidq->packet, offset - 1);
+						  tidq->prev_packet, offset - 1);
 			ret = cs_etm__synth_instruction_sample(
 				etmq, tidq, addr,
 				etm->instructions_sample_period);
@@ -1916,7 +1935,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 
 	/* Handle start tracing packet */
 	if (tidq->prev_packet->sample_type == CS_ETM_EMPTY)
-		goto swap_packet;
+		goto reset_last_br;
 
 	if (etmq->etm->synth_opts.last_branch &&
 	    etmq->etm->synth_opts.instructions &&
@@ -1952,8 +1971,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 			return err;
 	}
 
-swap_packet:
-	cs_etm__packet_swap(etm, tidq);
+reset_last_br:
 
 	/* Reset last branches after flush the trace */
 	if (etm->synth_opts.last_branch)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V3 2/2] perf: cs-etm: Store previous timestamp in packet queue
  2025-03-27 11:11 [PATCH V3 0/2] Fix Coresight instruction synthesis logic Tanmay Jagdale
  2025-03-27 11:11 ` [PATCH V3 1/2] perf: cs-etm: Fixes in instruction sample synthesis Tanmay Jagdale
@ 2025-03-27 11:11 ` Tanmay Jagdale
  1 sibling, 0 replies; 5+ messages in thread
From: Tanmay Jagdale @ 2025-03-27 11:11 UTC (permalink / raw)
  To: suzuki.poulose, mike.leach, james.clark, john.g.garry, leo.yan,
	will, acme, adrian.hunter
  Cc: linux-arm-kernel, linux-perf-users, coresight, linux-kernel,
	sgoutham, gcherian, lcherian, Tanmay Jagdale, James Clark

Since logic in cs_etm__sample is changed to synthesizing the previous
packet (tidq->prev_packet) instead of current packet (tidq->packet),
the first time this function is called, tidq->prev_packet is NULL
and we return without processing anything.

This is as expected but, in the first call, we would have a valid
timestamp (stored in tidq->packet_queue.cs_timestamp) which belongs
to tidq->packet. This would be lost due to no processing.

Losing this timestamp results in all the synthesized packets being
associated with the next timestamp and not their corresponding one.

To fix this, introduce a new variable (prev_cs_timestamp) to store the
queue's timestamp in cs_etm__sample(). When we start synthesizing the
prev_packet, use this cached value instead of the current timestamp
(cs_timestamp).

Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
Reviewed-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 17 +++++++++++++----
 tools/perf/util/cs-etm.h |  1 +
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index ebed5b98860e..b52b58328ca0 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1542,13 +1542,15 @@ u64 cs_etm__convert_sample_time(struct cs_etm_queue *etmq, u64 cs_timestamp)
 }
 
 static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
-					       struct cs_etm_traceid_queue *tidq)
+					       struct cs_etm_traceid_queue *tidq,
+					       bool instruction_sample)
 {
 	struct cs_etm_auxtrace *etm = etmq->etm;
 	struct cs_etm_packet_queue *packet_queue = &tidq->packet_queue;
 
 	if (!etm->timeless_decoding && etm->has_virtual_ts)
-		return packet_queue->cs_timestamp;
+		return instruction_sample ? packet_queue->prev_cs_timestamp :
+					    packet_queue->cs_timestamp;
 	else
 		return etm->latest_kernel_timestamp;
 }
@@ -1567,7 +1569,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 	event->sample.header.size = sizeof(struct perf_event_header);
 
 	/* Set time field based on etm auxtrace config. */
-	sample.time = cs_etm__resolve_sample_time(etmq, tidq);
+	sample.time = cs_etm__resolve_sample_time(etmq, tidq, true);
 
 	sample.ip = addr;
 	sample.pid = thread__pid(tidq->thread);
@@ -1643,7 +1645,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	event->sample.header.size = sizeof(struct perf_event_header);
 
 	/* Set time field based on etm auxtrace config. */
-	sample.time = cs_etm__resolve_sample_time(etmq, tidq);
+	sample.time = cs_etm__resolve_sample_time(etmq, tidq, false);
 
 	sample.ip = ip;
 	sample.pid = thread__pid(tidq->prev_packet_thread);
@@ -1903,6 +1905,13 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 		}
 	}
 
+	/*
+	 * Since we synthesize the prev_packet, store the current timestamp
+	 * here in prev_cs_timestamp so that when we *actually* synthesize
+	 * the prev_packet, we use this timestamp and not the future one.
+	 */
+	tidq->packet_queue.prev_cs_timestamp = tidq->packet_queue.cs_timestamp;
+
 	cs_etm__packet_swap(etm, tidq);
 
 	return 0;
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h
index a8caeea720aa..359d6233dd9b 100644
--- a/tools/perf/util/cs-etm.h
+++ b/tools/perf/util/cs-etm.h
@@ -205,6 +205,7 @@ struct cs_etm_packet_queue {
 	u32 instr_count;
 	u64 cs_timestamp; /* Timestamp from trace data, converted to ns if possible */
 	u64 next_cs_timestamp;
+	u64 prev_cs_timestamp;
 	struct cs_etm_packet packet_buffer[CS_ETM_PACKET_MAX_BUFFER];
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V3 1/2] perf: cs-etm: Fixes in instruction sample synthesis
  2025-03-27 11:11 ` [PATCH V3 1/2] perf: cs-etm: Fixes in instruction sample synthesis Tanmay Jagdale
@ 2025-03-27 15:35   ` Leo Yan
  2025-04-01 16:58     ` Tanmay Jagdale
  0 siblings, 1 reply; 5+ messages in thread
From: Leo Yan @ 2025-03-27 15:35 UTC (permalink / raw)
  To: Tanmay Jagdale
  Cc: suzuki.poulose, mike.leach, james.clark, john.g.garry, leo.yan,
	will, acme, adrian.hunter, linux-arm-kernel, linux-perf-users,
	coresight, linux-kernel, sgoutham, gcherian

Hi Tanmay,

On Thu, Mar 27, 2025 at 04:41:48PM +0530, Tanmay Jagdale wrote:
> The existing method to synthesize instruction samples has the
> following issues:
> 1. Branch instruction mnemonics were being added to non-branch
>    instructions too.
> 2. Branch target address was missing
> 
> To fix the issues, start synthesizing the instructions from the
> previous packet (tidq->prev_packet) instead of current packet
> (tidq->packet). This way it's easy to figure out the target
> address of the branch instruction in tidq->prev_packet which
> is the current packet's (tidq->packet) first executed instruction.
> 
> Since we have now switched to processing the previous packet
> first, we need not swap the packets during cs_etm__flush().
> 
> Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
> Reviewed-by: James Clark <james.clark@arm.com>

I saw James's reviewed tag.  However, I have several comments.

Sorry I jumped in too late.

> ---
>  tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++-------
>  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 0bf9e5c27b59..ebed5b98860e 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1576,10 +1576,26 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,

Seems to me, the problem is cs_etm__synth_instruction_sample() is
invoked from multiple callers.

Both the previous packet and packet are valid fo the flow:
  cs_etm__sample()
    `> cs_etm__synth_instruction_sample()

Only the previous packet is valid and the current packet stores stale
data for the flows:

  cs_etm__flush()
    `> cs_etm__synth_instruction_sample()

  cs_etm__end_block()
    `> cs_etm__synth_instruction_sample()

First, as a prerequisite, I think we should resolve the stale data in
the packet.  So we need a fix like:

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c                
index 0bf9e5c27b59..b7b17c0e4806 100644                                         
--- a/tools/perf/util/cs-etm.c                                                  
+++ b/tools/perf/util/cs-etm.c                                                  
@@ -741,6 +741,9 @@ static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
                                                                                
        if (etm->synth_opts.branches || etm->synth_opts.last_branch ||          
            etm->synth_opts.instructions) {                                     
+               /* The previous packet will not be used, cleanup it */          
+               memset(tidq->prev_packet, 0x0, sizeof(*tidq->packet));          
+                                                                               
                /*                                                              
                 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for 
                 * the next incoming packet.                                    

>  	sample.stream_id = etmq->etm->instructions_id;
>  	sample.period = period;
>  	sample.cpu = tidq->packet->cpu;

Should we use "prev_packet->cpu" at here?

Even for a branch instruction, as its IP address is from the previous
packet, we should use "prev_packet->cpu" for CPU ID as well.

> -	sample.flags = tidq->prev_packet->flags;
>  	sample.cpumode = event->sample.header.misc;
>  
> -	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
> +	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet, &sample);
> +
> +	/* Populate branch target information only when we encounter
> +	 * branch instruction, which is at the end of tidq->prev_packet.
> +	 */
> +	if (addr == (tidq->prev_packet->end_addr - 4)) {

  if (!addr && addr == cs_etm__last_executed_instr(tidq->prev_packet))

> +		/* Update the perf_sample flags using the prev_packet
> +		 * since that is the queue we are synthesizing.
> +		 */
> +		sample.flags = tidq->prev_packet->flags;
> +
> +		/* The last instruction of the previous queue would be a
> +		 * branch operation. Get the target of that branch by looking
> +		 * into the first executed instruction of the current packet
> +		 * queue.
> +		 */
> +		sample.addr = cs_etm__first_executed_instr(tidq->packet);

If connected to the change suggested for cleaning up packet in
cs_etm__packet_swap(), when run at here, if "tidq->packet" is a valid
packet, then it will return a branch target address, otherwise, it
will return 0.

> +	}
>  
>  	if (etm->synth_opts.last_branch)
>  		sample.branch_stack = tidq->last_branch;
> @@ -1771,7 +1787,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>  	/* Get instructions remainder from previous packet */
>  	instrs_prev = tidq->period_instructions;
>  
> -	tidq->period_instructions += tidq->packet->instr_count;
> +	tidq->period_instructions += tidq->prev_packet->instr_count;

A side effect for this change is we will defer to synthesize instruction
samples for _current_ packet, either the packet will be handled after
a new packet incoming, or at the end of a trace chunk.

The problem is for the later one, we can see cs_etm__end_block() and
cs_etm__flush() both only handle the previous packet. As a result, the
last packet will be ignored.

I would suggest we need to firstly fix this issue in
cs_etm__end_block() and cs_etm__flush() (maybe we need to consider to
consolidate the code with cs_etm__sample()).

>  	/*
>  	 * Record a branch when the last instruction in
> @@ -1851,8 +1867,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>  			 * been executed, but PC has not advanced to next
>  			 * instruction)
>  			 */
> +			/* Get address from prev_packet since we are synthesizing
> +			 * that in cs_etm__synth_instruction_sample()
> +			 */
>  			addr = cs_etm__instr_addr(etmq, trace_chan_id,
> -						  tidq->packet, offset - 1);
> +						  tidq->prev_packet, offset - 1);
>  			ret = cs_etm__synth_instruction_sample(
>  				etmq, tidq, addr,
>  				etm->instructions_sample_period);
> @@ -1916,7 +1935,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>  
>  	/* Handle start tracing packet */
>  	if (tidq->prev_packet->sample_type == CS_ETM_EMPTY)
> -		goto swap_packet;
> +		goto reset_last_br;
>  
>  	if (etmq->etm->synth_opts.last_branch &&
>  	    etmq->etm->synth_opts.instructions &&
> @@ -1952,8 +1971,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>  			return err;
>  	}
>  
> -swap_packet:
> -	cs_etm__packet_swap(etm, tidq);
> +reset_last_br:

As said, if we consolidate cs_etm__flush() for processing both
previous packet and current packet, then we don't need to remove
cs_etm__packet_swap() at here, right?

Thanks,
Leo

>  
>  	/* Reset last branches after flush the trace */
>  	if (etm->synth_opts.last_branch)
> -- 
> 2.43.0
> 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V3 1/2] perf: cs-etm: Fixes in instruction sample synthesis
  2025-03-27 15:35   ` Leo Yan
@ 2025-04-01 16:58     ` Tanmay Jagdale
  0 siblings, 0 replies; 5+ messages in thread
From: Tanmay Jagdale @ 2025-04-01 16:58 UTC (permalink / raw)
  To: leo.yan, Tanmay Jagdale
  Cc: suzuki.poulose, mike.leach, james.clark, john.g.garry, leo.yan,
	will, acme, adrian.hunter, linux-arm-kernel, linux-perf-users,
	coresight, linux-kernel, sgoutham, gcherian

From: Leo Yan <leo.yan@arm.com>

Hi Leo,

I was on vacation so could not get back earlier.
> 
> > Hi Tanmay,
> > 
> > On Thu, Mar 27, 2025 at 04:41:48PM +0530, Tanmay Jagdale wrote:
>>> The existing method to synthesize instruction samples has the
>>> following issues:
>>> 1. Branch instruction mnemonics were being added to non-branch
>>>    instructions too.
>>> 2. Branch target address was missing
>>> 
>>> To fix the issues, start synthesizing the instructions from the
>>> previous packet (tidq->prev_packet) instead of current packet
>>> (tidq->packet). This way it's easy to figure out the target
>>> address of the branch instruction in tidq->prev_packet which
>>> is the current packet's (tidq->packet) first executed instruction.
>>> 
>>> Since we have now switched to processing the previous packet
>>> first, we need not swap the packets during cs_etm__flush().
>>> 
>>> Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
>>> Reviewed-by: James Clark <james.clark@arm.com>
>>
>> I saw James's reviewed tag.  However, I have several comments.
>> 
>> Sorry I jumped in too late.
No problem, thanks for the review.

> 
>>> ---
>>>  tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++-------
>>>  1 file changed, 25 insertions(+), 7 deletions(-)
>>> 
>>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>>> index 0bf9e5c27b59..ebed5b98860e 100644
>>> --- a/tools/perf/util/cs-etm.c
>>> +++ b/tools/perf/util/cs-etm.c
>>> @@ -1576,10 +1576,26 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> 
> Seems to me, the problem is cs_etm__synth_instruction_sample() is
> invoked from multiple callers.
> 
> Both the previous packet and packet are valid fo the flow:
>   cs_etm__sample()
>     `> cs_etm__synth_instruction_sample()
> 
> Only the previous packet is valid and the current packet stores stale
> data for the flows:
> 
>   cs_etm__flush()
>     `> cs_etm__synth_instruction_sample()
> 
>  cs_etm__end_block()
>    `> cs_etm__synth_instruction_sample()
> 
> First, as a prerequisite, I think we should resolve the stale data in
> the packet.  So we need a fix like:
Agree.

> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c                
> index 0bf9e5c27b59..b7b17c0e4806 100644                                         
> --- a/tools/perf/util/cs-etm.c                                                  
> +++ b/tools/perf/util/cs-etm.c                                                  
> @@ -741,6 +741,9 @@ static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
>                                                                                 
>         if (etm->synth_opts.branches || etm->synth_opts.last_branch ||          
>             etm->synth_opts.instructions) {                                     
> +               /* The previous packet will not be used, cleanup it */          
> +               memset(tidq->prev_packet, 0x0, sizeof(*tidq->packet));          
> +                                                                               
>                 /*                                                              
>                  * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for 
>                  * the next incoming packet.                                    
> 
Thanks for pointing out, I'll include this fix.

>>>  	sample.stream_id = etmq->etm->instructions_id;
>>>  	sample.period = period;
>>>  	sample.cpu = tidq->packet->cpu;
> 
> Should we use "prev_packet->cpu" at here?
> 
> Even for a branch instruction, as its IP address is from the previous
> packet, we should use "prev_packet->cpu" for CPU ID as well.
ACK.

> 
>>> -	sample.flags = tidq->prev_packet->flags;
>>>  	sample.cpumode = event->sample.header.misc;
>>>  
>>> -	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
>>> +	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet, &sample);
>>> +
>>> +	/* Populate branch target information only when we encounter
>>> +	 * branch instruction, which is at the end of tidq->prev_packet.
>>> +	 */
>>> +	if (addr == (tidq->prev_packet->end_addr - 4)) {
> 
>   if (!addr && addr == cs_etm__last_executed_instr(tidq->prev_packet))
> 
>>> +		/* Update the perf_sample flags using the prev_packet
>>> +		 * since that is the queue we are synthesizing.
>>> +		 */
>>> +		sample.flags = tidq->prev_packet->flags;
>>> +
>>> +		/* The last instruction of the previous queue would be a
>>> +		 * branch operation. Get the target of that branch by looking
>>> +		 * into the first executed instruction of the current packet
>>> +		 * queue.
>>> +		 */
>>> +		sample.addr = cs_etm__first_executed_instr(tidq->packet);
> 
> If connected to the change suggested for cleaning up packet in
> cs_etm__packet_swap(), when run at here, if "tidq->packet" is a valid
> packet, then it will return a branch target address, otherwise, it
> will return 0.
> 
>>> +	}
>>>  
>>>  	if (etm->synth_opts.last_branch)
>>>  		sample.branch_stack = tidq->last_branch;
>>> @@ -1771,7 +1787,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>>>  	/* Get instructions remainder from previous packet */
>>>  	instrs_prev = tidq->period_instructions;
>>>  
>>> -	tidq->period_instructions += tidq->packet->instr_count;
>>> +	tidq->period_instructions += tidq->prev_packet->instr_count;
> 
> A side effect for this change is we will defer to synthesize instruction
> samples for _current_ packet, either the packet will be handled after
> a new packet incoming, or at the end of a trace chunk.
> 
> The problem is for the later one, we can see cs_etm__end_block() and
> cs_etm__flush() both only handle the previous packet. As a result, the
> last packet will be ignored.
Yes I agree, this is a side effect of the patch. The last packet's instructions
are not handled.

> 
> I would suggest we need to firstly fix this issue in
> cs_etm__end_block() and cs_etm__flush() (maybe we need to consider to
> consolidate the code with cs_etm__sample()).
Okay sure. I will take a look at consolidating the code and post them in
the next version.

> 
>>>  	/*
>>>  	 * Record a branch when the last instruction in
>>> @@ -1851,8 +1867,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>>>  			 * been executed, but PC has not advanced to next
>>>  			 * instruction)
>>>  			 */
>>> +			/* Get address from prev_packet since we are synthesizing
>>> +			 * that in cs_etm__synth_instruction_sample()
>>> +			 */
>>>  			addr = cs_etm__instr_addr(etmq, trace_chan_id,
>>> -						  tidq->packet, offset - 1);
>>> +						  tidq->prev_packet, offset - 1);
>>>  			ret = cs_etm__synth_instruction_sample(
>>>  				etmq, tidq, addr,
>>>  				etm->instructions_sample_period);
>>> @@ -1916,7 +1935,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>>>  
>>>  	/* Handle start tracing packet */
>>>  	if (tidq->prev_packet->sample_type == CS_ETM_EMPTY)
>>> -		goto swap_packet;
>>> +		goto reset_last_br;
>>>  
>>>  	if (etmq->etm->synth_opts.last_branch &&
>>>  	    etmq->etm->synth_opts.instructions &&
>>> @@ -1952,8 +1971,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>>>  			return err;
>>>  	}
>>>  
>>> -swap_packet:
>>> -	cs_etm__packet_swap(etm, tidq);
>>> +reset_last_br:
> 
> As said, if we consolidate cs_etm__flush() for processing both
> previous packet and current packet, then we don't need to remove
> cs_etm__packet_swap() at here, right?
Yes I think so too.

Thanks,
Tanmay
> 
> Thanks,
> Leo
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-04-01 17:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-27 11:11 [PATCH V3 0/2] Fix Coresight instruction synthesis logic Tanmay Jagdale
2025-03-27 11:11 ` [PATCH V3 1/2] perf: cs-etm: Fixes in instruction sample synthesis Tanmay Jagdale
2025-03-27 15:35   ` Leo Yan
2025-04-01 16:58     ` Tanmay Jagdale
2025-03-27 11:11 ` [PATCH V3 2/2] perf: cs-etm: Store previous timestamp in packet queue Tanmay Jagdale

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).