* [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
@ 2024-07-19 9:26 Ganapatrao Kulkarni
2024-07-19 14:39 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-07-19 9:26 UTC (permalink / raw)
To: james.clark, mike.leach, suzuki.poulose
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger, gankulkarni
To generate the instruction tracing, script uses 2 contiguous packets
address range. If there a continuity brake due to discontiguous branch
address, it is required to reset the tracing and start tracing with the
new set of contiguous packets.
Adding change to identify the break and complete the remaining tracing
of current packets and restart tracing from new set of packets, if
continuity is established.
Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
---
tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py
index d973c2baed1c..ad10cee2c35e 100755
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -198,6 +198,10 @@ def process_event(param_dict):
cpu_data[str(cpu) + 'addr'] = addr
return
+ if (cpu_data.get(str(cpu) + 'ip') == None):
+ cpu_data[str(cpu) + 'ip'] = ip
+
+ prev_ip = cpu_data[str(cpu) + 'ip']
if (options.verbose == True):
print("Event type: %s" % name)
@@ -243,12 +247,18 @@ def process_event(param_dict):
# Record for previous sample packet
cpu_data[str(cpu) + 'addr'] = addr
+ cpu_data[str(cpu) + 'ip'] = stop_addr
# Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
if (start_addr == 0 and stop_addr == 4):
print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
return
+ if (stop_addr < start_addr):
+ # Continuity of the Packets broken, set start_addr to previous
+ # packet ip to complete the remaining tracing of the address range.
+ start_addr = prev_ip
+
if (start_addr < int(dso_start) or start_addr > int(dso_end)):
print("Start address 0x%x is out of range [ 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
return
--
2.45.2
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-07-19 9:26 [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken Ganapatrao Kulkarni
@ 2024-07-19 14:39 ` James Clark
2024-07-22 10:02 ` Ganapatrao Kulkarni
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-07-19 14:39 UTC (permalink / raw)
To: Ganapatrao Kulkarni, james.clark, mike.leach, suzuki.poulose,
Leo Yan
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger
On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
> To generate the instruction tracing, script uses 2 contiguous packets
> address range. If there a continuity brake due to discontiguous branch
> address, it is required to reset the tracing and start tracing with the
> new set of contiguous packets.
>
> Adding change to identify the break and complete the remaining tracing
> of current packets and restart tracing from new set of packets, if
> continuity is established.
>
Hi Ganapatrao,
Can you add a before and after example of what's changed to the commit
message? It wasn't immediately obvious to me if this is adding missing
output, or it was correcting the tail end of the output that was
previously wrong.
> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> ---
> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> index d973c2baed1c..ad10cee2c35e 100755
> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> @@ -198,6 +198,10 @@ def process_event(param_dict):
> cpu_data[str(cpu) + 'addr'] = addr
> return
>
> + if (cpu_data.get(str(cpu) + 'ip') == None):
> + cpu_data[str(cpu) + 'ip'] = ip
> +
Do you need to write into the global cpu_data here? Doesn't it get
overwritten after you load it back into 'prev_ip'
prev_ip = cpu_data[str(cpu) + 'ip']
... then ...
# Record for previous sample packet
cpu_data[str(cpu) + 'addr'] = addr
cpu_data[str(cpu) + 'ip'] = stop_addr
Would a local variable not accomplish the same thing?
> + prev_ip = cpu_data[str(cpu) + 'ip']
>
> if (options.verbose == True):
> print("Event type: %s" % name)
> @@ -243,12 +247,18 @@ def process_event(param_dict):
>
> # Record for previous sample packet
> cpu_data[str(cpu) + 'addr'] = addr
> + cpu_data[str(cpu) + 'ip'] = stop_addr
>
> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
> if (start_addr == 0 and stop_addr == 4):
> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
> return
>
> + if (stop_addr < start_addr):
> + # Continuity of the Packets broken, set start_addr to previous
> + # packet ip to complete the remaining tracing of the address range.
> + start_addr = prev_ip
> +
> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
> print("Start address 0x%x is out of range [ 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
> return
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-07-19 14:39 ` James Clark
@ 2024-07-22 10:02 ` Ganapatrao Kulkarni
2024-07-23 13:10 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-07-22 10:02 UTC (permalink / raw)
To: James Clark, james.clark, mike.leach, suzuki.poulose, Leo Yan
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger
Hi James,
On 19-07-2024 08:09 pm, James Clark wrote:
>
>
> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>> To generate the instruction tracing, script uses 2 contiguous packets
>> address range. If there a continuity brake due to discontiguous branch
>> address, it is required to reset the tracing and start tracing with the
>> new set of contiguous packets.
>>
>> Adding change to identify the break and complete the remaining tracing
>> of current packets and restart tracing from new set of packets, if
>> continuity is established.
>>
>
> Hi Ganapatrao,
>
> Can you add a before and after example of what's changed to the commit
> message? It wasn't immediately obvious to me if this is adding missing
> output, or it was correcting the tail end of the output that was
> previously wrong.
It is adding tail end of the trace as well avoiding the segfault of the
perf application. With out this change the perf segfaults with as below log
./perf script --script=python:./scripts/python/arm-cs-trace-disasm.py --
-d objdump -k ../../vmlinux -v $* > dump
objdump: error: the stop address should be after the start address
Traceback (most recent call last):
File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
process_event
print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
File "./scripts/python/arm-cs-trace-disasm.py", line 105, in print_disam
for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
disasm_output = check_output(disasm).decode('utf-8').split('\n')
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
'--start-address=0xffff80008125b758',
'--stop-address=0xffff80008125a934', '../../vmlinux']' returned non-zero
exit status 1.
Fatal Python error: handler_call_die: problem in Python trace event handler
Python runtime state: initialized
Current thread 0x0000ffffb05054e0 (most recent call first):
<no Python frame>
Extension modules: perf_trace_context, systemd._journal,
systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
problem._py3abrt (total: 7)
Aborted (core dumped)
>
>> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
>> ---
>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> index d973c2baed1c..ad10cee2c35e 100755
>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>> cpu_data[str(cpu) + 'addr'] = addr
>> return
>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>> + cpu_data[str(cpu) + 'ip'] = ip
>> +
>
> Do you need to write into the global cpu_data here? Doesn't it get
> overwritten after you load it back into 'prev_ip'
No, the logic is same as holding the addr of previous packet.
Saving the previous packet saved ip in to prev_ip before overwriting
with the current packet.
>
> prev_ip = cpu_data[str(cpu) + 'ip']
>
> ... then ...
>
> # Record for previous sample packet
> cpu_data[str(cpu) + 'addr'] = addr
> cpu_data[str(cpu) + 'ip'] = stop_addr
>
> Would a local variable not accomplish the same thing?
No, We need global to hold the ip of previous packet.
>
>> + prev_ip = cpu_data[str(cpu) + 'ip']
>> if (options.verbose == True):
>> print("Event type: %s" % name)
>> @@ -243,12 +247,18 @@ def process_event(param_dict):
>> # Record for previous sample packet
>> cpu_data[str(cpu) + 'addr'] = addr
>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
>> if (start_addr == 0 and stop_addr == 4):
>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
>> return
>> + if (stop_addr < start_addr):
>> + # Continuity of the Packets broken, set start_addr to previous
>> + # packet ip to complete the remaining tracing of the address
>> range.
>> + start_addr = prev_ip
>> +
>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>> print("Start address 0x%x is out of range [ 0x%x .. 0x%x ]
>> for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
>> return
Thanks,
Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-07-22 10:02 ` Ganapatrao Kulkarni
@ 2024-07-23 13:10 ` James Clark
2024-07-23 15:26 ` Ganapatrao Kulkarni
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-07-23 13:10 UTC (permalink / raw)
To: Ganapatrao Kulkarni, james.clark, mike.leach, suzuki.poulose,
Leo Yan
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger
On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>
> Hi James,
>
> On 19-07-2024 08:09 pm, James Clark wrote:
>>
>>
>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>> To generate the instruction tracing, script uses 2 contiguous packets
>>> address range. If there a continuity brake due to discontiguous branch
>>> address, it is required to reset the tracing and start tracing with the
>>> new set of contiguous packets.
>>>
>>> Adding change to identify the break and complete the remaining tracing
>>> of current packets and restart tracing from new set of packets, if
>>> continuity is established.
>>>
>>
>> Hi Ganapatrao,
>>
>> Can you add a before and after example of what's changed to the commit
>> message? It wasn't immediately obvious to me if this is adding missing
>> output, or it was correcting the tail end of the output that was
>> previously wrong.
>
> It is adding tail end of the trace as well avoiding the segfault of the
> perf application. With out this change the perf segfaults with as below log
>
>
> ./perf script --script=python:./scripts/python/arm-cs-trace-disasm.py --
> -d objdump -k ../../vmlinux -v $* > dump
> objdump: error: the stop address should be after the start address
> Traceback (most recent call last):
> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
> process_event
> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in print_disam
> for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
> disasm_output = check_output(disasm).decode('utf-8').split('\n')
> ^^^^^^^^^^^^^^^^^^^^
> File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
> raise CalledProcessError(retcode, process.args,
> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
> '--start-address=0xffff80008125b758',
> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned non-zero
> exit status 1.
> Fatal Python error: handler_call_die: problem in Python trace event handler
> Python runtime state: initialized
>
> Current thread 0x0000ffffb05054e0 (most recent call first):
> <no Python frame>
>
> Extension modules: perf_trace_context, systemd._journal,
> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
> problem._py3abrt (total: 7)
> Aborted (core dumped)
>
>>
>>> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
>>> ---
>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>>> 1 file changed, 10 insertions(+)
>>>
>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> index d973c2baed1c..ad10cee2c35e 100755
>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>> cpu_data[str(cpu) + 'addr'] = addr
>>> return
>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>> + cpu_data[str(cpu) + 'ip'] = ip
>>> +
>>
>> Do you need to write into the global cpu_data here? Doesn't it get
>> overwritten after you load it back into 'prev_ip'
>
> No, the logic is same as holding the addr of previous packet.
> Saving the previous packet saved ip in to prev_ip before overwriting
> with the current packet.
It's not exactly the same logic as holding the addr of the previous
sample. For addr, we return on the first None, with your change we now
"pretend" that the second one is also the previous one:
if (cpu_data.get(str(cpu) + 'addr') == None):
cpu_data[str(cpu) + 'addr'] = addr
return <----------------------------sample 0 return
if (cpu_data.get(str(cpu) + 'ip') == None):
cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no return
Then for sample 1 'prev_ip' is actually now the 'current' IP:
prev_ip = cpu_data[str(cpu) + 'ip']
This means that prev_ip is sometimes the previous sample's IP only
sometimes (samples following 1), otherwise it's the current IP. Does
your fix actually require this bit? Because we already save the 'real'
previous one:
cpu_data[str(cpu) + 'ip'] = stop_addr
Also normally we save ip + 4 (stop_addr), where as you save ip. It's not
clear why there is no need to add the 4?
>>
>> prev_ip = cpu_data[str(cpu) + 'ip']
>>
>> ... then ...
>>
>> # Record for previous sample packet
>> cpu_data[str(cpu) + 'addr'] = addr
>> cpu_data[str(cpu) + 'ip'] = stop_addr
>>
>> Would a local variable not accomplish the same thing?
>
> No, We need global to hold the ip of previous packet.
>>
>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>> if (options.verbose == True):
>>> print("Event type: %s" % name)
>>> @@ -243,12 +247,18 @@ def process_event(param_dict):
>>> # Record for previous sample packet
>>> cpu_data[str(cpu) + 'addr'] = addr
>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
>>> if (start_addr == 0 and stop_addr == 4):
>>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
>>> return
>>> + if (stop_addr < start_addr):
>>> + # Continuity of the Packets broken, set start_addr to previous
>>> + # packet ip to complete the remaining tracing of the address
>>> range.
After looking a bit more I'm also not sure why stop_addr < start_addr
signifies a discontinuity. What if the discontinuity ends up with
stop_addr > start_addr? There's no reason it can't jump forwards as well
as backwards.
Can you share the 3 samples from the --verbose output to the script that
cause the issue?
I see discontinuities as having the branch source (ip) set to 0 which is
what we do at the start:
Sample = { cpu: 0000 addr: 0x0000ffffb807adac phys_addr:
0x0000000000000000 ip: 0x0000000000000000 pid: 28388 }
Then the ending one has the branch target (addr) set to 0:
Sample = { cpu: 0000 addr: 0x0000000000000000 phys_addr:
0x0000000000000000 ip: 0x0000ffffb7eee168 pid: 28388 }
And it doesn't hit objdump because of the range check:
Start address 0x0 is out of range ...
So I don't see any missing disassembly or crashes for this.
>>> + start_addr = prev_ip
>>> +
>>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>>> print("Start address 0x%x is out of range [ 0x%x .. 0x%x ]
>>> for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
>>> return
>
> Thanks,
> Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-07-23 13:10 ` James Clark
@ 2024-07-23 15:26 ` Ganapatrao Kulkarni
2024-07-23 15:46 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-07-23 15:26 UTC (permalink / raw)
To: James Clark, james.clark, mike.leach, suzuki.poulose, Leo Yan
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger
On 23-07-2024 06:40 pm, James Clark wrote:
>
>
> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>
>> Hi James,
>>
>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>
>>>
>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>> To generate the instruction tracing, script uses 2 contiguous packets
>>>> address range. If there a continuity brake due to discontiguous branch
>>>> address, it is required to reset the tracing and start tracing with the
>>>> new set of contiguous packets.
>>>>
>>>> Adding change to identify the break and complete the remaining tracing
>>>> of current packets and restart tracing from new set of packets, if
>>>> continuity is established.
>>>>
>>>
>>> Hi Ganapatrao,
>>>
>>> Can you add a before and after example of what's changed to the
>>> commit message? It wasn't immediately obvious to me if this is adding
>>> missing output, or it was correcting the tail end of the output that
>>> was previously wrong.
>>
>> It is adding tail end of the trace as well avoiding the segfault of
>> the perf application. With out this change the perf segfaults with as
>> below log
>>
>>
>> ./perf script --script=python:./scripts/python/arm-cs-trace-disasm.py
>> -- -d objdump -k ../../vmlinux -v $* > dump
>> objdump: error: the stop address should be after the start address
>> Traceback (most recent call last):
>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>> process_event
>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>> print_disam
>> for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>> ^^^^^^^^^^^^^^^^^^^^
>> File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>> raise CalledProcessError(retcode, process.args,
>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>> '--start-address=0xffff80008125b758',
>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>> non-zero exit status 1.
>> Fatal Python error: handler_call_die: problem in Python trace event
>> handler
>> Python runtime state: initialized
>>
>> Current thread 0x0000ffffb05054e0 (most recent call first):
>> <no Python frame>
>>
>> Extension modules: perf_trace_context, systemd._journal,
>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>> problem._py3abrt (total: 7)
>> Aborted (core dumped)
>>
>>>
>>>> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
>>>> ---
>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>>>> 1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>> return
>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>> +
>>>
>>> Do you need to write into the global cpu_data here? Doesn't it get
>>> overwritten after you load it back into 'prev_ip'
>>
>> No, the logic is same as holding the addr of previous packet.
>> Saving the previous packet saved ip in to prev_ip before overwriting
>> with the current packet.
>
> It's not exactly the same logic as holding the addr of the previous
> sample. For addr, we return on the first None, with your change we now
> "pretend" that the second one is also the previous one:
>
> if (cpu_data.get(str(cpu) + 'addr') == None):
> cpu_data[str(cpu) + 'addr'] = addr
> return <----------------------------sample 0 return
>
> if (cpu_data.get(str(cpu) + 'ip') == None):
> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no return
>
> Then for sample 1 'prev_ip' is actually now the 'current' IP:
Yes, it is dummy for first packet. Added anticipating that we wont hit
the discontinuity for the first packet itself.
Can this be changed to more intuitive like below?
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
b/tools/perf/scripts/python/arm-cs-trace-disasm.py
index d973c2baed1c..d49f5090059f 100755
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -198,6 +198,8 @@ def process_event(param_dict):
cpu_data[str(cpu) + 'addr'] = addr
return
+ if (cpu_data.get(str(cpu) + 'ip') != None):
+ prev_ip = cpu_data[str(cpu) + 'ip']
if (options.verbose == True):
print("Event type: %s" % name)
@@ -243,12 +245,18 @@ def process_event(param_dict):
# Record for previous sample packet
cpu_data[str(cpu) + 'addr'] = addr
+ cpu_data[str(cpu) + 'ip'] = stop_addr
# Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
if (start_addr == 0 and stop_addr == 4):
print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
return
+ if (stop_addr < start_addr and prev_ip != 0):
+ # Continuity of the Packets broken, set start_addr to
previous
+ # packet ip to complete the remaining tracing of the
address range.
+ start_addr = prev_ip
+
if (start_addr < int(dso_start) or start_addr > int(dso_end)):
print("Start address 0x%x is out of range [ 0x%x ..
0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
return
Without this patch below is the failure log(with segfault) for reference.
[root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 1 dd
if=/dev/zero of=/dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.087 MB perf.data ]
[root@sut01sys-r214 perf]# ./perf script
--script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump -k
../../vmlinux -v $* > dump
objdump: error: the stop address should be after the start address
Traceback (most recent call last):
File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
process_event
print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
File "./scripts/python/arm-cs-trace-disasm.py", line 105, in print_disam
for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
disasm_output = check_output(disasm).decode('utf-8').split('\n')
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
'--start-address=0xffff80008125b758',
'--stop-address=0xffff80008125a934', '../../vmlinux']' returned non-zero
exit status 1.
Fatal Python error: handler_call_die: problem in Python trace event handler
Python runtime state: initialized
Current thread 0x0000ffffb90d54e0 (most recent call first):
<no Python frame>
Extension modules: perf_trace_context, systemd._journal,
systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
problem._py3abrt (total: 7)
Aborted (core dumped)
dump snippet:
============
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff800080313f04 <__perf_event_header__init_id+0x4c>:
ffff800080313f04: 36100094 tbz w20, #2,
ffff800080313f14 <__perf_event_header__init_id+0x5c>
ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
ffff800080313f0c: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040
__perf_event_header__init_id+0x54
.../coresight/linux/kernel/events/core.c 586 return event->clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff80008030cb00 <local_clock>:
ffff80008030cb00: d503233f paciasp
ffff80008030cb04: a9bf7bfd stp x29, x30, [sp,
#-16]!
ffff80008030cb08: 910003fd mov x29, sp
ffff80008030cb0c: 97faba67 bl
ffff8000801bb4a8 <sched_clock>
perf 12720/12720 [0001] 5986.372298040
local_clock+0xc
...t/linux/./include/linux/sched/clock.h 64 return sched_clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff8000801bb4a8 <sched_clock>:
ffff8000801bb4a8: d503233f paciasp
ffff8000801bb4ac: a9be7bfd stp x29, x30, [sp,
#-32]!
ffff8000801bb4b0: 910003fd mov x29, sp
ffff8000801bb4b4: a90153f3 stp x19, x20, [sp, #16]
ffff8000801bb4b8: d5384113 mrs x19, sp_el0
ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
ffff8000801bb4c0: 11000400 add w0, w0, #0x1
ffff8000801bb4c4: b9001260 str w0, [x19, #16]
ffff8000801bb4c8: 94427cf8 bl
ffff80008125a8a8 <sched_clock_noinstr>
perf 12720/12720 [0001] 5986.372298040
sched_clock+0x20
...sight/linux/kernel/time/sched_clock.c 105 ns =
sched_clock_noinstr();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff80008125a8a8 <sched_clock_noinstr>:
ffff80008125a8a8: d503233f paciasp
ffff80008125a8ac: a9bc7bfd stp x29, x30, [sp,
#-64]!
ffff80008125a8b0: 910003fd mov x29, sp
ffff80008125a8b4: a90153f3 stp x19, x20, [sp, #16]
ffff80008125a8b8: b000e354 adrp x20,
ffff800082ec3000 <tick_bc_dev+0x140>
ffff80008125a8bc: 910d0294 add x20, x20, #0x340
ffff80008125a8c0: a90363f7 stp x23, x24, [sp, #48]
ffff80008125a8c4: 91002297 add x23, x20, #0x8
ffff80008125a8c8: 52800518 mov w24, #0x28
// #40
ffff80008125a8cc: a9025bf5 stp x21, x22, [sp, #32]
ffff80008125a8d0: b9400296 ldr w22, [x20]
ffff80008125a8d4: 120002d5 and w21, w22, #0x1
ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
ffff80008125a8dc: 8b1502f3 add x19, x23, x21
ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
ffff80008125a8e4: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040
sched_clock_noinstr+0x3c
...sight/linux/kernel/time/sched_clock.c 93 cyc =
(rd->read_sched_clock() - rd->epoch_cyc) &
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
With fix:
=========
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff800080313f04 <__perf_event_header__init_id+0x4c>:
ffff800080313f04: 36100094 tbz w20, #2,
ffff800080313f14 <__perf_event_header__init_id+0x5c>
ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
ffff800080313f0c: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040
__perf_event_header__init_id+0x54
.../coresight/linux/kernel/events/core.c 586 return event->clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff80008030cb00 <local_clock>:
ffff80008030cb00: d503233f paciasp
ffff80008030cb04: a9bf7bfd stp x29, x30, [sp,
#-16]!
ffff80008030cb08: 910003fd mov x29, sp
ffff80008030cb0c: 97faba67 bl
ffff8000801bb4a8 <sched_clock>
perf 12720/12720 [0001] 5986.372298040
local_clock+0xc
...t/linux/./include/linux/sched/clock.h 64 return sched_clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff8000801bb4a8 <sched_clock>:
ffff8000801bb4a8: d503233f paciasp
ffff8000801bb4ac: a9be7bfd stp x29, x30, [sp,
#-32]!
ffff8000801bb4b0: 910003fd mov x29, sp
ffff8000801bb4b4: a90153f3 stp x19, x20, [sp, #16]
ffff8000801bb4b8: d5384113 mrs x19, sp_el0
ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
ffff8000801bb4c0: 11000400 add w0, w0, #0x1
ffff8000801bb4c4: b9001260 str w0, [x19, #16]
ffff8000801bb4c8: 94427cf8 bl
ffff80008125a8a8 <sched_clock_noinstr>
perf 12720/12720 [0001] 5986.372298040
sched_clock+0x20
...sight/linux/kernel/time/sched_clock.c 105 ns =
sched_clock_noinstr();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff80008125a8a8 <sched_clock_noinstr>:
ffff80008125a8a8: d503233f paciasp
ffff80008125a8ac: a9bc7bfd stp x29, x30, [sp,
#-64]!
ffff80008125a8b0: 910003fd mov x29, sp
ffff80008125a8b4: a90153f3 stp x19, x20, [sp, #16]
ffff80008125a8b8: b000e354 adrp x20,
ffff800082ec3000 <tick_bc_dev+0x140>
ffff80008125a8bc: 910d0294 add x20, x20, #0x340
ffff80008125a8c0: a90363f7 stp x23, x24, [sp, #48]
ffff80008125a8c4: 91002297 add x23, x20, #0x8
ffff80008125a8c8: 52800518 mov w24, #0x28
// #40
ffff80008125a8cc: a9025bf5 stp x21, x22, [sp, #32]
ffff80008125a8d0: b9400296 ldr w22, [x20]
ffff80008125a8d4: 120002d5 and w21, w22, #0x1
ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
ffff80008125a8dc: 8b1502f3 add x19, x23, x21
ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
ffff80008125a8e4: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040
sched_clock_noinstr+0x3c
...sight/linux/kernel/time/sched_clock.c 93 cyc =
(rd->read_sched_clock() - rd->epoch_cyc) &
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff80008125a8e8 <sched_clock_noinstr+0x40>:
ffff80008125a8e8: f8756ae3 ldr x3, [x23, x21]
ffff80008125a8ec: a9409666 ldp x6, x5, [x19, #8]
ffff80008125a8f0: 29441261 ldp w1, w4, [x19, #32]
ffff80008125a8f4: d50339bf dmb ishld
ffff80008125a8f8: b9400282 ldr w2, [x20]
ffff80008125a8fc: 6b16005f cmp w2, w22
ffff80008125a900: 54fffe81 b.ne
ffff80008125a8d0 <sched_clock_noinstr+0x28> // b.any
ffff80008125a904: cb060000 sub x0, x0, x6
ffff80008125a908: 2a0103e1 mov w1, w1
ffff80008125a90c: 8a050000 and x0, x0, x5
ffff80008125a910: a94153f3 ldp x19, x20, [sp, #16]
ffff80008125a914: 9b017c00 mul x0, x0, x1
ffff80008125a918: a9425bf5 ldp x21, x22, [sp, #32]
ffff80008125a91c: a94363f7 ldp x23, x24, [sp, #48]
ffff80008125a920: 9ac42400 lsr x0, x0, x4
ffff80008125a924: a8c47bfd ldp x29, x30, [sp], #64
ffff80008125a928: d50323bf autiasp
ffff80008125a92c: 8b030000 add x0, x0, x3
ffff80008125a930: d65f03c0 ret
perf 12720/12720 [0001] 5986.372298040
sched_clock_noinstr+0x88
...sight/linux/kernel/time/sched_clock.c 99 }
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4ec phys_addr:
0x0000000000000000 ip: 0xffff8000801bb4e4 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff8000801bb4cc <sched_clock+0x24>:
ffff8000801bb4cc: aa0003f4 mov x20, x0
ffff8000801bb4d0: f9400a61 ldr x1, [x19, #16]
ffff8000801bb4d4: d1000421 sub x1, x1, #0x1
ffff8000801bb4d8: b9001261 str w1, [x19, #16]
ffff8000801bb4dc: b4000061 cbz x1,
ffff8000801bb4e8 <sched_clock+0x40>
ffff8000801bb4e0: f9400a60 ldr x0, [x19, #16]
ffff8000801bb4e4: b5000040 cbnz x0,
ffff8000801bb4ec <sched_clock+0x44>
perf 12720/12720 [0001] 5986.372298040
sched_clock+0x3c
...ux/./arch/arm64/include/asm/preempt.h 74 return !pc ||
!READ_ONCE(ti->preempt_count);
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008030cb10 phys_addr:
0x0000000000000000 ip: 0xffff8000801bb4fc pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff8000801bb4ec <sched_clock+0x44>:
ffff8000801bb4ec: aa1403e0 mov x0, x20
ffff8000801bb4f0: a94153f3 ldp x19, x20, [sp, #16]
ffff8000801bb4f4: a8c27bfd ldp x29, x30, [sp], #32
ffff8000801bb4f8: d50323bf autiasp
ffff8000801bb4fc: d65f03c0 ret
perf 12720/12720 [0001] 5986.372298040
sched_clock+0x54
...sight/linux/kernel/time/sched_clock.c 108 }
Still we miss tracing of 0xffff80008125b758, however seg-fault is avoided.
>
> prev_ip = cpu_data[str(cpu) + 'ip']
>
> This means that prev_ip is sometimes the previous sample's IP only
> sometimes (samples following 1), otherwise it's the current IP. Does
> your fix actually require this bit? Because we already save the 'real'
> previous one:
>
> cpu_data[str(cpu) + 'ip'] = stop_addr
>
> Also normally we save ip + 4 (stop_addr), where as you save ip. It's not
> clear why there is no need to add the 4?
>
>
>>>
>>> prev_ip = cpu_data[str(cpu) + 'ip']
>>>
>>> ... then ...
>>>
>>> # Record for previous sample packet
>>> cpu_data[str(cpu) + 'addr'] = addr
>>> cpu_data[str(cpu) + 'ip'] = stop_addr
>>>
>>> Would a local variable not accomplish the same thing?
>>
>> No, We need global to hold the ip of previous packet.
>>>
>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>> if (options.verbose == True):
>>>> print("Event type: %s" % name)
>>>> @@ -243,12 +247,18 @@ def process_event(param_dict):
>>>> # Record for previous sample packet
>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
>>>> if (start_addr == 0 and stop_addr == 4):
>>>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
>>>> return
>>>> + if (stop_addr < start_addr):
>>>> + # Continuity of the Packets broken, set start_addr to previous
>>>> + # packet ip to complete the remaining tracing of the
>>>> address range.
>
> After looking a bit more I'm also not sure why stop_addr < start_addr
> signifies a discontinuity. What if the discontinuity ends up with
> stop_addr > start_addr? There's no reason it can't jump forwards as well
> as backwards.
>
> Can you share the 3 samples from the --verbose output to the script that
> cause the issue?
>
> I see discontinuities as having the branch source (ip) set to 0 which is
> what we do at the start:
>
> Sample = { cpu: 0000 addr: 0x0000ffffb807adac phys_addr:
> 0x0000000000000000 ip: 0x0000000000000000 pid: 28388 }
>
> Then the ending one has the branch target (addr) set to 0:
>
> Sample = { cpu: 0000 addr: 0x0000000000000000 phys_addr:
> 0x0000000000000000 ip: 0x0000ffffb7eee168 pid: 28388 }
>
>
> And it doesn't hit objdump because of the range check:
>
> Start address 0x0 is out of range ...
>
> So I don't see any missing disassembly or crashes for this.
>
>>>> + start_addr = prev_ip
>>>> +
>>>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>>>> print("Start address 0x%x is out of range [ 0x%x .. 0x%x ]
>>>> for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
>>>> return
>>
>> Thanks,
>> Ganapat
Thanks,
Ganapat
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-07-23 15:26 ` Ganapatrao Kulkarni
@ 2024-07-23 15:46 ` James Clark
2024-07-24 6:38 ` Ganapatrao Kulkarni
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-07-23 15:46 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger, james.clark, mike.leach, suzuki.poulose, Leo Yan
On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>
>
> On 23-07-2024 06:40 pm, James Clark wrote:
>>
>>
>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>
>>> Hi James,
>>>
>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>
>>>>
>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>> To generate the instruction tracing, script uses 2 contiguous packets
>>>>> address range. If there a continuity brake due to discontiguous branch
>>>>> address, it is required to reset the tracing and start tracing with
>>>>> the
>>>>> new set of contiguous packets.
>>>>>
>>>>> Adding change to identify the break and complete the remaining tracing
>>>>> of current packets and restart tracing from new set of packets, if
>>>>> continuity is established.
>>>>>
>>>>
>>>> Hi Ganapatrao,
>>>>
>>>> Can you add a before and after example of what's changed to the
>>>> commit message? It wasn't immediately obvious to me if this is
>>>> adding missing output, or it was correcting the tail end of the
>>>> output that was previously wrong.
>>>
>>> It is adding tail end of the trace as well avoiding the segfault of
>>> the perf application. With out this change the perf segfaults with as
>>> below log
>>>
>>>
>>> ./perf script --script=python:./scripts/python/arm-cs-trace-disasm.py
>>> -- -d objdump -k ../../vmlinux -v $* > dump
>>> objdump: error: the stop address should be after the start address
>>> Traceback (most recent call last):
>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>> process_event
>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>> print_disam
>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>> stop_addr):
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>> read_disam
>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>> ^^^^^^^^^^^^^^^^^^^^
>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>> raise CalledProcessError(retcode, process.args,
>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>> '--start-address=0xffff80008125b758',
>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>> non-zero exit status 1.
>>> Fatal Python error: handler_call_die: problem in Python trace event
>>> handler
>>> Python runtime state: initialized
>>>
>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>> <no Python frame>
>>>
>>> Extension modules: perf_trace_context, systemd._journal,
>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>> problem._py3abrt (total: 7)
>>> Aborted (core dumped)
>>>
>>>>
>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>> <gankulkarni@os.amperecomputing.com>
>>>>> ---
>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>>>>> 1 file changed, 10 insertions(+)
>>>>>
>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>> return
>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>> +
>>>>
>>>> Do you need to write into the global cpu_data here? Doesn't it get
>>>> overwritten after you load it back into 'prev_ip'
>>>
>>> No, the logic is same as holding the addr of previous packet.
>>> Saving the previous packet saved ip in to prev_ip before overwriting
>>> with the current packet.
>>
>> It's not exactly the same logic as holding the addr of the previous
>> sample. For addr, we return on the first None, with your change we now
>> "pretend" that the second one is also the previous one:
>>
>> if (cpu_data.get(str(cpu) + 'addr') == None):
>> cpu_data[str(cpu) + 'addr'] = addr
>> return <----------------------------sample 0 return
>>
>> if (cpu_data.get(str(cpu) + 'ip') == None):
>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no return
>>
>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>
> Yes, it is dummy for first packet. Added anticipating that we wont hit
> the discontinuity for the first packet itself.
>
> Can this be changed to more intuitive like below?
>
> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> index d973c2baed1c..d49f5090059f 100755
> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> @@ -198,6 +198,8 @@ def process_event(param_dict):
> cpu_data[str(cpu) + 'addr'] = addr
> return
>
> + if (cpu_data.get(str(cpu) + 'ip') != None):
> + prev_ip = cpu_data[str(cpu) + 'ip']
>
> if (options.verbose == True):
> print("Event type: %s" % name)
> @@ -243,12 +245,18 @@ def process_event(param_dict):
>
> # Record for previous sample packet
> cpu_data[str(cpu) + 'addr'] = addr
> + cpu_data[str(cpu) + 'ip'] = stop_addr
>
> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
> if (start_addr == 0 and stop_addr == 4):
> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
> return
>
> + if (stop_addr < start_addr and prev_ip != 0):
> + # Continuity of the Packets broken, set start_addr to
> previous
> + # packet ip to complete the remaining tracing of the
> address range.
> + start_addr = prev_ip
> +
> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
> print("Start address 0x%x is out of range [ 0x%x ..
> 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
> return
>
> Without this patch below is the failure log(with segfault) for reference.
>
> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 1 dd
> if=/dev/zero of=/dev/null
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.087 MB perf.data ]
> [root@sut01sys-r214 perf]# ./perf script
> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump -k
> ../../vmlinux -v $* > dump
> objdump: error: the stop address should be after the start address
> Traceback (most recent call last):
> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
> process_event
> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in print_disam
> for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
> disasm_output = check_output(disasm).decode('utf-8').split('\n')
> ^^^^^^^^^^^^^^^^^^^^
> File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
> raise CalledProcessError(retcode, process.args,
> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
> '--start-address=0xffff80008125b758',
> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned non-zero
> exit status 1.
> Fatal Python error: handler_call_die: problem in Python trace event handler
> Python runtime state: initialized
>
> Current thread 0x0000ffffb90d54e0 (most recent call first):
> <no Python frame>
>
> Extension modules: perf_trace_context, systemd._journal,
> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
> problem._py3abrt (total: 7)
> Aborted (core dumped)
>
>
> dump snippet:
> ============
> Event type: branches
> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720 period:
> 1 time: 5986372298040 }
> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
> ffff800080313f04: 36100094 tbz w20, #2,
> ffff800080313f14 <__perf_event_header__init_id+0x5c>
> ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
> ffff800080313f0c: d63f0000 blr x0
> perf 12720/12720 [0001] 5986.372298040
> __perf_event_header__init_id+0x54
> .../coresight/linux/kernel/events/core.c 586 return
> event->clock();
> Event type: branches
> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720 period:
> 1 time: 5986372298040 }
> ffff80008030cb00 <local_clock>:
> ffff80008030cb00: d503233f paciasp
> ffff80008030cb04: a9bf7bfd stp x29, x30, [sp,
> #-16]!
> ffff80008030cb08: 910003fd mov x29, sp
> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
> <sched_clock>
> perf 12720/12720 [0001] 5986.372298040 local_clock+0xc
> ...t/linux/./include/linux/sched/clock.h 64 return sched_clock();
> Event type: branches
> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720 period:
> 1 time: 5986372298040 }
> ffff8000801bb4a8 <sched_clock>:
> ffff8000801bb4a8: d503233f paciasp
> ffff8000801bb4ac: a9be7bfd stp x29, x30, [sp,
> #-32]!
> ffff8000801bb4b0: 910003fd mov x29, sp
> ffff8000801bb4b4: a90153f3 stp x19, x20, [sp,
> #16]
> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
> <sched_clock_noinstr>
> perf 12720/12720 [0001] 5986.372298040
> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
> ns = sched_clock_noinstr();
> Event type: branches
> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720 period:
> 1 time: 5986372298040 }
> ffff80008125a8a8 <sched_clock_noinstr>:
> ffff80008125a8a8: d503233f paciasp
> ffff80008125a8ac: a9bc7bfd stp x29, x30, [sp,
> #-64]!
> ffff80008125a8b0: 910003fd mov x29, sp
> ffff80008125a8b4: a90153f3 stp x19, x20, [sp,
> #16]
> ffff80008125a8b8: b000e354 adrp x20,
> ffff800082ec3000 <tick_bc_dev+0x140>
> ffff80008125a8bc: 910d0294 add x20, x20, #0x340
> ffff80008125a8c0: a90363f7 stp x23, x24, [sp,
> #48]
> ffff80008125a8c4: 91002297 add x23, x20, #0x8
> ffff80008125a8c8: 52800518 mov w24, #0x28
> // #40
> ffff80008125a8cc: a9025bf5 stp x21, x22, [sp,
> #32]
> ffff80008125a8d0: b9400296 ldr w22, [x20]
> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
> ffff80008125a8e4: d63f0000 blr x0
> perf 12720/12720 [0001] 5986.372298040
> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
> Event type: branches
> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720 period:
> 1 time: 5986372298040 }
>
>
> With fix:
> =========
>
> Event type: branches
> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720 period:
> 1 time: 5986372298040 }
> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
> ffff800080313f04: 36100094 tbz w20, #2,
> ffff800080313f14 <__perf_event_header__init_id+0x5c>
> ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
> ffff800080313f0c: d63f0000 blr x0
> perf 12720/12720 [0001] 5986.372298040
> __perf_event_header__init_id+0x54
> .../coresight/linux/kernel/events/core.c 586 return
> event->clock();
> Event type: branches
> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720 period:
> 1 time: 5986372298040 }
> ffff80008030cb00 <local_clock>:
> ffff80008030cb00: d503233f paciasp
> ffff80008030cb04: a9bf7bfd stp x29, x30, [sp,
> #-16]!
> ffff80008030cb08: 910003fd mov x29, sp
> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
> <sched_clock>
> perf 12720/12720 [0001] 5986.372298040 local_clock+0xc
> ...t/linux/./include/linux/sched/clock.h 64 return sched_clock();
> Event type: branches
> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720 period:
> 1 time: 5986372298040 }
> ffff8000801bb4a8 <sched_clock>:
> ffff8000801bb4a8: d503233f paciasp
> ffff8000801bb4ac: a9be7bfd stp x29, x30, [sp,
> #-32]!
> ffff8000801bb4b0: 910003fd mov x29, sp
> ffff8000801bb4b4: a90153f3 stp x19, x20, [sp,
> #16]
> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
> <sched_clock_noinstr>
> perf 12720/12720 [0001] 5986.372298040
> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
> ns = sched_clock_noinstr();
> Event type: branches
> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720 period:
> 1 time: 5986372298040 }
> ffff80008125a8a8 <sched_clock_noinstr>:
> ffff80008125a8a8: d503233f paciasp
> ffff80008125a8ac: a9bc7bfd stp x29, x30, [sp,
> #-64]!
> ffff80008125a8b0: 910003fd mov x29, sp
> ffff80008125a8b4: a90153f3 stp x19, x20, [sp,
> #16]
> ffff80008125a8b8: b000e354 adrp x20,
> ffff800082ec3000 <tick_bc_dev+0x140>
> ffff80008125a8bc: 910d0294 add x20, x20, #0x340
> ffff80008125a8c0: a90363f7 stp x23, x24, [sp,
> #48]
> ffff80008125a8c4: 91002297 add x23, x20, #0x8
> ffff80008125a8c8: 52800518 mov w24, #0x28
> // #40
> ffff80008125a8cc: a9025bf5 stp x21, x22, [sp,
> #32]
> ffff80008125a8d0: b9400296 ldr w22, [x20]
> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
> ffff80008125a8e4: d63f0000 blr x0
It looks like the disassembly now assumes this BLR wasn't taken. We go
from ffff80008125a8e4 straight through to ...
> perf 12720/12720 [0001] 5986.372298040
> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
> Event type: branches
> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720 period:
> 1 time: 5986372298040 }
> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
> ffff80008125a8e8: f8756ae3 ldr x3, [x23, x21]
ffff80008125a8e4 which is just the previous one +4. Isn't your issue
actually a decode issue in Perf itself? Why is there a discontinuity
without branch samples being generated where either the source or
destination address is 0?
What are your record options to create this issue? As I mentioned in the
previous reply I haven't been able to reproduce it.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-07-23 15:46 ` James Clark
@ 2024-07-24 6:38 ` Ganapatrao Kulkarni
2024-07-24 14:45 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-07-24 6:38 UTC (permalink / raw)
To: James Clark
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger, james.clark, mike.leach, suzuki.poulose, Leo Yan
On 23-07-2024 09:16 pm, James Clark wrote:
>
>
> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>
>>
>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>
>>>
>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>
>>>> Hi James,
>>>>
>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>
>>>>>
>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>> To generate the instruction tracing, script uses 2 contiguous packets
>>>>>> address range. If there a continuity brake due to discontiguous
>>>>>> branch
>>>>>> address, it is required to reset the tracing and start tracing
>>>>>> with the
>>>>>> new set of contiguous packets.
>>>>>>
>>>>>> Adding change to identify the break and complete the remaining
>>>>>> tracing
>>>>>> of current packets and restart tracing from new set of packets, if
>>>>>> continuity is established.
>>>>>>
>>>>>
>>>>> Hi Ganapatrao,
>>>>>
>>>>> Can you add a before and after example of what's changed to the
>>>>> commit message? It wasn't immediately obvious to me if this is
>>>>> adding missing output, or it was correcting the tail end of the
>>>>> output that was previously wrong.
>>>>
>>>> It is adding tail end of the trace as well avoiding the segfault of
>>>> the perf application. With out this change the perf segfaults with
>>>> as below log
>>>>
>>>>
>>>> ./perf script
>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>> objdump -k ../../vmlinux -v $* > dump
>>>> objdump: error: the stop address should be after the start address
>>>> Traceback (most recent call last):
>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>> process_event
>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>> print_disam
>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>> stop_addr):
>>>>
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>> read_disam
>>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>>> ^^^^^^^^^^^^^^^^^^^^
>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>> check_output
>>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>> raise CalledProcessError(retcode, process.args,
>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>> '--start-address=0xffff80008125b758',
>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>> non-zero exit status 1.
>>>> Fatal Python error: handler_call_die: problem in Python trace event
>>>> handler
>>>> Python runtime state: initialized
>>>>
>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>> <no Python frame>
>>>>
>>>> Extension modules: perf_trace_context, systemd._journal,
>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>>> problem._py3abrt (total: 7)
>>>> Aborted (core dumped)
>>>>
>>>>>
>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>> ---
>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>>>>>> 1 file changed, 10 insertions(+)
>>>>>>
>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>> return
>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>> +
>>>>>
>>>>> Do you need to write into the global cpu_data here? Doesn't it get
>>>>> overwritten after you load it back into 'prev_ip'
>>>>
>>>> No, the logic is same as holding the addr of previous packet.
>>>> Saving the previous packet saved ip in to prev_ip before overwriting
>>>> with the current packet.
>>>
>>> It's not exactly the same logic as holding the addr of the previous
>>> sample. For addr, we return on the first None, with your change we
>>> now "pretend" that the second one is also the previous one:
>>>
>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>> cpu_data[str(cpu) + 'addr'] = addr
>>> return <----------------------------sample 0 return
>>>
>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no return
>>>
>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>
>> Yes, it is dummy for first packet. Added anticipating that we wont hit
>> the discontinuity for the first packet itself.
>>
>> Can this be changed to more intuitive like below?
>>
>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> index d973c2baed1c..d49f5090059f 100755
>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>> cpu_data[str(cpu) + 'addr'] = addr
>> return
>>
>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>
>> if (options.verbose == True):
>> print("Event type: %s" % name)
>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>
>> # Record for previous sample packet
>> cpu_data[str(cpu) + 'addr'] = addr
>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>
>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
>> if (start_addr == 0 and stop_addr == 4):
>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
>> return
>>
>> + if (stop_addr < start_addr and prev_ip != 0):
>> + # Continuity of the Packets broken, set start_addr to
>> previous
>> + # packet ip to complete the remaining tracing of the
>> address range.
>> + start_addr = prev_ip
>> +
>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>> print("Start address 0x%x is out of range [ 0x%x ..
>> 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
>> return
>>
>> Without this patch below is the failure log(with segfault) for reference.
>>
>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 1
>> dd if=/dev/zero of=/dev/null
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>> [root@sut01sys-r214 perf]# ./perf script
>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump
>> -k ../../vmlinux -v $* > dump
>> objdump: error: the stop address should be after the start address
>> Traceback (most recent call last):
>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>> process_event
>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>> print_disam
>> for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>> ^^^^^^^^^^^^^^^^^^^^
>> File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>> raise CalledProcessError(retcode, process.args,
>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>> '--start-address=0xffff80008125b758',
>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>> non-zero exit status 1.
>> Fatal Python error: handler_call_die: problem in Python trace event
>> handler
>> Python runtime state: initialized
>>
>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>> <no Python frame>
>>
>> Extension modules: perf_trace_context, systemd._journal,
>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>> problem._py3abrt (total: 7)
>> Aborted (core dumped)
>>
>>
>> dump snippet:
>> ============
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>> ffff800080313f04: 36100094 tbz w20, #2,
>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>> ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
>> ffff800080313f0c: d63f0000 blr x0
>> perf 12720/12720 [0001] 5986.372298040
>> __perf_event_header__init_id+0x54
>> .../coresight/linux/kernel/events/core.c 586 return
>> event->clock();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff80008030cb00 <local_clock>:
>> ffff80008030cb00: d503233f paciasp
>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>> [sp, #-16]!
>> ffff80008030cb08: 910003fd mov x29, sp
>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
>> <sched_clock>
>> perf 12720/12720 [0001] 5986.372298040
>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>> return sched_clock();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff8000801bb4a8 <sched_clock>:
>> ffff8000801bb4a8: d503233f paciasp
>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>> [sp, #-32]!
>> ffff8000801bb4b0: 910003fd mov x29, sp
>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>> [sp, #16]
>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
>> <sched_clock_noinstr>
>> perf 12720/12720 [0001] 5986.372298040
>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>> sched_clock_noinstr();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff80008125a8a8 <sched_clock_noinstr>:
>> ffff80008125a8a8: d503233f paciasp
>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>> [sp, #-64]!
>> ffff80008125a8b0: 910003fd mov x29, sp
>> ffff80008125a8b4: a90153f3 stp x19, x20,
>> [sp, #16]
>> ffff80008125a8b8: b000e354 adrp x20,
>> ffff800082ec3000 <tick_bc_dev+0x140>
>> ffff80008125a8bc: 910d0294 add x20, x20, #0x340
>> ffff80008125a8c0: a90363f7 stp x23, x24,
>> [sp, #48]
>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
>> ffff80008125a8c8: 52800518 mov w24, #0x28
>> // #40
>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>> [sp, #32]
>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
>> ffff80008125a8e4: d63f0000 blr x0
>> perf 12720/12720 [0001] 5986.372298040
>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>>
>>
>> With fix:
>> =========
>>
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>> ffff800080313f04: 36100094 tbz w20, #2,
>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>> ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
>> ffff800080313f0c: d63f0000 blr x0
>> perf 12720/12720 [0001] 5986.372298040
>> __perf_event_header__init_id+0x54
>> .../coresight/linux/kernel/events/core.c 586 return
>> event->clock();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff80008030cb00 <local_clock>:
>> ffff80008030cb00: d503233f paciasp
>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>> [sp, #-16]!
>> ffff80008030cb08: 910003fd mov x29, sp
>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
>> <sched_clock>
>> perf 12720/12720 [0001] 5986.372298040
>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>> return sched_clock();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff8000801bb4a8 <sched_clock>:
>> ffff8000801bb4a8: d503233f paciasp
>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>> [sp, #-32]!
>> ffff8000801bb4b0: 910003fd mov x29, sp
>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>> [sp, #16]
>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
>> <sched_clock_noinstr>
>> perf 12720/12720 [0001] 5986.372298040
>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>> sched_clock_noinstr();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff80008125a8a8 <sched_clock_noinstr>:
>> ffff80008125a8a8: d503233f paciasp
>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>> [sp, #-64]!
>> ffff80008125a8b0: 910003fd mov x29, sp
>> ffff80008125a8b4: a90153f3 stp x19, x20,
>> [sp, #16]
>> ffff80008125a8b8: b000e354 adrp x20,
>> ffff800082ec3000 <tick_bc_dev+0x140>
>> ffff80008125a8bc: 910d0294 add x20, x20, #0x340
>> ffff80008125a8c0: a90363f7 stp x23, x24,
>> [sp, #48]
>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
>> ffff80008125a8c8: 52800518 mov w24, #0x28
>> // #40
>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>> [sp, #32]
>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
>> ffff80008125a8e4: d63f0000 blr x0
>
> It looks like the disassembly now assumes this BLR wasn't taken. We go
> from ffff80008125a8e4 straight through to ...
>
>> perf 12720/12720 [0001] 5986.372298040
>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>> ffff80008125a8e8: f8756ae3 ldr x3, [x23, x21]
>
> ffff80008125a8e4 which is just the previous one +4. Isn't your issue
> actually a decode issue in Perf itself? Why is there a discontinuity
> without branch samples being generated where either the source or
> destination address is 0?
>
> What are your record options to create this issue? As I mentioned in the
> previous reply I haven't been able to reproduce it.
I am using below perf record command.
timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>
Thanks,
Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-07-24 6:38 ` Ganapatrao Kulkarni
@ 2024-07-24 14:45 ` James Clark
2024-08-01 10:00 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-07-24 14:45 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger, james.clark, mike.leach, suzuki.poulose, Leo Yan
On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>
>
> On 23-07-2024 09:16 pm, James Clark wrote:
>>
>>
>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>
>>>
>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>
>>>>
>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>
>>>>> Hi James,
>>>>>
>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>
>>>>>>
>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>> To generate the instruction tracing, script uses 2 contiguous
>>>>>>> packets
>>>>>>> address range. If there a continuity brake due to discontiguous
>>>>>>> branch
>>>>>>> address, it is required to reset the tracing and start tracing
>>>>>>> with the
>>>>>>> new set of contiguous packets.
>>>>>>>
>>>>>>> Adding change to identify the break and complete the remaining
>>>>>>> tracing
>>>>>>> of current packets and restart tracing from new set of packets, if
>>>>>>> continuity is established.
>>>>>>>
>>>>>>
>>>>>> Hi Ganapatrao,
>>>>>>
>>>>>> Can you add a before and after example of what's changed to the
>>>>>> commit message? It wasn't immediately obvious to me if this is
>>>>>> adding missing output, or it was correcting the tail end of the
>>>>>> output that was previously wrong.
>>>>>
>>>>> It is adding tail end of the trace as well avoiding the segfault of
>>>>> the perf application. With out this change the perf segfaults with
>>>>> as below log
>>>>>
>>>>>
>>>>> ./perf script
>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>> objdump: error: the stop address should be after the start address
>>>>> Traceback (most recent call last):
>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>> process_event
>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>> print_disam
>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>> stop_addr):
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>> read_disam
>>>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>> check_output
>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>> raise CalledProcessError(retcode, process.args,
>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>> '--start-address=0xffff80008125b758',
>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>> non-zero exit status 1.
>>>>> Fatal Python error: handler_call_die: problem in Python trace event
>>>>> handler
>>>>> Python runtime state: initialized
>>>>>
>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>> <no Python frame>
>>>>>
>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>>>> problem._py3abrt (total: 7)
>>>>> Aborted (core dumped)
>>>>>
>>>>>>
>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>> ---
>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>
>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>> return
>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>> +
>>>>>>
>>>>>> Do you need to write into the global cpu_data here? Doesn't it get
>>>>>> overwritten after you load it back into 'prev_ip'
>>>>>
>>>>> No, the logic is same as holding the addr of previous packet.
>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>> overwriting with the current packet.
>>>>
>>>> It's not exactly the same logic as holding the addr of the previous
>>>> sample. For addr, we return on the first None, with your change we
>>>> now "pretend" that the second one is also the previous one:
>>>>
>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>> return <----------------------------sample 0 return
>>>>
>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no return
>>>>
>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>
>>> Yes, it is dummy for first packet. Added anticipating that we wont
>>> hit the discontinuity for the first packet itself.
>>>
>>> Can this be changed to more intuitive like below?
>>>
>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> index d973c2baed1c..d49f5090059f 100755
>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>> cpu_data[str(cpu) + 'addr'] = addr
>>> return
>>>
>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>
>>> if (options.verbose == True):
>>> print("Event type: %s" % name)
>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>
>>> # Record for previous sample packet
>>> cpu_data[str(cpu) + 'addr'] = addr
>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>
>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
>>> if (start_addr == 0 and stop_addr == 4):
>>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" %
>>> cpu)
>>> return
>>>
>>> + if (stop_addr < start_addr and prev_ip != 0):
>>> + # Continuity of the Packets broken, set start_addr to
>>> previous
>>> + # packet ip to complete the remaining tracing of the
>>> address range.
>>> + start_addr = prev_ip
>>> +
>>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>>> print("Start address 0x%x is out of range [ 0x%x ..
>>> 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
>>> return
>>>
>>> Without this patch below is the failure log(with segfault) for
>>> reference.
>>>
>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 1
>>> dd if=/dev/zero of=/dev/null
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>> [root@sut01sys-r214 perf]# ./perf script
>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump
>>> -k ../../vmlinux -v $* > dump
>>> objdump: error: the stop address should be after the start address
>>> Traceback (most recent call last):
>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>> process_event
>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>> print_disam
>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>> stop_addr):
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>> read_disam
>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>> ^^^^^^^^^^^^^^^^^^^^
>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>> raise CalledProcessError(retcode, process.args,
>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>> '--start-address=0xffff80008125b758',
>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>> non-zero exit status 1.
>>> Fatal Python error: handler_call_die: problem in Python trace event
>>> handler
>>> Python runtime state: initialized
>>>
>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>> <no Python frame>
>>>
>>> Extension modules: perf_trace_context, systemd._journal,
>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>> problem._py3abrt (total: 7)
>>> Aborted (core dumped)
>>>
>>>
>>> dump snippet:
>>> ============
>>> Event type: branches
>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>> period: 1 time: 5986372298040 }
>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>> ffff800080313f04: 36100094 tbz w20, #2,
>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>> ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
>>> ffff800080313f0c: d63f0000 blr x0
>>> perf 12720/12720 [0001] 5986.372298040
>>> __perf_event_header__init_id+0x54
>>> .../coresight/linux/kernel/events/core.c 586 return
>>> event->clock();
>>> Event type: branches
>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>> period: 1 time: 5986372298040 }
>>> ffff80008030cb00 <local_clock>:
>>> ffff80008030cb00: d503233f paciasp
>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>>> [sp, #-16]!
>>> ffff80008030cb08: 910003fd mov x29, sp
>>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
>>> <sched_clock>
>>> perf 12720/12720 [0001] 5986.372298040
>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64 return
>>> sched_clock();
>>> Event type: branches
>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>> period: 1 time: 5986372298040 }
>>> ffff8000801bb4a8 <sched_clock>:
>>> ffff8000801bb4a8: d503233f paciasp
>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>>> [sp, #-32]!
>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>>> [sp, #16]
>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
>>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
>>> <sched_clock_noinstr>
>>> perf 12720/12720 [0001] 5986.372298040
>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>>> sched_clock_noinstr();
>>> Event type: branches
>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>> period: 1 time: 5986372298040 }
>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>> ffff80008125a8a8: d503233f paciasp
>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>>> [sp, #-64]!
>>> ffff80008125a8b0: 910003fd mov x29, sp
>>> ffff80008125a8b4: a90153f3 stp x19, x20,
>>> [sp, #16]
>>> ffff80008125a8b8: b000e354 adrp x20,
>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>> ffff80008125a8bc: 910d0294 add x20, x20,
>>> #0x340
>>> ffff80008125a8c0: a90363f7 stp x23, x24,
>>> [sp, #48]
>>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
>>> ffff80008125a8c8: 52800518 mov w24, #0x28
>>> // #40
>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>>> [sp, #32]
>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
>>> ffff80008125a8e4: d63f0000 blr x0
>>> perf 12720/12720 [0001] 5986.372298040
>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>> Event type: branches
>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>> period: 1 time: 5986372298040 }
>>>
>>>
>>> With fix:
>>> =========
>>>
>>> Event type: branches
>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>> period: 1 time: 5986372298040 }
>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>> ffff800080313f04: 36100094 tbz w20, #2,
>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>> ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
>>> ffff800080313f0c: d63f0000 blr x0
>>> perf 12720/12720 [0001] 5986.372298040
>>> __perf_event_header__init_id+0x54
>>> .../coresight/linux/kernel/events/core.c 586 return
>>> event->clock();
>>> Event type: branches
>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>> period: 1 time: 5986372298040 }
>>> ffff80008030cb00 <local_clock>:
>>> ffff80008030cb00: d503233f paciasp
>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>>> [sp, #-16]!
>>> ffff80008030cb08: 910003fd mov x29, sp
>>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
>>> <sched_clock>
>>> perf 12720/12720 [0001] 5986.372298040
>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64 return
>>> sched_clock();
>>> Event type: branches
>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>> period: 1 time: 5986372298040 }
>>> ffff8000801bb4a8 <sched_clock>:
>>> ffff8000801bb4a8: d503233f paciasp
>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>>> [sp, #-32]!
>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>>> [sp, #16]
>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
>>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
>>> <sched_clock_noinstr>
>>> perf 12720/12720 [0001] 5986.372298040
>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>>> sched_clock_noinstr();
>>> Event type: branches
>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>> period: 1 time: 5986372298040 }
>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>> ffff80008125a8a8: d503233f paciasp
>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>>> [sp, #-64]!
>>> ffff80008125a8b0: 910003fd mov x29, sp
>>> ffff80008125a8b4: a90153f3 stp x19, x20,
>>> [sp, #16]
>>> ffff80008125a8b8: b000e354 adrp x20,
>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>> ffff80008125a8bc: 910d0294 add x20, x20,
>>> #0x340
>>> ffff80008125a8c0: a90363f7 stp x23, x24,
>>> [sp, #48]
>>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
>>> ffff80008125a8c8: 52800518 mov w24, #0x28
>>> // #40
>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>>> [sp, #32]
>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
>>> ffff80008125a8e4: d63f0000 blr x0
>>
>> It looks like the disassembly now assumes this BLR wasn't taken. We go
>> from ffff80008125a8e4 straight through to ...
>>
>>> perf 12720/12720 [0001] 5986.372298040
>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>> Event type: branches
>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>> period: 1 time: 5986372298040 }
>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>> ffff80008125a8e8: f8756ae3 ldr x3, [x23, x21]
>>
>> ffff80008125a8e4 which is just the previous one +4. Isn't your issue
>> actually a decode issue in Perf itself? Why is there a discontinuity
>> without branch samples being generated where either the source or
>> destination address is 0?
>>
>> What are your record options to create this issue? As I mentioned in
>> the previous reply I haven't been able to reproduce it.
>
> I am using below perf record command.
>
> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>
Thanks I managed to reproduce it. I'll take a look to see if I think the
issue is somewhere else.
>>
>
> Thanks,
> Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-07-24 14:45 ` James Clark
@ 2024-08-01 10:00 ` James Clark
2024-08-01 10:28 ` Al Grant
2024-08-05 12:22 ` Ganapatrao Kulkarni
0 siblings, 2 replies; 45+ messages in thread
From: James Clark @ 2024-08-01 10:00 UTC (permalink / raw)
To: Ganapatrao Kulkarni, Mike Leach
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger, james.clark, suzuki.poulose, Leo Yan
On 24/07/2024 3:45 pm, James Clark wrote:
>
>
> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>
>>
>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>
>>>
>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>
>>>>
>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>
>>>>>
>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>
>>>>>> Hi James,
>>>>>>
>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>> To generate the instruction tracing, script uses 2 contiguous
>>>>>>>> packets
>>>>>>>> address range. If there a continuity brake due to discontiguous
>>>>>>>> branch
>>>>>>>> address, it is required to reset the tracing and start tracing
>>>>>>>> with the
>>>>>>>> new set of contiguous packets.
>>>>>>>>
>>>>>>>> Adding change to identify the break and complete the remaining
>>>>>>>> tracing
>>>>>>>> of current packets and restart tracing from new set of packets, if
>>>>>>>> continuity is established.
>>>>>>>>
>>>>>>>
>>>>>>> Hi Ganapatrao,
>>>>>>>
>>>>>>> Can you add a before and after example of what's changed to the
>>>>>>> commit message? It wasn't immediately obvious to me if this is
>>>>>>> adding missing output, or it was correcting the tail end of the
>>>>>>> output that was previously wrong.
>>>>>>
>>>>>> It is adding tail end of the trace as well avoiding the segfault
>>>>>> of the perf application. With out this change the perf segfaults
>>>>>> with as below log
>>>>>>
>>>>>>
>>>>>> ./perf script
>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>> objdump: error: the stop address should be after the start address
>>>>>> Traceback (most recent call last):
>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>> process_event
>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>> print_disam
>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>> stop_addr):
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>> read_disam
>>>>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>> check_output
>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>> '--start-address=0xffff80008125b758',
>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>> non-zero exit status 1.
>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>> event handler
>>>>>> Python runtime state: initialized
>>>>>>
>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>> <no Python frame>
>>>>>>
>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>>>>> problem._py3abrt (total: 7)
>>>>>> Aborted (core dumped)
>>>>>>
>>>>>>>
>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>> ---
>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>> return
>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>> +
>>>>>>>
>>>>>>> Do you need to write into the global cpu_data here? Doesn't it
>>>>>>> get overwritten after you load it back into 'prev_ip'
>>>>>>
>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>> overwriting with the current packet.
>>>>>
>>>>> It's not exactly the same logic as holding the addr of the previous
>>>>> sample. For addr, we return on the first None, with your change we
>>>>> now "pretend" that the second one is also the previous one:
>>>>>
>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>> return <----------------------------sample 0 return
>>>>>
>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no
>>>>> return
>>>>>
>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>
>>>> Yes, it is dummy for first packet. Added anticipating that we wont
>>>> hit the discontinuity for the first packet itself.
>>>>
>>>> Can this be changed to more intuitive like below?
>>>>
>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> index d973c2baed1c..d49f5090059f 100755
>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>> return
>>>>
>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>
>>>> if (options.verbose == True):
>>>> print("Event type: %s" % name)
>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>
>>>> # Record for previous sample packet
>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>
>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>> stop_addr=4
>>>> if (start_addr == 0 and stop_addr == 4):
>>>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" %
>>>> cpu)
>>>> return
>>>>
>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>> + # Continuity of the Packets broken, set start_addr
>>>> to previous
>>>> + # packet ip to complete the remaining tracing of the
>>>> address range.
>>>> + start_addr = prev_ip
>>>> +
>>>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>>>> print("Start address 0x%x is out of range [ 0x%x ..
>>>> 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
>>>> return
>>>>
>>>> Without this patch below is the failure log(with segfault) for
>>>> reference.
>>>>
>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 1
>>>> dd if=/dev/zero of=/dev/null
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>> [root@sut01sys-r214 perf]# ./perf script
>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>> objdump -k ../../vmlinux -v $* > dump
>>>> objdump: error: the stop address should be after the start address
>>>> Traceback (most recent call last):
>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>> process_event
>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>> print_disam
>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>> stop_addr):
>>>>
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>> read_disam
>>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>>> ^^^^^^^^^^^^^^^^^^^^
>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>> check_output
>>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>> raise CalledProcessError(retcode, process.args,
>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>> '--start-address=0xffff80008125b758',
>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>> non-zero exit status 1.
>>>> Fatal Python error: handler_call_die: problem in Python trace event
>>>> handler
>>>> Python runtime state: initialized
>>>>
>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>> <no Python frame>
>>>>
>>>> Extension modules: perf_trace_context, systemd._journal,
>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>>> problem._py3abrt (total: 7)
>>>> Aborted (core dumped)
>>>>
>>>>
>>>> dump snippet:
>>>> ============
>>>> Event type: branches
>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>> period: 1 time: 5986372298040 }
>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>> ffff800080313f04: 36100094 tbz w20, #2,
>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>> ffff800080313f08: f941e6a0 ldr x0, [x21,
>>>> #968]
>>>> ffff800080313f0c: d63f0000 blr x0
>>>> perf 12720/12720 [0001] 5986.372298040
>>>> __perf_event_header__init_id+0x54
>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>> event->clock();
>>>> Event type: branches
>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>> period: 1 time: 5986372298040 }
>>>> ffff80008030cb00 <local_clock>:
>>>> ffff80008030cb00: d503233f paciasp
>>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>>>> [sp, #-16]!
>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
>>>> <sched_clock>
>>>> perf 12720/12720 [0001] 5986.372298040
>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64 return
>>>> sched_clock();
>>>> Event type: branches
>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>> period: 1 time: 5986372298040 }
>>>> ffff8000801bb4a8 <sched_clock>:
>>>> ffff8000801bb4a8: d503233f paciasp
>>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>>>> [sp, #-32]!
>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>>>> [sp, #16]
>>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
>>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>>>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
>>>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
>>>> <sched_clock_noinstr>
>>>> perf 12720/12720 [0001] 5986.372298040
>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>>>> sched_clock_noinstr();
>>>> Event type: branches
>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>> period: 1 time: 5986372298040 }
>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>> ffff80008125a8a8: d503233f paciasp
>>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>>>> [sp, #-64]!
>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>> ffff80008125a8b4: a90153f3 stp x19, x20,
>>>> [sp, #16]
>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>> ffff80008125a8bc: 910d0294 add x20, x20,
>>>> #0x340
>>>> ffff80008125a8c0: a90363f7 stp x23, x24,
>>>> [sp, #48]
>>>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
>>>> ffff80008125a8c8: 52800518 mov w24, #0x28
>>>> // #40
>>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>>>> [sp, #32]
>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
>>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>>>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>>>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
>>>> ffff80008125a8e4: d63f0000 blr x0
>>>> perf 12720/12720 [0001] 5986.372298040
>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>> Event type: branches
>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>> period: 1 time: 5986372298040 }
>>>>
>>>>
>>>> With fix:
>>>> =========
>>>>
>>>> Event type: branches
>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>> period: 1 time: 5986372298040 }
>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>> ffff800080313f04: 36100094 tbz w20, #2,
>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>> ffff800080313f08: f941e6a0 ldr x0, [x21,
>>>> #968]
>>>> ffff800080313f0c: d63f0000 blr x0
>>>> perf 12720/12720 [0001] 5986.372298040
>>>> __perf_event_header__init_id+0x54
>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>> event->clock();
>>>> Event type: branches
>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>> period: 1 time: 5986372298040 }
>>>> ffff80008030cb00 <local_clock>:
>>>> ffff80008030cb00: d503233f paciasp
>>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>>>> [sp, #-16]!
>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
>>>> <sched_clock>
>>>> perf 12720/12720 [0001] 5986.372298040
>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64 return
>>>> sched_clock();
>>>> Event type: branches
>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>> period: 1 time: 5986372298040 }
>>>> ffff8000801bb4a8 <sched_clock>:
>>>> ffff8000801bb4a8: d503233f paciasp
>>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>>>> [sp, #-32]!
>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>>>> [sp, #16]
>>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
>>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>>>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
>>>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
>>>> <sched_clock_noinstr>
>>>> perf 12720/12720 [0001] 5986.372298040
>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>>>> sched_clock_noinstr();
>>>> Event type: branches
>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>> period: 1 time: 5986372298040 }
>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>> ffff80008125a8a8: d503233f paciasp
>>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>>>> [sp, #-64]!
>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>> ffff80008125a8b4: a90153f3 stp x19, x20,
>>>> [sp, #16]
>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>> ffff80008125a8bc: 910d0294 add x20, x20,
>>>> #0x340
>>>> ffff80008125a8c0: a90363f7 stp x23, x24,
>>>> [sp, #48]
>>>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
>>>> ffff80008125a8c8: 52800518 mov w24, #0x28
>>>> // #40
>>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>>>> [sp, #32]
>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
>>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>>>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>>>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
>>>> ffff80008125a8e4: d63f0000 blr x0
>>>
>>> It looks like the disassembly now assumes this BLR wasn't taken. We
>>> go from ffff80008125a8e4 straight through to ...
>>>
>>>> perf 12720/12720 [0001] 5986.372298040
>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>> Event type: branches
>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>> period: 1 time: 5986372298040 }
>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>> ffff80008125a8e8: f8756ae3 ldr x3, [x23, x21]
>>>
>>> ffff80008125a8e4 which is just the previous one +4. Isn't your issue
>>> actually a decode issue in Perf itself? Why is there a discontinuity
>>> without branch samples being generated where either the source or
>>> destination address is 0?
>>>
>>> What are your record options to create this issue? As I mentioned in
>>> the previous reply I haven't been able to reproduce it.
>>
>> I am using below perf record command.
>>
>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>
>
> Thanks I managed to reproduce it. I'll take a look to see if I think the
> issue is somewhere else.
>
At least for the failures I encountered, the issue is due to the
alternatives runtime instruction patching mechanism. vmlinux ends up
being the wrong image to decode with because a load of branches are
actually turned into nops.
Can you confirm if you use --kcore instead of vmlinux that you still get
failures:
sudo perf record -e cs_etm// -C 1 --kcore -o <output-folder.data> -- \
dd if=/dev/zero of=/dev/null
perf script -i <output-folder.data> \
tools/perf/scripts/python/arm-cs-trace-disasm.py -d llvm-objdump \
-k <output-folder.data>/kcore_dir/kcore
But I still think bad decode detection should be moved as much as
possible into OpenCSD and Perf rather than this script. Otherwise every
tool will have to re-implement it, and OpenCSD has a lot more info to
make decisions with.
One change we can make is to desynchronize when an N atom is an
unconditional branch:
diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
index c557998..3eefd5d 100644
--- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
+++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
@@ -1341,6 +1341,14 @@ ocsd_err_t TrcPktDecodeEtmV4I::processAtom(const
ocsd_atm_val atom)
// save recorded next instuction address
ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
+ // must have lost sync if an unconditional branch wasn't taken
+ if (atom == ATOM_N && !m_instr_info.is_conditional) {
+ m_need_addr = true;
+ m_out_elem.addElemType(m_index_curr_pkt,
OCSD_GEN_TRC_ELEM_NO_SYNC);
+ // wait for next address
+ return OCSD_OK;
+ }
+
Another one we can spot is when a new address comes that is before the
current decode address (basically the backwards check that you added).
There are probably others that can be spotted like an address appearing
after a direct branch that doesn't match the branch target.
I think at that point, desynchronising should cause the disassembly
script to throw away the last bit, rather than force it to be printed as
in this patch. As I mentioned above in the thread, it leads to printing
disassembly that's implausible and misleading (where an unconditional
branch wasn't taken).
^ permalink raw reply related [flat|nested] 45+ messages in thread
* RE: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-01 10:00 ` James Clark
@ 2024-08-01 10:28 ` Al Grant
2024-08-01 11:26 ` James Clark
2024-08-05 12:22 ` Ganapatrao Kulkarni
1 sibling, 1 reply; 45+ messages in thread
From: Al Grant @ 2024-08-01 10:28 UTC (permalink / raw)
To: James Clark, Ganapatrao Kulkarni, Mike Leach
Cc: acme@redhat.com, coresight@lists.linaro.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, darren@os.amperecomputing.com,
scclevenger@os.amperecomputing.com, Leo Yan
> -----Original Message-----
> From: James Clark <james.clark@linaro.org>
> Sent: Thursday, August 1, 2024 11:00 AM
> To: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>; Mike Leach
> <mike.leach@linaro.org>
> Cc: acme@redhat.com; coresight@lists.linaro.org; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> darren@os.amperecomputing.com; scclevenger@os.amperecomputing.com; Leo
> Yan <Leo.Yan@arm.com>
> Subject: Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if
> address continuity is broken
>
>
>
> On 24/07/2024 3:45 pm, James Clark wrote:
> >
> >
> > On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
> >>
> >>
> >> On 23-07-2024 09:16 pm, James Clark wrote:
> >>>
> >>>
> >>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
> >>>>
> >>>>
> >>>> On 23-07-2024 06:40 pm, James Clark wrote:
> >>>>>
> >>>>>
> >>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
> >>>>>>
> >>>>>> Hi James,
> >>>>>>
> >>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
> >>>>>>>> To generate the instruction tracing, script uses 2 contiguous
> >>>>>>>> packets address range. If there a continuity brake due to
> >>>>>>>> discontiguous branch address, it is required to reset the
> >>>>>>>> tracing and start tracing with the new set of contiguous
> >>>>>>>> packets.
> >>>>>>>>
> >>>>>>>> Adding change to identify the break and complete the remaining
> >>>>>>>> tracing of current packets and restart tracing from new set of
> >>>>>>>> packets, if continuity is established.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Hi Ganapatrao,
> >>>>>>>
> >>>>>>> Can you add a before and after example of what's changed to the
> >>>>>>> commit message? It wasn't immediately obvious to me if this is
> >>>>>>> adding missing output, or it was correcting the tail end of the
> >>>>>>> output that was previously wrong.
> >>>>>>
> >>>>>> It is adding tail end of the trace as well avoiding the segfault
> >>>>>> of the perf application. With out this change the perf segfaults
> >>>>>> with as below log
> >>>>>>
> >>>>>>
> >>>>>> ./perf script
> >>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
> >>>>>> objdump -k ../../vmlinux -v $* > dump
> >>>>>> objdump: error: the stop address should be after the start address
> >>>>>> Traceback (most recent call last):
> >>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
> >>>>>> process_event
> >>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
> >>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
> >>>>>> print_disam
> >>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
> >>>>>> stop_addr):
> >>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
> >>>>>> read_disam
> >>>>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
> >>>>>> ^^^^^^^^^^^^^^^^^^^^
> >>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
> >>>>>> check_output
> >>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
> >>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ^^^^
> >>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
> >>>>>> raise CalledProcessError(retcode, process.args,
> >>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
> >>>>>> '--start-address=0xffff80008125b758',
> >>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
> >>>>>> non-zero exit status 1.
> >>>>>> Fatal Python error: handler_call_die: problem in Python trace
> >>>>>> event handler
> >>>>>> Python runtime state: initialized
> >>>>>>
> >>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
> >>>>>> <no Python frame>
> >>>>>>
> >>>>>> Extension modules: perf_trace_context, systemd._journal,
> >>>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
> >>>>>> problem._py3abrt (total: 7)
> >>>>>> Aborted (core dumped)
> >>>>>>
> >>>>>>>
> >>>>>>>> Signed-off-by: Ganapatrao Kulkarni
> >>>>>>>> <gankulkarni@os.amperecomputing.com>
> >>>>>>>> ---
> >>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
> >>>>>>>> 1 file changed, 10 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
> >>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
> >>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
> >>>>>>>> return
> >>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
> >>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
> >>>>>>>> +
> >>>>>>>
> >>>>>>> Do you need to write into the global cpu_data here? Doesn't it
> >>>>>>> get overwritten after you load it back into 'prev_ip'
> >>>>>>
> >>>>>> No, the logic is same as holding the addr of previous packet.
> >>>>>> Saving the previous packet saved ip in to prev_ip before
> >>>>>> overwriting with the current packet.
> >>>>>
> >>>>> It's not exactly the same logic as holding the addr of the previous
> >>>>> sample. For addr, we return on the first None, with your change we
> >>>>> now "pretend" that the second one is also the previous one:
> >>>>>
> >>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
> >>>>> cpu_data[str(cpu) + 'addr'] = addr
> >>>>> return <----------------------------sample 0 return
> >>>>>
> >>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
> >>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no
> >>>>> return
> >>>>>
> >>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
> >>>>
> >>>> Yes, it is dummy for first packet. Added anticipating that we wont
> >>>> hit the discontinuity for the first packet itself.
> >>>>
> >>>> Can this be changed to more intuitive like below?
> >>>>
> >>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>> index d973c2baed1c..d49f5090059f 100755
> >>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
> >>>> cpu_data[str(cpu) + 'addr'] = addr
> >>>> return
> >>>>
> >>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
> >>>> + prev_ip = cpu_data[str(cpu) + 'ip']
> >>>>
> >>>> if (options.verbose == True):
> >>>> print("Event type: %s" % name)
> >>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
> >>>>
> >>>> # Record for previous sample packet
> >>>> cpu_data[str(cpu) + 'addr'] = addr
> >>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
> >>>>
> >>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
> >>>> stop_addr=4
> >>>> if (start_addr == 0 and stop_addr == 4):
> >>>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" %
> >>>> cpu)
> >>>> return
> >>>>
> >>>> + if (stop_addr < start_addr and prev_ip != 0):
> >>>> + # Continuity of the Packets broken, set start_addr
> >>>> to previous
> >>>> + # packet ip to complete the remaining tracing of the
> >>>> address range.
> >>>> + start_addr = prev_ip
> >>>> +
> >>>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
> >>>> print("Start address 0x%x is out of range [ 0x%x ..
> >>>> 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
> >>>> return
> >>>>
> >>>> Without this patch below is the failure log(with segfault) for
> >>>> reference.
> >>>>
> >>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 1
> >>>> dd if=/dev/zero of=/dev/null
> >>>> [ perf record: Woken up 1 times to write data ]
> >>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
> >>>> [root@sut01sys-r214 perf]# ./perf script
> >>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
> >>>> objdump -k ../../vmlinux -v $* > dump
> >>>> objdump: error: the stop address should be after the start address
> >>>> Traceback (most recent call last):
> >>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
> >>>> process_event
> >>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
> >>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
> >>>> print_disam
> >>>> for line in read_disam(dso_fname, dso_start, start_addr,
> >>>> stop_addr):
> >>>>
> >>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
> >>>> read_disam
> >>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
> >>>> ^^^^^^^^^^^^^^^^^^^^
> >>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
> >>>> check_output
> >>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
> >>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ^^
> >>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
> >>>> raise CalledProcessError(retcode, process.args,
> >>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
> >>>> '--start-address=0xffff80008125b758',
> >>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
> >>>> non-zero exit status 1.
> >>>> Fatal Python error: handler_call_die: problem in Python trace event
> >>>> handler
> >>>> Python runtime state: initialized
> >>>>
> >>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
> >>>> <no Python frame>
> >>>>
> >>>> Extension modules: perf_trace_context, systemd._journal,
> >>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
> >>>> problem._py3abrt (total: 7)
> >>>> Aborted (core dumped)
> >>>>
> >>>>
> >>>> dump snippet:
> >>>> ============
> >>>> Event type: branches
> >>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
> >>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
> >>>> period: 1 time: 5986372298040 }
> >>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
> >>>> ffff800080313f04: 36100094 tbz w20, #2,
> >>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
> >>>> ffff800080313f08: f941e6a0 ldr x0, [x21,
> >>>> #968]
> >>>> ffff800080313f0c: d63f0000 blr x0
> >>>> perf 12720/12720 [0001] 5986.372298040
> >>>> __perf_event_header__init_id+0x54
> >>>> .../coresight/linux/kernel/events/core.c 586 return
> >>>> event->clock();
> >>>> Event type: branches
> >>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
> >>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
> >>>> period: 1 time: 5986372298040 }
> >>>> ffff80008030cb00 <local_clock>:
> >>>> ffff80008030cb00: d503233f paciasp
> >>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
> >>>> [sp, #-16]!
> >>>> ffff80008030cb08: 910003fd mov x29, sp
> >>>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
> >>>> <sched_clock>
> >>>> perf 12720/12720 [0001] 5986.372298040
> >>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64 return
> >>>> sched_clock();
> >>>> Event type: branches
> >>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
> >>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
> >>>> period: 1 time: 5986372298040 }
> >>>> ffff8000801bb4a8 <sched_clock>:
> >>>> ffff8000801bb4a8: d503233f paciasp
> >>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
> >>>> [sp, #-32]!
> >>>> ffff8000801bb4b0: 910003fd mov x29, sp
> >>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
> >>>> [sp, #16]
> >>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
> >>>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
> >>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
> >>>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
> >>>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
> >>>> <sched_clock_noinstr>
> >>>> perf 12720/12720 [0001] 5986.372298040
> >>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
> >>>> sched_clock_noinstr();
> >>>> Event type: branches
> >>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
> >>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
> >>>> period: 1 time: 5986372298040 }
> >>>> ffff80008125a8a8 <sched_clock_noinstr>:
> >>>> ffff80008125a8a8: d503233f paciasp
> >>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
> >>>> [sp, #-64]!
> >>>> ffff80008125a8b0: 910003fd mov x29, sp
> >>>> ffff80008125a8b4: a90153f3 stp x19, x20,
> >>>> [sp, #16]
> >>>> ffff80008125a8b8: b000e354 adrp x20,
> >>>> ffff800082ec3000 <tick_bc_dev+0x140>
> >>>> ffff80008125a8bc: 910d0294 add x20, x20,
> >>>> #0x340
> >>>> ffff80008125a8c0: a90363f7 stp x23, x24,
> >>>> [sp, #48]
> >>>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
> >>>> ffff80008125a8c8: 52800518 mov w24, #0x28
> >>>> // #40
> >>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
> >>>> [sp, #32]
> >>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
> >>>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
> >>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
> >>>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
> >>>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
> >>>> ffff80008125a8e4: d63f0000 blr x0
> >>>> perf 12720/12720 [0001] 5986.372298040
> >>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
> >>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
> >>>> Event type: branches
> >>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
> >>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
> >>>> period: 1 time: 5986372298040 }
> >>>>
> >>>>
> >>>> With fix:
> >>>> =========
> >>>>
> >>>> Event type: branches
> >>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
> >>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
> >>>> period: 1 time: 5986372298040 }
> >>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
> >>>> ffff800080313f04: 36100094 tbz w20, #2,
> >>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
> >>>> ffff800080313f08: f941e6a0 ldr x0, [x21,
> >>>> #968]
> >>>> ffff800080313f0c: d63f0000 blr x0
> >>>> perf 12720/12720 [0001] 5986.372298040
> >>>> __perf_event_header__init_id+0x54
> >>>> .../coresight/linux/kernel/events/core.c 586 return
> >>>> event->clock();
> >>>> Event type: branches
> >>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
> >>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
> >>>> period: 1 time: 5986372298040 }
> >>>> ffff80008030cb00 <local_clock>:
> >>>> ffff80008030cb00: d503233f paciasp
> >>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
> >>>> [sp, #-16]!
> >>>> ffff80008030cb08: 910003fd mov x29, sp
> >>>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
> >>>> <sched_clock>
> >>>> perf 12720/12720 [0001] 5986.372298040
> >>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64 return
> >>>> sched_clock();
> >>>> Event type: branches
> >>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
> >>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
> >>>> period: 1 time: 5986372298040 }
> >>>> ffff8000801bb4a8 <sched_clock>:
> >>>> ffff8000801bb4a8: d503233f paciasp
> >>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
> >>>> [sp, #-32]!
> >>>> ffff8000801bb4b0: 910003fd mov x29, sp
> >>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
> >>>> [sp, #16]
> >>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
> >>>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
> >>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
> >>>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
> >>>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
> >>>> <sched_clock_noinstr>
> >>>> perf 12720/12720 [0001] 5986.372298040
> >>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
> >>>> sched_clock_noinstr();
> >>>> Event type: branches
> >>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
> >>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
> >>>> period: 1 time: 5986372298040 }
> >>>> ffff80008125a8a8 <sched_clock_noinstr>:
> >>>> ffff80008125a8a8: d503233f paciasp
> >>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
> >>>> [sp, #-64]!
> >>>> ffff80008125a8b0: 910003fd mov x29, sp
> >>>> ffff80008125a8b4: a90153f3 stp x19, x20,
> >>>> [sp, #16]
> >>>> ffff80008125a8b8: b000e354 adrp x20,
> >>>> ffff800082ec3000 <tick_bc_dev+0x140>
> >>>> ffff80008125a8bc: 910d0294 add x20, x20,
> >>>> #0x340
> >>>> ffff80008125a8c0: a90363f7 stp x23, x24,
> >>>> [sp, #48]
> >>>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
> >>>> ffff80008125a8c8: 52800518 mov w24, #0x28
> >>>> // #40
> >>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
> >>>> [sp, #32]
> >>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
> >>>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
> >>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
> >>>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
> >>>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
> >>>> ffff80008125a8e4: d63f0000 blr x0
> >>>
> >>> It looks like the disassembly now assumes this BLR wasn't taken. We
> >>> go from ffff80008125a8e4 straight through to ...
> >>>
> >>>> perf 12720/12720 [0001] 5986.372298040
> >>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
> >>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
> >>>> Event type: branches
> >>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
> >>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
> >>>> period: 1 time: 5986372298040 }
> >>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
> >>>> ffff80008125a8e8: f8756ae3 ldr x3, [x23, x21]
> >>>
> >>> ffff80008125a8e4 which is just the previous one +4. Isn't your issue
> >>> actually a decode issue in Perf itself? Why is there a discontinuity
> >>> without branch samples being generated where either the source or
> >>> destination address is 0?
> >>>
> >>> What are your record options to create this issue? As I mentioned in
> >>> the previous reply I haven't been able to reproduce it.
> >>
> >> I am using below perf record command.
> >>
> >> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
> >>
> >
> > Thanks I managed to reproduce it. I'll take a look to see if I think the
> > issue is somewhere else.
> >
>
> At least for the failures I encountered, the issue is due to the
> alternatives runtime instruction patching mechanism. vmlinux ends up
> being the wrong image to decode with because a load of branches are
> actually turned into nops.
>
> Can you confirm if you use --kcore instead of vmlinux that you still get
> failures:
>
> sudo perf record -e cs_etm// -C 1 --kcore -o <output-folder.data> -- \
> dd if=/dev/zero of=/dev/null
>
> perf script -i <output-folder.data> \
> tools/perf/scripts/python/arm-cs-trace-disasm.py -d llvm-objdump \
> -k <output-folder.data>/kcore_dir/kcore
>
> But I still think bad decode detection should be moved as much as
> possible into OpenCSD and Perf rather than this script. Otherwise every
> tool will have to re-implement it, and OpenCSD has a lot more info to
> make decisions with.
>
> One change we can make is to desynchronize when an N atom is an
> unconditional branch:
There's a CPU hardware erratum affecting multiple CPU types and
generations (including Neoverse N1 and V1), where a branch to the
next instruction will be traced as an N atom regardless of whether it's
unconditional, taken conditional, indirect etc. This was detected by
a similar check in one of our other ETM decoders and we root-caused
it to incorrect ETM implementation.
The safe check for current silicon is that it's an unconditional branch
that is direct and whose target is not the next instruction.
You can't infer that an N atom on an unconditional indirect branch is
a synchronization error, since it may have actually branched to the
next instruction, e.g. in a switch-like construction.
Maybe OpenCSD could make the stricter check (as written below)
configurable so you could enable it if you knew for sure that the trace
wasn't affected by this erratum, but that's not a safe default.
Al
>
> diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> index c557998..3eefd5d 100644
> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> @@ -1341,6 +1341,14 @@ ocsd_err_t TrcPktDecodeEtmV4I::processAtom(const
> ocsd_atm_val atom)
> // save recorded next instuction address
> ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
>
> + // must have lost sync if an unconditional branch wasn't taken
> + if (atom == ATOM_N && !m_instr_info.is_conditional) {
> + m_need_addr = true;
> + m_out_elem.addElemType(m_index_curr_pkt,
> OCSD_GEN_TRC_ELEM_NO_SYNC);
> + // wait for next address
> + return OCSD_OK;
> + }
> +
>
> Another one we can spot is when a new address comes that is before the
> current decode address (basically the backwards check that you added).
>
> There are probably others that can be spotted like an address appearing
> after a direct branch that doesn't match the branch target.
>
> I think at that point, desynchronising should cause the disassembly
> script to throw away the last bit, rather than force it to be printed as
> in this patch. As I mentioned above in the thread, it leads to printing
> disassembly that's implausible and misleading (where an unconditional
> branch wasn't taken).
> _______________________________________________
> CoreSight mailing list -- coresight@lists.linaro.org
> To unsubscribe send an email to coresight-leave@lists.linaro.org
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-01 10:28 ` Al Grant
@ 2024-08-01 11:26 ` James Clark
2024-08-01 11:58 ` Al Grant
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-08-01 11:26 UTC (permalink / raw)
To: Al Grant, Ganapatrao Kulkarni, Mike Leach
Cc: acme@redhat.com, coresight@lists.linaro.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, darren@os.amperecomputing.com,
scclevenger@os.amperecomputing.com, Leo Yan
On 01/08/2024 11:28 am, Al Grant wrote:
>
>
>> -----Original Message-----
>> From: James Clark <james.clark@linaro.org>
>> Sent: Thursday, August 1, 2024 11:00 AM
>> To: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>; Mike Leach
>> <mike.leach@linaro.org>
>> Cc: acme@redhat.com; coresight@lists.linaro.org; linux-arm-
>> kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
>> darren@os.amperecomputing.com; scclevenger@os.amperecomputing.com; Leo
>> Yan <Leo.Yan@arm.com>
>> Subject: Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if
>> address continuity is broken
>>
>>
>>
>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>
>>>
>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>
>>>>
>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>
>>>>>
>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>
>>>>>>
>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>
>>>>>>>> Hi James,
>>>>>>>>
>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>> To generate the instruction tracing, script uses 2 contiguous
>>>>>>>>>> packets address range. If there a continuity brake due to
>>>>>>>>>> discontiguous branch address, it is required to reset the
>>>>>>>>>> tracing and start tracing with the new set of contiguous
>>>>>>>>>> packets.
>>>>>>>>>>
>>>>>>>>>> Adding change to identify the break and complete the remaining
>>>>>>>>>> tracing of current packets and restart tracing from new set of
>>>>>>>>>> packets, if continuity is established.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>
>>>>>>>>> Can you add a before and after example of what's changed to the
>>>>>>>>> commit message? It wasn't immediately obvious to me if this is
>>>>>>>>> adding missing output, or it was correcting the tail end of the
>>>>>>>>> output that was previously wrong.
>>>>>>>>
>>>>>>>> It is adding tail end of the trace as well avoiding the segfault
>>>>>>>> of the perf application. With out this change the perf segfaults
>>>>>>>> with as below log
>>>>>>>>
>>>>>>>>
>>>>>>>> ./perf script
>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>> objdump: error: the stop address should be after the start address
>>>>>>>> Traceback (most recent call last):
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>>>> process_event
>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>>>> print_disam
>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>> stop_addr):
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>> read_disam
>>>>>>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>> check_output
>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> ^^^^
>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>>>> '--start-address=0xffff80008125b758',
>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>>>> non-zero exit status 1.
>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>> event handler
>>>>>>>> Python runtime state: initialized
>>>>>>>>
>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>> <no Python frame>
>>>>>>>>
>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>>>>>>> problem._py3abrt (total: 7)
>>>>>>>> Aborted (core dumped)
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>> ---
>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>> return
>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't it
>>>>>>>>> get overwritten after you load it back into 'prev_ip'
>>>>>>>>
>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>> overwriting with the current packet.
>>>>>>>
>>>>>>> It's not exactly the same logic as holding the addr of the previous
>>>>>>> sample. For addr, we return on the first None, with your change we
>>>>>>> now "pretend" that the second one is also the previous one:
>>>>>>>
>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>> return <----------------------------sample 0 return
>>>>>>>
>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no
>>>>>>> return
>>>>>>>
>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>
>>>>>> Yes, it is dummy for first packet. Added anticipating that we wont
>>>>>> hit the discontinuity for the first packet itself.
>>>>>>
>>>>>> Can this be changed to more intuitive like below?
>>>>>>
>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>> return
>>>>>>
>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>
>>>>>> if (options.verbose == True):
>>>>>> print("Event type: %s" % name)
>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>
>>>>>> # Record for previous sample packet
>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>
>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>> stop_addr=4
>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" %
>>>>>> cpu)
>>>>>> return
>>>>>>
>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>> + # Continuity of the Packets broken, set start_addr
>>>>>> to previous
>>>>>> + # packet ip to complete the remaining tracing of the
>>>>>> address range.
>>>>>> + start_addr = prev_ip
>>>>>> +
>>>>>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>>>>>> print("Start address 0x%x is out of range [ 0x%x ..
>>>>>> 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
>>>>>> return
>>>>>>
>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>> reference.
>>>>>>
>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 1
>>>>>> dd if=/dev/zero of=/dev/null
>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>> objdump: error: the stop address should be after the start address
>>>>>> Traceback (most recent call last):
>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>> process_event
>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>> print_disam
>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>> stop_addr):
>>>>>>
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>> read_disam
>>>>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>> check_output
>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> ^^
>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>> '--start-address=0xffff80008125b758',
>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>> non-zero exit status 1.
>>>>>> Fatal Python error: handler_call_die: problem in Python trace event
>>>>>> handler
>>>>>> Python runtime state: initialized
>>>>>>
>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>> <no Python frame>
>>>>>>
>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>>>>> problem._py3abrt (total: 7)
>>>>>> Aborted (core dumped)
>>>>>>
>>>>>>
>>>>>> dump snippet:
>>>>>> ============
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>> ffff800080313f04: 36100094 tbz w20, #2,
>>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>> ffff800080313f08: f941e6a0 ldr x0, [x21,
>>>>>> #968]
>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> __perf_event_header__init_id+0x54
>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>> event->clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff80008030cb00 <local_clock>:
>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>>>>>> [sp, #-16]!
>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
>>>>>> <sched_clock>
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64 return
>>>>>> sched_clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>>>>>> [sp, #-32]!
>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>>>>>> [sp, #16]
>>>>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>>>>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
>>>>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>>>>>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
>>>>>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
>>>>>> <sched_clock_noinstr>
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>>>>>> sched_clock_noinstr();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>>>>>> [sp, #-64]!
>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>> ffff80008125a8b4: a90153f3 stp x19, x20,
>>>>>> [sp, #16]
>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>> ffff80008125a8bc: 910d0294 add x20, x20,
>>>>>> #0x340
>>>>>> ffff80008125a8c0: a90363f7 stp x23, x24,
>>>>>> [sp, #48]
>>>>>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
>>>>>> ffff80008125a8c8: 52800518 mov w24, #0x28
>>>>>> // #40
>>>>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>>>>>> [sp, #32]
>>>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>>>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>>>>>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>>>>>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>>
>>>>>>
>>>>>> With fix:
>>>>>> =========
>>>>>>
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>> ffff800080313f04: 36100094 tbz w20, #2,
>>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>> ffff800080313f08: f941e6a0 ldr x0, [x21,
>>>>>> #968]
>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> __perf_event_header__init_id+0x54
>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>> event->clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff80008030cb00 <local_clock>:
>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>>>>>> [sp, #-16]!
>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
>>>>>> <sched_clock>
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64 return
>>>>>> sched_clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>>>>>> [sp, #-32]!
>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>>>>>> [sp, #16]
>>>>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>>>>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
>>>>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>>>>>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
>>>>>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
>>>>>> <sched_clock_noinstr>
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>>>>>> sched_clock_noinstr();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>>>>>> [sp, #-64]!
>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>> ffff80008125a8b4: a90153f3 stp x19, x20,
>>>>>> [sp, #16]
>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>> ffff80008125a8bc: 910d0294 add x20, x20,
>>>>>> #0x340
>>>>>> ffff80008125a8c0: a90363f7 stp x23, x24,
>>>>>> [sp, #48]
>>>>>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
>>>>>> ffff80008125a8c8: 52800518 mov w24, #0x28
>>>>>> // #40
>>>>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>>>>>> [sp, #32]
>>>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>>>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>>>>>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>>>>>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>
>>>>> It looks like the disassembly now assumes this BLR wasn't taken. We
>>>>> go from ffff80008125a8e4 straight through to ...
>>>>>
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>> ffff80008125a8e8: f8756ae3 ldr x3, [x23, x21]
>>>>>
>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your issue
>>>>> actually a decode issue in Perf itself? Why is there a discontinuity
>>>>> without branch samples being generated where either the source or
>>>>> destination address is 0?
>>>>>
>>>>> What are your record options to create this issue? As I mentioned in
>>>>> the previous reply I haven't been able to reproduce it.
>>>>
>>>> I am using below perf record command.
>>>>
>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>
>>>
>>> Thanks I managed to reproduce it. I'll take a look to see if I think the
>>> issue is somewhere else.
>>>
>>
>> At least for the failures I encountered, the issue is due to the
>> alternatives runtime instruction patching mechanism. vmlinux ends up
>> being the wrong image to decode with because a load of branches are
>> actually turned into nops.
>>
>> Can you confirm if you use --kcore instead of vmlinux that you still get
>> failures:
>>
>> sudo perf record -e cs_etm// -C 1 --kcore -o <output-folder.data> -- \
>> dd if=/dev/zero of=/dev/null
>>
>> perf script -i <output-folder.data> \
>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d llvm-objdump \
>> -k <output-folder.data>/kcore_dir/kcore
>>
>> But I still think bad decode detection should be moved as much as
>> possible into OpenCSD and Perf rather than this script. Otherwise every
>> tool will have to re-implement it, and OpenCSD has a lot more info to
>> make decisions with.
>>
>> One change we can make is to desynchronize when an N atom is an
>> unconditional branch:
>
> There's a CPU hardware erratum affecting multiple CPU types and
> generations (including Neoverse N1 and V1), where a branch to the
> next instruction will be traced as an N atom regardless of whether it's
> unconditional, taken conditional, indirect etc. This was detected by
> a similar check in one of our other ETM decoders and we root-caused
> it to incorrect ETM implementation.
>
> The safe check for current silicon is that it's an unconditional branch
> that is direct and whose target is not the next instruction.
>
> You can't infer that an N atom on an unconditional indirect branch is
> a synchronization error, since it may have actually branched to the
> next instruction, e.g. in a switch-like construction.
>
> Maybe OpenCSD could make the stricter check (as written below)
> configurable so you could enable it if you knew for sure that the trace
> wasn't affected by this erratum, but that's not a safe default.
>
> Al
>
>
That's good to know. In that case it would be better to exclude branches
to the next instruction from the check then rather than make it
configurable. At least for direct branches, that way it "just works".
Indirect looks a bit more complicated because you have to wait for the
address, but I'm sure it can be done one way or another.
>>
>> diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> index c557998..3eefd5d 100644
>> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> @@ -1341,6 +1341,14 @@ ocsd_err_t TrcPktDecodeEtmV4I::processAtom(const
>> ocsd_atm_val atom)
>> // save recorded next instuction address
>> ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
>>
>> + // must have lost sync if an unconditional branch wasn't taken
>> + if (atom == ATOM_N && !m_instr_info.is_conditional) {
>> + m_need_addr = true;
>> + m_out_elem.addElemType(m_index_curr_pkt,
>> OCSD_GEN_TRC_ELEM_NO_SYNC);
>> + // wait for next address
>> + return OCSD_OK;
>> + }
>> +
>>
>> Another one we can spot is when a new address comes that is before the
>> current decode address (basically the backwards check that you added).
>>
>> There are probably others that can be spotted like an address appearing
>> after a direct branch that doesn't match the branch target.
>>
>> I think at that point, desynchronising should cause the disassembly
>> script to throw away the last bit, rather than force it to be printed as
>> in this patch. As I mentioned above in the thread, it leads to printing
>> disassembly that's implausible and misleading (where an unconditional
>> branch wasn't taken).
>> _______________________________________________
>> CoreSight mailing list -- coresight@lists.linaro.org
>> To unsubscribe send an email to coresight-leave@lists.linaro.org
^ permalink raw reply [flat|nested] 45+ messages in thread
* RE: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-01 11:26 ` James Clark
@ 2024-08-01 11:58 ` Al Grant
2024-08-01 14:58 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: Al Grant @ 2024-08-01 11:58 UTC (permalink / raw)
To: James Clark, Ganapatrao Kulkarni, Mike Leach
Cc: acme@redhat.com, coresight@lists.linaro.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, darren@os.amperecomputing.com,
scclevenger@os.amperecomputing.com, Leo Yan
> -----Original Message-----
> From: James Clark <james.clark@linaro.org>
> Sent: Thursday, August 1, 2024 12:26 PM
> To: Al Grant <Al.Grant@arm.com>; Ganapatrao Kulkarni
> <gankulkarni@os.amperecomputing.com>; Mike Leach <mike.leach@linaro.org>
> Cc: acme@redhat.com; coresight@lists.linaro.org; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> darren@os.amperecomputing.com; scclevenger@os.amperecomputing.com; Leo
> Yan <Leo.Yan@arm.com>
> Subject: Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if
> address continuity is broken
>
>
>
> On 01/08/2024 11:28 am, Al Grant wrote:
> >
> >
> >> -----Original Message-----
> >> From: James Clark <james.clark@linaro.org>
> >> Sent: Thursday, August 1, 2024 11:00 AM
> >> To: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>; Mike
> >> Leach <mike.leach@linaro.org>
> >> Cc: acme@redhat.com; coresight@lists.linaro.org; linux-arm-
> >> kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> >> darren@os.amperecomputing.com; scclevenger@os.amperecomputing.com;
> >> Leo Yan <Leo.Yan@arm.com>
> >> Subject: Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip
> >> disasm if address continuity is broken
> >>
> >>
> >>
> >> On 24/07/2024 3:45 pm, James Clark wrote:
> >>>
> >>>
> >>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
> >>>>
> >>>>
> >>>> On 23-07-2024 09:16 pm, James Clark wrote:
> >>>>>
> >>>>>
> >>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
> >>>>>>>>
> >>>>>>>> Hi James,
> >>>>>>>>
> >>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
> >>>>>>>>>> To generate the instruction tracing, script uses 2 contiguous
> >>>>>>>>>> packets address range. If there a continuity brake due to
> >>>>>>>>>> discontiguous branch address, it is required to reset the
> >>>>>>>>>> tracing and start tracing with the new set of contiguous
> >>>>>>>>>> packets.
> >>>>>>>>>>
> >>>>>>>>>> Adding change to identify the break and complete the
> >>>>>>>>>> remaining tracing of current packets and restart tracing from
> >>>>>>>>>> new set of packets, if continuity is established.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hi Ganapatrao,
> >>>>>>>>>
> >>>>>>>>> Can you add a before and after example of what's changed to
> >>>>>>>>> the commit message? It wasn't immediately obvious to me if
> >>>>>>>>> this is adding missing output, or it was correcting the tail
> >>>>>>>>> end of the output that was previously wrong.
> >>>>>>>>
> >>>>>>>> It is adding tail end of the trace as well avoiding the
> >>>>>>>> segfault of the perf application. With out this change the perf
> >>>>>>>> segfaults with as below log
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ./perf script
> >>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
> >>>>>>>> objdump -k ../../vmlinux -v $* > dump
> >>>>>>>> objdump: error: the stop address should be after the start
> >>>>>>>> address Traceback (most recent call last):
> >>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
> >>>>>>>> in process_event
> >>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
> >>>>>>>> stop_addr)
> >>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
> >>>>>>>> in print_disam
> >>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
> >>>>>>>> stop_addr):
> >>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
> >>>>>>>> read_disam
> >>>>>>>> disasm_output =
> >>>>>>>> check_output(disasm).decode('utf-8').split('\n')
> >>>>>>>> ^^^^^^^^^^^^^^^^^^^^
> >>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
> >>>>>>>> check_output
> >>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
> >>>>>>>> check=True,
> >>>>>>>>
> >>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> ^^^^
> >>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in
> >>>>>>>> run
> >>>>>>>> raise CalledProcessError(retcode, process.args,
> >>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
> >>>>>>>> '--start-address=0xffff80008125b758',
> >>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
> >>>>>>>> non-zero exit status 1.
> >>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
> >>>>>>>> event handler Python runtime state: initialized
> >>>>>>>>
> >>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
> >>>>>>>> <no Python frame>
> >>>>>>>>
> >>>>>>>> Extension modules: perf_trace_context, systemd._journal,
> >>>>>>>> systemd._reader, systemd.id128, report._py3report,
> >>>>>>>> _dbus_bindings, problem._py3abrt (total: 7) Aborted (core
> >>>>>>>> dumped)
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
> >>>>>>>>>> <gankulkarni@os.amperecomputing.com>
> >>>>>>>>>> ---
> >>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
> >>>>>>>>>> ++++++++++
> >>>>>>>>>> 1 file changed, 10 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
> >>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
> >>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
> >>>>>>>>>> return
> >>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
> >>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
> >>>>>>>>>> +
> >>>>>>>>>
> >>>>>>>>> Do you need to write into the global cpu_data here? Doesn't it
> >>>>>>>>> get overwritten after you load it back into 'prev_ip'
> >>>>>>>>
> >>>>>>>> No, the logic is same as holding the addr of previous packet.
> >>>>>>>> Saving the previous packet saved ip in to prev_ip before
> >>>>>>>> overwriting with the current packet.
> >>>>>>>
> >>>>>>> It's not exactly the same logic as holding the addr of the
> >>>>>>> previous sample. For addr, we return on the first None, with
> >>>>>>> your change we now "pretend" that the second one is also the previous
> one:
> >>>>>>>
> >>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
> >>>>>>> cpu_data[str(cpu) + 'addr'] = addr
> >>>>>>> return <----------------------------sample 0 return
> >>>>>>>
> >>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
> >>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but
> >>>>>>> no return
> >>>>>>>
> >>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
> >>>>>>
> >>>>>> Yes, it is dummy for first packet. Added anticipating that we
> >>>>>> wont hit the discontinuity for the first packet itself.
> >>>>>>
> >>>>>> Can this be changed to more intuitive like below?
> >>>>>>
> >>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>> index d973c2baed1c..d49f5090059f 100755
> >>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
> >>>>>> cpu_data[str(cpu) + 'addr'] = addr
> >>>>>> return
> >>>>>>
> >>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
> >>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
> >>>>>>
> >>>>>> if (options.verbose == True):
> >>>>>> print("Event type: %s" % name) @@ -243,12
> >>>>>> +245,18 @@ def process_event(param_dict):
> >>>>>>
> >>>>>> # Record for previous sample packet
> >>>>>> cpu_data[str(cpu) + 'addr'] = addr
> >>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
> >>>>>>
> >>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
> >>>>>> stop_addr=4
> >>>>>> if (start_addr == 0 and stop_addr == 4):
> >>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
> >>>>>> inserted" %
> >>>>>> cpu)
> >>>>>> return
> >>>>>>
> >>>>>> + if (stop_addr < start_addr and prev_ip != 0):
> >>>>>> + # Continuity of the Packets broken, set
> >>>>>> +start_addr
> >>>>>> to previous
> >>>>>> + # packet ip to complete the remaining tracing of
> >>>>>> +the
> >>>>>> address range.
> >>>>>> + start_addr = prev_ip
> >>>>>> +
> >>>>>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
> >>>>>> print("Start address 0x%x is out of range [ 0x%x ..
> >>>>>> 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end),
> >>>>>> dso))
> >>>>>> return
> >>>>>>
> >>>>>> Without this patch below is the failure log(with segfault) for
> >>>>>> reference.
> >>>>>>
> >>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm//
> >>>>>> -C 1 dd if=/dev/zero of=/dev/null [ perf record: Woken up 1 times
> >>>>>> to write data ] [ perf record: Captured and wrote 1.087 MB
> >>>>>> perf.data ]
> >>>>>> [root@sut01sys-r214 perf]# ./perf script
> >>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
> >>>>>> objdump -k ../../vmlinux -v $* > dump
> >>>>>> objdump: error: the stop address should be after the start
> >>>>>> address Traceback (most recent call last):
> >>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
> >>>>>> process_event
> >>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
> >>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
> >>>>>> print_disam
> >>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
> >>>>>> stop_addr):
> >>>>>>
> >>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
> >>>>>> read_disam
> >>>>>> disasm_output =
> >>>>>> check_output(disasm).decode('utf-8').split('\n')
> >>>>>> ^^^^^^^^^^^^^^^^^^^^
> >>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
> >>>>>> check_output
> >>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
> >>>>>> check=True,
> >>>>>>
> >>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> ^^
> >>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
> >>>>>> raise CalledProcessError(retcode, process.args,
> >>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
> >>>>>> '--start-address=0xffff80008125b758',
> >>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
> >>>>>> non-zero exit status 1.
> >>>>>> Fatal Python error: handler_call_die: problem in Python trace
> >>>>>> event handler Python runtime state: initialized
> >>>>>>
> >>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
> >>>>>> <no Python frame>
> >>>>>>
> >>>>>> Extension modules: perf_trace_context, systemd._journal,
> >>>>>> systemd._reader, systemd.id128, report._py3report,
> >>>>>> _dbus_bindings, problem._py3abrt (total: 7) Aborted (core dumped)
> >>>>>>
> >>>>>>
> >>>>>> dump snippet:
> >>>>>> ============
> >>>>>> Event type: branches
> >>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
> >>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
> >>>>>> period: 1 time: 5986372298040 }
> >>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
> >>>>>> ffff800080313f04: 36100094 tbz w20,
> >>>>>> #2,
> >>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
> >>>>>> ffff800080313f08: f941e6a0 ldr x0,
> >>>>>> [x21, #968]
> >>>>>> ffff800080313f0c: d63f0000 blr x0
> >>>>>> perf 12720/12720 [0001] 5986.372298040
> >>>>>> __perf_event_header__init_id+0x54
> >>>>>> .../coresight/linux/kernel/events/core.c 586 return
> >>>>>> event->clock();
> >>>>>> Event type: branches
> >>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
> >>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
> >>>>>> period: 1 time: 5986372298040 }
> >>>>>> ffff80008030cb00 <local_clock>:
> >>>>>> ffff80008030cb00: d503233f paciasp
> >>>>>> ffff80008030cb04: a9bf7bfd stp x29,
> >>>>>> x30, [sp, #-16]!
> >>>>>> ffff80008030cb08: 910003fd mov x29, sp
> >>>>>> ffff80008030cb0c: 97faba67 bl
> >>>>>> ffff8000801bb4a8 <sched_clock>
> >>>>>> perf 12720/12720 [0001] 5986.372298040
> >>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
> >>>>>> return sched_clock(); Event type: branches Sample = { cpu: 0001
> >>>>>> addr: 0xffff80008125a8a8 phys_addr:
> >>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
> >>>>>> period: 1 time: 5986372298040 }
> >>>>>> ffff8000801bb4a8 <sched_clock>:
> >>>>>> ffff8000801bb4a8: d503233f paciasp
> >>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
> >>>>>> x30, [sp, #-32]!
> >>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
> >>>>>> ffff8000801bb4b4: a90153f3 stp x19,
> >>>>>> x20, [sp, #16]
> >>>>>> ffff8000801bb4b8: d5384113 mrs x19,
> >>>>>> sp_el0
> >>>>>> ffff8000801bb4bc: b9401260 ldr w0,
> >>>>>> [x19, #16]
> >>>>>> ffff8000801bb4c0: 11000400 add w0, w0,
> >>>>>> #0x1
> >>>>>> ffff8000801bb4c4: b9001260 str w0,
> >>>>>> [x19, #16]
> >>>>>> ffff8000801bb4c8: 94427cf8 bl
> >>>>>> ffff80008125a8a8 <sched_clock_noinstr>
> >>>>>> perf 12720/12720 [0001] 5986.372298040
> >>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns
> >>>>>> = sched_clock_noinstr(); Event type: branches Sample = { cpu:
> >>>>>> 0001 addr: 0xffff80008125b758 phys_addr:
> >>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
> >>>>>> period: 1 time: 5986372298040 }
> >>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
> >>>>>> ffff80008125a8a8: d503233f paciasp
> >>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
> >>>>>> x30, [sp, #-64]!
> >>>>>> ffff80008125a8b0: 910003fd mov x29, sp
> >>>>>> ffff80008125a8b4: a90153f3 stp x19,
> >>>>>> x20, [sp, #16]
> >>>>>> ffff80008125a8b8: b000e354 adrp x20,
> >>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
> >>>>>> ffff80008125a8bc: 910d0294 add x20,
> >>>>>> x20,
> >>>>>> #0x340
> >>>>>> ffff80008125a8c0: a90363f7 stp x23,
> >>>>>> x24, [sp, #48]
> >>>>>> ffff80008125a8c4: 91002297 add x23,
> >>>>>> x20, #0x8
> >>>>>> ffff80008125a8c8: 52800518 mov w24,
> >>>>>> #0x28
> >>>>>> // #40
> >>>>>> ffff80008125a8cc: a9025bf5 stp x21,
> >>>>>> x22, [sp, #32]
> >>>>>> ffff80008125a8d0: b9400296 ldr w22,
> >>>>>> [x20]
> >>>>>> ffff80008125a8d4: 120002d5 and w21,
> >>>>>> w22, #0x1
> >>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
> >>>>>> w21, w24
> >>>>>> ffff80008125a8dc: 8b1502f3 add x19,
> >>>>>> x23, x21
> >>>>>> ffff80008125a8e0: f9400e60 ldr x0,
> >>>>>> [x19, #24]
> >>>>>> ffff80008125a8e4: d63f0000 blr x0
> >>>>>> perf 12720/12720 [0001] 5986.372298040
> >>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
> >>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc)
> >>>>>> & Event type: branches Sample = { cpu: 0001 addr:
> >>>>>> 0xffff8000801bb4cc phys_addr:
> >>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
> >>>>>> period: 1 time: 5986372298040 }
> >>>>>>
> >>>>>>
> >>>>>> With fix:
> >>>>>> =========
> >>>>>>
> >>>>>> Event type: branches
> >>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
> >>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
> >>>>>> period: 1 time: 5986372298040 }
> >>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
> >>>>>> ffff800080313f04: 36100094 tbz w20,
> >>>>>> #2,
> >>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
> >>>>>> ffff800080313f08: f941e6a0 ldr x0,
> >>>>>> [x21, #968]
> >>>>>> ffff800080313f0c: d63f0000 blr x0
> >>>>>> perf 12720/12720 [0001] 5986.372298040
> >>>>>> __perf_event_header__init_id+0x54
> >>>>>> .../coresight/linux/kernel/events/core.c 586 return
> >>>>>> event->clock();
> >>>>>> Event type: branches
> >>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
> >>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
> >>>>>> period: 1 time: 5986372298040 }
> >>>>>> ffff80008030cb00 <local_clock>:
> >>>>>> ffff80008030cb00: d503233f paciasp
> >>>>>> ffff80008030cb04: a9bf7bfd stp x29,
> >>>>>> x30, [sp, #-16]!
> >>>>>> ffff80008030cb08: 910003fd mov x29, sp
> >>>>>> ffff80008030cb0c: 97faba67 bl
> >>>>>> ffff8000801bb4a8 <sched_clock>
> >>>>>> perf 12720/12720 [0001] 5986.372298040
> >>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
> >>>>>> return sched_clock(); Event type: branches Sample = { cpu: 0001
> >>>>>> addr: 0xffff80008125a8a8 phys_addr:
> >>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
> >>>>>> period: 1 time: 5986372298040 }
> >>>>>> ffff8000801bb4a8 <sched_clock>:
> >>>>>> ffff8000801bb4a8: d503233f paciasp
> >>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
> >>>>>> x30, [sp, #-32]!
> >>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
> >>>>>> ffff8000801bb4b4: a90153f3 stp x19,
> >>>>>> x20, [sp, #16]
> >>>>>> ffff8000801bb4b8: d5384113 mrs x19,
> >>>>>> sp_el0
> >>>>>> ffff8000801bb4bc: b9401260 ldr w0,
> >>>>>> [x19, #16]
> >>>>>> ffff8000801bb4c0: 11000400 add w0, w0,
> >>>>>> #0x1
> >>>>>> ffff8000801bb4c4: b9001260 str w0,
> >>>>>> [x19, #16]
> >>>>>> ffff8000801bb4c8: 94427cf8 bl
> >>>>>> ffff80008125a8a8 <sched_clock_noinstr>
> >>>>>> perf 12720/12720 [0001] 5986.372298040
> >>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns
> >>>>>> = sched_clock_noinstr(); Event type: branches Sample = { cpu:
> >>>>>> 0001 addr: 0xffff80008125b758 phys_addr:
> >>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
> >>>>>> period: 1 time: 5986372298040 }
> >>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
> >>>>>> ffff80008125a8a8: d503233f paciasp
> >>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
> >>>>>> x30, [sp, #-64]!
> >>>>>> ffff80008125a8b0: 910003fd mov x29, sp
> >>>>>> ffff80008125a8b4: a90153f3 stp x19,
> >>>>>> x20, [sp, #16]
> >>>>>> ffff80008125a8b8: b000e354 adrp x20,
> >>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
> >>>>>> ffff80008125a8bc: 910d0294 add x20,
> >>>>>> x20,
> >>>>>> #0x340
> >>>>>> ffff80008125a8c0: a90363f7 stp x23,
> >>>>>> x24, [sp, #48]
> >>>>>> ffff80008125a8c4: 91002297 add x23,
> >>>>>> x20, #0x8
> >>>>>> ffff80008125a8c8: 52800518 mov w24,
> >>>>>> #0x28
> >>>>>> // #40
> >>>>>> ffff80008125a8cc: a9025bf5 stp x21,
> >>>>>> x22, [sp, #32]
> >>>>>> ffff80008125a8d0: b9400296 ldr w22,
> >>>>>> [x20]
> >>>>>> ffff80008125a8d4: 120002d5 and w21,
> >>>>>> w22, #0x1
> >>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
> >>>>>> w21, w24
> >>>>>> ffff80008125a8dc: 8b1502f3 add x19,
> >>>>>> x23, x21
> >>>>>> ffff80008125a8e0: f9400e60 ldr x0,
> >>>>>> [x19, #24]
> >>>>>> ffff80008125a8e4: d63f0000 blr x0
> >>>>>
> >>>>> It looks like the disassembly now assumes this BLR wasn't taken.
> >>>>> We go from ffff80008125a8e4 straight through to ...
> >>>>>
> >>>>>> perf 12720/12720 [0001] 5986.372298040
> >>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
> >>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc)
> >>>>>> & Event type: branches Sample = { cpu: 0001 addr:
> >>>>>> 0xffff8000801bb4cc phys_addr:
> >>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
> >>>>>> period: 1 time: 5986372298040 }
> >>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
> >>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
> >>>>>> [x23, x21]
> >>>>>
> >>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
> >>>>> issue actually a decode issue in Perf itself? Why is there a
> >>>>> discontinuity without branch samples being generated where either
> >>>>> the source or destination address is 0?
> >>>>>
> >>>>> What are your record options to create this issue? As I mentioned
> >>>>> in the previous reply I haven't been able to reproduce it.
> >>>>
> >>>> I am using below perf record command.
> >>>>
> >>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
> >>>> of=/dev/null
> >>>>
> >>>
> >>> Thanks I managed to reproduce it. I'll take a look to see if I think
> >>> the issue is somewhere else.
> >>>
> >>
> >> At least for the failures I encountered, the issue is due to the
> >> alternatives runtime instruction patching mechanism. vmlinux ends up
> >> being the wrong image to decode with because a load of branches are
> >> actually turned into nops.
> >>
> >> Can you confirm if you use --kcore instead of vmlinux that you still
> >> get
> >> failures:
> >>
> >> sudo perf record -e cs_etm// -C 1 --kcore -o <output-folder.data> -- \
> >> dd if=/dev/zero of=/dev/null
> >>
> >> perf script -i <output-folder.data> \
> >> tools/perf/scripts/python/arm-cs-trace-disasm.py -d llvm-objdump \
> >> -k <output-folder.data>/kcore_dir/kcore
> >>
> >> But I still think bad decode detection should be moved as much as
> >> possible into OpenCSD and Perf rather than this script. Otherwise
> >> every tool will have to re-implement it, and OpenCSD has a lot more
> >> info to make decisions with.
> >>
> >> One change we can make is to desynchronize when an N atom is an
> >> unconditional branch:
> >
> > There's a CPU hardware erratum affecting multiple CPU types and
> > generations (including Neoverse N1 and V1), where a branch to the next
> > instruction will be traced as an N atom regardless of whether it's
> > unconditional, taken conditional, indirect etc. This was detected by a
> > similar check in one of our other ETM decoders and we root-caused it
> > to incorrect ETM implementation.
> >
> > The safe check for current silicon is that it's an unconditional
> > branch that is direct and whose target is not the next instruction.
> >
> > You can't infer that an N atom on an unconditional indirect branch is
> > a synchronization error, since it may have actually branched to the
> > next instruction, e.g. in a switch-like construction.
> >
> > Maybe OpenCSD could make the stricter check (as written below)
> > configurable so you could enable it if you knew for sure that the
> > trace wasn't affected by this erratum, but that's not a safe default.
> >
> > Al
> >
> >
>
> That's good to know. In that case it would be better to exclude branches to the
> next instruction from the check then rather than make it configurable. At least for
> direct branches, that way it "just works".
There's a tradeoff between "just working around" these particular
buggy ETM imlementations, and early detection of sync errors.
If you knew that your CPU didn't have the bug, you may prefer to
go for early detection of sync errors. There are use cases for trace
besides generating AutoFDO profiles, where accuracy is more critical,
and where it's better to fail early rather than trace an incorrect code
path. You'd probably also want the tighter check if you were testing
OpenCSD against your own ETM/ETE implementation.
Up to you whether to put in that configurability into OpenCSD.
For Linux perf, going with just the more tolerant check is probably fine.
> Indirect looks a bit more complicated because you have to wait for the address,
> but I'm sure it can be done one way or another.
If the indirect branch is traced as an N atom, it doesn't generate an
address packet, that's part of the problem.
Al
>
> >>
> >> diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> >> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> >> index c557998..3eefd5d 100644
> >> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> >> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> >> @@ -1341,6 +1341,14 @@ ocsd_err_t
> >> TrcPktDecodeEtmV4I::processAtom(const
> >> ocsd_atm_val atom)
> >> // save recorded next instuction address
> >> ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
> >>
> >> + // must have lost sync if an unconditional branch wasn't taken
> >> + if (atom == ATOM_N && !m_instr_info.is_conditional) {
> >> + m_need_addr = true;
> >> + m_out_elem.addElemType(m_index_curr_pkt,
> >> OCSD_GEN_TRC_ELEM_NO_SYNC);
> >> + // wait for next address
> >> + return OCSD_OK;
> >> + }
> >> +
> >>
> >> Another one we can spot is when a new address comes that is before
> >> the current decode address (basically the backwards check that you added).
> >>
> >> There are probably others that can be spotted like an address
> >> appearing after a direct branch that doesn't match the branch target.
> >>
> >> I think at that point, desynchronising should cause the disassembly
> >> script to throw away the last bit, rather than force it to be printed
> >> as in this patch. As I mentioned above in the thread, it leads to
> >> printing disassembly that's implausible and misleading (where an
> >> unconditional branch wasn't taken).
> >> _______________________________________________
> >> CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe
> >> send an email to coresight-leave@lists.linaro.org
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-01 11:58 ` Al Grant
@ 2024-08-01 14:58 ` James Clark
0 siblings, 0 replies; 45+ messages in thread
From: James Clark @ 2024-08-01 14:58 UTC (permalink / raw)
To: Al Grant, Ganapatrao Kulkarni, Mike Leach
Cc: acme@redhat.com, coresight@lists.linaro.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, darren@os.amperecomputing.com,
scclevenger@os.amperecomputing.com, Leo Yan
On 01/08/2024 12:58 pm, Al Grant wrote:
>
>
>> -----Original Message-----
>> From: James Clark <james.clark@linaro.org>
>> Sent: Thursday, August 1, 2024 12:26 PM
>> To: Al Grant <Al.Grant@arm.com>; Ganapatrao Kulkarni
>> <gankulkarni@os.amperecomputing.com>; Mike Leach <mike.leach@linaro.org>
>> Cc: acme@redhat.com; coresight@lists.linaro.org; linux-arm-
>> kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
>> darren@os.amperecomputing.com; scclevenger@os.amperecomputing.com; Leo
>> Yan <Leo.Yan@arm.com>
>> Subject: Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if
>> address continuity is broken
>>
>>
>>
>> On 01/08/2024 11:28 am, Al Grant wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: James Clark <james.clark@linaro.org>
>>>> Sent: Thursday, August 1, 2024 11:00 AM
>>>> To: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>; Mike
>>>> Leach <mike.leach@linaro.org>
>>>> Cc: acme@redhat.com; coresight@lists.linaro.org; linux-arm-
>>>> kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
>>>> darren@os.amperecomputing.com; scclevenger@os.amperecomputing.com;
>>>> Leo Yan <Leo.Yan@arm.com>
>>>> Subject: Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip
>>>> disasm if address continuity is broken
>>>>
>>>>
>>>>
>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>
>>>>>
>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>
>>>>>>
>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>
>>>>>>>>>> Hi James,
>>>>>>>>>>
>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>> To generate the instruction tracing, script uses 2 contiguous
>>>>>>>>>>>> packets address range. If there a continuity brake due to
>>>>>>>>>>>> discontiguous branch address, it is required to reset the
>>>>>>>>>>>> tracing and start tracing with the new set of contiguous
>>>>>>>>>>>> packets.
>>>>>>>>>>>>
>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>> remaining tracing of current packets and restart tracing from
>>>>>>>>>>>> new set of packets, if continuity is established.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>
>>>>>>>>>>> Can you add a before and after example of what's changed to
>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>> this is adding missing output, or it was correcting the tail
>>>>>>>>>>> end of the output that was previously wrong.
>>>>>>>>>>
>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>> segfault of the perf application. With out this change the perf
>>>>>>>>>> segfaults with as below log
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ./perf script
>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>> address Traceback (most recent call last):
>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>> in process_event
>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>> stop_addr)
>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>> in print_disam
>>>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>>> stop_addr):
>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>>>> read_disam
>>>>>>>>>> disasm_output =
>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>> check_output
>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>> check=True,
>>>>>>>>>>
>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> ^^^^
>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in
>>>>>>>>>> run
>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>>>>>> '--start-address=0xffff80008125b758',
>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>>>>>> non-zero exit status 1.
>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>> event handler Python runtime state: initialized
>>>>>>>>>>
>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>> <no Python frame>
>>>>>>>>>>
>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7) Aborted (core
>>>>>>>>>> dumped)
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>> return
>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>> +
>>>>>>>>>>>
>>>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't it
>>>>>>>>>>> get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>
>>>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>
>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>> your change we now "pretend" that the second one is also the previous
>> one:
>>>>>>>>>
>>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>>
>>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but
>>>>>>>>> no return
>>>>>>>>>
>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>>>
>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>
>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>
>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>> return
>>>>>>>>
>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>
>>>>>>>> if (options.verbose == True):
>>>>>>>> print("Event type: %s" % name) @@ -243,12
>>>>>>>> +245,18 @@ def process_event(param_dict):
>>>>>>>>
>>>>>>>> # Record for previous sample packet
>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>
>>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>> stop_addr=4
>>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>> inserted" %
>>>>>>>> cpu)
>>>>>>>> return
>>>>>>>>
>>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>> + # Continuity of the Packets broken, set
>>>>>>>> +start_addr
>>>>>>>> to previous
>>>>>>>> + # packet ip to complete the remaining tracing of
>>>>>>>> +the
>>>>>>>> address range.
>>>>>>>> + start_addr = prev_ip
>>>>>>>> +
>>>>>>>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>>>>>>>> print("Start address 0x%x is out of range [ 0x%x ..
>>>>>>>> 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end),
>>>>>>>> dso))
>>>>>>>> return
>>>>>>>>
>>>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>>>> reference.
>>>>>>>>
>>>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm//
>>>>>>>> -C 1 dd if=/dev/zero of=/dev/null [ perf record: Woken up 1 times
>>>>>>>> to write data ] [ perf record: Captured and wrote 1.087 MB
>>>>>>>> perf.data ]
>>>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>> address Traceback (most recent call last):
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>>>> process_event
>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>>>> print_disam
>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>> stop_addr):
>>>>>>>>
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>> read_disam
>>>>>>>> disasm_output =
>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>> check_output
>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>> check=True,
>>>>>>>>
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> ^^
>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>>>> '--start-address=0xffff80008125b758',
>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>>>> non-zero exit status 1.
>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>> event handler Python runtime state: initialized
>>>>>>>>
>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>> <no Python frame>
>>>>>>>>
>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7) Aborted (core dumped)
>>>>>>>>
>>>>>>>>
>>>>>>>> dump snippet:
>>>>>>>> ============
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>> #2,
>>>>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>> [x21, #968]
>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>> event->clock();
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>> x30, [sp, #-16]!
>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>> return sched_clock(); Event type: branches Sample = { cpu: 0001
>>>>>>>> addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>> x30, [sp, #-32]!
>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>> x20, [sp, #16]
>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>> sp_el0
>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>> [x19, #16]
>>>>>>>> ffff8000801bb4c0: 11000400 add w0, w0,
>>>>>>>> #0x1
>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>> [x19, #16]
>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns
>>>>>>>> = sched_clock_noinstr(); Event type: branches Sample = { cpu:
>>>>>>>> 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>> x30, [sp, #-64]!
>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>> x20, [sp, #16]
>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>> x20,
>>>>>>>> #0x340
>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>> x24, [sp, #48]
>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>> x20, #0x8
>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>> #0x28
>>>>>>>> // #40
>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>> x22, [sp, #32]
>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>> [x20]
>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>> w22, #0x1
>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>> w21, w24
>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>> x23, x21
>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>> [x19, #24]
>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc)
>>>>>>>> & Event type: branches Sample = { cpu: 0001 addr:
>>>>>>>> 0xffff8000801bb4cc phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>
>>>>>>>>
>>>>>>>> With fix:
>>>>>>>> =========
>>>>>>>>
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>> #2,
>>>>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>> [x21, #968]
>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>> event->clock();
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>> x30, [sp, #-16]!
>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>> return sched_clock(); Event type: branches Sample = { cpu: 0001
>>>>>>>> addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>> x30, [sp, #-32]!
>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>> x20, [sp, #16]
>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>> sp_el0
>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>> [x19, #16]
>>>>>>>> ffff8000801bb4c0: 11000400 add w0, w0,
>>>>>>>> #0x1
>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>> [x19, #16]
>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns
>>>>>>>> = sched_clock_noinstr(); Event type: branches Sample = { cpu:
>>>>>>>> 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>> x30, [sp, #-64]!
>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>> x20, [sp, #16]
>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>> x20,
>>>>>>>> #0x340
>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>> x24, [sp, #48]
>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>> x20, #0x8
>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>> #0x28
>>>>>>>> // #40
>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>> x22, [sp, #32]
>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>> [x20]
>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>> w22, #0x1
>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>> w21, w24
>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>> x23, x21
>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>> [x19, #24]
>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>
>>>>>>> It looks like the disassembly now assumes this BLR wasn't taken.
>>>>>>> We go from ffff80008125a8e4 straight through to ...
>>>>>>>
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc)
>>>>>>>> & Event type: branches Sample = { cpu: 0001 addr:
>>>>>>>> 0xffff8000801bb4cc phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>>> [x23, x21]
>>>>>>>
>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>> discontinuity without branch samples being generated where either
>>>>>>> the source or destination address is 0?
>>>>>>>
>>>>>>> What are your record options to create this issue? As I mentioned
>>>>>>> in the previous reply I haven't been able to reproduce it.
>>>>>>
>>>>>> I am using below perf record command.
>>>>>>
>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>> of=/dev/null
>>>>>>
>>>>>
>>>>> Thanks I managed to reproduce it. I'll take a look to see if I think
>>>>> the issue is somewhere else.
>>>>>
>>>>
>>>> At least for the failures I encountered, the issue is due to the
>>>> alternatives runtime instruction patching mechanism. vmlinux ends up
>>>> being the wrong image to decode with because a load of branches are
>>>> actually turned into nops.
>>>>
>>>> Can you confirm if you use --kcore instead of vmlinux that you still
>>>> get
>>>> failures:
>>>>
>>>> sudo perf record -e cs_etm// -C 1 --kcore -o <output-folder.data> -- \
>>>> dd if=/dev/zero of=/dev/null
>>>>
>>>> perf script -i <output-folder.data> \
>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d llvm-objdump \
>>>> -k <output-folder.data>/kcore_dir/kcore
>>>>
>>>> But I still think bad decode detection should be moved as much as
>>>> possible into OpenCSD and Perf rather than this script. Otherwise
>>>> every tool will have to re-implement it, and OpenCSD has a lot more
>>>> info to make decisions with.
>>>>
>>>> One change we can make is to desynchronize when an N atom is an
>>>> unconditional branch:
>>>
>>> There's a CPU hardware erratum affecting multiple CPU types and
>>> generations (including Neoverse N1 and V1), where a branch to the next
>>> instruction will be traced as an N atom regardless of whether it's
>>> unconditional, taken conditional, indirect etc. This was detected by a
>>> similar check in one of our other ETM decoders and we root-caused it
>>> to incorrect ETM implementation.
>>>
>>> The safe check for current silicon is that it's an unconditional
>>> branch that is direct and whose target is not the next instruction.
>>>
>>> You can't infer that an N atom on an unconditional indirect branch is
>>> a synchronization error, since it may have actually branched to the
>>> next instruction, e.g. in a switch-like construction.
>>>
>>> Maybe OpenCSD could make the stricter check (as written below)
>>> configurable so you could enable it if you knew for sure that the
>>> trace wasn't affected by this erratum, but that's not a safe default.
>>>
>>> Al
>>>
>>>
>>
>> That's good to know. In that case it would be better to exclude branches to the
>> next instruction from the check then rather than make it configurable. At least for
>> direct branches, that way it "just works".
>
> There's a tradeoff between "just working around" these particular
> buggy ETM imlementations, and early detection of sync errors.
> If you knew that your CPU didn't have the bug, you may prefer to
> go for early detection of sync errors. There are use cases for trace
> besides generating AutoFDO profiles, where accuracy is more critical,
> and where it's better to fail early rather than trace an incorrect code
> path. You'd probably also want the tighter check if you were testing
> OpenCSD against your own ETM/ETE implementation.
>
> Up to you whether to put in that configurability into OpenCSD.
> For Linux perf, going with just the more tolerant check is probably fine.
>
>> Indirect looks a bit more complicated because you have to wait for the address,
>> but I'm sure it can be done one way or another.
>
> If the indirect branch is traced as an N atom, it doesn't generate an
> address packet, that's part of the problem.
>
> Al
>
>
Ah ok, in that case I'd propose only adding the check for direct
branches. Then any issues with indirect ones are deferred until some
other decode issue later. That gives some improvement to earlier
detection but without having to add extra configuration.
I just feel like extra configuration would be hard for people to know
when to use or we'd have to maintain lists of affected devices etc.
>>
>>>>
>>>> diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>> index c557998..3eefd5d 100644
>>>> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>> @@ -1341,6 +1341,14 @@ ocsd_err_t
>>>> TrcPktDecodeEtmV4I::processAtom(const
>>>> ocsd_atm_val atom)
>>>> // save recorded next instuction address
>>>> ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
>>>>
>>>> + // must have lost sync if an unconditional branch wasn't taken
>>>> + if (atom == ATOM_N && !m_instr_info.is_conditional) {
>>>> + m_need_addr = true;
>>>> + m_out_elem.addElemType(m_index_curr_pkt,
>>>> OCSD_GEN_TRC_ELEM_NO_SYNC);
>>>> + // wait for next address
>>>> + return OCSD_OK;
>>>> + }
>>>> +
>>>>
>>>> Another one we can spot is when a new address comes that is before
>>>> the current decode address (basically the backwards check that you added).
>>>>
>>>> There are probably others that can be spotted like an address
>>>> appearing after a direct branch that doesn't match the branch target.
>>>>
>>>> I think at that point, desynchronising should cause the disassembly
>>>> script to throw away the last bit, rather than force it to be printed
>>>> as in this patch. As I mentioned above in the thread, it leads to
>>>> printing disassembly that's implausible and misleading (where an
>>>> unconditional branch wasn't taken).
>>>> _______________________________________________
>>>> CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe
>>>> send an email to coresight-leave@lists.linaro.org
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-01 10:00 ` James Clark
2024-08-01 10:28 ` Al Grant
@ 2024-08-05 12:22 ` Ganapatrao Kulkarni
2024-08-05 13:59 ` James Clark
1 sibling, 1 reply; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-08-05 12:22 UTC (permalink / raw)
To: James Clark, Mike Leach
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger, james.clark, suzuki.poulose, Leo Yan, Al.Grant
On 01-08-2024 03:30 pm, James Clark wrote:
>
>
> On 24/07/2024 3:45 pm, James Clark wrote:
>>
>>
>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>
>>>
>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>
>>>>
>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>
>>>>>
>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>
>>>>>>
>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>
>>>>>>> Hi James,
>>>>>>>
>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>> To generate the instruction tracing, script uses 2 contiguous
>>>>>>>>> packets
>>>>>>>>> address range. If there a continuity brake due to discontiguous
>>>>>>>>> branch
>>>>>>>>> address, it is required to reset the tracing and start tracing
>>>>>>>>> with the
>>>>>>>>> new set of contiguous packets.
>>>>>>>>>
>>>>>>>>> Adding change to identify the break and complete the remaining
>>>>>>>>> tracing
>>>>>>>>> of current packets and restart tracing from new set of packets, if
>>>>>>>>> continuity is established.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Ganapatrao,
>>>>>>>>
>>>>>>>> Can you add a before and after example of what's changed to the
>>>>>>>> commit message? It wasn't immediately obvious to me if this is
>>>>>>>> adding missing output, or it was correcting the tail end of the
>>>>>>>> output that was previously wrong.
>>>>>>>
>>>>>>> It is adding tail end of the trace as well avoiding the segfault
>>>>>>> of the perf application. With out this change the perf segfaults
>>>>>>> with as below log
>>>>>>>
>>>>>>>
>>>>>>> ./perf script
>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>> objdump: error: the stop address should be after the start address
>>>>>>> Traceback (most recent call last):
>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>>> process_event
>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>>> print_disam
>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>> stop_addr):
>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>> read_disam
>>>>>>> disasm_output =
>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>> check_output
>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>> check=True,
>>>>>>>
>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>>> '--start-address=0xffff80008125b758',
>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>>> non-zero exit status 1.
>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>> event handler
>>>>>>> Python runtime state: initialized
>>>>>>>
>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>> <no Python frame>
>>>>>>>
>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>> Aborted (core dumped)
>>>>>>>
>>>>>>>>
>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>> ---
>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>> return
>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>> +
>>>>>>>>
>>>>>>>> Do you need to write into the global cpu_data here? Doesn't it
>>>>>>>> get overwritten after you load it back into 'prev_ip'
>>>>>>>
>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>> overwriting with the current packet.
>>>>>>
>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>> previous sample. For addr, we return on the first None, with your
>>>>>> change we now "pretend" that the second one is also the previous one:
>>>>>>
>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>> return <----------------------------sample 0 return
>>>>>>
>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no
>>>>>> return
>>>>>>
>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>
>>>>> Yes, it is dummy for first packet. Added anticipating that we wont
>>>>> hit the discontinuity for the first packet itself.
>>>>>
>>>>> Can this be changed to more intuitive like below?
>>>>>
>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>> return
>>>>>
>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>
>>>>> if (options.verbose == True):
>>>>> print("Event type: %s" % name)
>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>
>>>>> # Record for previous sample packet
>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>
>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>> stop_addr=4
>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted"
>>>>> % cpu)
>>>>> return
>>>>>
>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>> + # Continuity of the Packets broken, set start_addr
>>>>> to previous
>>>>> + # packet ip to complete the remaining tracing of
>>>>> the address range.
>>>>> + start_addr = prev_ip
>>>>> +
>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>> int(dso_end)):
>>>>> print("Start address 0x%x is out of range [ 0x%x
>>>>> .. 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end),
>>>>> dso))
>>>>> return
>>>>>
>>>>> Without this patch below is the failure log(with segfault) for
>>>>> reference.
>>>>>
>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C
>>>>> 1 dd if=/dev/zero of=/dev/null
>>>>> [ perf record: Woken up 1 times to write data ]
>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>> objdump: error: the stop address should be after the start address
>>>>> Traceback (most recent call last):
>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>> process_event
>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>> print_disam
>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>> stop_addr):
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>> read_disam
>>>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>> check_output
>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>> raise CalledProcessError(retcode, process.args,
>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>> '--start-address=0xffff80008125b758',
>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>> non-zero exit status 1.
>>>>> Fatal Python error: handler_call_die: problem in Python trace event
>>>>> handler
>>>>> Python runtime state: initialized
>>>>>
>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>> <no Python frame>
>>>>>
>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>>>> problem._py3abrt (total: 7)
>>>>> Aborted (core dumped)
>>>>>
>>>>>
>>>>> dump snippet:
>>>>> ============
>>>>> Event type: branches
>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>> period: 1 time: 5986372298040 }
>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>> ffff800080313f04: 36100094 tbz w20, #2,
>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>> ffff800080313f08: f941e6a0 ldr x0, [x21,
>>>>> #968]
>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>> __perf_event_header__init_id+0x54
>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>> event->clock();
>>>>> Event type: branches
>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>> period: 1 time: 5986372298040 }
>>>>> ffff80008030cb00 <local_clock>:
>>>>> ffff80008030cb00: d503233f paciasp
>>>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>>>>> [sp, #-16]!
>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>> ffff80008030cb0c: 97faba67 bl
>>>>> ffff8000801bb4a8 <sched_clock>
>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>> return sched_clock();
>>>>> Event type: branches
>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>> period: 1 time: 5986372298040 }
>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>>>>> [sp, #-32]!
>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>>>>> [sp, #16]
>>>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>>>> ffff8000801bb4bc: b9401260 ldr w0, [x19,
>>>>> #16]
>>>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>>>>> ffff8000801bb4c4: b9001260 str w0, [x19,
>>>>> #16]
>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>>>>> sched_clock_noinstr();
>>>>> Event type: branches
>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>> period: 1 time: 5986372298040 }
>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>> ffff80008125a8a8: d503233f paciasp
>>>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>>>>> [sp, #-64]!
>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>> ffff80008125a8b4: a90153f3 stp x19, x20,
>>>>> [sp, #16]
>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>> ffff80008125a8bc: 910d0294 add x20, x20,
>>>>> #0x340
>>>>> ffff80008125a8c0: a90363f7 stp x23, x24,
>>>>> [sp, #48]
>>>>> ffff80008125a8c4: 91002297 add x23, x20,
>>>>> #0x8
>>>>> ffff80008125a8c8: 52800518 mov w24, #0x28
>>>>> // #40
>>>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>>>>> [sp, #32]
>>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>>> ffff80008125a8d4: 120002d5 and w21, w22,
>>>>> #0x1
>>>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>>>>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>>>>> ffff80008125a8e0: f9400e60 ldr x0, [x19,
>>>>> #24]
>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>> Event type: branches
>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>> period: 1 time: 5986372298040 }
>>>>>
>>>>>
>>>>> With fix:
>>>>> =========
>>>>>
>>>>> Event type: branches
>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>> period: 1 time: 5986372298040 }
>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>> ffff800080313f04: 36100094 tbz w20, #2,
>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>> ffff800080313f08: f941e6a0 ldr x0, [x21,
>>>>> #968]
>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>> __perf_event_header__init_id+0x54
>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>> event->clock();
>>>>> Event type: branches
>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>> period: 1 time: 5986372298040 }
>>>>> ffff80008030cb00 <local_clock>:
>>>>> ffff80008030cb00: d503233f paciasp
>>>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>>>>> [sp, #-16]!
>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>> ffff80008030cb0c: 97faba67 bl
>>>>> ffff8000801bb4a8 <sched_clock>
>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>> return sched_clock();
>>>>> Event type: branches
>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>> period: 1 time: 5986372298040 }
>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>>>>> [sp, #-32]!
>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>>>>> [sp, #16]
>>>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>>>> ffff8000801bb4bc: b9401260 ldr w0, [x19,
>>>>> #16]
>>>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>>>>> ffff8000801bb4c4: b9001260 str w0, [x19,
>>>>> #16]
>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>>>>> sched_clock_noinstr();
>>>>> Event type: branches
>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>> period: 1 time: 5986372298040 }
>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>> ffff80008125a8a8: d503233f paciasp
>>>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>>>>> [sp, #-64]!
>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>> ffff80008125a8b4: a90153f3 stp x19, x20,
>>>>> [sp, #16]
>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>> ffff80008125a8bc: 910d0294 add x20, x20,
>>>>> #0x340
>>>>> ffff80008125a8c0: a90363f7 stp x23, x24,
>>>>> [sp, #48]
>>>>> ffff80008125a8c4: 91002297 add x23, x20,
>>>>> #0x8
>>>>> ffff80008125a8c8: 52800518 mov w24, #0x28
>>>>> // #40
>>>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>>>>> [sp, #32]
>>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>>> ffff80008125a8d4: 120002d5 and w21, w22,
>>>>> #0x1
>>>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>>>>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>>>>> ffff80008125a8e0: f9400e60 ldr x0, [x19,
>>>>> #24]
>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>
>>>> It looks like the disassembly now assumes this BLR wasn't taken. We
>>>> go from ffff80008125a8e4 straight through to ...
>>>>
>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>> Event type: branches
>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>> period: 1 time: 5986372298040 }
>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>> ffff80008125a8e8: f8756ae3 ldr x3, [x23,
>>>>> x21]
>>>>
>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your issue
>>>> actually a decode issue in Perf itself? Why is there a discontinuity
>>>> without branch samples being generated where either the source or
>>>> destination address is 0?
>>>>
>>>> What are your record options to create this issue? As I mentioned in
>>>> the previous reply I haven't been able to reproduce it.
>>>
>>> I am using below perf record command.
>>>
>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>
>>
>> Thanks I managed to reproduce it. I'll take a look to see if I think
>> the issue is somewhere else.
>>
>
> At least for the failures I encountered, the issue is due to the
> alternatives runtime instruction patching mechanism. vmlinux ends up
> being the wrong image to decode with because a load of branches are
> actually turned into nops.
>
> Can you confirm if you use --kcore instead of vmlinux that you still get
> failures:
>
> sudo perf record -e cs_etm// -C 1 --kcore -o <output-folder.data> -- \
> dd if=/dev/zero of=/dev/null
>
> perf script -i <output-folder.data> \
> tools/perf/scripts/python/arm-cs-trace-disasm.py -d llvm-objdump \
> -k <output-folder.data>/kcore_dir/kcore
>
With below command combination with kcore also the issue is seen, as
reported in this email chain.
timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
dd if=/dev/zero of=/dev/null
./perf script -i kcore/data \
--script=python:./scripts/python/arm-cs-trace-disasm.py -- \
-d objdump -k kcore/kcore_dir/kcore
However, with below sequence(same as your command) the issue is *not* seen.
timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
dd if=/dev/zero of=/dev/null
./perf script -i kcore/data ./scripts/python/arm-cs-trace-disasm.py \
-- -d objdump -k kcore/kcore_dir/kcore
Do you see any issue with the command, which is showing the problem?
Also the output log produced by these both commands is different.
The below diff that you have shared has no effect on the failing case.
> But I still think bad decode detection should be moved as much as
> possible into OpenCSD and Perf rather than this script. Otherwise every
> tool will have to re-implement it, and OpenCSD has a lot more info to
> make decisions with.
>
> One change we can make is to desynchronize when an N atom is an
> unconditional branch:
>
> diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> index c557998..3eefd5d 100644
> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
> @@ -1341,6 +1341,14 @@ ocsd_err_t TrcPktDecodeEtmV4I::processAtom(const
> ocsd_atm_val atom)
> // save recorded next instuction address
> ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
>
> + // must have lost sync if an unconditional branch wasn't taken
> + if (atom == ATOM_N && !m_instr_info.is_conditional) {
> + m_need_addr = true;
> + m_out_elem.addElemType(m_index_curr_pkt,
> OCSD_GEN_TRC_ELEM_NO_SYNC);
> + // wait for next address
> + return OCSD_OK;
> + }
> +
>
> Another one we can spot is when a new address comes that is before the
> current decode address (basically the backwards check that you added).
>
> There are probably others that can be spotted like an address appearing
> after a direct branch that doesn't match the branch target.
>
> I think at that point, desynchronising should cause the disassembly
> script to throw away the last bit, rather than force it to be printed as
> in this patch. As I mentioned above in the thread, it leads to printing
> disassembly that's implausible and misleading (where an unconditional
> branch wasn't taken).
Thanks,
Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-05 12:22 ` Ganapatrao Kulkarni
@ 2024-08-05 13:59 ` James Clark
2024-08-06 7:02 ` Ganapatrao Kulkarni
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-08-05 13:59 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger, james.clark, suzuki.poulose, Leo Yan, Al.Grant,
Mike Leach
On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>
>
> On 01-08-2024 03:30 pm, James Clark wrote:
>>
>>
>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>
>>>
>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>
>>>>
>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>
>>>>>
>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>
>>>>>>
>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>
>>>>>>>> Hi James,
>>>>>>>>
>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>> To generate the instruction tracing, script uses 2 contiguous
>>>>>>>>>> packets
>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>> discontiguous branch
>>>>>>>>>> address, it is required to reset the tracing and start tracing
>>>>>>>>>> with the
>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>
>>>>>>>>>> Adding change to identify the break and complete the remaining
>>>>>>>>>> tracing
>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>> packets, if
>>>>>>>>>> continuity is established.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>
>>>>>>>>> Can you add a before and after example of what's changed to the
>>>>>>>>> commit message? It wasn't immediately obvious to me if this is
>>>>>>>>> adding missing output, or it was correcting the tail end of the
>>>>>>>>> output that was previously wrong.
>>>>>>>>
>>>>>>>> It is adding tail end of the trace as well avoiding the segfault
>>>>>>>> of the perf application. With out this change the perf segfaults
>>>>>>>> with as below log
>>>>>>>>
>>>>>>>>
>>>>>>>> ./perf script
>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>> objdump: error: the stop address should be after the start address
>>>>>>>> Traceback (most recent call last):
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>>>> process_event
>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>>>> print_disam
>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>> stop_addr):
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>> read_disam
>>>>>>>> disasm_output =
>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>> check_output
>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>> check=True,
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>>>> '--start-address=0xffff80008125b758',
>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>>>> non-zero exit status 1.
>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>> event handler
>>>>>>>> Python runtime state: initialized
>>>>>>>>
>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>> <no Python frame>
>>>>>>>>
>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>> Aborted (core dumped)
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>> ---
>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>> ++++++++++
>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>> return
>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't it
>>>>>>>>> get overwritten after you load it back into 'prev_ip'
>>>>>>>>
>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>> overwriting with the current packet.
>>>>>>>
>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>> previous sample. For addr, we return on the first None, with your
>>>>>>> change we now "pretend" that the second one is also the previous
>>>>>>> one:
>>>>>>>
>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>> return <----------------------------sample 0 return
>>>>>>>
>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no
>>>>>>> return
>>>>>>>
>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>
>>>>>> Yes, it is dummy for first packet. Added anticipating that we wont
>>>>>> hit the discontinuity for the first packet itself.
>>>>>>
>>>>>> Can this be changed to more intuitive like below?
>>>>>>
>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>> return
>>>>>>
>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>
>>>>>> if (options.verbose == True):
>>>>>> print("Event type: %s" % name)
>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>
>>>>>> # Record for previous sample packet
>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>
>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>> stop_addr=4
>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted"
>>>>>> % cpu)
>>>>>> return
>>>>>>
>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>> + # Continuity of the Packets broken, set start_addr
>>>>>> to previous
>>>>>> + # packet ip to complete the remaining tracing of
>>>>>> the address range.
>>>>>> + start_addr = prev_ip
>>>>>> +
>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>> int(dso_end)):
>>>>>> print("Start address 0x%x is out of range [ 0x%x
>>>>>> .. 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end),
>>>>>> dso))
>>>>>> return
>>>>>>
>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>> reference.
>>>>>>
>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C
>>>>>> 1 dd if=/dev/zero of=/dev/null
>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>> objdump: error: the stop address should be after the start address
>>>>>> Traceback (most recent call last):
>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>> process_event
>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>> print_disam
>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>> stop_addr):
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>> read_disam
>>>>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>> check_output
>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>> '--start-address=0xffff80008125b758',
>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>> non-zero exit status 1.
>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>> event handler
>>>>>> Python runtime state: initialized
>>>>>>
>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>> <no Python frame>
>>>>>>
>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>>>>> problem._py3abrt (total: 7)
>>>>>> Aborted (core dumped)
>>>>>>
>>>>>>
>>>>>> dump snippet:
>>>>>> ============
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>> ffff800080313f04: 36100094 tbz w20, #2,
>>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>> ffff800080313f08: f941e6a0 ldr x0, [x21,
>>>>>> #968]
>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> __perf_event_header__init_id+0x54
>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>> event->clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff80008030cb00 <local_clock>:
>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>>>>>> [sp, #-16]!
>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>> return sched_clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>>>>>> [sp, #-32]!
>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>>>>>> [sp, #16]
>>>>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>>>>> ffff8000801bb4bc: b9401260 ldr w0, [x19,
>>>>>> #16]
>>>>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>>>>>> ffff8000801bb4c4: b9001260 str w0, [x19,
>>>>>> #16]
>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns
>>>>>> = sched_clock_noinstr();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>>>>>> [sp, #-64]!
>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>> ffff80008125a8b4: a90153f3 stp x19, x20,
>>>>>> [sp, #16]
>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>> ffff80008125a8bc: 910d0294 add x20, x20,
>>>>>> #0x340
>>>>>> ffff80008125a8c0: a90363f7 stp x23, x24,
>>>>>> [sp, #48]
>>>>>> ffff80008125a8c4: 91002297 add x23, x20,
>>>>>> #0x8
>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>> #0x28 // #40
>>>>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>>>>>> [sp, #32]
>>>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>>>> ffff80008125a8d4: 120002d5 and w21, w22,
>>>>>> #0x1
>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21,
>>>>>> w24
>>>>>> ffff80008125a8dc: 8b1502f3 add x19, x23,
>>>>>> x21
>>>>>> ffff80008125a8e0: f9400e60 ldr x0, [x19,
>>>>>> #24]
>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>>
>>>>>>
>>>>>> With fix:
>>>>>> =========
>>>>>>
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>> ffff800080313f04: 36100094 tbz w20, #2,
>>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>> ffff800080313f08: f941e6a0 ldr x0, [x21,
>>>>>> #968]
>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> __perf_event_header__init_id+0x54
>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>> event->clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff80008030cb00 <local_clock>:
>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>>>>>> [sp, #-16]!
>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>> return sched_clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>>>>>> [sp, #-32]!
>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>>>>>> [sp, #16]
>>>>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>>>>> ffff8000801bb4bc: b9401260 ldr w0, [x19,
>>>>>> #16]
>>>>>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>>>>>> ffff8000801bb4c4: b9001260 str w0, [x19,
>>>>>> #16]
>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns
>>>>>> = sched_clock_noinstr();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>>>>>> [sp, #-64]!
>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>> ffff80008125a8b4: a90153f3 stp x19, x20,
>>>>>> [sp, #16]
>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>> ffff80008125a8bc: 910d0294 add x20, x20,
>>>>>> #0x340
>>>>>> ffff80008125a8c0: a90363f7 stp x23, x24,
>>>>>> [sp, #48]
>>>>>> ffff80008125a8c4: 91002297 add x23, x20,
>>>>>> #0x8
>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>> #0x28 // #40
>>>>>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>>>>>> [sp, #32]
>>>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>>>> ffff80008125a8d4: 120002d5 and w21, w22,
>>>>>> #0x1
>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21, w21,
>>>>>> w24
>>>>>> ffff80008125a8dc: 8b1502f3 add x19, x23,
>>>>>> x21
>>>>>> ffff80008125a8e0: f9400e60 ldr x0, [x19,
>>>>>> #24]
>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>
>>>>> It looks like the disassembly now assumes this BLR wasn't taken. We
>>>>> go from ffff80008125a8e4 straight through to ...
>>>>>
>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>> period: 1 time: 5986372298040 }
>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>> ffff80008125a8e8: f8756ae3 ldr x3, [x23,
>>>>>> x21]
>>>>>
>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>> discontinuity without branch samples being generated where either
>>>>> the source or destination address is 0?
>>>>>
>>>>> What are your record options to create this issue? As I mentioned
>>>>> in the previous reply I haven't been able to reproduce it.
>>>>
>>>> I am using below perf record command.
>>>>
>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>
>>>
>>> Thanks I managed to reproduce it. I'll take a look to see if I think
>>> the issue is somewhere else.
>>>
>>
>> At least for the failures I encountered, the issue is due to the
>> alternatives runtime instruction patching mechanism. vmlinux ends up
>> being the wrong image to decode with because a load of branches are
>> actually turned into nops.
>>
>> Can you confirm if you use --kcore instead of vmlinux that you still
>> get failures:
>>
>> sudo perf record -e cs_etm// -C 1 --kcore -o <output-folder.data> -- \
>> dd if=/dev/zero of=/dev/null
>>
>> perf script -i <output-folder.data> \
>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d llvm-objdump \
>> -k <output-folder.data>/kcore_dir/kcore
>>
>
> With below command combination with kcore also the issue is seen, as
> reported in this email chain.
>
> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
> dd if=/dev/zero of=/dev/null
>
> ./perf script -i kcore/data \
> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
> -d objdump -k kcore/kcore_dir/kcore
>
>
> However, with below sequence(same as your command) the issue is *not* seen.
>
> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
> dd if=/dev/zero of=/dev/null
>
> ./perf script -i kcore/data ./scripts/python/arm-cs-trace-disasm.py \
> -- -d objdump -k kcore/kcore_dir/kcore
>
> Do you see any issue with the command, which is showing the problem?
> Also the output log produced by these both commands is different.
>
Double check the command I gave. "-i" needs to be the same as "-o" (it's
the folder, not the data file). I think this could be causing your
issue. Unless you give it the folder it doesn't open kcore along with
the data file.
> The below diff that you have shared has no effect on the failing case.
>
>> But I still think bad decode detection should be moved as much as
>> possible into OpenCSD and Perf rather than this script. Otherwise
>> every tool will have to re-implement it, and OpenCSD has a lot more
>> info to make decisions with.
>>
>> One change we can make is to desynchronize when an N atom is an
>> unconditional branch:
>>
>> diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> index c557998..3eefd5d 100644
>> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> @@ -1341,6 +1341,14 @@ ocsd_err_t
>> TrcPktDecodeEtmV4I::processAtom(const ocsd_atm_val atom)
>> // save recorded next instuction address
>> ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
>>
>> + // must have lost sync if an unconditional branch wasn't taken
>> + if (atom == ATOM_N && !m_instr_info.is_conditional) {
>> + m_need_addr = true;
>> + m_out_elem.addElemType(m_index_curr_pkt,
>> OCSD_GEN_TRC_ELEM_NO_SYNC);
>> + // wait for next address
>> + return OCSD_OK;
>> + }
>> +
>>
>> Another one we can spot is when a new address comes that is before the
>> current decode address (basically the backwards check that you added).
>>
>> There are probably others that can be spotted like an address
>> appearing after a direct branch that doesn't match the branch target.
>>
>> I think at that point, desynchronising should cause the disassembly
>> script to throw away the last bit, rather than force it to be printed
>> as in this patch. As I mentioned above in the thread, it leads to
>> printing disassembly that's implausible and misleading (where an
>> unconditional branch wasn't taken).
>
> Thanks,
> Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-05 13:59 ` James Clark
@ 2024-08-06 7:02 ` Ganapatrao Kulkarni
2024-08-06 9:47 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-08-06 7:02 UTC (permalink / raw)
To: James Clark
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger, james.clark, suzuki.poulose, Leo Yan, Al.Grant,
Mike Leach
On 05-08-2024 07:29 pm, James Clark wrote:
>
>
> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>
>>
>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>
>>>
>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>
>>>>
>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>
>>>>>
>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>
>>>>>>
>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>
>>>>>>>>> Hi James,
>>>>>>>>>
>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>> To generate the instruction tracing, script uses 2 contiguous
>>>>>>>>>>> packets
>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>> discontiguous branch
>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>> tracing with the
>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>
>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>> remaining tracing
>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>> packets, if
>>>>>>>>>>> continuity is established.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>
>>>>>>>>>> Can you add a before and after example of what's changed to
>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>> this is adding missing output, or it was correcting the tail
>>>>>>>>>> end of the output that was previously wrong.
>>>>>>>>>
>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>> segfault of the perf application. With out this change the perf
>>>>>>>>> segfaults with as below log
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ./perf script
>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>> objdump: error: the stop address should be after the start address
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>>>>> process_event
>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>>>>> print_disam
>>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>> stop_addr):
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>>> read_disam
>>>>>>>>> disasm_output =
>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>> check_output
>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>> check=True,
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>>>>> '--start-address=0xffff80008125b758',
>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>>>>> non-zero exit status 1.
>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>> event handler
>>>>>>>>> Python runtime state: initialized
>>>>>>>>>
>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>> <no Python frame>
>>>>>>>>>
>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>> Aborted (core dumped)
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>>> ---
>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>> ++++++++++
>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>> return
>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>> +
>>>>>>>>>>
>>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't it
>>>>>>>>>> get overwritten after you load it back into 'prev_ip'
>>>>>>>>>
>>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>> overwriting with the current packet.
>>>>>>>>
>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>> previous one:
>>>>>>>>
>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>
>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no
>>>>>>>> return
>>>>>>>>
>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>>
>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>
>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>
>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>> return
>>>>>>>
>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>
>>>>>>> if (options.verbose == True):
>>>>>>> print("Event type: %s" % name)
>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>
>>>>>>> # Record for previous sample packet
>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>
>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>> stop_addr=4
>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>> inserted" % cpu)
>>>>>>> return
>>>>>>>
>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>> + # Continuity of the Packets broken, set
>>>>>>> start_addr to previous
>>>>>>> + # packet ip to complete the remaining tracing of
>>>>>>> the address range.
>>>>>>> + start_addr = prev_ip
>>>>>>> +
>>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>>> int(dso_end)):
>>>>>>> print("Start address 0x%x is out of range [ 0x%x
>>>>>>> .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>> int(dso_end), dso))
>>>>>>> return
>>>>>>>
>>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>>> reference.
>>>>>>>
>>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm//
>>>>>>> -C 1 dd if=/dev/zero of=/dev/null
>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>> objdump: error: the stop address should be after the start address
>>>>>>> Traceback (most recent call last):
>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>>> process_event
>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>>> print_disam
>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>> stop_addr):
>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>> read_disam
>>>>>>> disasm_output =
>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>> check_output
>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>> check=True,
>>>>>>>
>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>>> '--start-address=0xffff80008125b758',
>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>>> non-zero exit status 1.
>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>> event handler
>>>>>>> Python runtime state: initialized
>>>>>>>
>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>> <no Python frame>
>>>>>>>
>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>> Aborted (core dumped)
>>>>>>>
>>>>>>>
>>>>>>> dump snippet:
>>>>>>> ============
>>>>>>> Event type: branches
>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>>> period: 1 time: 5986372298040 }
>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>> ffff800080313f04: 36100094 tbz w20, #2,
>>>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>> [x21, #968]
>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>> __perf_event_header__init_id+0x54
>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>> event->clock();
>>>>>>> Event type: branches
>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>>> period: 1 time: 5986372298040 }
>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>> x30, [sp, #-16]!
>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>> return sched_clock();
>>>>>>> Event type: branches
>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>>> period: 1 time: 5986372298040 }
>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>> x30, [sp, #-32]!
>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>> x20, [sp, #16]
>>>>>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>> [x19, #16]
>>>>>>> ffff8000801bb4c0: 11000400 add w0, w0,
>>>>>>> #0x1
>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>> [x19, #16]
>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns
>>>>>>> = sched_clock_noinstr();
>>>>>>> Event type: branches
>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>>> period: 1 time: 5986372298040 }
>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>> x30, [sp, #-64]!
>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>> x20, [sp, #16]
>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>> x20, #0x340
>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>> x24, [sp, #48]
>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>> x20, #0x8
>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>> #0x28 // #40
>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>> x22, [sp, #32]
>>>>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>> w22, #0x1
>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>> w21, w24
>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>> x23, x21
>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>> [x19, #24]
>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>> Event type: branches
>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>
>>>>>>>
>>>>>>> With fix:
>>>>>>> =========
>>>>>>>
>>>>>>> Event type: branches
>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>>> period: 1 time: 5986372298040 }
>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>> ffff800080313f04: 36100094 tbz w20, #2,
>>>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>> [x21, #968]
>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>> __perf_event_header__init_id+0x54
>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>> event->clock();
>>>>>>> Event type: branches
>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>>> period: 1 time: 5986372298040 }
>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>> x30, [sp, #-16]!
>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>> return sched_clock();
>>>>>>> Event type: branches
>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>>> period: 1 time: 5986372298040 }
>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>> x30, [sp, #-32]!
>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>> x20, [sp, #16]
>>>>>>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>> [x19, #16]
>>>>>>> ffff8000801bb4c0: 11000400 add w0, w0,
>>>>>>> #0x1
>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>> [x19, #16]
>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns
>>>>>>> = sched_clock_noinstr();
>>>>>>> Event type: branches
>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>>> period: 1 time: 5986372298040 }
>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>> x30, [sp, #-64]!
>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>> x20, [sp, #16]
>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>> x20, #0x340
>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>> x24, [sp, #48]
>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>> x20, #0x8
>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>> #0x28 // #40
>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>> x22, [sp, #32]
>>>>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>> w22, #0x1
>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>> w21, w24
>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>> x23, x21
>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>> [x19, #24]
>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>
>>>>>> It looks like the disassembly now assumes this BLR wasn't taken.
>>>>>> We go from ffff80008125a8e4 straight through to ...
>>>>>>
>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>>>>>>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>> Event type: branches
>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>>> period: 1 time: 5986372298040 }
>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>> [x23, x21]
>>>>>>
>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>> discontinuity without branch samples being generated where either
>>>>>> the source or destination address is 0?
>>>>>>
>>>>>> What are your record options to create this issue? As I mentioned
>>>>>> in the previous reply I haven't been able to reproduce it.
>>>>>
>>>>> I am using below perf record command.
>>>>>
>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>>
>>>>
>>>> Thanks I managed to reproduce it. I'll take a look to see if I think
>>>> the issue is somewhere else.
>>>>
>>>
>>> At least for the failures I encountered, the issue is due to the
>>> alternatives runtime instruction patching mechanism. vmlinux ends up
>>> being the wrong image to decode with because a load of branches are
>>> actually turned into nops.
>>>
>>> Can you confirm if you use --kcore instead of vmlinux that you still
>>> get failures:
>>>
>>> sudo perf record -e cs_etm// -C 1 --kcore -o <output-folder.data>
>>> -- \
>>> dd if=/dev/zero of=/dev/null
>>>
>>> perf script -i <output-folder.data> \
>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d llvm-objdump \
>>> -k <output-folder.data>/kcore_dir/kcore
>>>
>>
>> With below command combination with kcore also the issue is seen, as
>> reported in this email chain.
>>
>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>> dd if=/dev/zero of=/dev/null
>>
>> ./perf script -i kcore/data \
>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>> -d objdump -k kcore/kcore_dir/kcore
>>
>>
>> However, with below sequence(same as your command) the issue is *not*
>> seen.
>>
>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>> dd if=/dev/zero of=/dev/null
>>
>> ./perf script -i kcore/data ./scripts/python/arm-cs-trace-disasm.py \
>> -- -d objdump -k kcore/kcore_dir/kcore
>>
>> Do you see any issue with the command, which is showing the problem?
>> Also the output log produced by these both commands is different.
>>
>
> Double check the command I gave. "-i" needs to be the same as "-o" (it's
> the folder, not the data file). I think this could be causing your
> issue. Unless you give it the folder it doesn't open kcore along with
> the data file.
>
As per 'perf script --help'
-i, --input=
Input file name. (default: perf.data unless stdin is a fifo)
Also tried just giving dir as you suggested and still the same.
./perf script -i kcore
--script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump -k
kcore/kcore_dir/kcore
>> The below diff that you have shared has no effect on the failing case.
>>
>>> But I still think bad decode detection should be moved as much as
>>> possible into OpenCSD and Perf rather than this script. Otherwise
>>> every tool will have to re-implement it, and OpenCSD has a lot more
>>> info to make decisions with.
>>>
>>> One change we can make is to desynchronize when an N atom is an
>>> unconditional branch:
>>>
>>> diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>> index c557998..3eefd5d 100644
>>> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>> @@ -1341,6 +1341,14 @@ ocsd_err_t
>>> TrcPktDecodeEtmV4I::processAtom(const ocsd_atm_val atom)
>>> // save recorded next instuction address
>>> ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
>>>
>>> + // must have lost sync if an unconditional branch wasn't taken
>>> + if (atom == ATOM_N && !m_instr_info.is_conditional) {
>>> + m_need_addr = true;
>>> + m_out_elem.addElemType(m_index_curr_pkt,
>>> OCSD_GEN_TRC_ELEM_NO_SYNC);
>>> + // wait for next address
>>> + return OCSD_OK;
>>> + }
>>> +
>>>
>>> Another one we can spot is when a new address comes that is before
>>> the current decode address (basically the backwards check that you
>>> added).
>>>
>>> There are probably others that can be spotted like an address
>>> appearing after a direct branch that doesn't match the branch target.
>>>
>>> I think at that point, desynchronising should cause the disassembly
>>> script to throw away the last bit, rather than force it to be printed
>>> as in this patch. As I mentioned above in the thread, it leads to
>>> printing disassembly that's implausible and misleading (where an
>>> unconditional branch wasn't taken).
>>
Thanks,
Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-06 7:02 ` Ganapatrao Kulkarni
@ 2024-08-06 9:47 ` James Clark
2024-08-06 9:57 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-08-06 9:47 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger, james.clark, suzuki.poulose, Leo Yan, Al.Grant,
Mike Leach
On 06/08/2024 8:02 am, Ganapatrao Kulkarni wrote:
>
>
> On 05-08-2024 07:29 pm, James Clark wrote:
>>
>>
>> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>>
>>>
>>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>>
>>>>
>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>
>>>>>
>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>
>>>>>>
>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>
>>>>>>>>>> Hi James,
>>>>>>>>>>
>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>> To generate the instruction tracing, script uses 2
>>>>>>>>>>>> contiguous packets
>>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>>> discontiguous branch
>>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>>> tracing with the
>>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>>
>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>> remaining tracing
>>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>>> packets, if
>>>>>>>>>>>> continuity is established.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>
>>>>>>>>>>> Can you add a before and after example of what's changed to
>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>> this is adding missing output, or it was correcting the tail
>>>>>>>>>>> end of the output that was previously wrong.
>>>>>>>>>>
>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>> segfault of the perf application. With out this change the
>>>>>>>>>> perf segfaults with as below log
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ./perf script
>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>> address
>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>> in process_event
>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>> in print_disam
>>>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>>> stop_addr):
>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>>>> read_disam
>>>>>>>>>> disasm_output =
>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>> check_output
>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>> check=True,
>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>> event handler
>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>
>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>> <no Python frame>
>>>>>>>>>>
>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git
>>>>>>>>>>>> a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>> return
>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>> +
>>>>>>>>>>>
>>>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't
>>>>>>>>>>> it get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>
>>>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>
>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>>> previous one:
>>>>>>>>>
>>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>>
>>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but
>>>>>>>>> no return
>>>>>>>>>
>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>>>
>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>
>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>
>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>> return
>>>>>>>>
>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>
>>>>>>>> if (options.verbose == True):
>>>>>>>> print("Event type: %s" % name)
>>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>>
>>>>>>>> # Record for previous sample packet
>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>
>>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>> stop_addr=4
>>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>> inserted" % cpu)
>>>>>>>> return
>>>>>>>>
>>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>> + # Continuity of the Packets broken, set
>>>>>>>> start_addr to previous
>>>>>>>> + # packet ip to complete the remaining tracing of
>>>>>>>> the address range.
>>>>>>>> + start_addr = prev_ip
>>>>>>>> +
>>>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>>>> int(dso_end)):
>>>>>>>> print("Start address 0x%x is out of range [
>>>>>>>> 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>>> int(dso_end), dso))
>>>>>>>> return
>>>>>>>>
>>>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>>>> reference.
>>>>>>>>
>>>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm//
>>>>>>>> -C 1 dd if=/dev/zero of=/dev/null
>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>> objdump: error: the stop address should be after the start address
>>>>>>>> Traceback (most recent call last):
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>>>> process_event
>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>>>> print_disam
>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>> stop_addr):
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>> read_disam
>>>>>>>> disasm_output =
>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>> check_output
>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>> check=True,
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>>>> '--start-address=0xffff80008125b758',
>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>>>> non-zero exit status 1.
>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>> event handler
>>>>>>>> Python runtime state: initialized
>>>>>>>>
>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>> <no Python frame>
>>>>>>>>
>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>> Aborted (core dumped)
>>>>>>>>
>>>>>>>>
>>>>>>>> dump snippet:
>>>>>>>> ============
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>> [x21, #968]
>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>> event->clock();
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>> x30, [sp, #-16]!
>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>> return sched_clock();
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>> x30, [sp, #-32]!
>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>> x20, [sp, #16]
>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>> sp_el0
>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>> [x19, #16]
>>>>>>>> ffff8000801bb4c0: 11000400 add w0, w0,
>>>>>>>> #0x1
>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>> [x19, #16]
>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>> x30, [sp, #-64]!
>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>> x20, [sp, #16]
>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>> x20, #0x340
>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>> x24, [sp, #48]
>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>> x20, #0x8
>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>> #0x28 // #40
>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>> x22, [sp, #32]
>>>>>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>> w22, #0x1
>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>> w21, w24
>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>> x23, x21
>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>> [x19, #24]
>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93 cyc
>>>>>>>> = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>
>>>>>>>>
>>>>>>>> With fix:
>>>>>>>> =========
>>>>>>>>
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>> [x21, #968]
>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>> event->clock();
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>> x30, [sp, #-16]!
>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>> return sched_clock();
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>> x30, [sp, #-32]!
>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>> x20, [sp, #16]
>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>> sp_el0
>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>> [x19, #16]
>>>>>>>> ffff8000801bb4c0: 11000400 add w0, w0,
>>>>>>>> #0x1
>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>> [x19, #16]
>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>> x30, [sp, #-64]!
>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>> x20, [sp, #16]
>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>> x20, #0x340
>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>> x24, [sp, #48]
>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>> x20, #0x8
>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>> #0x28 // #40
>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>> x22, [sp, #32]
>>>>>>>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>> w22, #0x1
>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>> w21, w24
>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>> x23, x21
>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>> [x19, #24]
>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>
>>>>>>> It looks like the disassembly now assumes this BLR wasn't taken.
>>>>>>> We go from ffff80008125a8e4 straight through to ...
>>>>>>>
>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93 cyc
>>>>>>>> = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>> Event type: branches
>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>>> [x23, x21]
>>>>>>>
>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>> discontinuity without branch samples being generated where either
>>>>>>> the source or destination address is 0?
>>>>>>>
>>>>>>> What are your record options to create this issue? As I mentioned
>>>>>>> in the previous reply I haven't been able to reproduce it.
>>>>>>
>>>>>> I am using below perf record command.
>>>>>>
>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>> of=/dev/null
>>>>>>
>>>>>
>>>>> Thanks I managed to reproduce it. I'll take a look to see if I
>>>>> think the issue is somewhere else.
>>>>>
>>>>
>>>> At least for the failures I encountered, the issue is due to the
>>>> alternatives runtime instruction patching mechanism. vmlinux ends up
>>>> being the wrong image to decode with because a load of branches are
>>>> actually turned into nops.
>>>>
>>>> Can you confirm if you use --kcore instead of vmlinux that you still
>>>> get failures:
>>>>
>>>> sudo perf record -e cs_etm// -C 1 --kcore -o <output-folder.data>
>>>> -- \
>>>> dd if=/dev/zero of=/dev/null
>>>>
>>>> perf script -i <output-folder.data> \
>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d llvm-objdump \
>>>> -k <output-folder.data>/kcore_dir/kcore
>>>>
>>>
>>> With below command combination with kcore also the issue is seen, as
>>> reported in this email chain.
>>>
>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>> dd if=/dev/zero of=/dev/null
>>>
>>> ./perf script -i kcore/data \
>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>>> -d objdump -k kcore/kcore_dir/kcore
>>>
>>>
>>> However, with below sequence(same as your command) the issue is *not*
>>> seen.
>>>
>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>> dd if=/dev/zero of=/dev/null
>>>
>>> ./perf script -i kcore/data ./scripts/python/arm-cs-trace-disasm.py \
>>> -- -d objdump -k kcore/kcore_dir/kcore
>>>
>>> Do you see any issue with the command, which is showing the problem?
>>> Also the output log produced by these both commands is different.
>>>
BTW are you running this on the target or somewhere else? It's
suspicious that "-i kcore/data" works at all because there is no kernel
image given to Perf. Unless you are running on the target and then I
think it will just open the one from /proc. Or maybe it uses
/boot/vmlinux by default which also wouldn't work.
Also the difference between "--script=python:" and just giving the
script name is in the parsing of the arguments following " -- ".
Sometimes they're also parsed as Perf arguments (like the -v becomes
perf verbose and -k becomes the Perf vmlinux rather than the script).
I _think_ you want the " -- " when "--script" is used, and no "--" when
it's not. But there are some other combinations and you'll have to debug
it to compare your two exact scenarios to see why they're different.
But ignoring that issue with the argument format, you mentioned you
didn't see the issue any more with one version of --kcore. So I'm
assuming that confirms the issue is just a decode image issue, so we
shouldn't try to patch this script?
>>
>> Double check the command I gave. "-i" needs to be the same as "-o"
>> (it's the folder, not the data file). I think this could be causing
>> your issue. Unless you give it the folder it doesn't open kcore along
>> with the data file.
>>
>
> As per 'perf script --help'
>
> -i, --input=
> Input file name. (default: perf.data unless stdin is a fifo)
>
That could probably say "file name, or folder when --kcore is used", if
you mean that you think it's not accurate?
But when you use --kcore the default folder (not file) name is still
perf.data, so the default argument gives you a clue that you're not
supposed to descend into the folder.
> Also tried just giving dir as you suggested and still the same.
>
> ./perf script -i kcore
> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump -k
> kcore/kcore_dir/kcore
>
>>> The below diff that you have shared has no effect on the failing case.
>>>
>>>> But I still think bad decode detection should be moved as much as
>>>> possible into OpenCSD and Perf rather than this script. Otherwise
>>>> every tool will have to re-implement it, and OpenCSD has a lot more
>>>> info to make decisions with.
>>>>
>>>> One change we can make is to desynchronize when an N atom is an
>>>> unconditional branch:
>>>>
>>>> diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>> index c557998..3eefd5d 100644
>>>> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>> @@ -1341,6 +1341,14 @@ ocsd_err_t
>>>> TrcPktDecodeEtmV4I::processAtom(const ocsd_atm_val atom)
>>>> // save recorded next instuction address
>>>> ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
>>>>
>>>> + // must have lost sync if an unconditional branch wasn't taken
>>>> + if (atom == ATOM_N && !m_instr_info.is_conditional) {
>>>> + m_need_addr = true;
>>>> + m_out_elem.addElemType(m_index_curr_pkt,
>>>> OCSD_GEN_TRC_ELEM_NO_SYNC);
>>>> + // wait for next address
>>>> + return OCSD_OK;
>>>> + }
>>>> +
>>>>
>>>> Another one we can spot is when a new address comes that is before
>>>> the current decode address (basically the backwards check that you
>>>> added).
>>>>
>>>> There are probably others that can be spotted like an address
>>>> appearing after a direct branch that doesn't match the branch target.
>>>>
>>>> I think at that point, desynchronising should cause the disassembly
>>>> script to throw away the last bit, rather than force it to be
>>>> printed as in this patch. As I mentioned above in the thread, it
>>>> leads to printing disassembly that's implausible and misleading
>>>> (where an unconditional branch wasn't taken).
>>>
>
> Thanks,
> Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-06 9:47 ` James Clark
@ 2024-08-06 9:57 ` James Clark
2024-08-06 15:02 ` Steve Clevenger
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-08-06 9:57 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
scclevenger, james.clark, suzuki.poulose, Leo Yan, Al.Grant,
Mike Leach
On 06/08/2024 10:47 am, James Clark wrote:
>
>
> On 06/08/2024 8:02 am, Ganapatrao Kulkarni wrote:
>>
>>
>> On 05-08-2024 07:29 pm, James Clark wrote:
>>>
>>>
>>> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>>>
>>>>
>>>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>>>
>>>>>
>>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>>
>>>>>>
>>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi James,
>>>>>>>>>>>
>>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>> To generate the instruction tracing, script uses 2
>>>>>>>>>>>>> contiguous packets
>>>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>>>> discontiguous branch
>>>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>>>> tracing with the
>>>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>>> remaining tracing
>>>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>>>> packets, if
>>>>>>>>>>>>> continuity is established.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>>
>>>>>>>>>>>> Can you add a before and after example of what's changed to
>>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>>> this is adding missing output, or it was correcting the tail
>>>>>>>>>>>> end of the output that was previously wrong.
>>>>>>>>>>>
>>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>>> segfault of the perf application. With out this change the
>>>>>>>>>>> perf segfaults with as below log
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ./perf script
>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>> address
>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>> in process_event
>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>> in print_disam
>>>>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>>>> stop_addr):
>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99,
>>>>>>>>>>> in read_disam
>>>>>>>>>>> disasm_output =
>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>> check_output
>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>> check=True,
>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>>> event handler
>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>
>>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>
>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>> a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>> return
>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>>> +
>>>>>>>>>>>>
>>>>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't
>>>>>>>>>>>> it get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>>
>>>>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>>
>>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>>>> previous one:
>>>>>>>>>>
>>>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>>>
>>>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but
>>>>>>>>>> no return
>>>>>>>>>>
>>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>>>>
>>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>>
>>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>>
>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>> return
>>>>>>>>>
>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>>
>>>>>>>>> if (options.verbose == True):
>>>>>>>>> print("Event type: %s" % name)
>>>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>>>
>>>>>>>>> # Record for previous sample packet
>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>>
>>>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>>> stop_addr=4
>>>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>>> inserted" % cpu)
>>>>>>>>> return
>>>>>>>>>
>>>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>>> + # Continuity of the Packets broken, set
>>>>>>>>> start_addr to previous
>>>>>>>>> + # packet ip to complete the remaining tracing
>>>>>>>>> of the address range.
>>>>>>>>> + start_addr = prev_ip
>>>>>>>>> +
>>>>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>>>>> int(dso_end)):
>>>>>>>>> print("Start address 0x%x is out of range [
>>>>>>>>> 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>>>> int(dso_end), dso))
>>>>>>>>> return
>>>>>>>>>
>>>>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>>>>> reference.
>>>>>>>>>
>>>>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm//
>>>>>>>>> -C 1 dd if=/dev/zero of=/dev/null
>>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>> objdump: error: the stop address should be after the start address
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>>>>>>> process_event
>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>>>>>>> print_disam
>>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>> stop_addr):
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>>> read_disam
>>>>>>>>> disasm_output =
>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>> check_output
>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>> check=True,
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>>>>>>> '--start-address=0xffff80008125b758',
>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>>>>>>> non-zero exit status 1.
>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>> event handler
>>>>>>>>> Python runtime state: initialized
>>>>>>>>>
>>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>>> <no Python frame>
>>>>>>>>>
>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>> Aborted (core dumped)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> dump snippet:
>>>>>>>>> ============
>>>>>>>>> Event type: branches
>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>> [x21, #968]
>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>> event->clock();
>>>>>>>>> Event type: branches
>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>> return sched_clock();
>>>>>>>>> Event type: branches
>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>> x20, [sp, #16]
>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>> sp_el0
>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>> [x19, #16]
>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>> w0, #0x1
>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>> [x19, #16]
>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>> Event type: branches
>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>> x20, [sp, #16]
>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>> x20, #0x340
>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>> x24, [sp, #48]
>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>> x20, #0x8
>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>> #0x28 // #40
>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>> x22, [sp, #32]
>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>> [x20]
>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>> w22, #0x1
>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>> w21, w24
>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>> x23, x21
>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>> [x19, #24]
>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93 cyc
>>>>>>>>> = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>> Event type: branches
>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With fix:
>>>>>>>>> =========
>>>>>>>>>
>>>>>>>>> Event type: branches
>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>> [x21, #968]
>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>> event->clock();
>>>>>>>>> Event type: branches
>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>> return sched_clock();
>>>>>>>>> Event type: branches
>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>> x20, [sp, #16]
>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>> sp_el0
>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>> [x19, #16]
>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>> w0, #0x1
>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>> [x19, #16]
>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>> Event type: branches
>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>> x20, [sp, #16]
>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>> x20, #0x340
>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>> x24, [sp, #48]
>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>> x20, #0x8
>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>> #0x28 // #40
>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>> x22, [sp, #32]
>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>> [x20]
>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>> w22, #0x1
>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>> w21, w24
>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>> x23, x21
>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>> [x19, #24]
>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>
>>>>>>>> It looks like the disassembly now assumes this BLR wasn't taken.
>>>>>>>> We go from ffff80008125a8e4 straight through to ...
>>>>>>>>
>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93 cyc
>>>>>>>>> = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>> Event type: branches
>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>>>>>>>>> period: 1 time: 5986372298040 }
>>>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>>>> [x23, x21]
>>>>>>>>
>>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>>> discontinuity without branch samples being generated where
>>>>>>>> either the source or destination address is 0?
>>>>>>>>
>>>>>>>> What are your record options to create this issue? As I
>>>>>>>> mentioned in the previous reply I haven't been able to reproduce
>>>>>>>> it.
>>>>>>>
>>>>>>> I am using below perf record command.
>>>>>>>
>>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>>> of=/dev/null
>>>>>>>
>>>>>>
>>>>>> Thanks I managed to reproduce it. I'll take a look to see if I
>>>>>> think the issue is somewhere else.
>>>>>>
>>>>>
>>>>> At least for the failures I encountered, the issue is due to the
>>>>> alternatives runtime instruction patching mechanism. vmlinux ends
>>>>> up being the wrong image to decode with because a load of branches
>>>>> are actually turned into nops.
>>>>>
>>>>> Can you confirm if you use --kcore instead of vmlinux that you
>>>>> still get failures:
>>>>>
>>>>> sudo perf record -e cs_etm// -C 1 --kcore -o
>>>>> <output-folder.data> -- \
>>>>> dd if=/dev/zero of=/dev/null
>>>>>
>>>>> perf script -i <output-folder.data> \
>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d
>>>>> llvm-objdump \
>>>>> -k <output-folder.data>/kcore_dir/kcore
>>>>>
>>>>
>>>> With below command combination with kcore also the issue is seen, as
>>>> reported in this email chain.
>>>>
>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>> dd if=/dev/zero of=/dev/null
>>>>
>>>> ./perf script -i kcore/data \
>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>>>> -d objdump -k kcore/kcore_dir/kcore
>>>>
>>>>
>>>> However, with below sequence(same as your command) the issue is
>>>> *not* seen.
>>>>
>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>> dd if=/dev/zero of=/dev/null
>>>>
>>>> ./perf script -i kcore/data ./scripts/python/arm-cs-trace-disasm.py \
>>>> -- -d objdump -k kcore/kcore_dir/kcore
>>>>
>>>> Do you see any issue with the command, which is showing the problem?
>>>> Also the output log produced by these both commands is different.
>>>>
>
> BTW are you running this on the target or somewhere else? It's
> suspicious that "-i kcore/data" works at all because there is no kernel
> image given to Perf. Unless you are running on the target and then I
> think it will just open the one from /proc. Or maybe it uses
> /boot/vmlinux by default which also wouldn't work.
>
> Also the difference between "--script=python:" and just giving the
> script name is in the parsing of the arguments following " -- ".
> Sometimes they're also parsed as Perf arguments (like the -v becomes
> perf verbose and -k becomes the Perf vmlinux rather than the script).
>
> I _think_ you want the " -- " when "--script" is used, and no "--" when
> it's not. But there are some other combinations and you'll have to debug
> it to compare your two exact scenarios to see why they're different.
>
> But ignoring that issue with the argument format, you mentioned you
> didn't see the issue any more with one version of --kcore. So I'm
> assuming that confirms the issue is just a decode image issue, so we
> shouldn't try to patch this script?
>
Although one change we should make to the script is change the example
to use kcore. We can leave in one vmlinux one as an example if kcore
isn't available, but add a note that it will fail if any patched code is
traced (which is almost always).
And make the other fixes to OpenCSD to stop it from making samples that
go backwards. That will fix the hard exit on the script and turn it into
a regular descynchronise.
>>>
>>> Double check the command I gave. "-i" needs to be the same as "-o"
>>> (it's the folder, not the data file). I think this could be causing
>>> your issue. Unless you give it the folder it doesn't open kcore along
>>> with the data file.
>>>
>>
>> As per 'perf script --help'
>>
>> -i, --input=
>> Input file name. (default: perf.data unless stdin is a fifo)
>>
>
> That could probably say "file name, or folder when --kcore is used", if
> you mean that you think it's not accurate?
>
> But when you use --kcore the default folder (not file) name is still
> perf.data, so the default argument gives you a clue that you're not
> supposed to descend into the folder.
>
>> Also tried just giving dir as you suggested and still the same.
>>
>> ./perf script -i kcore
>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump
>> -k kcore/kcore_dir/kcore
>>
>>>> The below diff that you have shared has no effect on the failing case.
>>>>
>>>>> But I still think bad decode detection should be moved as much as
>>>>> possible into OpenCSD and Perf rather than this script. Otherwise
>>>>> every tool will have to re-implement it, and OpenCSD has a lot more
>>>>> info to make decisions with.
>>>>>
>>>>> One change we can make is to desynchronize when an N atom is an
>>>>> unconditional branch:
>>>>>
>>>>> diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>>> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>>> index c557998..3eefd5d 100644
>>>>> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>>> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>>> @@ -1341,6 +1341,14 @@ ocsd_err_t
>>>>> TrcPktDecodeEtmV4I::processAtom(const ocsd_atm_val atom)
>>>>> // save recorded next instuction address
>>>>> ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
>>>>>
>>>>> + // must have lost sync if an unconditional branch wasn't
>>>>> taken
>>>>> + if (atom == ATOM_N && !m_instr_info.is_conditional) {
>>>>> + m_need_addr = true;
>>>>> + m_out_elem.addElemType(m_index_curr_pkt,
>>>>> OCSD_GEN_TRC_ELEM_NO_SYNC);
>>>>> + // wait for next address
>>>>> + return OCSD_OK;
>>>>> + }
>>>>> +
>>>>>
>>>>> Another one we can spot is when a new address comes that is before
>>>>> the current decode address (basically the backwards check that you
>>>>> added).
>>>>>
>>>>> There are probably others that can be spotted like an address
>>>>> appearing after a direct branch that doesn't match the branch target.
>>>>>
>>>>> I think at that point, desynchronising should cause the disassembly
>>>>> script to throw away the last bit, rather than force it to be
>>>>> printed as in this patch. As I mentioned above in the thread, it
>>>>> leads to printing disassembly that's implausible and misleading
>>>>> (where an unconditional branch wasn't taken).
>>>>
>>
>> Thanks,
>> Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-06 9:57 ` James Clark
@ 2024-08-06 15:02 ` Steve Clevenger
2024-08-06 16:14 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: Steve Clevenger @ 2024-08-06 15:02 UTC (permalink / raw)
To: James Clark, Ganapatrao Kulkarni
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Leo Yan, Al.Grant, Mike Leach
On 8/6/2024 2:57 AM, James Clark wrote:
>
>
> On 06/08/2024 10:47 am, James Clark wrote:
>>
>>
>> On 06/08/2024 8:02 am, Ganapatrao Kulkarni wrote:
>>>
>>>
>>> On 05-08-2024 07:29 pm, James Clark wrote:
>>>>
>>>>
>>>> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>>>>
>>>>>
>>>>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>>>>
>>>>>>
>>>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi James,
>>>>>>>>>>>>
>>>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>> To generate the instruction tracing, script uses 2
>>>>>>>>>>>>>> contiguous packets
>>>>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>>>>> discontiguous branch
>>>>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>>>>> tracing with the
>>>>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>>>> remaining tracing
>>>>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>>>>> packets, if
>>>>>>>>>>>>>> continuity is established.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you add a before and after example of what's changed to
>>>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>>>> this is adding missing output, or it was correcting the
>>>>>>>>>>>>> tail end of the output that was previously wrong.
>>>>>>>>>>>>
>>>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>>>> segfault of the perf application. With out this change the
>>>>>>>>>>>> perf segfaults with as below log
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ./perf script
>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py --
>>>>>>>>>>>> -d objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>> address
>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>>> in process_event
>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>>> in print_disam
>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start,
>>>>>>>>>>>> start_addr, stop_addr):
>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99,
>>>>>>>>>>>> in read_disam
>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>> check_output
>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>> check=True,
>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python
>>>>>>>>>>>> trace event handler
>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>
>>>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>
>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>> a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't
>>>>>>>>>>>>> it get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>>>
>>>>>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>>>
>>>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>>>>> previous one:
>>>>>>>>>>>
>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>>>>
>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but
>>>>>>>>>>> no return
>>>>>>>>>>>
>>>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>>>>>
>>>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>>>
>>>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>>>
>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>> return
>>>>>>>>>>
>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>>>
>>>>>>>>>> if (options.verbose == True):
>>>>>>>>>> print("Event type: %s" % name)
>>>>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>>>>
>>>>>>>>>> # Record for previous sample packet
>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>>>
>>>>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>>>> stop_addr=4
>>>>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>>>> inserted" % cpu)
>>>>>>>>>> return
>>>>>>>>>>
>>>>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>>>> + # Continuity of the Packets broken, set
>>>>>>>>>> start_addr to previous
>>>>>>>>>> + # packet ip to complete the remaining tracing
>>>>>>>>>> of the address range.
>>>>>>>>>> + start_addr = prev_ip
>>>>>>>>>> +
>>>>>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>>>>>> int(dso_end)):
>>>>>>>>>> print("Start address 0x%x is out of range [
>>>>>>>>>> 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>>>>> int(dso_end), dso))
>>>>>>>>>> return
>>>>>>>>>>
>>>>>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>>>>>> reference.
>>>>>>>>>>
>>>>>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e
>>>>>>>>>> cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>> address
>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>> in process_event
>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>> in print_disam
>>>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>>> stop_addr):
>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>>>> read_disam
>>>>>>>>>> disasm_output =
>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>> check_output
>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>> check=True,
>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>> event handler
>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>
>>>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>>>> <no Python frame>
>>>>>>>>>>
>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> dump snippet:
>>>>>>>>>> ============
>>>>>>>>>> Event type: branches
>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>> [x21, #968]
>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>> event->clock();
>>>>>>>>>> Event type: branches
>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>> return sched_clock();
>>>>>>>>>> Event type: branches
>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>>> sp_el0
>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>> [x19, #16]
>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>> w0, #0x1
>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>> [x19, #16]
>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>> Event type: branches
>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>>> x20, #0x340
>>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>>> x20, #0x8
>>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>>> #0x28 // #40
>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>>> [x20]
>>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>>> w22, #0x1
>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>>> w21, w24
>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>>> x23, x21
>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>> [x19, #24]
>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>> Event type: branches
>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> With fix:
>>>>>>>>>> =========
>>>>>>>>>>
>>>>>>>>>> Event type: branches
>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>> [x21, #968]
>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>> event->clock();
>>>>>>>>>> Event type: branches
>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>> return sched_clock();
>>>>>>>>>> Event type: branches
>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>>> sp_el0
>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>> [x19, #16]
>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>> w0, #0x1
>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>> [x19, #16]
>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>> Event type: branches
>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>>> x20, #0x340
>>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>>> x20, #0x8
>>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>>> #0x28 // #40
>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>>> [x20]
>>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>>> w22, #0x1
>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>>> w21, w24
>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>>> x23, x21
>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>> [x19, #24]
>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>
>>>>>>>>> It looks like the disassembly now assumes this BLR wasn't
>>>>>>>>> taken. We go from ffff80008125a8e4 straight through to ...
>>>>>>>>>
>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>> Event type: branches
>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>>>>> [x23, x21]
>>>>>>>>>
>>>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>>>> discontinuity without branch samples being generated where
>>>>>>>>> either the source or destination address is 0?
>>>>>>>>>
>>>>>>>>> What are your record options to create this issue? As I
>>>>>>>>> mentioned in the previous reply I haven't been able to
>>>>>>>>> reproduce it.
>>>>>>>>
>>>>>>>> I am using below perf record command.
>>>>>>>>
>>>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>>>> of=/dev/null
>>>>>>>>
>>>>>>>
>>>>>>> Thanks I managed to reproduce it. I'll take a look to see if I
>>>>>>> think the issue is somewhere else.
>>>>>>>
>>>>>>
>>>>>> At least for the failures I encountered, the issue is due to the
>>>>>> alternatives runtime instruction patching mechanism. vmlinux ends
>>>>>> up being the wrong image to decode with because a load of branches
>>>>>> are actually turned into nops.
>>>>>>
>>>>>> Can you confirm if you use --kcore instead of vmlinux that you
>>>>>> still get failures:
>>>>>>
>>>>>> sudo perf record -e cs_etm// -C 1 --kcore -o
>>>>>> <output-folder.data> -- \
>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>
>>>>>> perf script -i <output-folder.data> \
>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d
>>>>>> llvm-objdump \
>>>>>> -k <output-folder.data>/kcore_dir/kcore
>>>>>>
>>>>>
>>>>> With below command combination with kcore also the issue is seen,
>>>>> as reported in this email chain.
>>>>>
>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>> dd if=/dev/zero of=/dev/null
>>>>>
>>>>> ./perf script -i kcore/data \
>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>>>>> -d objdump -k kcore/kcore_dir/kcore
>>>>>
>>>>>
>>>>> However, with below sequence(same as your command) the issue is
>>>>> *not* seen.
>>>>>
>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>> dd if=/dev/zero of=/dev/null
>>>>>
>>>>> ./perf script -i kcore/data ./scripts/python/arm-cs-trace-disasm.py \
>>>>> -- -d objdump -k kcore/kcore_dir/kcore
>>>>>
>>>>> Do you see any issue with the command, which is showing the problem?
>>>>> Also the output log produced by these both commands is different.
>>>>>
>>
>> BTW are you running this on the target or somewhere else? It's
>> suspicious that "-i kcore/data" works at all because there is no
>> kernel image given to Perf. Unless you are running on the target and
>> then I think it will just open the one from /proc. Or maybe it uses
>> /boot/vmlinux by default which also wouldn't work.
>>
>> Also the difference between "--script=python:" and just giving the
>> script name is in the parsing of the arguments following " -- ".
>> Sometimes they're also parsed as Perf arguments (like the -v becomes
>> perf verbose and -k becomes the Perf vmlinux rather than the script).
>>
>> I _think_ you want the " -- " when "--script" is used, and no "--"
>> when it's not. But there are some other combinations and you'll have
>> to debug it to compare your two exact scenarios to see why they're
>> different.
>>
>> But ignoring that issue with the argument format, you mentioned you
>> didn't see the issue any more with one version of --kcore. So I'm
>> assuming that confirms the issue is just a decode image issue, so we
>> shouldn't try to patch this script?
>>
>
> Although one change we should make to the script is change the example
> to use kcore. We can leave in one vmlinux one as an example if kcore
> isn't available, but add a note that it will fail if any patched code is
> traced (which is almost always).
James, you may recall the year old thread
https://lore.kernel.org/all/ed8cea4c-a261-60ca-f4e1-333ec73cca8f@os.amperecomputing.com.
I described there an awkward workaround Ampere has used to solve the
patched code problem. At the time, it sounded like the maintainers were
interested in getting away from using the python script, mostly for
speed purposes. I didn't see the discussion go any further.
>
> And make the other fixes to OpenCSD to stop it from making samples that
> go backwards. That will fix the hard exit on the script and turn it into
> a regular descynchronise.
>
>>>>
>>>> Double check the command I gave. "-i" needs to be the same as "-o"
>>>> (it's the folder, not the data file). I think this could be causing
>>>> your issue. Unless you give it the folder it doesn't open kcore
>>>> along with the data file.
>>>>
>>>
>>> As per 'perf script --help'
>>>
>>> -i, --input=
>>> Input file name. (default: perf.data unless stdin is a fifo)
>>>
>>
>> That could probably say "file name, or folder when --kcore is used",
>> if you mean that you think it's not accurate?
>>
>> But when you use --kcore the default folder (not file) name is still
>> perf.data, so the default argument gives you a clue that you're not
>> supposed to descend into the folder.
>>
>>> Also tried just giving dir as you suggested and still the same.
>>>
>>> ./perf script -i kcore
>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump
>>> -k kcore/kcore_dir/kcore
>>>
>>>>> The below diff that you have shared has no effect on the failing case.
>>>>>
>>>>>> But I still think bad decode detection should be moved as much as
>>>>>> possible into OpenCSD and Perf rather than this script. Otherwise
>>>>>> every tool will have to re-implement it, and OpenCSD has a lot
>>>>>> more info to make decisions with.
>>>>>>
>>>>>> One change we can make is to desynchronize when an N atom is an
>>>>>> unconditional branch:
>>>>>>
>>>>>> diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>>>> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>>>> index c557998..3eefd5d 100644
>>>>>> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>>>> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>>>>>> @@ -1341,6 +1341,14 @@ ocsd_err_t
>>>>>> TrcPktDecodeEtmV4I::processAtom(const ocsd_atm_val atom)
>>>>>> // save recorded next instuction address
>>>>>> ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
>>>>>>
>>>>>> + // must have lost sync if an unconditional branch wasn't
>>>>>> taken
>>>>>> + if (atom == ATOM_N && !m_instr_info.is_conditional) {
>>>>>> + m_need_addr = true;
>>>>>> + m_out_elem.addElemType(m_index_curr_pkt,
>>>>>> OCSD_GEN_TRC_ELEM_NO_SYNC);
>>>>>> + // wait for next address
>>>>>> + return OCSD_OK;
>>>>>> + }
>>>>>> +
>>>>>>
>>>>>> Another one we can spot is when a new address comes that is before
>>>>>> the current decode address (basically the backwards check that you
>>>>>> added).
>>>>>>
>>>>>> There are probably others that can be spotted like an address
>>>>>> appearing after a direct branch that doesn't match the branch target.
>>>>>>
>>>>>> I think at that point, desynchronising should cause the
>>>>>> disassembly script to throw away the last bit, rather than force
>>>>>> it to be printed as in this patch. As I mentioned above in the
>>>>>> thread, it leads to printing disassembly that's implausible and
>>>>>> misleading (where an unconditional branch wasn't taken).
>>>>>
>>>
>>> Thanks,
>>> Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-06 15:02 ` Steve Clevenger
@ 2024-08-06 16:14 ` James Clark
2024-08-07 12:17 ` Ganapatrao Kulkarni
2024-08-08 7:54 ` Leo Yan
0 siblings, 2 replies; 45+ messages in thread
From: James Clark @ 2024-08-06 16:14 UTC (permalink / raw)
To: scclevenger, Ganapatrao Kulkarni
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Leo Yan, Al.Grant, Mike Leach
On 06/08/2024 4:02 pm, Steve Clevenger wrote:
>
>
> On 8/6/2024 2:57 AM, James Clark wrote:
>>
>>
>> On 06/08/2024 10:47 am, James Clark wrote:
>>>
>>>
>>> On 06/08/2024 8:02 am, Ganapatrao Kulkarni wrote:
>>>>
>>>>
>>>> On 05-08-2024 07:29 pm, James Clark wrote:
>>>>>
>>>>>
>>>>> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>>>>>
>>>>>>
>>>>>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi James,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>> To generate the instruction tracing, script uses 2
>>>>>>>>>>>>>>> contiguous packets
>>>>>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>>>>>> discontiguous branch
>>>>>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>>>>>> tracing with the
>>>>>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>>>>> remaining tracing
>>>>>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>>>>>> packets, if
>>>>>>>>>>>>>>> continuity is established.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you add a before and after example of what's changed to
>>>>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>>>>> this is adding missing output, or it was correcting the
>>>>>>>>>>>>>> tail end of the output that was previously wrong.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>>>>> segfault of the perf application. With out this change the
>>>>>>>>>>>>> perf segfaults with as below log
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ./perf script
>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py --
>>>>>>>>>>>>> -d objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>> address
>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start,
>>>>>>>>>>>>> start_addr, stop_addr):
>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99,
>>>>>>>>>>>>> in read_disam
>>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>>> check_output
>>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python
>>>>>>>>>>>>> trace event handler
>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>
>>>>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>> a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't
>>>>>>>>>>>>>> it get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>>>>
>>>>>>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>>>>
>>>>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>>>>>> previous one:
>>>>>>>>>>>>
>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>>>>>
>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but
>>>>>>>>>>>> no return
>>>>>>>>>>>>
>>>>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>>>>>>
>>>>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>>>>
>>>>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>> return
>>>>>>>>>>>
>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>>>>
>>>>>>>>>>> if (options.verbose == True):
>>>>>>>>>>> print("Event type: %s" % name)
>>>>>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>>>>>
>>>>>>>>>>> # Record for previous sample packet
>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>>>>
>>>>>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>>>>> stop_addr=4
>>>>>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>>>>> inserted" % cpu)
>>>>>>>>>>> return
>>>>>>>>>>>
>>>>>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>>>>> + # Continuity of the Packets broken, set
>>>>>>>>>>> start_addr to previous
>>>>>>>>>>> + # packet ip to complete the remaining tracing
>>>>>>>>>>> of the address range.
>>>>>>>>>>> + start_addr = prev_ip
>>>>>>>>>>> +
>>>>>>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>>>>>>> int(dso_end)):
>>>>>>>>>>> print("Start address 0x%x is out of range [
>>>>>>>>>>> 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>>>>>> int(dso_end), dso))
>>>>>>>>>>> return
>>>>>>>>>>>
>>>>>>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>>>>>>> reference.
>>>>>>>>>>>
>>>>>>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e
>>>>>>>>>>> cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>> address
>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>> in process_event
>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>> in print_disam
>>>>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>>>> stop_addr):
>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>>>>> read_disam
>>>>>>>>>>> disasm_output =
>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>> check_output
>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>> check=True,
>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>>> event handler
>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>
>>>>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>
>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> dump snippet:
>>>>>>>>>>> ============
>>>>>>>>>>> Event type: branches
>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>> [x21, #968]
>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>> event->clock();
>>>>>>>>>>> Event type: branches
>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>> return sched_clock();
>>>>>>>>>>> Event type: branches
>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>>>> sp_el0
>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>> [x19, #16]
>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>> w0, #0x1
>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>> [x19, #16]
>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>> Event type: branches
>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>>>> x20, #0x340
>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>>>> x20, #0x8
>>>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>>>> [x20]
>>>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>>>> w22, #0x1
>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>>>> w21, w24
>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>>>> x23, x21
>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>> [x19, #24]
>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>> Event type: branches
>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> With fix:
>>>>>>>>>>> =========
>>>>>>>>>>>
>>>>>>>>>>> Event type: branches
>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>> [x21, #968]
>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>> event->clock();
>>>>>>>>>>> Event type: branches
>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>> return sched_clock();
>>>>>>>>>>> Event type: branches
>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>>>> sp_el0
>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>> [x19, #16]
>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>> w0, #0x1
>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>> [x19, #16]
>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>> Event type: branches
>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>>>> x20, #0x340
>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>>>> x20, #0x8
>>>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>>>> [x20]
>>>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>>>> w22, #0x1
>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>>>> w21, w24
>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>>>> x23, x21
>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>> [x19, #24]
>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>
>>>>>>>>>> It looks like the disassembly now assumes this BLR wasn't
>>>>>>>>>> taken. We go from ffff80008125a8e4 straight through to ...
>>>>>>>>>>
>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>> Event type: branches
>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>>>>>> [x23, x21]
>>>>>>>>>>
>>>>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>>>>> discontinuity without branch samples being generated where
>>>>>>>>>> either the source or destination address is 0?
>>>>>>>>>>
>>>>>>>>>> What are your record options to create this issue? As I
>>>>>>>>>> mentioned in the previous reply I haven't been able to
>>>>>>>>>> reproduce it.
>>>>>>>>>
>>>>>>>>> I am using below perf record command.
>>>>>>>>>
>>>>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>>>>> of=/dev/null
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks I managed to reproduce it. I'll take a look to see if I
>>>>>>>> think the issue is somewhere else.
>>>>>>>>
>>>>>>>
>>>>>>> At least for the failures I encountered, the issue is due to the
>>>>>>> alternatives runtime instruction patching mechanism. vmlinux ends
>>>>>>> up being the wrong image to decode with because a load of branches
>>>>>>> are actually turned into nops.
>>>>>>>
>>>>>>> Can you confirm if you use --kcore instead of vmlinux that you
>>>>>>> still get failures:
>>>>>>>
>>>>>>> sudo perf record -e cs_etm// -C 1 --kcore -o
>>>>>>> <output-folder.data> -- \
>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>
>>>>>>> perf script -i <output-folder.data> \
>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d
>>>>>>> llvm-objdump \
>>>>>>> -k <output-folder.data>/kcore_dir/kcore
>>>>>>>
>>>>>>
>>>>>> With below command combination with kcore also the issue is seen,
>>>>>> as reported in this email chain.
>>>>>>
>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>
>>>>>> ./perf script -i kcore/data \
>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>>>>>> -d objdump -k kcore/kcore_dir/kcore
>>>>>>
>>>>>>
>>>>>> However, with below sequence(same as your command) the issue is
>>>>>> *not* seen.
>>>>>>
>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>
>>>>>> ./perf script -i kcore/data ./scripts/python/arm-cs-trace-disasm.py \
>>>>>> -- -d objdump -k kcore/kcore_dir/kcore
>>>>>>
>>>>>> Do you see any issue with the command, which is showing the problem?
>>>>>> Also the output log produced by these both commands is different.
>>>>>>
>>>
>>> BTW are you running this on the target or somewhere else? It's
>>> suspicious that "-i kcore/data" works at all because there is no
>>> kernel image given to Perf. Unless you are running on the target and
>>> then I think it will just open the one from /proc. Or maybe it uses
>>> /boot/vmlinux by default which also wouldn't work.
>>>
>>> Also the difference between "--script=python:" and just giving the
>>> script name is in the parsing of the arguments following " -- ".
>>> Sometimes they're also parsed as Perf arguments (like the -v becomes
>>> perf verbose and -k becomes the Perf vmlinux rather than the script).
>>>
>>> I _think_ you want the " -- " when "--script" is used, and no "--"
>>> when it's not. But there are some other combinations and you'll have
>>> to debug it to compare your two exact scenarios to see why they're
>>> different.
>>>
>>> But ignoring that issue with the argument format, you mentioned you
>>> didn't see the issue any more with one version of --kcore. So I'm
>>> assuming that confirms the issue is just a decode image issue, so we
>>> shouldn't try to patch this script?
>>>
>>
>> Although one change we should make to the script is change the example
>> to use kcore. We can leave in one vmlinux one as an example if kcore
>> isn't available, but add a note that it will fail if any patched code is
>> traced (which is almost always).
>
> James, you may recall the year old thread
> https://lore.kernel.org/all/ed8cea4c-a261-60ca-f4e1-333ec73cca8f@os.amperecomputing.com.
> I described there an awkward workaround Ampere has used to solve the
> patched code problem. At the time, it sounded like the maintainers were
> interested in getting away from using the python script, mostly for
> speed purposes. I didn't see the discussion go any further.
>
Oh yes thanks for the reminder. I wasn't thinking about the source code
lines and debug symbols in this thread. I suppose your merging of kcore
and vmlinux gives both the correct image and the symbols, but I was only
focused on the image being correct, so only kcore was enough.
It looks like everything we want to do from your previous thread is in
addition to the fixes from this one. Even if we auto merge kcore +
symbols and move the disassembly into Perf, we still want to detect
decode issues earlier and not have IPs jumping backwards. Whether it's
the script or Perf doing that the behavior should be the same.
To summarise I think these are the changes to make:
* Improve bad decode detection in OpenCSD
* Get the script to auto merge kcore and vmlinux
* Maybe we could get Perf to do this if both a kcore folder and -k
vmlinux are used?
* Improve the performance, either in the script or move more
functionality into Perf
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-06 16:14 ` James Clark
@ 2024-08-07 12:17 ` Ganapatrao Kulkarni
2024-08-07 14:53 ` James Clark
2024-08-08 7:54 ` Leo Yan
1 sibling, 1 reply; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-08-07 12:17 UTC (permalink / raw)
To: James Clark, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Leo Yan, Al.Grant, Mike Leach
On 06-08-2024 09:44 pm, James Clark wrote:
>
>
> On 06/08/2024 4:02 pm, Steve Clevenger wrote:
>>
>>
>> On 8/6/2024 2:57 AM, James Clark wrote:
>>>
>>>
>>> On 06/08/2024 10:47 am, James Clark wrote:
>>>>
>>>>
>>>> On 06/08/2024 8:02 am, Ganapatrao Kulkarni wrote:
>>>>>
>>>>>
>>>>> On 05-08-2024 07:29 pm, James Clark wrote:
>>>>>>
>>>>>>
>>>>>> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi James,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>>> To generate the instruction tracing, script uses 2
>>>>>>>>>>>>>>>> contiguous packets
>>>>>>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>>>>>>> discontiguous branch
>>>>>>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>>>>>>> tracing with the
>>>>>>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>>>>>> remaining tracing
>>>>>>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>>>>>>> packets, if
>>>>>>>>>>>>>>>> continuity is established.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you add a before and after example of what's changed to
>>>>>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>>>>>> this is adding missing output, or it was correcting the
>>>>>>>>>>>>>>> tail end of the output that was previously wrong.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>>>>>> segfault of the perf application. With out this change the
>>>>>>>>>>>>>> perf segfaults with as below log
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ./perf script
>>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py --
>>>>>>>>>>>>>> -d objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>>> address
>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start,
>>>>>>>>>>>>>> start_addr, stop_addr):
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99,
>>>>>>>>>>>>>> in read_disam
>>>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>>>> check_output
>>>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571,
>>>>>>>>>>>>>> in run
>>>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python
>>>>>>>>>>>>>> trace event handler
>>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>> a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't
>>>>>>>>>>>>>>> it get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>>>>>>> previous one:
>>>>>>>>>>>>>
>>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>>>>>>
>>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but
>>>>>>>>>>>>> no return
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>>>>>
>>>>>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>> return
>>>>>>>>>>>>
>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>>>>>
>>>>>>>>>>>> if (options.verbose == True):
>>>>>>>>>>>> print("Event type: %s" % name)
>>>>>>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>>>>>>
>>>>>>>>>>>> # Record for previous sample packet
>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>>>>>
>>>>>>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>>>>>> stop_addr=4
>>>>>>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>>>>>> inserted" % cpu)
>>>>>>>>>>>> return
>>>>>>>>>>>>
>>>>>>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>>>>>> + # Continuity of the Packets broken, set
>>>>>>>>>>>> start_addr to previous
>>>>>>>>>>>> + # packet ip to complete the remaining tracing
>>>>>>>>>>>> of the address range.
>>>>>>>>>>>> + start_addr = prev_ip
>>>>>>>>>>>> +
>>>>>>>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>>>>>>>> int(dso_end)):
>>>>>>>>>>>> print("Start address 0x%x is out of range [
>>>>>>>>>>>> 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>>>>>>> int(dso_end), dso))
>>>>>>>>>>>> return
>>>>>>>>>>>>
>>>>>>>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>>>>>>>> reference.
>>>>>>>>>>>>
>>>>>>>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e
>>>>>>>>>>>> cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>> address
>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>>> in process_event
>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>>> in print_disam
>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>>>>> stop_addr):
>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>>>>>> read_disam
>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>> check_output
>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>> check=True,
>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in
>>>>>>>>>>>> run
>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>>>> event handler
>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>
>>>>>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>
>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> dump snippet:
>>>>>>>>>>>> ============
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>>> event->clock();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>> ffff80008030cb08: 910003fd mov
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>>>>> sp_el0
>>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>> ffff80008125a8b0: 910003fd mov
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>>>>> [x20]
>>>>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>>>>> w21, w24
>>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>>>>> x23, x21
>>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> With fix:
>>>>>>>>>>>> =========
>>>>>>>>>>>>
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>>> event->clock();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>> ffff80008030cb08: 910003fd mov
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>>>>> sp_el0
>>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>> ffff80008125a8b0: 910003fd mov
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>>>>> [x20]
>>>>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>>>>> w21, w24
>>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>>>>> x23, x21
>>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>>
>>>>>>>>>>> It looks like the disassembly now assumes this BLR wasn't
>>>>>>>>>>> taken. We go from ffff80008125a8e4 straight through to ...
>>>>>>>>>>>
>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>>>>>>> [x23, x21]
>>>>>>>>>>>
>>>>>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>>>>>> discontinuity without branch samples being generated where
>>>>>>>>>>> either the source or destination address is 0?
>>>>>>>>>>>
>>>>>>>>>>> What are your record options to create this issue? As I
>>>>>>>>>>> mentioned in the previous reply I haven't been able to
>>>>>>>>>>> reproduce it.
>>>>>>>>>>
>>>>>>>>>> I am using below perf record command.
>>>>>>>>>>
>>>>>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>>>>>> of=/dev/null
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks I managed to reproduce it. I'll take a look to see if I
>>>>>>>>> think the issue is somewhere else.
>>>>>>>>>
>>>>>>>>
>>>>>>>> At least for the failures I encountered, the issue is due to the
>>>>>>>> alternatives runtime instruction patching mechanism. vmlinux ends
>>>>>>>> up being the wrong image to decode with because a load of branches
>>>>>>>> are actually turned into nops.
>>>>>>>>
>>>>>>>> Can you confirm if you use --kcore instead of vmlinux that you
>>>>>>>> still get failures:
>>>>>>>>
>>>>>>>> sudo perf record -e cs_etm// -C 1 --kcore -o
>>>>>>>> <output-folder.data> -- \
>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>
>>>>>>>> perf script -i <output-folder.data> \
>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d
>>>>>>>> llvm-objdump \
>>>>>>>> -k <output-folder.data>/kcore_dir/kcore
>>>>>>>>
>>>>>>>
>>>>>>> With below command combination with kcore also the issue is seen,
>>>>>>> as reported in this email chain.
>>>>>>>
>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>
>>>>>>> ./perf script -i kcore/data \
>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>>>>>>> -d objdump -k kcore/kcore_dir/kcore
>>>>>>>
>>>>>>>
>>>>>>> However, with below sequence(same as your command) the issue is
>>>>>>> *not* seen.
>>>>>>>
>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>
>>>>>>> ./perf script -i kcore/data
>>>>>>> ./scripts/python/arm-cs-trace-disasm.py \
>>>>>>> -- -d objdump -k kcore/kcore_dir/kcore
>>>>>>>
>>>>>>> Do you see any issue with the command, which is showing the problem?
>>>>>>> Also the output log produced by these both commands is different.
>>>>>>>
>>>>
>>>> BTW are you running this on the target or somewhere else? It's
>>>> suspicious that "-i kcore/data" works at all because there is no
>>>> kernel image given to Perf. Unless you are running on the target and
>>>> then I think it will just open the one from /proc. Or maybe it uses
>>>> /boot/vmlinux by default which also wouldn't work.
>>>>
Yes All tests are done natively on Ampere's ARM64 platform.
some more combination of commands which are also failing.
./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -- -d
objdump -k kcore/kcore_dir/kcore
./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -F
cpu,event,ip,addr,sym -- -d objdump -k kcore/kcore_dir/kcore
./perf script -i ./kcore scripts/python/arm-cs-trace-disasm.py -d
objdump -k kcore/kcore_dir/kcore
>>>> Also the difference between "--script=python:" and just giving the
>>>> script name is in the parsing of the arguments following " -- ".
>>>> Sometimes they're also parsed as Perf arguments (like the -v becomes
>>>> perf verbose and -k becomes the Perf vmlinux rather than the script).
>>>>
>>>> I _think_ you want the " -- " when "--script" is used, and no "--"
>>>> when it's not. But there are some other combinations and you'll have
>>>> to debug it to compare your two exact scenarios to see why they're
>>>> different.
>>>>
>>>> But ignoring that issue with the argument format, you mentioned you
>>>> didn't see the issue any more with one version of --kcore. So I'm
>>>> assuming that confirms the issue is just a decode image issue, so we
>>>> shouldn't try to patch this script?
>>>>
>>>
>>> Although one change we should make to the script is change the example
>>> to use kcore. We can leave in one vmlinux one as an example if kcore
>>> isn't available, but add a note that it will fail if any patched code is
>>> traced (which is almost always).
>>
>> James, you may recall the year old thread
>> https://lore.kernel.org/all/ed8cea4c-a261-60ca-f4e1-333ec73cca8f@os.amperecomputing.com.
>> I described there an awkward workaround Ampere has used to solve the
>> patched code problem. At the time, it sounded like the maintainers were
>> interested in getting away from using the python script, mostly for
>> speed purposes. I didn't see the discussion go any further.
>>
>
> Oh yes thanks for the reminder. I wasn't thinking about the source code
> lines and debug symbols in this thread. I suppose your merging of kcore
> and vmlinux gives both the correct image and the symbols, but I was only
> focused on the image being correct, so only kcore was enough.
>
> It looks like everything we want to do from your previous thread is in
> addition to the fixes from this one. Even if we auto merge kcore +
> symbols and move the disassembly into Perf, we still want to detect
> decode issues earlier and not have IPs jumping backwards. Whether it's
> the script or Perf doing that the behavior should be the same.
>
Since it is breaking the decode, can we please add as a fix to drop the
packets from decode when the discontinuity is seen (with warning message
in verbose mode)? like below diff?
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -257,6 +257,11 @@ def process_event(param_dict):
print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
return
+ if (stop_addr < start_addr):
+ if (options.verbose == True):
+ print("Packet Dropped, Discontinuity detected
[stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
+ return
+
if (options.objdump_name != None):
# It doesn't need to decrease virtual memory offset for
disassembly
# for kernel dso and executable file dso, so in this
case we set
> To summarise I think these are the changes to make:
>
> * Improve bad decode detection in OpenCSD
> * Get the script to auto merge kcore and vmlinux
> * Maybe we could get Perf to do this if both a kcore folder and -k
> vmlinux are used?
> * Improve the performance, either in the script or move more
> functionality into Perf
Thanks,
Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-07 12:17 ` Ganapatrao Kulkarni
@ 2024-08-07 14:53 ` James Clark
2024-08-07 16:18 ` Ganapatrao Kulkarni
2024-08-07 16:48 ` Leo Yan
0 siblings, 2 replies; 45+ messages in thread
From: James Clark @ 2024-08-07 14:53 UTC (permalink / raw)
To: Ganapatrao Kulkarni, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Leo Yan, Al.Grant, Mike Leach
On 07/08/2024 1:17 pm, Ganapatrao Kulkarni wrote:
>
>
> On 06-08-2024 09:44 pm, James Clark wrote:
>>
>>
>> On 06/08/2024 4:02 pm, Steve Clevenger wrote:
>>>
>>>
>>> On 8/6/2024 2:57 AM, James Clark wrote:
>>>>
>>>>
>>>> On 06/08/2024 10:47 am, James Clark wrote:
>>>>>
>>>>>
>>>>> On 06/08/2024 8:02 am, Ganapatrao Kulkarni wrote:
>>>>>>
>>>>>>
>>>>>> On 05-08-2024 07:29 pm, James Clark wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi James,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>>>> To generate the instruction tracing, script uses 2
>>>>>>>>>>>>>>>>> contiguous packets
>>>>>>>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>>>>>>>> discontiguous branch
>>>>>>>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>>>>>>>> tracing with the
>>>>>>>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>>>>>>> remaining tracing
>>>>>>>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>>>>>>>> packets, if
>>>>>>>>>>>>>>>>> continuity is established.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you add a before and after example of what's changed to
>>>>>>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>>>>>>> this is adding missing output, or it was correcting the
>>>>>>>>>>>>>>>> tail end of the output that was previously wrong.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>>>>>>> segfault of the perf application. With out this change the
>>>>>>>>>>>>>>> perf segfaults with as below log
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ./perf script
>>>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py --
>>>>>>>>>>>>>>> -d objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>>> 271,
>>>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>>> 105,
>>>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start,
>>>>>>>>>>>>>>> start_addr, stop_addr):
>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99,
>>>>>>>>>>>>>>> in read_disam
>>>>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>>>>> check_output
>>>>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571,
>>>>>>>>>>>>>>> in run
>>>>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python
>>>>>>>>>>>>>>> trace event handler
>>>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>> a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't
>>>>>>>>>>>>>>>> it get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No, the logic is same as holding the addr of previous
>>>>>>>>>>>>>>> packet.
>>>>>>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>>>>>>>> previous one:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>> no return
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>> return
>>>>>>>>>>>>>
>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>>>>>>
>>>>>>>>>>>>> if (options.verbose == True):
>>>>>>>>>>>>> print("Event type: %s" % name)
>>>>>>>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>>>>>>>
>>>>>>>>>>>>> # Record for previous sample packet
>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>>>>>>
>>>>>>>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>>>>>>> stop_addr=4
>>>>>>>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>>>>>>> inserted" % cpu)
>>>>>>>>>>>>> return
>>>>>>>>>>>>>
>>>>>>>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>>>>>>> + # Continuity of the Packets broken, set
>>>>>>>>>>>>> start_addr to previous
>>>>>>>>>>>>> + # packet ip to complete the remaining tracing
>>>>>>>>>>>>> of the address range.
>>>>>>>>>>>>> + start_addr = prev_ip
>>>>>>>>>>>>> +
>>>>>>>>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>>>>>>>>> int(dso_end)):
>>>>>>>>>>>>> print("Start address 0x%x is out of range [
>>>>>>>>>>>>> 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>>>>>>>> int(dso_end), dso))
>>>>>>>>>>>>> return
>>>>>>>>>>>>>
>>>>>>>>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>>>>>>>>> reference.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e
>>>>>>>>>>>>> cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>>>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>>>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>> address
>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>>>>>> stop_addr):
>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>> 99, in
>>>>>>>>>>>>> read_disam
>>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>>> check_output
>>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571,
>>>>>>>>>>>>> in run
>>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>>>>> event handler
>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>
>>>>>>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> dump snippet:
>>>>>>>>>>>>> ============
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff800080313f04
>>>>>>>>>>>>> <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>>>> event->clock();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>>>>>> sp_el0
>>>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>>>>>> [x20]
>>>>>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>>>>>> w21, w24
>>>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>>>>>> x23, x21
>>>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> With fix:
>>>>>>>>>>>>> =========
>>>>>>>>>>>>>
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff800080313f04
>>>>>>>>>>>>> <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>>>> event->clock();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>>>>>> sp_el0
>>>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>>>>>> [x20]
>>>>>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>>>>>> w21, w24
>>>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>>>>>> x23, x21
>>>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like the disassembly now assumes this BLR wasn't
>>>>>>>>>>>> taken. We go from ffff80008125a8e4 straight through to ...
>>>>>>>>>>>>
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>>>>>>>> [x23, x21]
>>>>>>>>>>>>
>>>>>>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>>>>>>> discontinuity without branch samples being generated where
>>>>>>>>>>>> either the source or destination address is 0?
>>>>>>>>>>>>
>>>>>>>>>>>> What are your record options to create this issue? As I
>>>>>>>>>>>> mentioned in the previous reply I haven't been able to
>>>>>>>>>>>> reproduce it.
>>>>>>>>>>>
>>>>>>>>>>> I am using below perf record command.
>>>>>>>>>>>
>>>>>>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>>>>>>> of=/dev/null
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks I managed to reproduce it. I'll take a look to see if I
>>>>>>>>>> think the issue is somewhere else.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> At least for the failures I encountered, the issue is due to the
>>>>>>>>> alternatives runtime instruction patching mechanism. vmlinux ends
>>>>>>>>> up being the wrong image to decode with because a load of branches
>>>>>>>>> are actually turned into nops.
>>>>>>>>>
>>>>>>>>> Can you confirm if you use --kcore instead of vmlinux that you
>>>>>>>>> still get failures:
>>>>>>>>>
>>>>>>>>> sudo perf record -e cs_etm// -C 1 --kcore -o
>>>>>>>>> <output-folder.data> -- \
>>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>>
>>>>>>>>> perf script -i <output-folder.data> \
>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d
>>>>>>>>> llvm-objdump \
>>>>>>>>> -k <output-folder.data>/kcore_dir/kcore
>>>>>>>>>
>>>>>>>>
>>>>>>>> With below command combination with kcore also the issue is seen,
>>>>>>>> as reported in this email chain.
>>>>>>>>
>>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>
>>>>>>>> ./perf script -i kcore/data \
>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>>>>>>>> -d objdump -k kcore/kcore_dir/kcore
>>>>>>>>
>>>>>>>>
>>>>>>>> However, with below sequence(same as your command) the issue is
>>>>>>>> *not* seen.
>>>>>>>>
>>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>
>>>>>>>> ./perf script -i kcore/data
>>>>>>>> ./scripts/python/arm-cs-trace-disasm.py \
>>>>>>>> -- -d objdump -k kcore/kcore_dir/kcore
>>>>>>>>
>>>>>>>> Do you see any issue with the command, which is showing the
>>>>>>>> problem?
>>>>>>>> Also the output log produced by these both commands is different.
>>>>>>>>
>>>>>
>>>>> BTW are you running this on the target or somewhere else? It's
>>>>> suspicious that "-i kcore/data" works at all because there is no
>>>>> kernel image given to Perf. Unless you are running on the target and
>>>>> then I think it will just open the one from /proc. Or maybe it uses
>>>>> /boot/vmlinux by default which also wouldn't work.
>>>>>
>
> Yes All tests are done natively on Ampere's ARM64 platform.
> some more combination of commands which are also failing.
>
> ./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -- -d
> objdump -k kcore/kcore_dir/kcore
>
> ./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -F
> cpu,event,ip,addr,sym -- -d objdump -k kcore/kcore_dir/kcore
>
> ./perf script -i ./kcore scripts/python/arm-cs-trace-disasm.py -d
> objdump -k kcore/kcore_dir/kcore
>
>>>>> Also the difference between "--script=python:" and just giving the
>>>>> script name is in the parsing of the arguments following " -- ".
>>>>> Sometimes they're also parsed as Perf arguments (like the -v becomes
>>>>> perf verbose and -k becomes the Perf vmlinux rather than the script).
>>>>>
>>>>> I _think_ you want the " -- " when "--script" is used, and no "--"
>>>>> when it's not. But there are some other combinations and you'll have
>>>>> to debug it to compare your two exact scenarios to see why they're
>>>>> different.
>>>>>
>>>>> But ignoring that issue with the argument format, you mentioned you
>>>>> didn't see the issue any more with one version of --kcore. So I'm
>>>>> assuming that confirms the issue is just a decode image issue, so we
>>>>> shouldn't try to patch this script?
>>>>>
>>>>
>>>> Although one change we should make to the script is change the example
>>>> to use kcore. We can leave in one vmlinux one as an example if kcore
>>>> isn't available, but add a note that it will fail if any patched
>>>> code is
>>>> traced (which is almost always).
>>>
>>> James, you may recall the year old thread
>>> https://lore.kernel.org/all/ed8cea4c-a261-60ca-f4e1-333ec73cca8f@os.amperecomputing.com.
>>> I described there an awkward workaround Ampere has used to solve the
>>> patched code problem. At the time, it sounded like the maintainers were
>>> interested in getting away from using the python script, mostly for
>>> speed purposes. I didn't see the discussion go any further.
>>>
>>
>> Oh yes thanks for the reminder. I wasn't thinking about the source
>> code lines and debug symbols in this thread. I suppose your merging of
>> kcore and vmlinux gives both the correct image and the symbols, but I
>> was only focused on the image being correct, so only kcore was enough.
>>
>> It looks like everything we want to do from your previous thread is in
>> addition to the fixes from this one. Even if we auto merge kcore +
>> symbols and move the disassembly into Perf, we still want to detect
>> decode issues earlier and not have IPs jumping backwards. Whether it's
>> the script or Perf doing that the behavior should be the same.
>>
>
> Since it is breaking the decode, can we please add as a fix to drop the
> packets from decode when the discontinuity is seen (with warning message
> in verbose mode)? like below diff?
>
> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> @@ -257,6 +257,11 @@ def process_event(param_dict):
> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
> return
>
> + if (stop_addr < start_addr):
> + if (options.verbose == True):
> + print("Packet Dropped, Discontinuity detected
> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
> dso))
> + return
> +
I suppose my only concern with this is that it hides real errors and
Perf shouldn't be outputting samples that go backwards. Considering that
fixing this in OpenCSD and Perf has a much wider benefit I think that
should be the ultimate goal. I'm putting this on my todo list for now
(including Steve's merging idea).
But in the mean time what about having a force option?
> + if (stop_addr < start_addr):
> + if (options.verbose == True or not options.force):
> + print("Packet Dropped, Discontinuity detected
> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
> dso))
> + if (not options.force):
> + return
> +
That way in the future we'll still get reports if something new goes wrong.
> if (options.objdump_name != None):
> # It doesn't need to decrease virtual memory offset for
> disassembly
> # for kernel dso and executable file dso, so in this
> case we set
>
>
>> To summarise I think these are the changes to make:
>>
>> * Improve bad decode detection in OpenCSD
>> * Get the script to auto merge kcore and vmlinux
>> * Maybe we could get Perf to do this if both a kcore folder and -k
>> vmlinux are used?
>> * Improve the performance, either in the script or move more
>> functionality into Perf
>
> Thanks,
> Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-07 14:53 ` James Clark
@ 2024-08-07 16:18 ` Ganapatrao Kulkarni
2024-08-07 19:20 ` Leo Yan
2024-08-07 16:48 ` Leo Yan
1 sibling, 1 reply; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-08-07 16:18 UTC (permalink / raw)
To: James Clark, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Leo Yan, Al.Grant, Mike Leach
On 07-08-2024 08:23 pm, James Clark wrote:
>
>
> On 07/08/2024 1:17 pm, Ganapatrao Kulkarni wrote:
>>
>>
>> On 06-08-2024 09:44 pm, James Clark wrote:
>>>
>>>
>>> On 06/08/2024 4:02 pm, Steve Clevenger wrote:
>>>>
>>>>
>>>> On 8/6/2024 2:57 AM, James Clark wrote:
>>>>>
>>>>>
>>>>> On 06/08/2024 10:47 am, James Clark wrote:
>>>>>>
>>>>>>
>>>>>> On 06/08/2024 8:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 05-08-2024 07:29 pm, James Clark wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi James,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>>>>> To generate the instruction tracing, script uses 2
>>>>>>>>>>>>>>>>>> contiguous packets
>>>>>>>>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>>>>>>>>> discontiguous branch
>>>>>>>>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>>>>>>>>> tracing with the
>>>>>>>>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>>>>>>>> remaining tracing
>>>>>>>>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>>>>>>>>> packets, if
>>>>>>>>>>>>>>>>>> continuity is established.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can you add a before and after example of what's
>>>>>>>>>>>>>>>>> changed to
>>>>>>>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>>>>>>>> this is adding missing output, or it was correcting the
>>>>>>>>>>>>>>>>> tail end of the output that was previously wrong.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>>>>>>>> segfault of the perf application. With out this change the
>>>>>>>>>>>>>>>> perf segfaults with as below log
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ./perf script
>>>>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py --
>>>>>>>>>>>>>>>> -d objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>>>> 271,
>>>>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>>>> 105,
>>>>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start,
>>>>>>>>>>>>>>>> start_addr, stop_addr):
>>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>>>> 99,
>>>>>>>>>>>>>>>> in read_disam
>>>>>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line
>>>>>>>>>>>>>>>> 466, in
>>>>>>>>>>>>>>>> check_output
>>>>>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line
>>>>>>>>>>>>>>>> 571, in run
>>>>>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python
>>>>>>>>>>>>>>>> trace event handler
>>>>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>>>>>>>> <gankulkarni@os.amperecomputing.com>
>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>>> a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do you need to write into the global cpu_data here?
>>>>>>>>>>>>>>>>> Doesn't
>>>>>>>>>>>>>>>>> it get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No, the logic is same as holding the addr of previous
>>>>>>>>>>>>>>>> packet.
>>>>>>>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>>>>>>>>> previous one:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1
>>>>>>>>>>>>>>> save but
>>>>>>>>>>>>>>> no return
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current'
>>>>>>>>>>>>>>> IP:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (options.verbose == True):
>>>>>>>>>>>>>> print("Event type: %s" % name)
>>>>>>>>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # Record for previous sample packet
>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>>>>>>>> stop_addr=4
>>>>>>>>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>>>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>>>>>>>> inserted" % cpu)
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>>>>>>>> + # Continuity of the Packets broken, set
>>>>>>>>>>>>>> start_addr to previous
>>>>>>>>>>>>>> + # packet ip to complete the remaining tracing
>>>>>>>>>>>>>> of the address range.
>>>>>>>>>>>>>> + start_addr = prev_ip
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>>>>>>>>>> int(dso_end)):
>>>>>>>>>>>>>> print("Start address 0x%x is out of range [
>>>>>>>>>>>>>> 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>>>>>>>>> int(dso_end), dso))
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Without this patch below is the failure log(with segfault)
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> reference.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@sut01sys-r214 perf]# timeout 4s ./perf record -e
>>>>>>>>>>>>>> cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>>>>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>>>>>>>>> [root@sut01sys-r214 perf]# ./perf script
>>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>>> address
>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start,
>>>>>>>>>>>>>> start_addr,
>>>>>>>>>>>>>> stop_addr):
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>> 99, in
>>>>>>>>>>>>>> read_disam
>>>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>>>> check_output
>>>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571,
>>>>>>>>>>>>>> in run
>>>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>>>>>> event handler
>>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> dump snippet:
>>>>>>>>>>>>>> ============
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff800080313f04
>>>>>>>>>>>>>> <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>>>> ffff800080313f04: 36100094 tbz
>>>>>>>>>>>>>> w20,
>>>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>>>>> event->clock();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> sp_el0
>>>>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c
>>>>>>>>>>>>>> 105
>>>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp
>>>>>>>>>>>>>> x20,
>>>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>>>> ffff80008125a8bc: 910d0294 add
>>>>>>>>>>>>>> x20,
>>>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp
>>>>>>>>>>>>>> x23,
>>>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>>>> ffff80008125a8c4: 91002297 add
>>>>>>>>>>>>>> x23,
>>>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>>>> ffff80008125a8c8: 52800518 mov
>>>>>>>>>>>>>> w24,
>>>>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp
>>>>>>>>>>>>>> x21,
>>>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr
>>>>>>>>>>>>>> w22,
>>>>>>>>>>>>>> [x20]
>>>>>>>>>>>>>> ffff80008125a8d4: 120002d5 and
>>>>>>>>>>>>>> w21,
>>>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull
>>>>>>>>>>>>>> x21,
>>>>>>>>>>>>>> w21, w24
>>>>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x23, x21
>>>>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With fix:
>>>>>>>>>>>>>> =========
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff800080313f04
>>>>>>>>>>>>>> <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>>>> ffff800080313f04: 36100094 tbz
>>>>>>>>>>>>>> w20,
>>>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>>>>> event->clock();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> sp_el0
>>>>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c
>>>>>>>>>>>>>> 105
>>>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp
>>>>>>>>>>>>>> x20,
>>>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>>>> ffff80008125a8bc: 910d0294 add
>>>>>>>>>>>>>> x20,
>>>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp
>>>>>>>>>>>>>> x23,
>>>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>>>> ffff80008125a8c4: 91002297 add
>>>>>>>>>>>>>> x23,
>>>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>>>> ffff80008125a8c8: 52800518 mov
>>>>>>>>>>>>>> w24,
>>>>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp
>>>>>>>>>>>>>> x21,
>>>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr
>>>>>>>>>>>>>> w22,
>>>>>>>>>>>>>> [x20]
>>>>>>>>>>>>>> ffff80008125a8d4: 120002d5 and
>>>>>>>>>>>>>> w21,
>>>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull
>>>>>>>>>>>>>> x21,
>>>>>>>>>>>>>> w21, w24
>>>>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x23, x21
>>>>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like the disassembly now assumes this BLR wasn't
>>>>>>>>>>>>> taken. We go from ffff80008125a8e4 straight through to ...
>>>>>>>>>>>>>
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>>>>>>>>> [x23, x21]
>>>>>>>>>>>>>
>>>>>>>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>>>>>>>> discontinuity without branch samples being generated where
>>>>>>>>>>>>> either the source or destination address is 0?
>>>>>>>>>>>>>
>>>>>>>>>>>>> What are your record options to create this issue? As I
>>>>>>>>>>>>> mentioned in the previous reply I haven't been able to
>>>>>>>>>>>>> reproduce it.
>>>>>>>>>>>>
>>>>>>>>>>>> I am using below perf record command.
>>>>>>>>>>>>
>>>>>>>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>>>>>>>> of=/dev/null
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks I managed to reproduce it. I'll take a look to see if I
>>>>>>>>>>> think the issue is somewhere else.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> At least for the failures I encountered, the issue is due to the
>>>>>>>>>> alternatives runtime instruction patching mechanism. vmlinux ends
>>>>>>>>>> up being the wrong image to decode with because a load of
>>>>>>>>>> branches
>>>>>>>>>> are actually turned into nops.
>>>>>>>>>>
>>>>>>>>>> Can you confirm if you use --kcore instead of vmlinux that you
>>>>>>>>>> still get failures:
>>>>>>>>>>
>>>>>>>>>> sudo perf record -e cs_etm// -C 1 --kcore -o
>>>>>>>>>> <output-folder.data> -- \
>>>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>>>
>>>>>>>>>> perf script -i <output-folder.data> \
>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d
>>>>>>>>>> llvm-objdump \
>>>>>>>>>> -k <output-folder.data>/kcore_dir/kcore
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With below command combination with kcore also the issue is seen,
>>>>>>>>> as reported in this email chain.
>>>>>>>>>
>>>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>>
>>>>>>>>> ./perf script -i kcore/data \
>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>>>>>>>>> -d objdump -k kcore/kcore_dir/kcore
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> However, with below sequence(same as your command) the issue is
>>>>>>>>> *not* seen.
>>>>>>>>>
>>>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>>
>>>>>>>>> ./perf script -i kcore/data
>>>>>>>>> ./scripts/python/arm-cs-trace-disasm.py \
>>>>>>>>> -- -d objdump -k kcore/kcore_dir/kcore
>>>>>>>>>
>>>>>>>>> Do you see any issue with the command, which is showing the
>>>>>>>>> problem?
>>>>>>>>> Also the output log produced by these both commands is different.
>>>>>>>>>
>>>>>>
>>>>>> BTW are you running this on the target or somewhere else? It's
>>>>>> suspicious that "-i kcore/data" works at all because there is no
>>>>>> kernel image given to Perf. Unless you are running on the target and
>>>>>> then I think it will just open the one from /proc. Or maybe it uses
>>>>>> /boot/vmlinux by default which also wouldn't work.
>>>>>>
>>
>> Yes All tests are done natively on Ampere's ARM64 platform.
>> some more combination of commands which are also failing.
>>
>> ./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py --
>> -d objdump -k kcore/kcore_dir/kcore
>>
>> ./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -F
>> cpu,event,ip,addr,sym -- -d objdump -k kcore/kcore_dir/kcore
>>
>> ./perf script -i ./kcore scripts/python/arm-cs-trace-disasm.py -d
>> objdump -k kcore/kcore_dir/kcore
>>
>>>>>> Also the difference between "--script=python:" and just giving the
>>>>>> script name is in the parsing of the arguments following " -- ".
>>>>>> Sometimes they're also parsed as Perf arguments (like the -v becomes
>>>>>> perf verbose and -k becomes the Perf vmlinux rather than the script).
>>>>>>
>>>>>> I _think_ you want the " -- " when "--script" is used, and no "--"
>>>>>> when it's not. But there are some other combinations and you'll have
>>>>>> to debug it to compare your two exact scenarios to see why they're
>>>>>> different.
>>>>>>
>>>>>> But ignoring that issue with the argument format, you mentioned you
>>>>>> didn't see the issue any more with one version of --kcore. So I'm
>>>>>> assuming that confirms the issue is just a decode image issue, so we
>>>>>> shouldn't try to patch this script?
>>>>>>
>>>>>
>>>>> Although one change we should make to the script is change the example
>>>>> to use kcore. We can leave in one vmlinux one as an example if kcore
>>>>> isn't available, but add a note that it will fail if any patched
>>>>> code is
>>>>> traced (which is almost always).
>>>>
>>>> James, you may recall the year old thread
>>>> https://lore.kernel.org/all/ed8cea4c-a261-60ca-f4e1-333ec73cca8f@os.amperecomputing.com.
>>>> I described there an awkward workaround Ampere has used to solve the
>>>> patched code problem. At the time, it sounded like the maintainers were
>>>> interested in getting away from using the python script, mostly for
>>>> speed purposes. I didn't see the discussion go any further.
>>>>
>>>
>>> Oh yes thanks for the reminder. I wasn't thinking about the source
>>> code lines and debug symbols in this thread. I suppose your merging
>>> of kcore and vmlinux gives both the correct image and the symbols,
>>> but I was only focused on the image being correct, so only kcore was
>>> enough.
>>>
>>> It looks like everything we want to do from your previous thread is
>>> in addition to the fixes from this one. Even if we auto merge kcore +
>>> symbols and move the disassembly into Perf, we still want to detect
>>> decode issues earlier and not have IPs jumping backwards. Whether
>>> it's the script or Perf doing that the behavior should be the same.
>>>
>>
>> Since it is breaking the decode, can we please add as a fix to drop
>> the packets from decode when the discontinuity is seen (with warning
>> message in verbose mode)? like below diff?
>>
>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> @@ -257,6 +257,11 @@ def process_event(param_dict):
>> print("Stop address 0x%x is out of range [ 0x%x ..
>> 0x%x ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>> return
>>
>> + if (stop_addr < start_addr):
>> + if (options.verbose == True):
>> + print("Packet Dropped, Discontinuity detected
>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
>> dso))
>> + return
>> +
>
> I suppose my only concern with this is that it hides real errors and
> Perf shouldn't be outputting samples that go backwards. Considering that
> fixing this in OpenCSD and Perf has a much wider benefit I think that
> should be the ultimate goal. I'm putting this on my todo list for now
> (including Steve's merging idea).
>
> But in the mean time what about having a force option?
>
> > + if (stop_addr < start_addr):
> > + if (options.verbose == True or not options.force):
> > + print("Packet Dropped, Discontinuity detected
> > [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
> > dso))
> > + if (not options.force):
> > + return
> > +
>
> That way in the future we'll still get reports if something new goes wrong.
Sure, Makes sense.
Is below diff with force option looks good?
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
b/tools/perf/scripts/python/arm-cs-trace-disasm.py
index d973c2baed1c..efe34f308beb 100755
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -36,7 +36,10 @@ option_list = [
help="Set path to objdump executable file"),
make_option("-v", "--verbose", dest="verbose",
action="store_true", default=False,
- help="Enable debugging log")
+ help="Enable debugging log"),
+ make_option("-f", "--force", dest="force",
+ action="store_true", default=False,
+ help="Force decoder to continue")
]
parser = OptionParser(option_list=option_list)
@@ -257,6 +260,12 @@ def process_event(param_dict):
print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
return
+ if (stop_addr < start_addr):
+ if (options.verbose == True or options.force):
+ print("Packet Discontinuity detected
[stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
+ if (options.force):
+ return
+
if (options.objdump_name != None):
# It doesn't need to decrease virtual memory offset for
disassembly
# for kernel dso and executable file dso, so in this
case we set
>
>> if (options.objdump_name != None):
>> # It doesn't need to decrease virtual memory offset
>> for disassembly
>> # for kernel dso and executable file dso, so in this
>> case we set
>>
>>
>>> To summarise I think these are the changes to make:
>>>
>>> * Improve bad decode detection in OpenCSD
>>> * Get the script to auto merge kcore and vmlinux
>>> * Maybe we could get Perf to do this if both a kcore folder and -k
>>> vmlinux are used?
>>> * Improve the performance, either in the script or move more
>>> functionality into Perf
>>
Thanks,
Ganapat
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-07 14:53 ` James Clark
2024-08-07 16:18 ` Ganapatrao Kulkarni
@ 2024-08-07 16:48 ` Leo Yan
2024-08-08 9:32 ` James Clark
1 sibling, 1 reply; 45+ messages in thread
From: Leo Yan @ 2024-08-07 16:48 UTC (permalink / raw)
To: James Clark, Ganapatrao Kulkarni, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Al.Grant, Mike Leach
Hi all,
On 8/7/2024 3:53 PM, James Clark wrote:
A minor suggestion: if the discussion is too long, please delete the
irrelevant message ;)
[...]
>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> @@ -257,6 +257,11 @@ def process_event(param_dict):
>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>> return
>>
>> + if (stop_addr < start_addr):
>> + if (options.verbose == True):
>> + print("Packet Dropped, Discontinuity detected
>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
>> dso))
>> + return
>> +
>
> I suppose my only concern with this is that it hides real errors and
> Perf shouldn't be outputting samples that go backwards. Considering that
> fixing this in OpenCSD and Perf has a much wider benefit I think that
> should be the ultimate goal. I'm putting this on my todo list for now
> (including Steve's merging idea).
In the perf's util/cs-etm.c file, it handles DISCONTINUITY with:
case CS_ETM_DISCONTINUITY:
/*
* The trace is discontinuous, if the previous packet is
* instruction packet, set flag PERF_IP_FLAG_TRACE_END
* for previous packet.
*/
if (prev_packet->sample_type == CS_ETM_RANGE)
prev_packet->flags |= PERF_IP_FLAG_BRANCH |
PERF_IP_FLAG_TRACE_END;
I am wandering if OpenCSD has passed the correct info so Perf decoder can
detect the discontinuity. If yes, then the flag 'PERF_IP_FLAG_TRACE_END' will
be set (it is a general flag in branch sample), then we can consider use it in
the python script to handle discontinuous data.
>
> But in the mean time what about having a force option?
>
>> + if (stop_addr < start_addr):
>> + if (options.verbose == True or not options.force):
>> + print("Packet Dropped, Discontinuity detected
>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
>> dso))
>> + if (not options.force):
>> + return
If the stop address is less than the start address, it must be something
wrong. In this case, we can report a warning for discontinuity and directly
return (also need to save the `addr` into global variable for next parsing).
I prefer to not add force option for this case - eventually, this will consume
much time for reporting this kind of failure and need to root causing it. A
better way is we just print out the reasoning in the log and continue to dump.
Thanks,
Leo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-07 16:18 ` Ganapatrao Kulkarni
@ 2024-08-07 19:20 ` Leo Yan
2024-08-08 4:36 ` Ganapatrao Kulkarni
0 siblings, 1 reply; 45+ messages in thread
From: Leo Yan @ 2024-08-07 19:20 UTC (permalink / raw)
To: Ganapatrao Kulkarni, James Clark, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Al.Grant, Mike Leach
On 8/7/2024 5:18 PM, Ganapatrao Kulkarni wrote:
> Is below diff with force option looks good?
>
> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> index d973c2baed1c..efe34f308beb 100755
> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> @@ -36,7 +36,10 @@ option_list = [
> help="Set path to objdump executable file"),
> make_option("-v", "--verbose", dest="verbose",
> action="store_true", default=False,
> - help="Enable debugging log")
> + help="Enable debugging log"),
> + make_option("-f", "--force", dest="force",
> + action="store_true", default=False,
> + help="Force decoder to continue")
> ]
>
> parser = OptionParser(option_list=option_list)
> @@ -257,6 +260,12 @@ def process_event(param_dict):
> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
> return
>
> + if (stop_addr < start_addr):
> + if (options.verbose == True or options.force):
> + print("Packet Discontinuity detected [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
> + if (options.force):
> + return
I struggled a bit for the code - it is confused that force mode bails out
and the non-force mode continues to run. I prefer to always bail out for
the discontinuity case, as it is pointless to continue in this case.
if (stop_addr <= start_addr):
print("Packet Discontinuity detected [stop_add:0x%x start_addr:0x%x ] for dso %s" % \
(stop_addr, start_addr, dso))
return
Thanks,
Leo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-07 19:20 ` Leo Yan
@ 2024-08-08 4:36 ` Ganapatrao Kulkarni
2024-08-08 7:42 ` Leo Yan
0 siblings, 1 reply; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-08-08 4:36 UTC (permalink / raw)
To: Leo Yan, James Clark, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Al.Grant, Mike Leach
On 08-08-2024 12:50 am, Leo Yan wrote:
> On 8/7/2024 5:18 PM, Ganapatrao Kulkarni wrote:
>
>> Is below diff with force option looks good?
>>
>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> index d973c2baed1c..efe34f308beb 100755
>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> @@ -36,7 +36,10 @@ option_list = [
>> help="Set path to objdump executable file"),
>> make_option("-v", "--verbose", dest="verbose",
>> action="store_true", default=False,
>> - help="Enable debugging log")
>> + help="Enable debugging log"),
>> + make_option("-f", "--force", dest="force",
>> + action="store_true", default=False,
>> + help="Force decoder to continue")
>> ]
>>
>> parser = OptionParser(option_list=option_list)
>> @@ -257,6 +260,12 @@ def process_event(param_dict):
>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>> return
>>
>> + if (stop_addr < start_addr):
>> + if (options.verbose == True or options.force):
>> + print("Packet Discontinuity detected [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
>> + if (options.force):
>> + return
>
> I struggled a bit for the code - it is confused that force mode bails out
> and the non-force mode continues to run. I prefer to always bail out for
> the discontinuity case, as it is pointless to continue in this case.
Kept bail out with force option since I though it is not good to hide
the error in normal use, otherwise we never able to notice this error in
the future and it becomes default hidden. Eventually this error should
be fixed.
Having said that, It is also seems OK to avoid the error with the
warning message as you suggested.
>
> if (stop_addr <= start_addr):
> print("Packet Discontinuity detected [stop_add:0x%x start_addr:0x%x ] for dso %s" % \
> (stop_addr, start_addr, dso))
> return
>
Thanks,
Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-08 4:36 ` Ganapatrao Kulkarni
@ 2024-08-08 7:42 ` Leo Yan
2024-08-08 9:21 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: Leo Yan @ 2024-08-08 7:42 UTC (permalink / raw)
To: Ganapatrao Kulkarni, James Clark, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Al.Grant, Mike Leach
On 8/8/2024 5:36 AM, Ganapatrao Kulkarni wrote:
>
> On 08-08-2024 12:50 am, Leo Yan wrote:
>> On 8/7/2024 5:18 PM, Ganapatrao Kulkarni wrote:
>>
>>> Is below diff with force option looks good?
>>>
>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> index d973c2baed1c..efe34f308beb 100755
>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> @@ -36,7 +36,10 @@ option_list = [
>>> help="Set path to objdump executable file"),
>>> make_option("-v", "--verbose", dest="verbose",
>>> action="store_true", default=False,
>>> - help="Enable debugging log")
>>> + help="Enable debugging log"),
>>> + make_option("-f", "--force", dest="force",
>>> + action="store_true", default=False,
>>> + help="Force decoder to continue")
>>> ]
>>>
>>> parser = OptionParser(option_list=option_list)
>>> @@ -257,6 +260,12 @@ def process_event(param_dict):
>>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>>> return
>>>
>>> + if (stop_addr < start_addr):
>>> + if (options.verbose == True or options.force):
>>> + print("Packet Discontinuity detected [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
>>> + if (options.force):
>>> + return
>>
>> I struggled a bit for the code - it is confused that force mode bails out
>> and the non-force mode continues to run. I prefer to always bail out for
>> the discontinuity case, as it is pointless to continue in this case.
>
> Kept bail out with force option since I though it is not good to hide
> the error in normal use, otherwise we never able to notice this error in
> the future and it becomes default hidden. Eventually this error should
> be fixed.
As James said, the issue should be fixed in OpenCSD or Perf decoding flow.
Thus, perf tool should be tolerant errors - report warning and drop
discontinuous samples. This would be easier for developers later if face
the same issue, they don't need to spend time to locate issue and struggle
for overriding the error.
If you prefer to use force option, it might be better to give reasoning and
*suggestion* in one go, something like:
if (stop_addr < start_addr):
print("Packet Discontinuity detected [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
print("Use option '-f' following the script for force mode"
if (options.force)
return
Either way is fine for me. Thanks a lot for taking time on the issue.
Leo
> Having said that, It is also seems OK to avoid the error with the
> warning message as you suggested.
>
>>
>> if (stop_addr <= start_addr):
>> print("Packet Discontinuity detected [stop_add:0x%x start_addr:0x%x ] for dso %s" % \
>> (stop_addr, start_addr, dso))
>> return
>>
>
> Thanks,
> Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-06 16:14 ` James Clark
2024-08-07 12:17 ` Ganapatrao Kulkarni
@ 2024-08-08 7:54 ` Leo Yan
1 sibling, 0 replies; 45+ messages in thread
From: Leo Yan @ 2024-08-08 7:54 UTC (permalink / raw)
To: James Clark, scclevenger, Ganapatrao Kulkarni
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Al.Grant, Mike Leach
On 8/6/2024 5:14 PM, James Clark wrote:
[...]
>>> Although one change we should make to the script is change the example
>>> to use kcore. We can leave in one vmlinux one as an example if kcore
>>> isn't available, but add a note that it will fail if any patched code is
>>> traced (which is almost always).
>>
>> James, you may recall the year old thread
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2Fed8cea4c-a261-60ca-f4e1-333ec73cca8f%40os.amperecomputing.com&data=05%7C02%7Cleo.yan%40arm.com%7C263b6739000948cc6b1308dcb632df73%7Cf34e597957d94aaaad4db122a662184d%7C0%7C0%7C638585576843676468%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=kYXL6KuIv3n1UwA%2BxmEbm6V2E0g2fOtt7Iv%2B%2BMb%2B%2FQ0%3D&reserved=0.
>> I described there an awkward workaround Ampere has used to solve the
>> patched code problem. At the time, it sounded like the maintainers were
>> interested in getting away from using the python script, mostly for
>> speed purposes. I didn't see the discussion go any further.
>>
>
> Oh yes thanks for the reminder. I wasn't thinking about the source code
> lines and debug symbols in this thread. I suppose your merging of kcore
> and vmlinux gives both the correct image and the symbols, but I was only
> focused on the image being correct, so only kcore was enough.
>
> It looks like everything we want to do from your previous thread is in
> addition to the fixes from this one. Even if we auto merge kcore +
> symbols and move the disassembly into Perf, we still want to detect
> decode issues earlier and not have IPs jumping backwards. Whether it's
> the script or Perf doing that the behavior should be the same.
>
> To summarise I think these are the changes to make:
>
> * Improve bad decode detection in OpenCSD
> * Get the script to auto merge kcore and vmlinux
> * Maybe we could get Perf to do this if both a kcore folder and -k
> vmlinux are used?
We need firstly make clear what's the purpose for using kcore and vmlinux.
The kcore contains the latest instructions, so it is used for disassembly.
The vmlinux contains the debug info for locating source file and lines.
If so, this is a common issue for perf, it might not necessary to merge
two files. Alternatively, we need to check how the perf to use kcore
and vmlinux at the same time.
Thanks,
Leo
> * Improve the performance, either in the script or move more
> functionality into Perf
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-08 7:42 ` Leo Yan
@ 2024-08-08 9:21 ` James Clark
2024-08-08 10:51 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-08-08 9:21 UTC (permalink / raw)
To: Leo Yan, Ganapatrao Kulkarni, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Al.Grant, Mike Leach
On 08/08/2024 8:42 am, Leo Yan wrote:
> On 8/8/2024 5:36 AM, Ganapatrao Kulkarni wrote:
>>
>> On 08-08-2024 12:50 am, Leo Yan wrote:
>>> On 8/7/2024 5:18 PM, Ganapatrao Kulkarni wrote:
>>>
>>>> Is below diff with force option looks good?
>>>>
>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> index d973c2baed1c..efe34f308beb 100755
>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> @@ -36,7 +36,10 @@ option_list = [
>>>> help="Set path to objdump executable file"),
>>>> make_option("-v", "--verbose", dest="verbose",
>>>> action="store_true", default=False,
>>>> - help="Enable debugging log")
>>>> + help="Enable debugging log"),
>>>> + make_option("-f", "--force", dest="force",
>>>> + action="store_true", default=False,
>>>> + help="Force decoder to continue")
>>>> ]
>>>>
>>>> parser = OptionParser(option_list=option_list)
>>>> @@ -257,6 +260,12 @@ def process_event(param_dict):
>>>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
>>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>>>> return
>>>>
>>>> + if (stop_addr < start_addr):
>>>> + if (options.verbose == True or options.force):
>>>> + print("Packet Discontinuity detected [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
The options.force for the print should be "options.verbose or not
options.force" I think? You want to print the error until the user adds
-f, then hide it. Unless verbose is on.
>>>> + if (options.force):
>>>> + return
Oops I had this one the wrong way around in my example. This way is correct.
>>>
>>> I struggled a bit for the code - it is confused that force mode bails out
>>> and the non-force mode continues to run. I prefer to always bail out for
>>> the discontinuity case, as it is pointless to continue in this case.
>>
>> Kept bail out with force option since I though it is not good to hide
>> the error in normal use, otherwise we never able to notice this error in
>> the future and it becomes default hidden. Eventually this error should
>> be fixed.
>
> As James said, the issue should be fixed in OpenCSD or Perf decoding flow.
>
> Thus, perf tool should be tolerant errors - report warning and drop
> discontinuous samples. This would be easier for developers later if face
> the same issue, they don't need to spend time to locate issue and struggle
> for overriding the error.
>
> If you prefer to use force option, it might be better to give reasoning and
> *suggestion* in one go, something like:
>
> if (stop_addr < start_addr):
> print("Packet Discontinuity detected [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
> print("Use option '-f' following the script for force mode"
> if (options.force)
> return
>
> Either way is fine for me. Thanks a lot for taking time on the issue.
>
> Leo
>
But your diff looks good Ganapat, I think send a patch with Leo's extra
help message added and the first force flipped.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-07 16:48 ` Leo Yan
@ 2024-08-08 9:32 ` James Clark
2024-08-08 11:05 ` Leo Yan
2024-08-09 14:13 ` Mike Leach
0 siblings, 2 replies; 45+ messages in thread
From: James Clark @ 2024-08-08 9:32 UTC (permalink / raw)
To: Leo Yan, Ganapatrao Kulkarni, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Al.Grant, Mike Leach
On 07/08/2024 5:48 pm, Leo Yan wrote:
> Hi all,
>
> On 8/7/2024 3:53 PM, James Clark wrote:
>
> A minor suggestion: if the discussion is too long, please delete the
> irrelevant message ;)
>
> [...]
>
>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>> @@ -257,6 +257,11 @@ def process_event(param_dict):
>>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>>> return
>>>
>>> + if (stop_addr < start_addr):
>>> + if (options.verbose == True):
>>> + print("Packet Dropped, Discontinuity detected
>>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
>>> dso))
>>> + return
>>> +
>>
>> I suppose my only concern with this is that it hides real errors and
>> Perf shouldn't be outputting samples that go backwards. Considering that
>> fixing this in OpenCSD and Perf has a much wider benefit I think that
>> should be the ultimate goal. I'm putting this on my todo list for now
>> (including Steve's merging idea).
>
> In the perf's util/cs-etm.c file, it handles DISCONTINUITY with:
>
> case CS_ETM_DISCONTINUITY:
> /*
> * The trace is discontinuous, if the previous packet is
> * instruction packet, set flag PERF_IP_FLAG_TRACE_END
> * for previous packet.
> */
> if (prev_packet->sample_type == CS_ETM_RANGE)
> prev_packet->flags |= PERF_IP_FLAG_BRANCH |
> PERF_IP_FLAG_TRACE_END;
>
> I am wandering if OpenCSD has passed the correct info so Perf decoder can
> detect the discontinuity. If yes, then the flag 'PERF_IP_FLAG_TRACE_END' will
> be set (it is a general flag in branch sample), then we can consider use it in
> the python script to handle discontinuous data.
No OpenCSD isn't passing the correct info here. Higher up in the thread
I suggested an OpenCSD patch that makes it detect the error earlier and
fixes the issue. It also needs to output a discontinuity when the
address goes backwards. So two fixes and then the script works without
modifications.
>
>>
>> But in the mean time what about having a force option?
>>
>>> + if (stop_addr < start_addr):
>>> + if (options.verbose == True or not options.force):
>>> + print("Packet Dropped, Discontinuity detected
>>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
>>> dso))
>>> + if (not options.force):
>>> + return
>
> If the stop address is less than the start address, it must be something
> wrong. In this case, we can report a warning for discontinuity and directly
> return (also need to save the `addr` into global variable for next parsing).
>
> I prefer to not add force option for this case - eventually, this will consume
> much time for reporting this kind of failure and need to root causing it. A
> better way is we just print out the reasoning in the log and continue to dump.
But in this case we've identified all the known issues that would cause
the script to fail and we can fix them in Perf and OpenCSD. There may
not even be any more issues that will cause the script to fail in the
future so there's no point in softening the error IMO. That will only
hide future issues (of which there may be none) and make root causing
harder when it hits some other tool.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-08 9:21 ` James Clark
@ 2024-08-08 10:51 ` James Clark
2024-08-08 11:14 ` Ganapatrao Kulkarni
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-08-08 10:51 UTC (permalink / raw)
To: Leo Yan, Ganapatrao Kulkarni, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Al.Grant, Mike Leach
On 08/08/2024 10:21 am, James Clark wrote:
>
>
> On 08/08/2024 8:42 am, Leo Yan wrote:
>> On 8/8/2024 5:36 AM, Ganapatrao Kulkarni wrote:
>>>
>>> On 08-08-2024 12:50 am, Leo Yan wrote:
>>>> On 8/7/2024 5:18 PM, Ganapatrao Kulkarni wrote:
>>>>
>>>>> Is below diff with force option looks good?
>>>>>
>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> index d973c2baed1c..efe34f308beb 100755
>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> @@ -36,7 +36,10 @@ option_list = [
>>>>> help="Set path to objdump executable file"),
>>>>> make_option("-v", "--verbose", dest="verbose",
>>>>> action="store_true", default=False,
>>>>> - help="Enable debugging log")
>>>>> + help="Enable debugging log"),
>>>>> + make_option("-f", "--force", dest="force",
>>>>> + action="store_true", default=False,
>>>>> + help="Force decoder to continue")
>>>>> ]
>>>>>
>>>>> parser = OptionParser(option_list=option_list)
>>>>> @@ -257,6 +260,12 @@ def process_event(param_dict):
>>>>> print("Stop address 0x%x is out of range [ 0x%x
>>>>> .. 0x%x
>>>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>>>>> return
>>>>>
>>>>> + if (stop_addr < start_addr):
>>>>> + if (options.verbose == True or options.force):
>>>>> + print("Packet Discontinuity detected
>>>>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr,
>>>>> start_addr, dso))
>
> The options.force for the print should be "options.verbose or not
> options.force" I think? You want to print the error until the user adds
> -f, then hide it. Unless verbose is on.
>
>>>>> + if (options.force):
>>>>> + return
>
> Oops I had this one the wrong way around in my example. This way is
> correct.
>
>>>>
>>>> I struggled a bit for the code - it is confused that force mode
>>>> bails out
>>>> and the non-force mode continues to run. I prefer to always bail out
>>>> for
>>>> the discontinuity case, as it is pointless to continue in this case.
>>>
>>> Kept bail out with force option since I though it is not good to hide
>>> the error in normal use, otherwise we never able to notice this error in
>>> the future and it becomes default hidden. Eventually this error should
>>> be fixed.
>>
>> As James said, the issue should be fixed in OpenCSD or Perf decoding
>> flow.
>>
>> Thus, perf tool should be tolerant errors - report warning and drop
>> discontinuous samples. This would be easier for developers later if face
>> the same issue, they don't need to spend time to locate issue and
>> struggle
>> for overriding the error.
>>
>> If you prefer to use force option, it might be better to give
>> reasoning and
>> *suggestion* in one go, something like:
>>
>> if (stop_addr < start_addr):
>> print("Packet Discontinuity detected [stop_add:0x%x
>> start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
>> print("Use option '-f' following the script for force mode"
>> if (options.force)
>> return
>>
>> Either way is fine for me. Thanks a lot for taking time on the issue.
>>
>> Leo
>
> But your diff looks good Ganapat, I think send a patch with Leo's extra
> help message added and the first force flipped.
One other small detail about Leo's suggestion print out. Can you add an
instruction of how to keep the warnings as well:
print("Use option '-f' following the script for force mode. Add '-v' \
to continue printing decode warnings.")
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-08 9:32 ` James Clark
@ 2024-08-08 11:05 ` Leo Yan
2024-08-09 14:13 ` Mike Leach
1 sibling, 0 replies; 45+ messages in thread
From: Leo Yan @ 2024-08-08 11:05 UTC (permalink / raw)
To: James Clark, Ganapatrao Kulkarni, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Al.Grant, Mike Leach
On 8/8/24 10:32, James Clark wrote:
[...]
>>> I suppose my only concern with this is that it hides real errors and
>>> Perf shouldn't be outputting samples that go backwards. Considering that
>>> fixing this in OpenCSD and Perf has a much wider benefit I think that
>>> should be the ultimate goal. I'm putting this on my todo list for now
>>> (including Steve's merging idea).
>>
>> In the perf's util/cs-etm.c file, it handles DISCONTINUITY with:
>>
>> case CS_ETM_DISCONTINUITY:
>> /*
>> * The trace is discontinuous, if the previous packet is
>> * instruction packet, set flag PERF_IP_FLAG_TRACE_END
>> * for previous packet.
>> */
>> if (prev_packet->sample_type == CS_ETM_RANGE)
>> prev_packet->flags |= PERF_IP_FLAG_BRANCH |
>> PERF_IP_FLAG_TRACE_END;
>>
>> I am wandering if OpenCSD has passed the correct info so Perf decoder can
>> detect the discontinuity. If yes, then the flag 'PERF_IP_FLAG_TRACE_END' will
>> be set (it is a general flag in branch sample), then we can consider use it in
>> the python script to handle discontinuous data.
>
> No OpenCSD isn't passing the correct info here. Higher up in the thread
> I suggested an OpenCSD patch that makes it detect the error earlier and
> fixes the issue. It also needs to output a discontinuity when the
> address goes backwards. So two fixes and then the script works without
> modifications.
Great! Just remind, with the fixes above, we might still need to enhance the
script to consume the PERF_IP_FLAG_TRACE_END flag, this can allow the script
to be reliable.
[...]
>> I prefer to not add force option for this case - eventually, this will consume
>> much time for reporting this kind of failure and need to root causing it. A
>> better way is we just print out the reasoning in the log and continue to dump.
>
> But in this case we've identified all the known issues that would cause
> the script to fail and we can fix them in Perf and OpenCSD. There may
> not even be any more issues that will cause the script to fail in the
> future so there's no point in softening the error IMO. That will only
> hide future issues (of which there may be none) and make root causing
> harder when it hits some other tool.
It is fine for me - with friendly logs as discussed in other replies.
Thanks,
Leo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-08 10:51 ` James Clark
@ 2024-08-08 11:14 ` Ganapatrao Kulkarni
2024-08-08 15:01 ` Mike Leach
0 siblings, 1 reply; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-08-08 11:14 UTC (permalink / raw)
To: James Clark, Leo Yan, scclevenger
Cc: acme, coresight, linux-arm-kernel, linux-kernel, darren,
james.clark, suzuki.poulose, Al.Grant, Mike Leach
On 08-08-2024 04:21 pm, James Clark wrote:
>
>
> On 08/08/2024 10:21 am, James Clark wrote:
>>
>>
>> On 08/08/2024 8:42 am, Leo Yan wrote:
>>> On 8/8/2024 5:36 AM, Ganapatrao Kulkarni wrote:
>>>>
>>>> On 08-08-2024 12:50 am, Leo Yan wrote:
>>>>> On 8/7/2024 5:18 PM, Ganapatrao Kulkarni wrote:
>>>>>
>>>>>> Is below diff with force option looks good?
>>>>>>
>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> index d973c2baed1c..efe34f308beb 100755
>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> @@ -36,7 +36,10 @@ option_list = [
>>>>>> help="Set path to objdump executable file"),
>>>>>> make_option("-v", "--verbose", dest="verbose",
>>>>>> action="store_true", default=False,
>>>>>> - help="Enable debugging log")
>>>>>> + help="Enable debugging log"),
>>>>>> + make_option("-f", "--force", dest="force",
>>>>>> + action="store_true", default=False,
>>>>>> + help="Force decoder to continue")
>>>>>> ]
>>>>>>
>>>>>> parser = OptionParser(option_list=option_list)
>>>>>> @@ -257,6 +260,12 @@ def process_event(param_dict):
>>>>>> print("Stop address 0x%x is out of range [ 0x%x
>>>>>> .. 0x%x
>>>>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>>>>>> return
>>>>>>
>>>>>> + if (stop_addr < start_addr):
>>>>>> + if (options.verbose == True or options.force):
>>>>>> + print("Packet Discontinuity detected
>>>>>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr,
>>>>>> start_addr, dso))
>>
>> The options.force for the print should be "options.verbose or not
>> options.force" I think? You want to print the error until the user
>> adds -f, then hide it. Unless verbose is on.
>>
>>>>>> + if (options.force):
>>>>>> + return
>>
>> Oops I had this one the wrong way around in my example. This way is
>> correct.
>>
>>>>>
>>>>> I struggled a bit for the code - it is confused that force mode
>>>>> bails out
>>>>> and the non-force mode continues to run. I prefer to always bail
>>>>> out for
>>>>> the discontinuity case, as it is pointless to continue in this case.
>>>>
>>>> Kept bail out with force option since I though it is not good to hide
>>>> the error in normal use, otherwise we never able to notice this
>>>> error in
>>>> the future and it becomes default hidden. Eventually this error should
>>>> be fixed.
>>>
>>> As James said, the issue should be fixed in OpenCSD or Perf decoding
>>> flow.
>>>
>>> Thus, perf tool should be tolerant errors - report warning and drop
>>> discontinuous samples. This would be easier for developers later if face
>>> the same issue, they don't need to spend time to locate issue and
>>> struggle
>>> for overriding the error.
>>>
>>> If you prefer to use force option, it might be better to give
>>> reasoning and
>>> *suggestion* in one go, something like:
>>>
>>> if (stop_addr < start_addr):
>>> print("Packet Discontinuity detected [stop_add:0x%x
>>> start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
>>> print("Use option '-f' following the script for force mode"
>>> if (options.force)
>>> return
>>>
>>> Either way is fine for me. Thanks a lot for taking time on the issue.
>>>
>>> Leo
>>
>> But your diff looks good Ganapat, I think send a patch with Leo's
>> extra help message added and the first force flipped.
>
> One other small detail about Leo's suggestion print out. Can you add an
> instruction of how to keep the warnings as well:
>
> print("Use option '-f' following the script for force mode. Add '-v' \
> to continue printing decode warnings.")
>
Thanks James and Leo for your comments.
I will send the V2 with the changes as discussed.
Thanks.
Ganapat
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-08 11:14 ` Ganapatrao Kulkarni
@ 2024-08-08 15:01 ` Mike Leach
0 siblings, 0 replies; 45+ messages in thread
From: Mike Leach @ 2024-08-08 15:01 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: James Clark, Leo Yan, scclevenger, acme, coresight,
linux-arm-kernel, linux-kernel, darren, james.clark,
suzuki.poulose, Al.Grant
On Thu, 8 Aug 2024 at 12:15, Ganapatrao Kulkarni
<gankulkarni@os.amperecomputing.com> wrote:
>
>
>
> On 08-08-2024 04:21 pm, James Clark wrote:
> >
> >
> > On 08/08/2024 10:21 am, James Clark wrote:
> >>
> >>
> >> On 08/08/2024 8:42 am, Leo Yan wrote:
> >>> On 8/8/2024 5:36 AM, Ganapatrao Kulkarni wrote:
> >>>>
> >>>> On 08-08-2024 12:50 am, Leo Yan wrote:
> >>>>> On 8/7/2024 5:18 PM, Ganapatrao Kulkarni wrote:
> >>>>>
> >>>>>> Is below diff with force option looks good?
> >>>>>>
> >>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>> index d973c2baed1c..efe34f308beb 100755
> >>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>> @@ -36,7 +36,10 @@ option_list = [
> >>>>>> help="Set path to objdump executable file"),
> >>>>>> make_option("-v", "--verbose", dest="verbose",
> >>>>>> action="store_true", default=False,
> >>>>>> - help="Enable debugging log")
> >>>>>> + help="Enable debugging log"),
> >>>>>> + make_option("-f", "--force", dest="force",
> >>>>>> + action="store_true", default=False,
> >>>>>> + help="Force decoder to continue")
> >>>>>> ]
> >>>>>>
> >>>>>> parser = OptionParser(option_list=option_list)
> >>>>>> @@ -257,6 +260,12 @@ def process_event(param_dict):
> >>>>>> print("Stop address 0x%x is out of range [ 0x%x
> >>>>>> .. 0x%x
> >>>>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
> >>>>>> return
> >>>>>>
> >>>>>> + if (stop_addr < start_addr):
> >>>>>> + if (options.verbose == True or options.force):
> >>>>>> + print("Packet Discontinuity detected
> >>>>>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr,
> >>>>>> start_addr, dso))
> >>
> >> The options.force for the print should be "options.verbose or not
> >> options.force" I think? You want to print the error until the user
> >> adds -f, then hide it. Unless verbose is on.
> >>
> >>>>>> + if (options.force):
> >>>>>> + return
> >>
> >> Oops I had this one the wrong way around in my example. This way is
> >> correct.
> >>
> >>>>>
> >>>>> I struggled a bit for the code - it is confused that force mode
> >>>>> bails out
> >>>>> and the non-force mode continues to run. I prefer to always bail
> >>>>> out for
> >>>>> the discontinuity case, as it is pointless to continue in this case.
> >>>>
> >>>> Kept bail out with force option since I though it is not good to hide
> >>>> the error in normal use, otherwise we never able to notice this
> >>>> error in
> >>>> the future and it becomes default hidden. Eventually this error should
> >>>> be fixed.
> >>>
> >>> As James said, the issue should be fixed in OpenCSD or Perf decoding
> >>> flow.
> >>>
> >>> Thus, perf tool should be tolerant errors - report warning and drop
> >>> discontinuous samples. This would be easier for developers later if face
> >>> the same issue, they don't need to spend time to locate issue and
> >>> struggle
> >>> for overriding the error.
> >>>
> >>> If you prefer to use force option, it might be better to give
> >>> reasoning and
> >>> *suggestion* in one go, something like:
> >>>
> >>> if (stop_addr < start_addr):
> >>> print("Packet Discontinuity detected [stop_add:0x%x
> >>> start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
> >>> print("Use option '-f' following the script for force mode"
> >>> if (options.force)
> >>> return
> >>>
> >>> Either way is fine for me. Thanks a lot for taking time on the issue.
> >>>
> >>> Leo
> >>
> >> But your diff looks good Ganapat, I think send a patch with Leo's
> >> extra help message added and the first force flipped.
> >
> > One other small detail about Leo's suggestion print out. Can you add an
> > instruction of how to keep the warnings as well:
> >
> > print("Use option '-f' following the script for force mode. Add '-v' \
> > to continue printing decode warnings.")
> >
>
> Thanks James and Leo for your comments.
> I will send the V2 with the changes as discussed.
>
> Thanks.
> Ganapat
>
Certainly any ARM trace decode is dependent on accurate program images
being input to provide correct trace decode at the output.
So if an OpenCSD client does not provide accurate information then it
should not really expect to get accurate trace as an output!
That said there are certainly a couple of changes that can be made:-
1) Clearly outputting a trace range with a finish address lower than
the start address is incorrect. This can unconditionally output a hard
error.
2) Detection of non-cond not taken should be added as a configurable
option. - either all, or direct only. This can be achieved by adding
flags to the library configuration API.
For a client like perf - these could be controlled by the verbose
level - which I believe is an int in the range 0-10 or something?
However - when we do detect these errors, it is essential that the
entire decoder is reset and tracing not restarted till the next sync
point in the trace data.
i.e. assuming that the next range that happens to be consecutive after
a break, given a prior input of incorrect address data is simply
invalid. There is no way of knowing if the branch taken / not taken
sequence matches the addresses in the program image any more.
The solution that James proposes above, needs to actually generate an
error which will then automatically reset the decoder to an unsynced
state.
Regards
Mike
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-08 9:32 ` James Clark
2024-08-08 11:05 ` Leo Yan
@ 2024-08-09 14:13 ` Mike Leach
2024-08-09 15:19 ` James Clark
1 sibling, 1 reply; 45+ messages in thread
From: Mike Leach @ 2024-08-09 14:13 UTC (permalink / raw)
To: James Clark
Cc: Leo Yan, Ganapatrao Kulkarni, scclevenger, acme, coresight,
linux-arm-kernel, linux-kernel, darren, james.clark,
suzuki.poulose, Al.Grant
Hi James
On Thu, 8 Aug 2024 at 10:32, James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 07/08/2024 5:48 pm, Leo Yan wrote:
> > Hi all,
> >
> > On 8/7/2024 3:53 PM, James Clark wrote:
> >
> > A minor suggestion: if the discussion is too long, please delete the
> > irrelevant message ;)
> >
> > [...]
> >
> >>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>> @@ -257,6 +257,11 @@ def process_event(param_dict):
> >>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
> >>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
> >>> return
> >>>
> >>> + if (stop_addr < start_addr):
> >>> + if (options.verbose == True):
> >>> + print("Packet Dropped, Discontinuity detected
> >>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
> >>> dso))
> >>> + return
> >>> +
> >>
> >> I suppose my only concern with this is that it hides real errors and
> >> Perf shouldn't be outputting samples that go backwards. Considering that
> >> fixing this in OpenCSD and Perf has a much wider benefit I think that
> >> should be the ultimate goal. I'm putting this on my todo list for now
> >> (including Steve's merging idea).
> >
> > In the perf's util/cs-etm.c file, it handles DISCONTINUITY with:
> >
> > case CS_ETM_DISCONTINUITY:
> > /*
> > * The trace is discontinuous, if the previous packet is
> > * instruction packet, set flag PERF_IP_FLAG_TRACE_END
> > * for previous packet.
> > */
> > if (prev_packet->sample_type == CS_ETM_RANGE)
> > prev_packet->flags |= PERF_IP_FLAG_BRANCH |
> > PERF_IP_FLAG_TRACE_END;
> >
> > I am wandering if OpenCSD has passed the correct info so Perf decoder can
> > detect the discontinuity. If yes, then the flag 'PERF_IP_FLAG_TRACE_END' will
> > be set (it is a general flag in branch sample), then we can consider use it in
> > the python script to handle discontinuous data.
>
> No OpenCSD isn't passing the correct info here. Higher up in the thread
> I suggested an OpenCSD patch that makes it detect the error earlier and
> fixes the issue. It also needs to output a discontinuity when the
> address goes backwards. So two fixes and then the script works without
> modifications.
>
Which address is going backwards here? - OpenCSD generates trace
ranges only by walking forwards from the last known address till it
hits a branch. Unless this wraps round 0x000000 this will never result
in a backwards address as far as I can see.
Do you have an example dump with OpenCSD outputting a range packet
with backwards addresses?
Mike
> >
> >>
> >> But in the mean time what about having a force option?
> >>
> >>> + if (stop_addr < start_addr):
> >>> + if (options.verbose == True or not options.force):
> >>> + print("Packet Dropped, Discontinuity detected
> >>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
> >>> dso))
> >>> + if (not options.force):
> >>> + return
> >
> > If the stop address is less than the start address, it must be something
> > wrong. In this case, we can report a warning for discontinuity and directly
> > return (also need to save the `addr` into global variable for next parsing).
> >
> > I prefer to not add force option for this case - eventually, this will consume
> > much time for reporting this kind of failure and need to root causing it. A
> > better way is we just print out the reasoning in the log and continue to dump.
>
> But in this case we've identified all the known issues that would cause
> the script to fail and we can fix them in Perf and OpenCSD. There may
> not even be any more issues that will cause the script to fail in the
> future so there's no point in softening the error IMO. That will only
> hide future issues (of which there may be none) and make root causing
> harder when it hits some other tool.
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-09 14:13 ` Mike Leach
@ 2024-08-09 15:19 ` James Clark
2024-08-19 10:59 ` Mike Leach
0 siblings, 1 reply; 45+ messages in thread
From: James Clark @ 2024-08-09 15:19 UTC (permalink / raw)
To: Mike Leach
Cc: Leo Yan, Ganapatrao Kulkarni, scclevenger, acme, coresight,
linux-arm-kernel, linux-kernel, darren, james.clark,
suzuki.poulose, Al.Grant
On 09/08/2024 3:13 pm, Mike Leach wrote:
> Hi James
>
> On Thu, 8 Aug 2024 at 10:32, James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 07/08/2024 5:48 pm, Leo Yan wrote:
>>> Hi all,
>>>
>>> On 8/7/2024 3:53 PM, James Clark wrote:
>>>
>>> A minor suggestion: if the discussion is too long, please delete the
>>> irrelevant message ;)
>>>
>>> [...]
>>>
>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> @@ -257,6 +257,11 @@ def process_event(param_dict):
>>>>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
>>>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>>>>> return
>>>>>
>>>>> + if (stop_addr < start_addr):
>>>>> + if (options.verbose == True):
>>>>> + print("Packet Dropped, Discontinuity detected
>>>>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
>>>>> dso))
>>>>> + return
>>>>> +
>>>>
>>>> I suppose my only concern with this is that it hides real errors and
>>>> Perf shouldn't be outputting samples that go backwards. Considering that
>>>> fixing this in OpenCSD and Perf has a much wider benefit I think that
>>>> should be the ultimate goal. I'm putting this on my todo list for now
>>>> (including Steve's merging idea).
>>>
>>> In the perf's util/cs-etm.c file, it handles DISCONTINUITY with:
>>>
>>> case CS_ETM_DISCONTINUITY:
>>> /*
>>> * The trace is discontinuous, if the previous packet is
>>> * instruction packet, set flag PERF_IP_FLAG_TRACE_END
>>> * for previous packet.
>>> */
>>> if (prev_packet->sample_type == CS_ETM_RANGE)
>>> prev_packet->flags |= PERF_IP_FLAG_BRANCH |
>>> PERF_IP_FLAG_TRACE_END;
>>>
>>> I am wandering if OpenCSD has passed the correct info so Perf decoder can
>>> detect the discontinuity. If yes, then the flag 'PERF_IP_FLAG_TRACE_END' will
>>> be set (it is a general flag in branch sample), then we can consider use it in
>>> the python script to handle discontinuous data.
>>
>> No OpenCSD isn't passing the correct info here. Higher up in the thread
>> I suggested an OpenCSD patch that makes it detect the error earlier and
>> fixes the issue. It also needs to output a discontinuity when the
>> address goes backwards. So two fixes and then the script works without
>> modifications.
>>
>
> Which address is going backwards here? - OpenCSD generates trace
> ranges only by walking forwards from the last known address till it
> hits a branch. Unless this wraps round 0x000000 this will never result
> in a backwards address as far as I can see.
> Do you have an example dump with OpenCSD outputting a range packet
> with backwards addresses?
>
> Mike
>
The example I have I think is something like this:
1. Start address / trace on
2. E
3. Output range
...
4. Periodic address update
...
5. E
6. Output range
If decode has gone wrong (but undetectably) between steps 1 and 3. Then
the next steps still output a second range based on the last periodic
address received. (I think it might not necessarily have to be a
periodic address but could also be indirect address packet?). Perf
converts the ranges into branch samples by taking the end of the first
range and beginning of the second range. Then the disassembly script
converts those samples into ranges again by taking the source and
destination of the last two branch samples.
The original issue that Ganapat saw was that the periodic address causes
OpenCSD to put the source address of the second range somewhere before
the first one, even though it didn't output a branch or discontinuity
that would explain how it got there.
But yes you're right the ranges themselves always go forwards from the
point of view of their own start and end addresses.
I thought it might be possible for OpenCSD to check against the last
range output? Although I wasn't sure if maybe it's actually valid to do
a backwards jump like that without the trace on/off packets with address
filtering or something?
The root cause is still the incorrect image, but I think this check
along with the other direct branch check should make it pretty difficult
for people to make the mistake.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-09 15:19 ` James Clark
@ 2024-08-19 10:59 ` Mike Leach
2024-08-23 9:03 ` James Clark
0 siblings, 1 reply; 45+ messages in thread
From: Mike Leach @ 2024-08-19 10:59 UTC (permalink / raw)
To: James Clark
Cc: Leo Yan, Ganapatrao Kulkarni, scclevenger, acme, coresight,
linux-arm-kernel, linux-kernel, darren, james.clark,
suzuki.poulose, Al.Grant
Hi,
A new branch of OpenCSD is available - ocsd-consistency-checks-1.5.4-rc1
Testing I managed to do confirms the N atom on unconditional branches
appear to work. I do not have a test case for the range
discontinuities.
The checks are enabled using operation flags on decoder creation. See
the docs for details.
Mike
On Fri, 9 Aug 2024 at 16:20, James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 09/08/2024 3:13 pm, Mike Leach wrote:
> > Hi James
> >
> > On Thu, 8 Aug 2024 at 10:32, James Clark <james.clark@linaro.org> wrote:
> >>
> >>
> >>
> >> On 07/08/2024 5:48 pm, Leo Yan wrote:
> >>> Hi all,
> >>>
> >>> On 8/7/2024 3:53 PM, James Clark wrote:
> >>>
> >>> A minor suggestion: if the discussion is too long, please delete the
> >>> irrelevant message ;)
> >>>
> >>> [...]
> >>>
> >>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>> @@ -257,6 +257,11 @@ def process_event(param_dict):
> >>>>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
> >>>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
> >>>>> return
> >>>>>
> >>>>> + if (stop_addr < start_addr):
> >>>>> + if (options.verbose == True):
> >>>>> + print("Packet Dropped, Discontinuity detected
> >>>>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
> >>>>> dso))
> >>>>> + return
> >>>>> +
> >>>>
> >>>> I suppose my only concern with this is that it hides real errors and
> >>>> Perf shouldn't be outputting samples that go backwards. Considering that
> >>>> fixing this in OpenCSD and Perf has a much wider benefit I think that
> >>>> should be the ultimate goal. I'm putting this on my todo list for now
> >>>> (including Steve's merging idea).
> >>>
> >>> In the perf's util/cs-etm.c file, it handles DISCONTINUITY with:
> >>>
> >>> case CS_ETM_DISCONTINUITY:
> >>> /*
> >>> * The trace is discontinuous, if the previous packet is
> >>> * instruction packet, set flag PERF_IP_FLAG_TRACE_END
> >>> * for previous packet.
> >>> */
> >>> if (prev_packet->sample_type == CS_ETM_RANGE)
> >>> prev_packet->flags |= PERF_IP_FLAG_BRANCH |
> >>> PERF_IP_FLAG_TRACE_END;
> >>>
> >>> I am wandering if OpenCSD has passed the correct info so Perf decoder can
> >>> detect the discontinuity. If yes, then the flag 'PERF_IP_FLAG_TRACE_END' will
> >>> be set (it is a general flag in branch sample), then we can consider use it in
> >>> the python script to handle discontinuous data.
> >>
> >> No OpenCSD isn't passing the correct info here. Higher up in the thread
> >> I suggested an OpenCSD patch that makes it detect the error earlier and
> >> fixes the issue. It also needs to output a discontinuity when the
> >> address goes backwards. So two fixes and then the script works without
> >> modifications.
> >>
> >
> > Which address is going backwards here? - OpenCSD generates trace
> > ranges only by walking forwards from the last known address till it
> > hits a branch. Unless this wraps round 0x000000 this will never result
> > in a backwards address as far as I can see.
> > Do you have an example dump with OpenCSD outputting a range packet
> > with backwards addresses?
> >
> > Mike
> >
> The example I have I think is something like this:
>
> 1. Start address / trace on
> 2. E
> 3. Output range
> ...
> 4. Periodic address update
> ...
> 5. E
> 6. Output range
>
> If decode has gone wrong (but undetectably) between steps 1 and 3. Then
> the next steps still output a second range based on the last periodic
> address received. (I think it might not necessarily have to be a
> periodic address but could also be indirect address packet?). Perf
> converts the ranges into branch samples by taking the end of the first
> range and beginning of the second range. Then the disassembly script
> converts those samples into ranges again by taking the source and
> destination of the last two branch samples.
>
> The original issue that Ganapat saw was that the periodic address causes
> OpenCSD to put the source address of the second range somewhere before
> the first one, even though it didn't output a branch or discontinuity
> that would explain how it got there.
>
> But yes you're right the ranges themselves always go forwards from the
> point of view of their own start and end addresses.
>
> I thought it might be possible for OpenCSD to check against the last
> range output? Although I wasn't sure if maybe it's actually valid to do
> a backwards jump like that without the trace on/off packets with address
> filtering or something?
>
> The root cause is still the incorrect image, but I think this check
> along with the other direct branch check should make it pretty difficult
> for people to make the mistake.
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-19 10:59 ` Mike Leach
@ 2024-08-23 9:03 ` James Clark
2024-08-23 9:57 ` Ganapatrao Kulkarni
2024-08-28 9:33 ` Mike Leach
0 siblings, 2 replies; 45+ messages in thread
From: James Clark @ 2024-08-23 9:03 UTC (permalink / raw)
To: Mike Leach, Ganapatrao Kulkarni
Cc: Leo Yan, scclevenger, acme, coresight, linux-arm-kernel,
linux-kernel, darren, james.clark, suzuki.poulose, Al.Grant
On 19/08/2024 11:59 am, Mike Leach wrote:
> Hi,
>
> A new branch of OpenCSD is available - ocsd-consistency-checks-1.5.4-rc1
>
> Testing I managed to do confirms the N atom on unconditional branches
> appear to work. I do not have a test case for the range
> discontinuities.
>
> The checks are enabled using operation flags on decoder creation. See
> the docs for details.
>
> Mike
>
Hi Mike,
I tested the new OpenCSD and I don't see the error anymore in the
disassembly script. I'm not sure if we need to go any further and add
the backwards check, it looks like just a later symptom and the checks
that you've added already prevent it.
If you release a new version I can send the perf patch. I was going to
use these flags if that looks right to you? As far as I know that's the
set that can be always on and won't fail on bad hardware?
I also assumed that ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK can be given even
for etmv3 and it's just a nop?
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index e917985bbbe6..90967fd807e6 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -685,9 +685,14 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params,
return 0;
if (d_params->operation == CS_ETM_OPERATION_DECODE) {
+ int decode_flags = OCSD_CREATE_FLG_FULL_DECODER;
+#ifdef OCSD_OPFLG_N_UNCOND_DIR_BR_CHK
+ decode_flags |= OCSD_OPFLG_N_UNCOND_DIR_BR_CHK | OCSD_OPFLG_CHK_RANGE_CONTINUE |
+ ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK;
+#endif
if (ocsd_dt_create_decoder(decoder->dcd_tree,
decoder->decoder_name,
- OCSD_CREATE_FLG_FULL_DECODER,
+ decode_flags,
trace_config, &csid))
return -1;
> On Fri, 9 Aug 2024 at 16:20, James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 09/08/2024 3:13 pm, Mike Leach wrote:
>>> Hi James
>>>
>>> On Thu, 8 Aug 2024 at 10:32, James Clark <james.clark@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 07/08/2024 5:48 pm, Leo Yan wrote:
>>>>> Hi all,
>>>>>
>>>>> On 8/7/2024 3:53 PM, James Clark wrote:
>>>>>
>>>>> A minor suggestion: if the discussion is too long, please delete the
>>>>> irrelevant message ;)
>>>>>
>>>>> [...]
>>>>>
>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>> @@ -257,6 +257,11 @@ def process_event(param_dict):
>>>>>>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
>>>>>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>>>>>>> return
>>>>>>>
>>>>>>> + if (stop_addr < start_addr):
>>>>>>> + if (options.verbose == True):
>>>>>>> + print("Packet Dropped, Discontinuity detected
>>>>>>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
>>>>>>> dso))
>>>>>>> + return
>>>>>>> +
>>>>>>
>>>>>> I suppose my only concern with this is that it hides real errors and
>>>>>> Perf shouldn't be outputting samples that go backwards. Considering that
>>>>>> fixing this in OpenCSD and Perf has a much wider benefit I think that
>>>>>> should be the ultimate goal. I'm putting this on my todo list for now
>>>>>> (including Steve's merging idea).
>>>>>
>>>>> In the perf's util/cs-etm.c file, it handles DISCONTINUITY with:
>>>>>
>>>>> case CS_ETM_DISCONTINUITY:
>>>>> /*
>>>>> * The trace is discontinuous, if the previous packet is
>>>>> * instruction packet, set flag PERF_IP_FLAG_TRACE_END
>>>>> * for previous packet.
>>>>> */
>>>>> if (prev_packet->sample_type == CS_ETM_RANGE)
>>>>> prev_packet->flags |= PERF_IP_FLAG_BRANCH |
>>>>> PERF_IP_FLAG_TRACE_END;
>>>>>
>>>>> I am wandering if OpenCSD has passed the correct info so Perf decoder can
>>>>> detect the discontinuity. If yes, then the flag 'PERF_IP_FLAG_TRACE_END' will
>>>>> be set (it is a general flag in branch sample), then we can consider use it in
>>>>> the python script to handle discontinuous data.
>>>>
>>>> No OpenCSD isn't passing the correct info here. Higher up in the thread
>>>> I suggested an OpenCSD patch that makes it detect the error earlier and
>>>> fixes the issue. It also needs to output a discontinuity when the
>>>> address goes backwards. So two fixes and then the script works without
>>>> modifications.
>>>>
>>>
>>> Which address is going backwards here? - OpenCSD generates trace
>>> ranges only by walking forwards from the last known address till it
>>> hits a branch. Unless this wraps round 0x000000 this will never result
>>> in a backwards address as far as I can see.
>>> Do you have an example dump with OpenCSD outputting a range packet
>>> with backwards addresses?
>>>
>>> Mike
>>>
>> The example I have I think is something like this:
>>
>> 1. Start address / trace on
>> 2. E
>> 3. Output range
>> ...
>> 4. Periodic address update
>> ...
>> 5. E
>> 6. Output range
>>
>> If decode has gone wrong (but undetectably) between steps 1 and 3. Then
>> the next steps still output a second range based on the last periodic
>> address received. (I think it might not necessarily have to be a
>> periodic address but could also be indirect address packet?). Perf
>> converts the ranges into branch samples by taking the end of the first
>> range and beginning of the second range. Then the disassembly script
>> converts those samples into ranges again by taking the source and
>> destination of the last two branch samples.
>>
>> The original issue that Ganapat saw was that the periodic address causes
>> OpenCSD to put the source address of the second range somewhere before
>> the first one, even though it didn't output a branch or discontinuity
>> that would explain how it got there.
>>
>> But yes you're right the ranges themselves always go forwards from the
>> point of view of their own start and end addresses.
>>
>> I thought it might be possible for OpenCSD to check against the last
>> range output? Although I wasn't sure if maybe it's actually valid to do
>> a backwards jump like that without the trace on/off packets with address
>> filtering or something?
>>
>> The root cause is still the incorrect image, but I think this check
>> along with the other direct branch check should make it pretty difficult
>> for people to make the mistake.
>
>
>
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-23 9:03 ` James Clark
@ 2024-08-23 9:57 ` Ganapatrao Kulkarni
2024-08-23 10:36 ` James Clark
2024-08-30 9:58 ` James Clark
2024-08-28 9:33 ` Mike Leach
1 sibling, 2 replies; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-08-23 9:57 UTC (permalink / raw)
To: James Clark, Mike Leach
Cc: Leo Yan, scclevenger, acme, coresight, linux-arm-kernel,
linux-kernel, darren, james.clark, suzuki.poulose, Al.Grant
Hi James/Mike,
On 23-08-2024 02:33 pm, James Clark wrote:
>
>
> On 19/08/2024 11:59 am, Mike Leach wrote:
>> Hi,
>>
>> A new branch of OpenCSD is available - ocsd-consistency-checks-1.5.4-rc1
>>
>> Testing I managed to do confirms the N atom on unconditional branches
>> appear to work. I do not have a test case for the range
>> discontinuities.
>>
>> The checks are enabled using operation flags on decoder creation. See
>> the docs for details.
>>
>> Mike
>>
>
> Hi Mike,
>
> I tested the new OpenCSD and I don't see the error anymore in the
> disassembly script. I'm not sure if we need to go any further and add
> the backwards check, it looks like just a later symptom and the checks
> that you've added already prevent it.
>
> If you release a new version I can send the perf patch. I was going to
> use these flags if that looks right to you? As far as I know that's the
> set that can be always on and won't fail on bad hardware?
>
> I also assumed that ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK can be given even
> for etmv3 and it's just a nop?
>
> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> index e917985bbbe6..90967fd807e6 100644
> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> @@ -685,9 +685,14 @@ cs_etm_decoder__create_etm_decoder(struct
> cs_etm_decoder_params *d_params,
> return 0;
>
> if (d_params->operation == CS_ETM_OPERATION_DECODE) {
> + int decode_flags = OCSD_CREATE_FLG_FULL_DECODER;
> +#ifdef OCSD_OPFLG_N_UNCOND_DIR_BR_CHK
> + decode_flags |= OCSD_OPFLG_N_UNCOND_DIR_BR_CHK |
> OCSD_OPFLG_CHK_RANGE_CONTINUE |
> + ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK;
> +#endif
> if (ocsd_dt_create_decoder(decoder->dcd_tree,
> decoder->decoder_name,
> - OCSD_CREATE_FLG_FULL_DECODER,
> + decode_flags,
> trace_config, &csid))
> return -1;
>
I tried Mike's branch with above James's patch and still the segfault is
happening to us.
--
Thanks,
Ganapat/GK
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-23 9:57 ` Ganapatrao Kulkarni
@ 2024-08-23 10:36 ` James Clark
2024-08-23 10:37 ` James Clark
2024-08-30 9:58 ` James Clark
1 sibling, 1 reply; 45+ messages in thread
From: James Clark @ 2024-08-23 10:36 UTC (permalink / raw)
To: Ganapatrao Kulkarni, Mike Leach
Cc: Leo Yan, scclevenger, acme, coresight, linux-arm-kernel,
linux-kernel, darren, james.clark, suzuki.poulose, Al.Grant
On 23/08/2024 10:57 am, Ganapatrao Kulkarni wrote:
>
> Hi James/Mike,
>
> On 23-08-2024 02:33 pm, James Clark wrote:
>>
>>
>> On 19/08/2024 11:59 am, Mike Leach wrote:
>>> Hi,
>>>
>>> A new branch of OpenCSD is available - ocsd-consistency-checks-1.5.4-rc1
>>>
>>> Testing I managed to do confirms the N atom on unconditional branches
>>> appear to work. I do not have a test case for the range
>>> discontinuities.
>>>
>>> The checks are enabled using operation flags on decoder creation. See
>>> the docs for details.
>>>
>>> Mike
>>>
>>
>> Hi Mike,
>>
>> I tested the new OpenCSD and I don't see the error anymore in the
>> disassembly script. I'm not sure if we need to go any further and add
>> the backwards check, it looks like just a later symptom and the checks
>> that you've added already prevent it.
>>
>> If you release a new version I can send the perf patch. I was going to
>> use these flags if that looks right to you? As far as I know that's the
>> set that can be always on and won't fail on bad hardware?
>>
>> I also assumed that ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK can be given even
>> for etmv3 and it's just a nop?
>>
>> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> index e917985bbbe6..90967fd807e6 100644
>> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> @@ -685,9 +685,14 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params,
>> return 0;
>>
>> if (d_params->operation == CS_ETM_OPERATION_DECODE) {
>> + int decode_flags = OCSD_CREATE_FLG_FULL_DECODER;
>> +#ifdef OCSD_OPFLG_N_UNCOND_DIR_BR_CHK
>> + decode_flags |= OCSD_OPFLG_N_UNCOND_DIR_BR_CHK | OCSD_OPFLG_CHK_RANGE_CONTINUE |
>> + ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK;
>> +#endif
>> if (ocsd_dt_create_decoder(decoder->dcd_tree,
>> decoder->decoder_name,
>> - OCSD_CREATE_FLG_FULL_DECODER,
>> + decode_flags,
>> trace_config, &csid))
>> return -1;
>>
>
> I tried Mike's branch with above James's patch and still the segfault is happening to us.
>
Did you update OpenCSD as well?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-23 10:36 ` James Clark
@ 2024-08-23 10:37 ` James Clark
0 siblings, 0 replies; 45+ messages in thread
From: James Clark @ 2024-08-23 10:37 UTC (permalink / raw)
To: Ganapatrao Kulkarni, Mike Leach
Cc: Leo Yan, scclevenger, acme, coresight, linux-arm-kernel,
linux-kernel, darren, james.clark, suzuki.poulose, Al.Grant
On 23/08/2024 11:36 am, James Clark wrote:
>
>
> On 23/08/2024 10:57 am, Ganapatrao Kulkarni wrote:
>>
>> Hi James/Mike,
>>
>> On 23-08-2024 02:33 pm, James Clark wrote:
>>>
>>>
>>> On 19/08/2024 11:59 am, Mike Leach wrote:
>>>> Hi,
>>>>
>>>> A new branch of OpenCSD is available - ocsd-consistency-checks-1.5.4-rc1
>>>>
>>>> Testing I managed to do confirms the N atom on unconditional branches
>>>> appear to work. I do not have a test case for the range
>>>> discontinuities.
>>>>
>>>> The checks are enabled using operation flags on decoder creation. See
>>>> the docs for details.
>>>>
>>>> Mike
>>>>
>>>
>>> Hi Mike,
>>>
>>> I tested the new OpenCSD and I don't see the error anymore in the
>>> disassembly script. I'm not sure if we need to go any further and add
>>> the backwards check, it looks like just a later symptom and the checks
>>> that you've added already prevent it.
>>>
>>> If you release a new version I can send the perf patch. I was going to
>>> use these flags if that looks right to you? As far as I know that's the
>>> set that can be always on and won't fail on bad hardware?
>>>
>>> I also assumed that ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK can be given even
>>> for etmv3 and it's just a nop?
>>>
>>> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>>> index e917985bbbe6..90967fd807e6 100644
>>> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>>> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>>> @@ -685,9 +685,14 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params,
>>> return 0;
>>>
>>> if (d_params->operation == CS_ETM_OPERATION_DECODE) {
>>> + int decode_flags = OCSD_CREATE_FLG_FULL_DECODER;
>>> +#ifdef OCSD_OPFLG_N_UNCOND_DIR_BR_CHK
>>> + decode_flags |= OCSD_OPFLG_N_UNCOND_DIR_BR_CHK | OCSD_OPFLG_CHK_RANGE_CONTINUE |
>>> + ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK;
>>> +#endif
>>> if (ocsd_dt_create_decoder(decoder->dcd_tree,
>>> decoder->decoder_name,
>>> - OCSD_CREATE_FLG_FULL_DECODER,
>>> + decode_flags,
>>> trace_config, &csid))
>>> return -1;
>>>
>>
>> I tried Mike's branch with above James's patch and still the segfault is happening to us.
>>
>
> Did you update OpenCSD as well?
>
Oh sorry I only read the second part I see you did.
Can you share your perf.data file? And do you see any of the new warnings:
DCD_ETMV4_0018 : 0x002e (OCSD_ERR_BAD_DECODE_IMAGE) [Mismatch between trace packets and decode image.]; TrcIdx=3059; CS ID=12; Bad program image - N Atom on unconditional direct BR.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-23 9:03 ` James Clark
2024-08-23 9:57 ` Ganapatrao Kulkarni
@ 2024-08-28 9:33 ` Mike Leach
2024-08-29 13:35 ` James Clark
1 sibling, 1 reply; 45+ messages in thread
From: Mike Leach @ 2024-08-28 9:33 UTC (permalink / raw)
To: James Clark
Cc: Ganapatrao Kulkarni, Leo Yan, scclevenger, acme, coresight,
linux-arm-kernel, linux-kernel, darren, james.clark,
suzuki.poulose, Al.Grant
Hi James,
On Fri, 23 Aug 2024 at 10:03, James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 19/08/2024 11:59 am, Mike Leach wrote:
> > Hi,
> >
> > A new branch of OpenCSD is available - ocsd-consistency-checks-1.5.4-rc1
> >
> > Testing I managed to do confirms the N atom on unconditional branches
> > appear to work. I do not have a test case for the range
> > discontinuities.
> >
> > The checks are enabled using operation flags on decoder creation. See
> > the docs for details.
> >
> > Mike
> >
>
> Hi Mike,
>
> I tested the new OpenCSD and I don't see the error anymore in the
> disassembly script. I'm not sure if we need to go any further and add
> the backwards check, it looks like just a later symptom and the checks
> that you've added already prevent it.
>
The OCSD_OPFLG_CHK_RANGE_CONTINUE is the backwards address check - at
least as so far as is possible in OpenCSD.
What it checks is if the next range after a not taken branch starts at
the end address of the previous range. However this check is cancelled
if there are other packets that intervene - e.g. trace on / exceptions
/ anything that might imply a discontinuity.
The other caveat is that I did not have an example to see if the code
could actually get triggered - though I will go back and manually
trigger it in the debugger just to test functional correctness.
If you are still seeing backwards addresses after these changes then I
am not sure where they are coming from. It may be there is a missing
discontinuity somewhere that is not being flagged.
The other alternative that does occur to me now - thinking about
incorrect images, is if we incorrectly associate an atom with a direct
branch rather than an indirect branch.
For a direct branch the decoder will calculate the target and carry on
- not looking for an address update as it is not needed. Then when the
address update does arrive, it is used as a latest address and the
range address will be updated.
Unfortunately this would be difficult to test for - the decoder is
written to assume good trace and correct images - adding in code to
try to remember previous state and judge if something is wrong,
without getting false positives is difficult. It adds code complexity
that is not necessary for well behaved clients!
> If you release a new version I can send the perf patch. I was going to
> use these flags if that looks right to you? As far as I know that's the
> set that can be always on and won't fail on bad hardware?
>
That set of flags is fine -
> I also assumed that ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK can be given even
> for etmv3 and it's just a nop?
>
It is safe - as there is no flag for ETMv3 in the same slot.
Effectively decode flags have a common set of bits and decoder
specific set of bits that overlap for each decoder. We have not
really needed anything for ETMv3 to date, and I don't expect that to
change.
I'll get the new version released by the end of the week. I am off on
sabbatical for a month after that so any further investigation /
changes will have to wait
Regards
Mike
> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> index e917985bbbe6..90967fd807e6 100644
> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> @@ -685,9 +685,14 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params,
> return 0;
>
> if (d_params->operation == CS_ETM_OPERATION_DECODE) {
> + int decode_flags = OCSD_CREATE_FLG_FULL_DECODER;
> +#ifdef OCSD_OPFLG_N_UNCOND_DIR_BR_CHK
> + decode_flags |= OCSD_OPFLG_N_UNCOND_DIR_BR_CHK | OCSD_OPFLG_CHK_RANGE_CONTINUE |
> + ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK;
> +#endif
> if (ocsd_dt_create_decoder(decoder->dcd_tree,
> decoder->decoder_name,
> - OCSD_CREATE_FLG_FULL_DECODER,
> + decode_flags,
> trace_config, &csid))
> return -1;
>
> > On Fri, 9 Aug 2024 at 16:20, James Clark <james.clark@linaro.org> wrote:
> >>
> >>
> >>
> >> On 09/08/2024 3:13 pm, Mike Leach wrote:
> >>> Hi James
> >>>
> >>> On Thu, 8 Aug 2024 at 10:32, James Clark <james.clark@linaro.org> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 07/08/2024 5:48 pm, Leo Yan wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> On 8/7/2024 3:53 PM, James Clark wrote:
> >>>>>
> >>>>> A minor suggestion: if the discussion is too long, please delete the
> >>>>> irrelevant message ;)
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> >>>>>>> @@ -257,6 +257,11 @@ def process_event(param_dict):
> >>>>>>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
> >>>>>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
> >>>>>>> return
> >>>>>>>
> >>>>>>> + if (stop_addr < start_addr):
> >>>>>>> + if (options.verbose == True):
> >>>>>>> + print("Packet Dropped, Discontinuity detected
> >>>>>>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
> >>>>>>> dso))
> >>>>>>> + return
> >>>>>>> +
> >>>>>>
> >>>>>> I suppose my only concern with this is that it hides real errors and
> >>>>>> Perf shouldn't be outputting samples that go backwards. Considering that
> >>>>>> fixing this in OpenCSD and Perf has a much wider benefit I think that
> >>>>>> should be the ultimate goal. I'm putting this on my todo list for now
> >>>>>> (including Steve's merging idea).
> >>>>>
> >>>>> In the perf's util/cs-etm.c file, it handles DISCONTINUITY with:
> >>>>>
> >>>>> case CS_ETM_DISCONTINUITY:
> >>>>> /*
> >>>>> * The trace is discontinuous, if the previous packet is
> >>>>> * instruction packet, set flag PERF_IP_FLAG_TRACE_END
> >>>>> * for previous packet.
> >>>>> */
> >>>>> if (prev_packet->sample_type == CS_ETM_RANGE)
> >>>>> prev_packet->flags |= PERF_IP_FLAG_BRANCH |
> >>>>> PERF_IP_FLAG_TRACE_END;
> >>>>>
> >>>>> I am wandering if OpenCSD has passed the correct info so Perf decoder can
> >>>>> detect the discontinuity. If yes, then the flag 'PERF_IP_FLAG_TRACE_END' will
> >>>>> be set (it is a general flag in branch sample), then we can consider use it in
> >>>>> the python script to handle discontinuous data.
> >>>>
> >>>> No OpenCSD isn't passing the correct info here. Higher up in the thread
> >>>> I suggested an OpenCSD patch that makes it detect the error earlier and
> >>>> fixes the issue. It also needs to output a discontinuity when the
> >>>> address goes backwards. So two fixes and then the script works without
> >>>> modifications.
> >>>>
> >>>
> >>> Which address is going backwards here? - OpenCSD generates trace
> >>> ranges only by walking forwards from the last known address till it
> >>> hits a branch. Unless this wraps round 0x000000 this will never result
> >>> in a backwards address as far as I can see.
> >>> Do you have an example dump with OpenCSD outputting a range packet
> >>> with backwards addresses?
> >>>
> >>> Mike
> >>>
> >> The example I have I think is something like this:
> >>
> >> 1. Start address / trace on
> >> 2. E
> >> 3. Output range
> >> ...
> >> 4. Periodic address update
> >> ...
> >> 5. E
> >> 6. Output range
> >>
> >> If decode has gone wrong (but undetectably) between steps 1 and 3. Then
> >> the next steps still output a second range based on the last periodic
> >> address received. (I think it might not necessarily have to be a
> >> periodic address but could also be indirect address packet?). Perf
> >> converts the ranges into branch samples by taking the end of the first
> >> range and beginning of the second range. Then the disassembly script
> >> converts those samples into ranges again by taking the source and
> >> destination of the last two branch samples.
> >>
> >> The original issue that Ganapat saw was that the periodic address causes
> >> OpenCSD to put the source address of the second range somewhere before
> >> the first one, even though it didn't output a branch or discontinuity
> >> that would explain how it got there.
> >>
> >> But yes you're right the ranges themselves always go forwards from the
> >> point of view of their own start and end addresses.
> >>
> >> I thought it might be possible for OpenCSD to check against the last
> >> range output? Although I wasn't sure if maybe it's actually valid to do
> >> a backwards jump like that without the trace on/off packets with address
> >> filtering or something?
> >>
> >> The root cause is still the incorrect image, but I think this check
> >> along with the other direct branch check should make it pretty difficult
> >> for people to make the mistake.
> >
> >
> >
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-28 9:33 ` Mike Leach
@ 2024-08-29 13:35 ` James Clark
0 siblings, 0 replies; 45+ messages in thread
From: James Clark @ 2024-08-29 13:35 UTC (permalink / raw)
To: Mike Leach, Ganapatrao Kulkarni
Cc: Leo Yan, scclevenger, acme, coresight, linux-arm-kernel,
linux-kernel, darren, james.clark, suzuki.poulose, Al.Grant
On 28/08/2024 10:33 am, Mike Leach wrote:
> Hi James,
>
> On Fri, 23 Aug 2024 at 10:03, James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 19/08/2024 11:59 am, Mike Leach wrote:
>>> Hi,
>>>
>>> A new branch of OpenCSD is available - ocsd-consistency-checks-1.5.4-rc1
>>>
>>> Testing I managed to do confirms the N atom on unconditional branches
>>> appear to work. I do not have a test case for the range
>>> discontinuities.
>>>
>>> The checks are enabled using operation flags on decoder creation. See
>>> the docs for details.
>>>
>>> Mike
>>>
>>
>> Hi Mike,
>>
>> I tested the new OpenCSD and I don't see the error anymore in the
>> disassembly script. I'm not sure if we need to go any further and add
>> the backwards check, it looks like just a later symptom and the checks
>> that you've added already prevent it.
>>
>
> The OCSD_OPFLG_CHK_RANGE_CONTINUE is the backwards address check - at
> least as so far as is possible in OpenCSD.
> What it checks is if the next range after a not taken branch starts at
> the end address of the previous range. However this check is cancelled
> if there are other packets that intervene - e.g. trace on / exceptions
> / anything that might imply a discontinuity.
>
> The other caveat is that I did not have an example to see if the code
> could actually get triggered - though I will go back and manually
> trigger it in the debugger just to test functional correctness.
>
> If you are still seeing backwards addresses after these changes then I
> am not sure where they are coming from. It may be there is a missing
> discontinuity somewhere that is not being flagged.
I tracked down this issue, so there are two issues now:
#1 Using vmlinux which is a bad image, but is
fixed by your OpenCSD bad image detection changes.
#2 With Ganapat's kcore which should be the correct image, the issue
is in Perf. There is a bug in the handling of a full packet_queue
resulting in it setting the previous branch destination rather than
the next one for the last sample in the queue.
I should be able to send a patch for this. I'll also try to add a test
because this decode script seems like a good place to catch bugs.
>
> The other alternative that does occur to me now - thinking about
> incorrect images, is if we incorrectly associate an atom with a direct
> branch rather than an indirect branch.
> For a direct branch the decoder will calculate the target and carry on
> - not looking for an address update as it is not needed. Then when the
> address update does arrive, it is used as a latest address and the
> range address will be updated.
Yeah this is the backwards address issue I was thinking of, but with the
other OpenCSD changes I don't think there is an example of it, so we can
probably hold off for now. Especially if it's difficult to test for.
>
> Unfortunately this would be difficult to test for - the decoder is
> written to assume good trace and correct images - adding in code to
> try to remember previous state and judge if something is wrong,
> without getting false positives is difficult. It adds code complexity
> that is not necessary for well behaved clients!
>
>> If you release a new version I can send the perf patch. I was going to
>> use these flags if that looks right to you? As far as I know that's the
>> set that can be always on and won't fail on bad hardware?
>>
>
> That set of flags is fine -
>
>> I also assumed that ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK can be given even
>> for etmv3 and it's just a nop?
>>
>
> It is safe - as there is no flag for ETMv3 in the same slot.
> Effectively decode flags have a common set of bits and decoder
> specific set of bits that overlap for each decoder. We have not
> really needed anything for ETMv3 to date, and I don't expect that to
> change.
>
>
> I'll get the new version released by the end of the week. I am off on
> sabbatical for a month after that so any further investigation /
> changes will have to wait
>
> Regards
>
> Mike
>
>
>> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> index e917985bbbe6..90967fd807e6 100644
>> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> @@ -685,9 +685,14 @@ cs_etm_decoder__create_etm_decoder(struct cs_etm_decoder_params *d_params,
>> return 0;
>>
>> if (d_params->operation == CS_ETM_OPERATION_DECODE) {
>> + int decode_flags = OCSD_CREATE_FLG_FULL_DECODER;
>> +#ifdef OCSD_OPFLG_N_UNCOND_DIR_BR_CHK
>> + decode_flags |= OCSD_OPFLG_N_UNCOND_DIR_BR_CHK | OCSD_OPFLG_CHK_RANGE_CONTINUE |
>> + ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK;
>> +#endif
>> if (ocsd_dt_create_decoder(decoder->dcd_tree,
>> decoder->decoder_name,
>> - OCSD_CREATE_FLG_FULL_DECODER,
>> + decode_flags,
>> trace_config, &csid))
>> return -1;
>>
>>> On Fri, 9 Aug 2024 at 16:20, James Clark <james.clark@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 09/08/2024 3:13 pm, Mike Leach wrote:
>>>>> Hi James
>>>>>
>>>>> On Thu, 8 Aug 2024 at 10:32, James Clark <james.clark@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 07/08/2024 5:48 pm, Leo Yan wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> On 8/7/2024 3:53 PM, James Clark wrote:
>>>>>>>
>>>>>>> A minor suggestion: if the discussion is too long, please delete the
>>>>>>> irrelevant message ;)
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>> @@ -257,6 +257,11 @@ def process_event(param_dict):
>>>>>>>>> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
>>>>>>>>> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>>>>>>>>> return
>>>>>>>>>
>>>>>>>>> + if (stop_addr < start_addr):
>>>>>>>>> + if (options.verbose == True):
>>>>>>>>> + print("Packet Dropped, Discontinuity detected
>>>>>>>>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
>>>>>>>>> dso))
>>>>>>>>> + return
>>>>>>>>> +
>>>>>>>>
>>>>>>>> I suppose my only concern with this is that it hides real errors and
>>>>>>>> Perf shouldn't be outputting samples that go backwards. Considering that
>>>>>>>> fixing this in OpenCSD and Perf has a much wider benefit I think that
>>>>>>>> should be the ultimate goal. I'm putting this on my todo list for now
>>>>>>>> (including Steve's merging idea).
>>>>>>>
>>>>>>> In the perf's util/cs-etm.c file, it handles DISCONTINUITY with:
>>>>>>>
>>>>>>> case CS_ETM_DISCONTINUITY:
>>>>>>> /*
>>>>>>> * The trace is discontinuous, if the previous packet is
>>>>>>> * instruction packet, set flag PERF_IP_FLAG_TRACE_END
>>>>>>> * for previous packet.
>>>>>>> */
>>>>>>> if (prev_packet->sample_type == CS_ETM_RANGE)
>>>>>>> prev_packet->flags |= PERF_IP_FLAG_BRANCH |
>>>>>>> PERF_IP_FLAG_TRACE_END;
>>>>>>>
>>>>>>> I am wandering if OpenCSD has passed the correct info so Perf decoder can
>>>>>>> detect the discontinuity. If yes, then the flag 'PERF_IP_FLAG_TRACE_END' will
>>>>>>> be set (it is a general flag in branch sample), then we can consider use it in
>>>>>>> the python script to handle discontinuous data.
>>>>>>
>>>>>> No OpenCSD isn't passing the correct info here. Higher up in the thread
>>>>>> I suggested an OpenCSD patch that makes it detect the error earlier and
>>>>>> fixes the issue. It also needs to output a discontinuity when the
>>>>>> address goes backwards. So two fixes and then the script works without
>>>>>> modifications.
>>>>>>
>>>>>
>>>>> Which address is going backwards here? - OpenCSD generates trace
>>>>> ranges only by walking forwards from the last known address till it
>>>>> hits a branch. Unless this wraps round 0x000000 this will never result
>>>>> in a backwards address as far as I can see.
>>>>> Do you have an example dump with OpenCSD outputting a range packet
>>>>> with backwards addresses?
>>>>>
>>>>> Mike
>>>>>
>>>> The example I have I think is something like this:
>>>>
>>>> 1. Start address / trace on
>>>> 2. E
>>>> 3. Output range
>>>> ...
>>>> 4. Periodic address update
>>>> ...
>>>> 5. E
>>>> 6. Output range
>>>>
>>>> If decode has gone wrong (but undetectably) between steps 1 and 3. Then
>>>> the next steps still output a second range based on the last periodic
>>>> address received. (I think it might not necessarily have to be a
>>>> periodic address but could also be indirect address packet?). Perf
>>>> converts the ranges into branch samples by taking the end of the first
>>>> range and beginning of the second range. Then the disassembly script
>>>> converts those samples into ranges again by taking the source and
>>>> destination of the last two branch samples.
>>>>
>>>> The original issue that Ganapat saw was that the periodic address causes
>>>> OpenCSD to put the source address of the second range somewhere before
>>>> the first one, even though it didn't output a branch or discontinuity
>>>> that would explain how it got there.
>>>>
>>>> But yes you're right the ranges themselves always go forwards from the
>>>> point of view of their own start and end addresses.
>>>>
>>>> I thought it might be possible for OpenCSD to check against the last
>>>> range output? Although I wasn't sure if maybe it's actually valid to do
>>>> a backwards jump like that without the trace on/off packets with address
>>>> filtering or something?
>>>>
>>>> The root cause is still the incorrect image, but I think this check
>>>> along with the other direct branch check should make it pretty difficult
>>>> for people to make the mistake.
>>>
>>>
>>>
>
>
>
> --
> Mike Leach
> Principal Engineer, ARM Ltd.
> Manchester Design Centre. UK
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-23 9:57 ` Ganapatrao Kulkarni
2024-08-23 10:36 ` James Clark
@ 2024-08-30 9:58 ` James Clark
2024-09-02 6:12 ` Ganapatrao Kulkarni
1 sibling, 1 reply; 45+ messages in thread
From: James Clark @ 2024-08-30 9:58 UTC (permalink / raw)
To: Ganapatrao Kulkarni, Mike Leach
Cc: Leo Yan, scclevenger, acme, coresight, linux-arm-kernel,
linux-kernel, darren, james.clark, suzuki.poulose, Al.Grant
On 23/08/2024 10:57 am, Ganapatrao Kulkarni wrote:
>
> Hi James/Mike,
>
> On 23-08-2024 02:33 pm, James Clark wrote:
>>
>>
>> On 19/08/2024 11:59 am, Mike Leach wrote:
>>> Hi,
>>>
>>> A new branch of OpenCSD is available - ocsd-consistency-checks-1.5.4-rc1
>>>
>>> Testing I managed to do confirms the N atom on unconditional branches
>>> appear to work. I do not have a test case for the range
>>> discontinuities.
>>>
>>> The checks are enabled using operation flags on decoder creation. See
>>> the docs for details.
>>>
>>> Mike
>>>
>>
>> Hi Mike,
>>
>> I tested the new OpenCSD and I don't see the error anymore in the
>> disassembly script. I'm not sure if we need to go any further and add
>> the backwards check, it looks like just a later symptom and the checks
>> that you've added already prevent it.
>>
>> If you release a new version I can send the perf patch. I was going to
>> use these flags if that looks right to you? As far as I know that's the
>> set that can be always on and won't fail on bad hardware?
>>
>> I also assumed that ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK can be given even
>> for etmv3 and it's just a nop?
>>
>> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> index e917985bbbe6..90967fd807e6 100644
>> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> @@ -685,9 +685,14 @@ cs_etm_decoder__create_etm_decoder(struct
>> cs_etm_decoder_params *d_params,
>> return 0;
>>
>> if (d_params->operation == CS_ETM_OPERATION_DECODE) {
>> + int decode_flags = OCSD_CREATE_FLG_FULL_DECODER;
>> +#ifdef OCSD_OPFLG_N_UNCOND_DIR_BR_CHK
>> + decode_flags |= OCSD_OPFLG_N_UNCOND_DIR_BR_CHK |
>> OCSD_OPFLG_CHK_RANGE_CONTINUE |
>> + ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK;
>> +#endif
>> if (ocsd_dt_create_decoder(decoder->dcd_tree,
>> decoder->decoder_name,
>> - OCSD_CREATE_FLG_FULL_DECODER,
>> + decode_flags,
>> trace_config, &csid))
>> return -1;
>>
>
> I tried Mike's branch with above James's patch and still the segfault is
> happening to us.
>
Looks like the Perf bug is only on the timestamped decode path, you can
force timeless as a workaround. Timestamps aren't used by the
disassembly script anyway:
--itrace=Zb
Full command:
perf script -i ./kcore -s python:tools/perf/scripts/python/arm-cs-\
trace-disasm.py --itrace=Zb -- -k ./kcore/kcore_dir/kcore
You can also disable timestamps when recording then you don't need the
itrace option. This will save you a lot of data anyway.
But I'm still working on the proper fix.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
2024-08-30 9:58 ` James Clark
@ 2024-09-02 6:12 ` Ganapatrao Kulkarni
0 siblings, 0 replies; 45+ messages in thread
From: Ganapatrao Kulkarni @ 2024-09-02 6:12 UTC (permalink / raw)
To: James Clark, Mike Leach
Cc: Leo Yan, scclevenger, acme, coresight, linux-arm-kernel,
linux-kernel, darren, james.clark, suzuki.poulose, Al.Grant
On 30-08-2024 03:28 pm, James Clark wrote:
>
>
> On 23/08/2024 10:57 am, Ganapatrao Kulkarni wrote:
>>
>> Hi James/Mike,
>>
>> On 23-08-2024 02:33 pm, James Clark wrote:
>>>
>>>
>>> On 19/08/2024 11:59 am, Mike Leach wrote:
>>>> Hi,
>>>>
>>>> A new branch of OpenCSD is available -
>>>> ocsd-consistency-checks-1.5.4-rc1
>>>>
>>>> Testing I managed to do confirms the N atom on unconditional branches
>>>> appear to work. I do not have a test case for the range
>>>> discontinuities.
>>>>
>>>> The checks are enabled using operation flags on decoder creation. See
>>>> the docs for details.
>>>>
>>>> Mike
>>>>
>>>
>>> Hi Mike,
>>>
>>> I tested the new OpenCSD and I don't see the error anymore in the
>>> disassembly script. I'm not sure if we need to go any further and add
>>> the backwards check, it looks like just a later symptom and the checks
>>> that you've added already prevent it.
>>>
>>> If you release a new version I can send the perf patch. I was going to
>>> use these flags if that looks right to you? As far as I know that's the
>>> set that can be always on and won't fail on bad hardware?
>>>
>>> I also assumed that ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK can be given even
>>> for etmv3 and it's just a nop?
>>>
>>> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>>> b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>>> index e917985bbbe6..90967fd807e6 100644
>>> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>>> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>>> @@ -685,9 +685,14 @@ cs_etm_decoder__create_etm_decoder(struct
>>> cs_etm_decoder_params *d_params,
>>> return 0;
>>>
>>> if (d_params->operation == CS_ETM_OPERATION_DECODE) {
>>> + int decode_flags = OCSD_CREATE_FLG_FULL_DECODER;
>>> +#ifdef OCSD_OPFLG_N_UNCOND_DIR_BR_CHK
>>> + decode_flags |= OCSD_OPFLG_N_UNCOND_DIR_BR_CHK |
>>> OCSD_OPFLG_CHK_RANGE_CONTINUE |
>>> + ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK;
>>> +#endif
>>> if (ocsd_dt_create_decoder(decoder->dcd_tree,
>>> decoder->decoder_name,
>>> - OCSD_CREATE_FLG_FULL_DECODER,
>>> + decode_flags,
>>> trace_config, &csid))
>>> return -1;
>>>
>>
>> I tried Mike's branch with above James's patch and still the segfault
>> is happening to us.
>>
>
> Looks like the Perf bug is only on the timestamped decode path, you can
> force timeless as a workaround. Timestamps aren't used by the
> disassembly script anyway:
>
> --itrace=Zb
>
> Full command:
>
> perf script -i ./kcore -s python:tools/perf/scripts/python/arm-cs-\
> trace-disasm.py --itrace=Zb -- -k ./kcore/kcore_dir/kcore
>
Thanks James, I could run without any issue with "--itrace=Zb"
> You can also disable timestamps when recording then you don't need the
> itrace option. This will save you a lot of data anyway.
>
> But I'm still working on the proper fix.
Thanks.
--
Thanks,
Ganapat/GK
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2024-09-02 6:14 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-19 9:26 [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken Ganapatrao Kulkarni
2024-07-19 14:39 ` James Clark
2024-07-22 10:02 ` Ganapatrao Kulkarni
2024-07-23 13:10 ` James Clark
2024-07-23 15:26 ` Ganapatrao Kulkarni
2024-07-23 15:46 ` James Clark
2024-07-24 6:38 ` Ganapatrao Kulkarni
2024-07-24 14:45 ` James Clark
2024-08-01 10:00 ` James Clark
2024-08-01 10:28 ` Al Grant
2024-08-01 11:26 ` James Clark
2024-08-01 11:58 ` Al Grant
2024-08-01 14:58 ` James Clark
2024-08-05 12:22 ` Ganapatrao Kulkarni
2024-08-05 13:59 ` James Clark
2024-08-06 7:02 ` Ganapatrao Kulkarni
2024-08-06 9:47 ` James Clark
2024-08-06 9:57 ` James Clark
2024-08-06 15:02 ` Steve Clevenger
2024-08-06 16:14 ` James Clark
2024-08-07 12:17 ` Ganapatrao Kulkarni
2024-08-07 14:53 ` James Clark
2024-08-07 16:18 ` Ganapatrao Kulkarni
2024-08-07 19:20 ` Leo Yan
2024-08-08 4:36 ` Ganapatrao Kulkarni
2024-08-08 7:42 ` Leo Yan
2024-08-08 9:21 ` James Clark
2024-08-08 10:51 ` James Clark
2024-08-08 11:14 ` Ganapatrao Kulkarni
2024-08-08 15:01 ` Mike Leach
2024-08-07 16:48 ` Leo Yan
2024-08-08 9:32 ` James Clark
2024-08-08 11:05 ` Leo Yan
2024-08-09 14:13 ` Mike Leach
2024-08-09 15:19 ` James Clark
2024-08-19 10:59 ` Mike Leach
2024-08-23 9:03 ` James Clark
2024-08-23 9:57 ` Ganapatrao Kulkarni
2024-08-23 10:36 ` James Clark
2024-08-23 10:37 ` James Clark
2024-08-30 9:58 ` James Clark
2024-09-02 6:12 ` Ganapatrao Kulkarni
2024-08-28 9:33 ` Mike Leach
2024-08-29 13:35 ` James Clark
2024-08-08 7:54 ` Leo Yan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).