* Some questions about using the perf tool in ARM-SPE
@ 2023-06-09  7:14 蔡沅信
  2023-06-09  9:06 ` James Clark
  0 siblings, 1 reply; 16+ messages in thread
From: 蔡沅信 @ 2023-06-09  7:14 UTC (permalink / raw)
  To: linux-perf-users
hi linux-perf-users
I glad perf tool can support ARM-SPE.
however, I have encountered some problems in using
My perf tool version is 6.1. I compiled it to run on an Android system
"Here are my questions"
1.I can't use the perf c2c function to analyze, i'm not sure if it
only supports certain hardware architectures
>>user_shell:/data/local/tmp # ./perf c2c report
>> Failed setup nodes
2.I get more information via the command /.perf report --mem-mode but
I'm confused about the data
Snoop  Locked  Blocked     Local INSTR Latency
......         ..........  .       ..................
N/A            No          N/A           0
N/A            No          N/A           0
N/A            No          N/A           0
N/A            No          N/A           0
.               ,                .
.
.
I have observed a lot of data about Snoop&Locked&Blocked&Local INSTR
Latency and their results are always N/A&No&N/A&0 which makes me feel
like it is not supported. Please give me some information about it and
introduction.
Many Thanks
Best Regards
Zack.
^ permalink raw reply	[flat|nested] 16+ messages in thread* Re: Some questions about using the perf tool in ARM-SPE 2023-06-09 7:14 Some questions about using the perf tool in ARM-SPE 蔡沅信 @ 2023-06-09 9:06 ` James Clark [not found] ` <CALDTKqg01+xJ2xu218c_QH2PbX9wdhYOiJfDieCXL5PHWV-6FQ@mail.gmail.com> 0 siblings, 1 reply; 16+ messages in thread From: James Clark @ 2023-06-09 9:06 UTC (permalink / raw) To: 蔡沅信, linux-perf-users On 09/06/2023 08:14, 蔡沅信 wrote: > hi linux-perf-users > > I glad perf tool can support ARM-SPE. > however, I have encountered some problems in using > My perf tool version is 6.1. I compiled it to run on an Android system > "Here are my questions" > > 1.I can't use the perf c2c function to analyze, i'm not sure if it > only supports certain hardware architectures > >>> user_shell:/data/local/tmp # ./perf c2c report >>> Failed setup nodes > > 2.I get more information via the command /.perf report --mem-mode but > I'm confused about the data > > Snoop Locked Blocked Local INSTR Latency > ...... .......... . .................. > N/A No N/A 0 > N/A No N/A 0 > N/A No N/A 0 > N/A No N/A 0 > . , . > . > . > I have observed a lot of data about Snoop&Locked&Blocked&Local INSTR > Latency and their results are always N/A&No&N/A&0 which makes me feel > like it is not supported. Please give me some information about it and > introduction. > > Many Thanks > Best Regards > Zack. Hi Zack, I wasn't able to reproduce your issue on Ubuntu, it could be something to do with Android. It looks like the error "Failed setup nodes" comes from the setup_nodes() function. Can you debug it to see exactly which part of it is failing? I see it's something to do with numa nodes, maybe that doesn't work on Android or you need to add some numa stuff to the kernel config. Or the function needs to handle missing data differently. James ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CALDTKqg01+xJ2xu218c_QH2PbX9wdhYOiJfDieCXL5PHWV-6FQ@mail.gmail.com>]
[parent not found: <77773641-26e5-a754-63cf-e7d3443e11fc@arm.com>]
* Re: Some questions about using the perf tool in ARM-SPE [not found] ` <77773641-26e5-a754-63cf-e7d3443e11fc@arm.com> @ 2023-06-13 13:23 ` 蔡沅信 2023-06-14 1:21 ` Leo Yan 0 siblings, 1 reply; 16+ messages in thread From: 蔡沅信 @ 2023-06-13 13:23 UTC (permalink / raw) To: James Clark, linux-perf-users OK I have a new discovery that c2c seems to support only certain Arm Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it? Using the Arm Statistical Profiling Extension to detect false cache-line sharing | Blog | Linaro Thanks Zack <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> 乾淨無病毒。www.avg.com <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> James Clark <james.clark@arm.com> 於 2023年6月13日 週二 下午9:18寫道: > > Hi Zack, > > It looks like you replied just to me rather than to the list. Are you > able to re-send this as a reply-all to the original post so others might > be able to help as well? > > Thanks > James > > On 12/06/2023 06:50, 蔡沅信 wrote: > > Hi James. > > > > After my debug experiment. I found that there are seven CPUs on my platform > > but I can't get them to work together > > The first setting is divided into three types of clusters. This part is set > > to three cluter through DTS > > > > user_shell:/sys/bus/event_source/devices # cat arm_spe_0/cpumask > > arm_spe_1/cpumask arm_spe_2/cpumask > > 7 > > 4-6 > > 0-3 > > > > This makes it impossible for me to record the whole system together (cpu > > 0-7). only one cluter can be recorded at the same time thhe above is -v > > information > > > > user_shell:/data/local/tmp # ./perf record -e > > arm_spe/branch_filter=1,load_filter=1,store_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,min_latency=10,jitter=1/ > > -a -v --vmlinux ./vmlinux -o text.data > > Warning: option `vmlinux' is being ignored because NO_DWARF=1 > > DEBUGINFOD_URLS= > > nr_cblocks: 0 > > affinity: SYS > > mmap flush: 1 > > comp level: 0 > > maps__set_modules_path_dir: cannot open > > /lib/modules/6.1.25-android14-5-maybe-dirty-mainline dir > > Problems setting modules path maps, continuing anyway... > > mmap size 528384B > > Control descriptor is not initialized > > ^C[ perf record: Woken up 51 times to write data ] > > failed to write feature CPUDESC > > failed to write feature NUMA_TOPOLOGY > > failed to write feature MEM_TOPOLOGY > > failed to write feature CPU_PMU_CAPS > > failed to write feature HYBRID_TOPOLOGY > > [ perf record: Captured and wrote 23.952 MB text.data ] > > > > Could give me some help can three cluter work together I think perf c2c > > should also work > > If I have left out any information you may need, please let me know! > > > > Many Thanks > > Best Regards > > Zack. > > > > > > James Clark <james.clark@arm.com> 於 2023年6月9日 週五 下午5:06寫道: > > > >> > >> > >> On 09/06/2023 08:14, 蔡沅信 wrote: > >>> hi linux-perf-users > >>> > >>> I glad perf tool can support ARM-SPE. > >>> however, I have encountered some problems in using > >>> My perf tool version is 6.1. I compiled it to run on an Android system > >>> "Here are my questions" > >>> > >>> 1.I can't use the perf c2c function to analyze, i'm not sure if it > >>> only supports certain hardware architectures > >>> > >>>>> user_shell:/data/local/tmp # ./perf c2c report > >>>>> Failed setup nodes > >>> > >>> 2.I get more information via the command /.perf report --mem-mode but > >>> I'm confused about the data > >>> > >>> Snoop Locked Blocked Local INSTR Latency > >>> ...... .......... . .................. > >>> N/A No N/A 0 > >>> N/A No N/A 0 > >>> N/A No N/A 0 > >>> N/A No N/A 0 > >>> . , . > >>> . > >>> . > >>> I have observed a lot of data about Snoop&Locked&Blocked&Local INSTR > >>> Latency and their results are always N/A&No&N/A&0 which makes me feel > >>> like it is not supported. Please give me some information about it and > >>> introduction. > >>> > >>> Many Thanks > >>> Best Regards > >>> Zack. > >> > >> Hi Zack, > >> > >> I wasn't able to reproduce your issue on Ubuntu, it could be something > >> to do with Android. > >> > >> It looks like the error "Failed setup nodes" comes from the > >> setup_nodes() function. Can you debug it to see exactly which part of it > >> is failing? I see it's something to do with numa nodes, maybe that > >> doesn't work on Android or you need to add some numa stuff to the kernel > >> config. Or the function needs to handle missing data differently. > >> > >> James > >> > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-06-13 13:23 ` 蔡沅信 @ 2023-06-14 1:21 ` Leo Yan [not found] ` <CALDTKqgz6=WFs=bMvnFkKv5kt5OP5wtUqQ2uekVbumCxNqeRXw@mail.gmail.com> 2023-07-03 8:18 ` Suzuki K Poulose 0 siblings, 2 replies; 16+ messages in thread From: Leo Yan @ 2023-06-14 1:21 UTC (permalink / raw) To: 蔡沅信, Mark Rutland, Suzuki Kuruppassery Poulose Cc: James Clark, linux-perf-users Hi, On Tue, Jun 13, 2023 at 09:23:33PM +0800, 蔡沅信 wrote: > OK > I have a new discovery that c2c seems to support only certain Arm > Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it? Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data source packet format with Neoverse CPUs. In theory, we can add Cortex-X4's MIDR into the neoverse_spe[] array in tools/perf/util/arm-spe.c to support Cortex-X4. Linux master branch misses the definition for Cortex-X4's MIDR [2], Mark.R / Suzuki / James, could you confirm if Arm has plan or already has patches for enabling Cortex-X4's MIDR? Come back to your current issue, as James said, seems the issue is related with NUMA (or CPU topology) which is missed in your kernel config, it's very unlikely related with CPU type, even Cortex-X4 is not supported, perf should still work for SPE packets except data source packet. But your shared log is not related with decoding, anyway, you can try below change to rule out if the issue is related with CPU type: diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 7b36ba6b4079..3c3a3846f253 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -527,6 +527,7 @@ static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 m else return 0; + is_neoverse = 1; if (is_neoverse) arm_spe__synth_data_source_neoverse(record, &data_src); else [1] https://developer.arm.com/documentation/102484/0001/Statistical-Profiling-Extension-support/Statistical-Profiling-Extension-data-source-packet [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/include/asm/cputype.h > Using the Arm Statistical Profiling Extension to detect false > cache-line sharing | Blog | Linaro > > Thanks > Zack ^ permalink raw reply related [flat|nested] 16+ messages in thread
[parent not found: <CALDTKqgz6=WFs=bMvnFkKv5kt5OP5wtUqQ2uekVbumCxNqeRXw@mail.gmail.com>]
* Re: Some questions about using the perf tool in ARM-SPE [not found] ` <CALDTKqgz6=WFs=bMvnFkKv5kt5OP5wtUqQ2uekVbumCxNqeRXw@mail.gmail.com> @ 2023-06-14 6:08 ` 蔡沅信 2023-06-18 9:28 ` Leo Yan 0 siblings, 1 reply; 16+ messages in thread From: 蔡沅信 @ 2023-06-14 6:08 UTC (permalink / raw) To: Leo Yan, James Clark, linux-perf-users, Mark Rutland, Suzuki Kuruppassery Poulose "Fix mail to text modeFix mail to text mode" Hi, How do I add NUME nodes (or CPU topology) to the kernel config? After I modified arm-spe.c. Snoop is working but Locked&Blocked&Local INSTR Latency their results are always No&N/A&0 I merged 3 Cluter into one and have been able to record the whole system. user_shell:/sys/bus/event_source/devices/arm_spe_0 # cat cpumask 0-7 On the c2c side: user_shel:/data/local/tmp # ./perf c2c report -vvv coalesce sort fields: offset,iaddr coalesce resort fields: offset,tot_peer coalesce output fields: cl_num_empty,percent_rmt_peer,percent_lcl_peer,percent_stores_l1hit,percent_stores_l1miss,percent_stores_na,offset,offset_node,dcacheline_count,iaddr,mean_rmt_peer,mean_lcl_peer,mean_load,tot_recs,cpucnt,symbol,dso,cl_srcline,node Failed setup nodes On the other hand, the perf I use is statically compiled with the aarch64 cross-compiler. I can't open all the features Auto-detecting system features: ... dwarf: [ OFF ] ... dwarf_getlocations: [ OFF ] ... glibc: [ on ] ... libbfd: [ OFF ] ... libbfd-buildid: [ OFF ] ... libcap: [ OFF ] ... libelf: [ OFF ] ... libnuma: [ OFF ] ... numa_num_possible_cpus: [ OFF ] ... libperl: [ OFF ] ... libpython: [ OFF ] ... libcrypto: [ OFF ] ... libunwind: [ OFF ] ... libdw-dwarf-unwind: [ OFF ] ... zlib: [ OFF ] ... lzma: [ OFF ] ... get_cpuid: [ OFF ] ... bpf: [ on ] ... libaio: [ on ] ... libzstd: [ OFF ] Does it affect the results? Many Thanks Best Regards Zack. 蔡沅信 <fissure2010@gmail.com> 於 2023年6月14日 週三 下午12:06寫道: > > Hi, > How do I add NUME nodes (or CPU topology) to the kernel config? > After I modified arm-spe.c. Snoop is working but Locked&Blocked&Local INSTR Latency their results are always No&N/A&0 > > > I merged 3 Cluter into one and have been able to record the whole system. > user_shell:/sys/bus/event_source/devices/arm_spe_0 # cat cpumask > 0-7 > > > > On the c2c side: > user_shel:/data/local/tmp # ./perf c2c report -vvv > coalesce sort fields: offset,iaddr > coalesce resort fields: offset,tot_peer > coalesce output fields: cl_num_empty,percent_rmt_peer,percent_lcl_peer,percent_stores_l1hit,percent_stores_l1miss,percent_stores_na,offset,offset_node,dcacheline_count,iaddr,mean_rmt_peer,mean_lcl_peer,mean_load,tot_recs,cpucnt,symbol,dso,cl_srcline,node > Failed setup nodes > > > On the other hand, the perf I use is statically compiled with the aarch64 cross-compiler. I can't open all the features > Auto-detecting system features: > ... dwarf: [ OFF ] > ... dwarf_getlocations: [ OFF ] > ... glibc: [ on ] > ... libbfd: [ OFF ] > ... libbfd-buildid: [ OFF ] > ... libcap: [ OFF ] > ... libelf: [ OFF ] > ... libnuma: [ OFF ] > ... numa_num_possible_cpus: [ OFF ] > ... libperl: [ OFF ] > ... libpython: [ OFF ] > ... libcrypto: [ OFF ] > ... libunwind: [ OFF ] > ... libdw-dwarf-unwind: [ OFF ] > ... zlib: [ OFF ] > ... lzma: [ OFF ] > ... get_cpuid: [ OFF ] > ... bpf: [ on ] > ... libaio: [ on ] > ... libzstd: [ OFF ] > > Does it affect the results? > > Many Thanks > Best Regards > Zack. > > > Leo Yan <leo.yan@linaro.org> 於 2023年6月14日 週三 上午9:21寫道: >> >> Hi, >> >> On Tue, Jun 13, 2023 at 09:23:33PM +0800, 蔡沅信 wrote: >> > OK >> > I have a new discovery that c2c seems to support only certain Arm >> > Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it? >> >> Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data >> source packet format with Neoverse CPUs. In theory, we can add >> Cortex-X4's MIDR into the neoverse_spe[] array in >> tools/perf/util/arm-spe.c to support Cortex-X4. >> >> Linux master branch misses the definition for Cortex-X4's MIDR [2], >> Mark.R / Suzuki / James, could you confirm if Arm has plan or already >> has patches for enabling Cortex-X4's MIDR? >> >> Come back to your current issue, as James said, seems the issue is >> related with NUMA (or CPU topology) which is missed in your kernel >> config, it's very unlikely related with CPU type, even Cortex-X4 is not >> supported, perf should still work for SPE packets except data source >> packet. But your shared log is not related with decoding, anyway, you >> can try below change to rule out if the issue is related with CPU type: >> >> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c >> index 7b36ba6b4079..3c3a3846f253 100644 >> --- a/tools/perf/util/arm-spe.c >> +++ b/tools/perf/util/arm-spe.c >> @@ -527,6 +527,7 @@ static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 m >> else >> return 0; >> >> + is_neoverse = 1; >> if (is_neoverse) >> arm_spe__synth_data_source_neoverse(record, &data_src); >> else >> >> >> [1] https://developer.arm.com/documentation/102484/0001/Statistical-Profiling-Extension-support/Statistical-Profiling-Extension-data-source-packet >> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/include/asm/cputype.h >> >> > Using the Arm Statistical Profiling Extension to detect false >> > cache-line sharing | Blog | Linaro >> > >> > Thanks >> > Zack ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-06-14 6:08 ` 蔡沅信 @ 2023-06-18 9:28 ` Leo Yan 2023-07-01 5:25 ` 蔡沅信 0 siblings, 1 reply; 16+ messages in thread From: Leo Yan @ 2023-06-18 9:28 UTC (permalink / raw) To: 蔡沅信 Cc: James Clark, linux-perf-users, Mark Rutland, Suzuki Kuruppassery Poulose On Wed, Jun 14, 2023 at 02:08:36PM +0800, 蔡沅信 wrote: > "Fix mail to text modeFix mail to text mode" > > Hi, > How do I add NUME nodes (or CPU topology) to the kernel config? Below configurations are enabled in my testing kernel: root@leoy-huangpu:/home/leoy# zcat /proc/config.gz | grep NUMA CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y CONFIG_NUMA_BALANCING=y CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y CONFIG_NUMA=y CONFIG_ACPI_NUMA=y CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_USE_PERCPU_NUMA_NODE_ID=y CONFIG_GENERIC_ARCH_NUMA=y CONFIG_OF_NUMA=y CONFIG_DMA_PERNUMA_CMA=y > After I modified arm-spe.c. Snoop is working but Locked&Blocked&Local > INSTR Latency their results are always No&N/A&0 I cannot understand this ... What's you have modified for arm-spe.c? If you don't share more complete perf log then it would be difficult to understand and locate issue. > I merged 3 Cluter into one and have been able to record the whole system. > user_shell:/sys/bus/event_source/devices/arm_spe_0 # cat cpumask > 0-7 This seems to me fine to me. > On the c2c side: > user_shel:/data/local/tmp # ./perf c2c report -vvv > coalesce sort fields: offset,iaddr > coalesce resort fields: offset,tot_peer > coalesce output fields: > cl_num_empty,percent_rmt_peer,percent_lcl_peer,percent_stores_l1hit,percent_stores_l1miss,percent_stores_na,offset,offset_node,dcacheline_count,iaddr,mean_rmt_peer,mean_lcl_peer,mean_load,tot_recs,cpucnt,symbol,dso,cl_srcline,node > Failed setup nodes Before diving into "perf c2c" tool, please use "perf script" tool to decode the perf data file and check if you have captured any SPE trace data, and it's good to dump header info. This would be useful to analyze the issie. $ ./perf script --header -I > On the other hand, the perf I use is statically compiled with the > aarch64 cross-compiler. I can't open all the features This would be fine. At my side, I built perf statically on x86_64 machine with the command: $ make LDFLAGS=-static NO_LIBELF=1 NO_JVMTI=1 VF=1 DEBUG=1 NO_LIBTRACEEVENT=1 And then I copied the perf binary on my Arm64 machine, it works pretty well for Arm SPE. Thanks, Leo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-06-18 9:28 ` Leo Yan @ 2023-07-01 5:25 ` 蔡沅信 0 siblings, 0 replies; 16+ messages in thread From: 蔡沅信 @ 2023-07-01 5:25 UTC (permalink / raw) To: Leo Yan, James Clark, linux-perf-users, Mark Rutland, Suzuki Kuruppassery Poulose Hi Leo Yan On our platform, I cannot enable too many NUMA configurations as it would lead to build failures. could this be the reason for c2c failure? If so, I will try to identify which side is experiencing issues. This is my platform config status. ccould support c2c? user_shell:/data/local/tmp # zcat /proc/config.gz | grep NUMA CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y # CONFIG_NUMA is not set CONFIG_DMA_PERNUMA_CMA=y > ./perf script --header -I I think that's a key message. # ====== # missing features: TRACING_DATA CPUDESC NUMA_TOPOLOGY BRANCH_STACK GROUP_DESC STAT MEM_TOPOLOGY CLOCKID DIR_FORMAT (null) (null) COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY # ======== # Only instruction-based sampling period is currently supported by Arm SPE. There are also many events related to L1d, TLB, and memory that are displayed. If you require this information, I can provide it perf6.1 4725 [000] 129.779296: 1 l1d-access: ffffffe012888f64 [unknown] ([unknown]) Many Thanks Best Regards Zack. Leo Yan <leo.yan@linaro.org> 於 2023年6月18日 週日 下午5:28寫道: <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> 乾淨無病毒。www.avg.com <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > On Wed, Jun 14, 2023 at 02:08:36PM +0800, 蔡沅信 wrote: > > "Fix mail to text modeFix mail to text mode" > > > > Hi, > > How do I add NUME nodes (or CPU topology) to the kernel config? > > Below configurations are enabled in my testing kernel: > > root@leoy-huangpu:/home/leoy# zcat /proc/config.gz | grep NUMA > CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y > CONFIG_NUMA_BALANCING=y > CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y > CONFIG_NUMA=y > CONFIG_ACPI_NUMA=y > CONFIG_NUMA_KEEP_MEMINFO=y > CONFIG_USE_PERCPU_NUMA_NODE_ID=y > CONFIG_GENERIC_ARCH_NUMA=y > CONFIG_OF_NUMA=y > CONFIG_DMA_PERNUMA_CMA=y > > > After I modified arm-spe.c. Snoop is working but Locked&Blocked&Local > > INSTR Latency their results are always No&N/A&0 > > I cannot understand this ... What's you have modified for arm-spe.c? > > If you don't share more complete perf log then it would be difficult to > understand and locate issue. > > > I merged 3 Cluter into one and have been able to record the whole system. > > user_shell:/sys/bus/event_source/devices/arm_spe_0 # cat cpumask > > 0-7 > > This seems to me fine to me. > > > On the c2c side: > > user_shel:/data/local/tmp # ./perf c2c report -vvv > > coalesce sort fields: offset,iaddr > > coalesce resort fields: offset,tot_peer > > coalesce output fields: > > cl_num_empty,percent_rmt_peer,percent_lcl_peer,percent_stores_l1hit,percent_stores_l1miss,percent_stores_na,offset,offset_node,dcacheline_count,iaddr,mean_rmt_peer,mean_lcl_peer,mean_load,tot_recs,cpucnt,symbol,dso,cl_srcline,node > > Failed setup nodes > > Before diving into "perf c2c" tool, please use "perf script" tool to > decode the perf data file and check if you have captured any SPE trace > data, and it's good to dump header info. This would be useful to > analyze the issie. > > $ ./perf script --header -I > > > On the other hand, the perf I use is statically compiled with the > > aarch64 cross-compiler. I can't open all the features > > This would be fine. At my side, I built perf statically on x86_64 > machine with the command: > > $ make LDFLAGS=-static NO_LIBELF=1 NO_JVMTI=1 VF=1 DEBUG=1 NO_LIBTRACEEVENT=1 > > And then I copied the perf binary on my Arm64 machine, it works pretty > well for Arm SPE. > > Thanks, > Leo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-06-14 1:21 ` Leo Yan [not found] ` <CALDTKqgz6=WFs=bMvnFkKv5kt5OP5wtUqQ2uekVbumCxNqeRXw@mail.gmail.com> @ 2023-07-03 8:18 ` Suzuki K Poulose 2023-07-03 8:24 ` James Clark 1 sibling, 1 reply; 16+ messages in thread From: Suzuki K Poulose @ 2023-07-03 8:18 UTC (permalink / raw) To: Leo Yan, 蔡沅信, Mark Rutland Cc: James Clark, linux-perf-users Hi Leo On 14/06/2023 02:21, Leo Yan wrote: > Hi, > > On Tue, Jun 13, 2023 at 09:23:33PM +0800, 蔡沅信 wrote: >> OK >> I have a new discovery that c2c seems to support only certain Arm >> Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it? > > Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data > source packet format with Neoverse CPUs. In theory, we can add > Cortex-X4's MIDR into the neoverse_spe[] array in > tools/perf/util/arm-spe.c to support Cortex-X4. > > Linux master branch misses the definition for Cortex-X4's MIDR [2], > Mark.R / Suzuki / James, could you confirm if Arm has plan or already > has patches for enabling Cortex-X4's MIDR? > Is there a particular reason why need this in the kernel ? Usual policy is, kernel needs to distinguish an MIDR, only if it ever needs to. e.g, a CPU erratum specific to the MIDR. Suzuki ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-07-03 8:18 ` Suzuki K Poulose @ 2023-07-03 8:24 ` James Clark 2023-07-03 9:39 ` Leo Yan 0 siblings, 1 reply; 16+ messages in thread From: James Clark @ 2023-07-03 8:24 UTC (permalink / raw) To: Suzuki K Poulose, Leo Yan, 蔡沅信, Mark Rutland Cc: linux-perf-users On 03/07/2023 09:18, Suzuki K Poulose wrote: > Hi Leo > On 14/06/2023 02:21, Leo Yan wrote: >> Hi, >> >> On Tue, Jun 13, 2023 at 09:23:33PM +0800, 蔡沅信 wrote: >>> OK >>> I have a new discovery that c2c seems to support only certain Arm >>> Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it? >> >> Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data >> source packet format with Neoverse CPUs. In theory, we can add >> Cortex-X4's MIDR into the neoverse_spe[] array in >> tools/perf/util/arm-spe.c to support Cortex-X4. >> >> Linux master branch misses the definition for Cortex-X4's MIDR [2], >> Mark.R / Suzuki / James, could you confirm if Arm has plan or already >> has patches for enabling Cortex-X4's MIDR? >> > > Is there a particular reason why need this in the kernel ? Usual > policy is, kernel needs to distinguish an MIDR, only if it ever > needs to. e.g, a CPU erratum specific to the MIDR. > > Suzuki> This is on the tool side rather than in the kernel. But yes, if the data source encoding is the same as the existing ones please send a patch adding X4's MIDR to the list. Thanks James ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-07-03 8:24 ` James Clark @ 2023-07-03 9:39 ` Leo Yan 2023-07-03 9:42 ` James Clark 0 siblings, 1 reply; 16+ messages in thread From: Leo Yan @ 2023-07-03 9:39 UTC (permalink / raw) To: James Clark Cc: Suzuki K Poulose, 蔡沅信, Mark Rutland, linux-perf-users Hi Suzuki, James, On Mon, Jul 03, 2023 at 09:24:27AM +0100, James Clark wrote: [...] > >>> I have a new discovery that c2c seems to support only certain Arm > >>> Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it? > >> > >> Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data > >> source packet format with Neoverse CPUs. In theory, we can add > >> Cortex-X4's MIDR into the neoverse_spe[] array in > >> tools/perf/util/arm-spe.c to support Cortex-X4. > >> > >> Linux master branch misses the definition for Cortex-X4's MIDR [2], > >> Mark.R / Suzuki / James, could you confirm if Arm has plan or already > >> has patches for enabling Cortex-X4's MIDR? > > > > Is there a particular reason why need this in the kernel ? Usual > > policy is, kernel needs to distinguish an MIDR, only if it ever > > needs to. e.g, a CPU erratum specific to the MIDR. Just clarify a bit. Since Arm SPE data source packet is implementation dependent, which means different CPU types can use different data source formats, or some CPU variants don't support data source packet at all; for this reason, perf tool checks CPU MIDR to decide if data source format can be supported or not. Now perf tool directly includes kernel header arch/arm64/include/asm/cputype.h for CPU MIDR definitions, which is why I am asking if any one is working on adding MIDR for Cortex-X4. > This is on the tool side rather than in the kernel. But yes, if the data > source encoding is the same as the existing ones please send a patch > adding X4's MIDR to the list. I can do this; as elaborated above, I think we need two patches, one is a kernel patch for adding MIDR and another patch is for perf tool to consume the MIDR of X4. I would like to know if this is expected or not from your side. Thanks, Leo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-07-03 9:39 ` Leo Yan @ 2023-07-03 9:42 ` James Clark 2023-07-03 10:20 ` Leo Yan 0 siblings, 1 reply; 16+ messages in thread From: James Clark @ 2023-07-03 9:42 UTC (permalink / raw) To: Leo Yan Cc: Suzuki K Poulose, 蔡沅信, Mark Rutland, linux-perf-users On 03/07/2023 10:39, Leo Yan wrote: > Hi Suzuki, James, > > On Mon, Jul 03, 2023 at 09:24:27AM +0100, James Clark wrote: > > [...] > >>>>> I have a new discovery that c2c seems to support only certain Arm >>>>> Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it? >>>> >>>> Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data >>>> source packet format with Neoverse CPUs.� In theory, we can add >>>> Cortex-X4's MIDR into the neoverse_spe[] array in >>>> tools/perf/util/arm-spe.c to support Cortex-X4. >>>> >>>> Linux master branch misses the definition for Cortex-X4's MIDR [2], >>>> Mark.R / Suzuki / James, could you confirm if Arm has plan or already >>>> has patches for enabling Cortex-X4's MIDR? >>> >>> Is there a particular reason why need this in the kernel ? Usual >>> policy is, kernel needs to distinguish an MIDR, only if it ever >>> needs to. e.g, a CPU erratum specific to the MIDR. > > Just clarify a bit. Since Arm SPE data source packet is > implementation dependent, which means different CPU types can use > different data source formats, or some CPU variants don't support data > source packet at all; for this reason, perf tool checks CPU MIDR to > decide if data source format can be supported or not. > > Now perf tool directly includes kernel header > arch/arm64/include/asm/cputype.h for CPU MIDR definitions, which is > why I am asking if any one is working on adding MIDR for Cortex-X4. > >> This is on the tool side rather than in the kernel. But yes, if the data >> source encoding is the same as the existing ones please send a patch >> adding X4's MIDR to the list. > > I can do this; as elaborated above, I think we need two patches, one is > a kernel patch for adding MIDR and another patch is for perf tool to > consume the MIDR of X4. I would like to know if this is expected or > not from your side. > Personally I would just add it on the tool side only. If it's not really needed in the kernel then it doesn't make sense to add it. But I suppose from a consistency point of view we could add it in both places. I'm not too fussed either way. > Thanks, > Leo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-07-03 9:42 ` James Clark @ 2023-07-03 10:20 ` Leo Yan 2023-07-03 13:28 ` Suzuki K Poulose 0 siblings, 1 reply; 16+ messages in thread From: Leo Yan @ 2023-07-03 10:20 UTC (permalink / raw) To: James Clark Cc: Suzuki K Poulose, 蔡沅信, Mark Rutland, linux-perf-users On Mon, Jul 03, 2023 at 10:42:21AM +0100, James Clark wrote: [...] > >> This is on the tool side rather than in the kernel. But yes, if the data > >> source encoding is the same as the existing ones please send a patch > >> adding X4's MIDR to the list. > > > > I can do this; as elaborated above, I think we need two patches, one is > > a kernel patch for adding MIDR and another patch is for perf tool to > > consume the MIDR of X4. I would like to know if this is expected or > > not from your side. > > > > Personally I would just add it on the tool side only. If it's not really > needed in the kernel then it doesn't make sense to add it. Makes sense to me. I will try to work out a patch, and it's good to not break building when kernel adds the same MIDR x4 definition. Thanks, Leo > But I suppose from a consistency point of view we could add it in both > places. I'm not too fussed either way. > > > Thanks, > > Leo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-07-03 10:20 ` Leo Yan @ 2023-07-03 13:28 ` Suzuki K Poulose 2023-07-05 17:33 ` Namhyung Kim 0 siblings, 1 reply; 16+ messages in thread From: Suzuki K Poulose @ 2023-07-03 13:28 UTC (permalink / raw) To: Leo Yan, James Clark Cc: 蔡沅信, Mark Rutland, linux-perf-users On 03/07/2023 11:20, Leo Yan wrote: > On Mon, Jul 03, 2023 at 10:42:21AM +0100, James Clark wrote: > > [...] > >>>> This is on the tool side rather than in the kernel. But yes, if the data >>>> source encoding is the same as the existing ones please send a patch >>>> adding X4's MIDR to the list. >>> >>> I can do this; as elaborated above, I think we need two patches, one is >>> a kernel patch for adding MIDR and another patch is for perf tool to >>> consume the MIDR of X4. I would like to know if this is expected or >>> not from your side. >>> >> >> Personally I would just add it on the tool side only. If it's not really >> needed in the kernel then it doesn't make sense to add it. > > Makes sense to me. I will try to work out a patch, and it's good to not > break building when kernel adds the same MIDR x4 definition. Do we somehow sync (copy) the cputype.h for perf tool ? Or do we keep them in sync with a patch ? If it is the former, I wouldn't bother about updating the kernel headers. Also, how does the perf tool read the midr ? It is safer to read : /sys/devices/system/cpu/cpu<N>/regs/identification/midr_el1 than relying mrs emulation, which could get you migrated to another CPU and get a completely different MIDR_EL1 of a little CPU. Suzuki > > Thanks, > Leo > >> But I suppose from a consistency point of view we could add it in both >> places. I'm not too fussed either way. >> >>> Thanks, >>> Leo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-07-03 13:28 ` Suzuki K Poulose @ 2023-07-05 17:33 ` Namhyung Kim 2023-07-17 5:57 ` Leo Yan 0 siblings, 1 reply; 16+ messages in thread From: Namhyung Kim @ 2023-07-05 17:33 UTC (permalink / raw) To: Suzuki K Poulose Cc: Leo Yan, James Clark, 蔡沅信, Mark Rutland, linux-perf-users, Arnaldo Carvalho de Melo Hello, On Mon, Jul 3, 2023 at 6:33 AM Suzuki K Poulose <suzuki.poulose@arm.com> wrote: > > On 03/07/2023 11:20, Leo Yan wrote: > > On Mon, Jul 03, 2023 at 10:42:21AM +0100, James Clark wrote: > > > > [...] > > > >>>> This is on the tool side rather than in the kernel. But yes, if the data > >>>> source encoding is the same as the existing ones please send a patch > >>>> adding X4's MIDR to the list. > >>> > >>> I can do this; as elaborated above, I think we need two patches, one is > >>> a kernel patch for adding MIDR and another patch is for perf tool to > >>> consume the MIDR of X4. I would like to know if this is expected or > >>> not from your side. > >>> > >> > >> Personally I would just add it on the tool side only. If it's not really > >> needed in the kernel then it doesn't make sense to add it. > > > > Makes sense to me. I will try to work out a patch, and it's good to not > > break building when kernel adds the same MIDR x4 definition. > > Do we somehow sync (copy) the cputype.h for perf tool ? Or do we keep > them in sync with a patch ? > > If it is the former, I wouldn't bother about updating the kernel > headers. In general, perf tool has a copy of kernel headers and there's a script called check-headers.sh to verify they are in sync. We don't recommend kernel patches to touch the tool's copy. And it's done by tool devs separately. Thanks, Namhyung > > Also, how does the perf tool read the midr ? It is safer to read : > > /sys/devices/system/cpu/cpu<N>/regs/identification/midr_el1 > > than relying mrs emulation, which could get you migrated to another CPU > and get a completely different MIDR_EL1 of a little CPU. > > Suzuki > > > > > > > Thanks, > > Leo > > > >> But I suppose from a consistency point of view we could add it in both > >> places. I'm not too fussed either way. > >> > >>> Thanks, > >>> Leo > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-07-05 17:33 ` Namhyung Kim @ 2023-07-17 5:57 ` Leo Yan 2023-07-17 9:05 ` Suzuki K Poulose 0 siblings, 1 reply; 16+ messages in thread From: Leo Yan @ 2023-07-17 5:57 UTC (permalink / raw) To: Namhyung Kim Cc: Suzuki K Poulose, James Clark, 蔡沅信, Mark Rutland, linux-perf-users, Arnaldo Carvalho de Melo Hi Suzuki, On Wed, Jul 05, 2023 at 10:33:18AM -0700, Namhyung Kim wrote: > On Mon, Jul 3, 2023 at 6:33 AM Suzuki K Poulose <suzuki.poulose@arm.com> wrote: > > On 03/07/2023 11:20, Leo Yan wrote: > > > On Mon, Jul 03, 2023 at 10:42:21AM +0100, James Clark wrote: [...] > > Do we somehow sync (copy) the cputype.h for perf tool ? Or do we keep > > them in sync with a patch ? > > > > If it is the former, I wouldn't bother about updating the kernel > > headers. > > In general, perf tool has a copy of kernel headers and there's a script > called check-headers.sh to verify they are in sync. We don't > recommend kernel patches to touch the tool's copy. And it's done > by tool devs separately. We have the kernel header arch/arm64/include/asm/cputype.h, tools have a copy which is placed in tools/arch/arm64/include/asm/cputype.h. As Namhyung explained, usually the kernel header is firstly changed and then the tool developers send a separate patch to sync with kernel header. You could see a recent example by Arnaldo's patch [1]. By following this working model, I sent patch series to add Cortex-X4 CPU part and MIDR definitions. I personally think this is the best way for us to keep the alignment between the kernel header and tools header [2]. Please let me know if this doable or not? Sorry for late response due to vacation :) Thanks, Leo [1] https://lore.kernel.org/lkml/ZLFQ%2Ftu%2FATQwDEIW@kernel.org/ [2] https://lore.kernel.org/lkml/20230717054327.79815-1-leo.yan@linaro.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some questions about using the perf tool in ARM-SPE 2023-07-17 5:57 ` Leo Yan @ 2023-07-17 9:05 ` Suzuki K Poulose 0 siblings, 0 replies; 16+ messages in thread From: Suzuki K Poulose @ 2023-07-17 9:05 UTC (permalink / raw) To: Leo Yan, Namhyung Kim Cc: James Clark, 蔡沅信, Mark Rutland, linux-perf-users, Arnaldo Carvalho de Melo On 17/07/2023 06:57, Leo Yan wrote: > Hi Suzuki, > > On Wed, Jul 05, 2023 at 10:33:18AM -0700, Namhyung Kim wrote: >> On Mon, Jul 3, 2023 at 6:33 AM Suzuki K Poulose <suzuki.poulose@arm.com> wrote: >>> On 03/07/2023 11:20, Leo Yan wrote: >>>> On Mon, Jul 03, 2023 at 10:42:21AM +0100, James Clark wrote: > > [...] > >>> Do we somehow sync (copy) the cputype.h for perf tool ? Or do we keep >>> them in sync with a patch ? >>> >>> If it is the former, I wouldn't bother about updating the kernel >>> headers. >> >> In general, perf tool has a copy of kernel headers and there's a script >> called check-headers.sh to verify they are in sync. We don't >> recommend kernel patches to touch the tool's copy. And it's done >> by tool devs separately. > > We have the kernel header arch/arm64/include/asm/cputype.h, tools have a > copy which is placed in tools/arch/arm64/include/asm/cputype.h. > > As Namhyung explained, usually the kernel header is firstly changed and > then the tool developers send a separate patch to sync with kernel > header. You could see a recent example by Arnaldo's patch [1]. > > By following this working model, I sent patch series to add Cortex-X4 > CPU part and MIDR definitions. I personally think this is the best > way for us to keep the alignment between the kernel header and tools > header [2]. Please let me know if this doable or not? Sure, if the tool relies on syncing the kernel headers, thats fine. > > Sorry for late response due to vacation :) No worries, I will dig the series from the mailing list and respond there. Thanks for sending this out. Suzuki > > Thanks, > Leo > > [1] https://lore.kernel.org/lkml/ZLFQ%2Ftu%2FATQwDEIW@kernel.org/ > [2] https://lore.kernel.org/lkml/20230717054327.79815-1-leo.yan@linaro.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-07-17  9:05 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-09  7:14 Some questions about using the perf tool in ARM-SPE 蔡沅信
2023-06-09  9:06 ` James Clark
     [not found]   ` <CALDTKqg01+xJ2xu218c_QH2PbX9wdhYOiJfDieCXL5PHWV-6FQ@mail.gmail.com>
     [not found]     ` <77773641-26e5-a754-63cf-e7d3443e11fc@arm.com>
2023-06-13 13:23       ` 蔡沅信
2023-06-14  1:21         ` Leo Yan
     [not found]           ` <CALDTKqgz6=WFs=bMvnFkKv5kt5OP5wtUqQ2uekVbumCxNqeRXw@mail.gmail.com>
2023-06-14  6:08             ` 蔡沅信
2023-06-18  9:28               ` Leo Yan
2023-07-01  5:25                 ` 蔡沅信
2023-07-03  8:18           ` Suzuki K Poulose
2023-07-03  8:24             ` James Clark
2023-07-03  9:39               ` Leo Yan
2023-07-03  9:42                 ` James Clark
2023-07-03 10:20                   ` Leo Yan
2023-07-03 13:28                     ` Suzuki K Poulose
2023-07-05 17:33                       ` Namhyung Kim
2023-07-17  5:57                         ` Leo Yan
2023-07-17  9:05                           ` Suzuki K Poulose
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).