From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Jiri Olsa <jolsa@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Andi Kleen <ak@linux.intel.com>,
Eranian Stephane <eranian@google.com>,
Mark Rutland <mark.rutland@arm.com>,
broonie@kernel.org, Ravi Bangoria <ravi.bangoria@amd.com>,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Zide Chen <zide.chen@intel.com>,
Falcon Thomas <thomas.falcon@intel.com>,
Dapeng Mi <dapeng1.mi@intel.com>,
Xudong Hao <xudong.hao@intel.com>,
Kan Liang <kan.liang@linux.intel.com>
Subject: Re: [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format
Date: Tue, 20 Jan 2026 11:04:53 +0800 [thread overview]
Message-ID: <478c90df-61a8-4e19-a640-931ce791fe97@linux.intel.com> (raw)
In-Reply-To: <CAP-5=fUNeJpj-rdi+552-o9xhCiT41q-Q8Zhg8U7ev0Wbiyhog@mail.gmail.com>
On 1/20/2026 4:25 AM, Ian Rogers wrote:
> On Sun, Jan 18, 2026 at 10:55 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>
>> On 1/17/2026 1:50 PM, Ian Rogers wrote:
>>> On Mon, Jan 5, 2026 at 11:27 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>>> Ian,
>>>>
>>>> I looked at these perf regs __weak helpers again, like
>>>> arch__intr_reg_mask()/arch__user_reg_mask(). It could be really hard to
>>>> eliminate these __weak helpers and convert them into a generic function
>>>> like perf_reg_name(). All these __weak helpers are arch-dependent and
>>>> usually need to call perf_event_open sysctrl to get the required registers
>>>> mask. So even we convert them into a generic function, we still have no way
>>>> to get the registers mask of a different arch, like get x86 registers mask
>>>> on arm machine. Another reason is that these __weak helpers may contain
>>>> some arch-specific instructions. If we want to convert them into a general
>>>> perf function like perf_reg_name(). It may cause building error since these
>>>> arch-specific instructions may not exist on the building machine.
>>> Hi Dapeng,
>>>
>>> There was already a patch to better support cross architecture
>>> libdw-unwind-ing and I've just sent out a series to clean this up so
>>> that this is achieved by having mapping functions between perf and
>>> dwarf register names. The functions use the e_machine of the binary to
>>> determine how to map, etc. The series is here:
>>> https://lore.kernel.org/lkml/20260117052849.2205545-1-irogers@google.com/
>>> and I think it can be the foundation for avoiding the weak functions.
>> Hi Ian,
>>
>> Thanks for the reference patch. But they are different. The reference
>> patches mainly parse the regs from perf.data and the __weak functions can
>> be eliminated in the parsing phase since the registers bitmap is fixed for
>> a fixed arch. While these __weak functions
>> arch__intr_reg_mask()/arch__user_reg_mask() are used to obtain the support
>> sampling registers on a specific platform.
>>
>> We know different platforms even for same arch may support different
>> registers, e.g., some x86 platforms may only support XMM registers, but
>> some others may support XMM/YMM/ZMM registers, then all these arch-specific
>> arch__intr_reg_mask()/arch__user_reg_mask() functions have to depend on the
>> perf_event_open() syscall to retrieve the supported registers mask from kernel.
>>
>> Thus, it becomes impossible to retrieve the supported registers mask for a
>> x86 specific platform from running on a arm platform.
>>
>> Even we don't consider this limitation and forcibly convert the
>> __weak arch__intr_reg_mask() function to some kind of below function, just
>> like currently what perf_reg_name() does.
>>
>> uint64_t perf_intr_reg_mask(const char *arch)
>> {
>> uint64_t mask = 0;
>>
>> if (!strcmp(arch, "csky"))
>> mask = perf_intr_reg_mask_csky(id);
>> else if (!strcmp(arch, "loongarch"))
>> mask = perf_intr_reg_mask_loongarch(id);
>> else if (!strcmp(arch, "mips"))
>> mask = perf_intr_reg_mask_mips(id);
>> else if (!strcmp(arch, "powerpc"))
>> mask = perf_intr_reg_mask_powerpc(id);
>> else if (!strcmp(arch, "riscv"))
>> mask = perf_intr_reg_mask_riscv(id);
>> else if (!strcmp(arch, "s390"))
>> mask = perf_intr_reg_mask_s390(id);
>> else if (!strcmp(arch, "x86"))
>> mask = perf_intr_reg_mask_x86(id);
>> else if (!strcmp(arch, "arm"))
>> mask = perf_intr_reg_mask_arm(id);
>> else if (!strcmp(arch, "arm64"))
>> mask = perf_intr_reg_mask_arm64(id);
>>
>> return mask;
>> }
>>
>> But currently there are some arch-dependent instructions in these
>> arch-specific instructions, like the below code in powerpc specific
>> arch__intr_reg_mask().
>>
>> version = (((mfspr(SPRN_PVR)) >> 16) & 0xFFFF);
>>
>> mfspr is a powerpc specific instruction, building this converted
>> perf_intr_reg_mask on non-powerpc platform would lead to building error.
> Hi Dapeng,
>
> So my main point is the arch directory and ifdefs, how do they differ
> from writing code that uses the ELF machine? For example, your code
> uses the arch/x86 directory and has ifdefs on
> HAVE_ARCH_X86_64_SUPPORT. How is that different from:
> ```
> switch(e_machine) {
> case EM_X86_64:
> ...
> case EM_I386:
> ...
> default:
> return 0;
> }
> ```
> If we need to determine for the current running machine then e_machine
> can equal EM_HOST that is set up for this purpose.
I think the key factor that determines if we can convert the code into
above e_machine switch ... case format is whether the code is
architecture-dependent both in building and execution phases.
If the code is not architecture-dependent, It's good to covert the code
into the e_machine switch ... case and that would provide better applicability.
Otherwise, the architecture-dependent code would lead to the building error
(building phase) or get incorrect execution results (execution phase).
Even if we introduce EM_HOST case, it won't really solve the building
error, instead it may introduce new building error, e.g.,
```
switch(e_machine) {
case EM_HOST:
...
case EM_X86_64:
...
case EM_I386:
...
default:
return 0;
}
```
Assume the code is built on a x86_64 machine, then EM_HOST equals
EM_X86_64, that would cause the "duplicate case value" building error.
If we want to limit the architecture-dependent code is built only on the
correct architecture, then we still have to introduce the architecture
#ifdefs. This is actually no difference with current arch directory __weak
functions and make it more complex.
>
> I agree that determining features needs calls that may not be
> supported on other architectures. That should yield EOPNOTSUPP and we
> can use information like that to populate generic information like the
> PMU missing features:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n190
> we also probe API support with:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/perf_api_probe.h?h=perf-tools-next
In general, I agree we can return EOPNOTSUPP or some generic information
for some architecture independent code. But it's not applicable for these 2
specific arch__intr_reg_mask()/arch__user_reg_mask() functions, current
perf code depends on these 2 functions to return the supported registers
mask on a specific (running) platform.
>
> The current code doing lots of string comparisons is unnecessary
> overhead and imprecise (x86 is used for both 32-bit and 64-bit x86).
> It is removed in the series I linked to, I think we can eventually get
> rid of the whole arch string for similar reasons of trying to minimize
> the use of the arch directory. I'm curious what happens with APX, will
> the e_machine change? We may need to pass in the sample regs_dump's
> abi field for cases like this.
Yes, I agree we should git rid of the arch-string comparison and minimize
the use of arch directory. It would improve the efficiency.
I don't think the support of APX would change the e_machine, it should
still be EM_X86_64.
Yes, we need the abi filed (exactly PERF_SAMPLE_REGS_ABI_SIMD) to determine
it's APX or legacy XMM.
>
> My point on the unwinding is that the sample register mask appears to
> be set up the same regardless, whereas for stack samples
> (--call-graph=dwarf) maybe just sample IP and SP suffices. So perhaps
> there should be additional registers to set up the sample mask.
Yes, that's true. It can be further optimized.
>
> By avoiding the arch functions we can avoid the problem of broken
> cross architecture support, we can also lay the groundwork for support
> on different architectures that may want to do similar things. I agree
> that doesn't matter until >1 architecture is trying to have more
> register masks, my concern is trying to keep the code generic and
> trying to make sure cross architecture is working. New weak functions
> is going in the opposite direction to that.
Yes, I agree we should git rid of these arch functions as much as possible.
But for these architecture dependent code (as above shows), it seems the
__weak functions are still the simplest and best way to handle them.
Thanks.
>
> Thanks,
> Ian
>
>> -Dapeng Mi
>>
>>> I also noticed that I think we're sampling the XMM registers for dwarf
>>> unwinding, but it seems unlikely the XMM registers will hold stack
>>> frame information - so this is probably an x86 inefficiency.
>>>
>>> Thanks,
>>> Ian
>>>
next prev parent reply other threads:[~2026-01-20 3:05 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 6:54 [Patch v5 00/19] Support SIMD/eGPRs/SSP registers sampling for perf Dapeng Mi
2025-12-03 6:54 ` [Patch v5 01/19] perf: Eliminate duplicate arch-specific functions definations Dapeng Mi
2025-12-03 6:54 ` [Patch v5 02/19] perf/x86: Use x86_perf_regs in the x86 nmi handler Dapeng Mi
2025-12-03 6:54 ` [Patch v5 03/19] perf/x86: Introduce x86-specific x86_pmu_setup_regs_data() Dapeng Mi
2025-12-03 6:54 ` [Patch v5 04/19] x86/fpu/xstate: Add xsaves_nmi() helper Dapeng Mi
2025-12-03 6:54 ` [Patch v5 05/19] perf: Move and rename has_extended_regs() for ARCH-specific use Dapeng Mi
2025-12-03 6:54 ` [Patch v5 06/19] perf/x86: Add support for XMM registers in non-PEBS and REGS_USER Dapeng Mi
2025-12-04 15:17 ` Peter Zijlstra
2025-12-04 15:47 ` Peter Zijlstra
2025-12-05 6:37 ` Mi, Dapeng
2025-12-04 18:59 ` Dave Hansen
2025-12-05 8:42 ` Peter Zijlstra
2025-12-03 6:54 ` [Patch v5 07/19] perf: Add sampling support for SIMD registers Dapeng Mi
2025-12-05 11:07 ` Peter Zijlstra
2025-12-08 5:24 ` Mi, Dapeng
2025-12-05 11:40 ` Peter Zijlstra
2025-12-08 6:00 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 08/19] perf/x86: Enable XMM sampling using sample_simd_vec_reg_* fields Dapeng Mi
2025-12-05 11:25 ` Peter Zijlstra
2025-12-08 6:10 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 09/19] perf/x86: Enable YMM " Dapeng Mi
2025-12-03 6:54 ` [Patch v5 10/19] perf/x86: Enable ZMM " Dapeng Mi
2025-12-03 6:54 ` [Patch v5 11/19] perf/x86: Enable OPMASK sampling using sample_simd_pred_reg_* fields Dapeng Mi
2025-12-03 6:54 ` [Patch v5 12/19] perf/x86: Enable eGPRs sampling using sample_regs_* fields Dapeng Mi
2025-12-05 12:16 ` Peter Zijlstra
2025-12-08 6:11 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 13/19] perf/x86: Enable SSP " Dapeng Mi
2025-12-05 12:20 ` Peter Zijlstra
2025-12-08 6:21 ` Mi, Dapeng
2025-12-24 5:45 ` Ravi Bangoria
2025-12-24 6:26 ` Mi, Dapeng
2026-01-06 6:55 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 14/19] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS capability Dapeng Mi
2025-12-03 6:54 ` [Patch v5 15/19] perf/x86/intel: Enable arch-PEBS based SIMD/eGPRs/SSP sampling Dapeng Mi
2025-12-03 6:54 ` [Patch v5 16/19] perf/x86: Activate back-to-back NMI detection for arch-PEBS induced NMIs Dapeng Mi
2025-12-05 12:39 ` Peter Zijlstra
2025-12-07 20:44 ` Andi Kleen
2025-12-08 6:46 ` Mi, Dapeng
2025-12-08 8:50 ` Peter Zijlstra
2025-12-08 8:53 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 17/19] perf headers: Sync with the kernel headers Dapeng Mi
2025-12-03 23:43 ` Ian Rogers
2025-12-04 1:37 ` Mi, Dapeng
2025-12-04 7:28 ` Ian Rogers
2026-01-20 7:01 ` Ian Rogers
2026-01-20 7:25 ` Mi, Dapeng
2026-01-20 7:16 ` Ian Rogers
2026-01-20 7:43 ` Mi, Dapeng
2026-01-20 8:00 ` Ian Rogers
2026-01-20 9:22 ` Mi, Dapeng
2026-01-20 18:11 ` Ian Rogers
2026-01-21 2:03 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format Dapeng Mi
2025-12-04 0:17 ` Ian Rogers
2025-12-04 2:58 ` Mi, Dapeng
2025-12-04 7:49 ` Ian Rogers
2025-12-04 9:20 ` Mi, Dapeng
2025-12-04 16:16 ` Ian Rogers
2025-12-05 4:00 ` Mi, Dapeng
2025-12-05 6:38 ` Ian Rogers
2025-12-05 8:10 ` Mi, Dapeng
2025-12-05 16:35 ` Ian Rogers
2025-12-08 4:20 ` Mi, Dapeng
2026-01-06 7:27 ` Mi, Dapeng
2026-01-17 5:50 ` Ian Rogers
2026-01-19 6:55 ` Mi, Dapeng
2026-01-19 20:25 ` Ian Rogers
2026-01-20 3:04 ` Mi, Dapeng [this message]
2026-01-20 5:16 ` Ian Rogers
2026-01-20 6:46 ` Mi, Dapeng
2026-01-20 6:56 ` Ian Rogers
2026-01-20 7:39 ` Ian Rogers
2026-01-20 9:04 ` Mi, Dapeng
2026-01-20 18:20 ` Ian Rogers
2026-01-21 5:17 ` Mi, Dapeng
2026-01-21 7:09 ` Ian Rogers
2026-01-21 7:52 ` Mi, Dapeng
2026-01-21 14:48 ` Ian Rogers
2026-01-22 1:49 ` Mi, Dapeng
2026-01-22 7:27 ` Ian Rogers
2026-01-22 8:29 ` Mi, Dapeng
2025-12-03 6:55 ` [Patch v5 19/19] perf regs: Enable dumping of SIMD registers Dapeng Mi
2025-12-04 0:24 ` [Patch v5 00/19] Support SIMD/eGPRs/SSP registers sampling for perf Ian Rogers
2025-12-04 3:28 ` Mi, Dapeng
2025-12-16 4:42 ` Ravi Bangoria
2025-12-16 6:59 ` Mi, Dapeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=478c90df-61a8-4e19-a640-931ce791fe97@linux.intel.com \
--to=dapeng1.mi@linux.intel.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ak@linux.intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=broonie@kernel.org \
--cc=dapeng1.mi@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=eranian@google.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=ravi.bangoria@amd.com \
--cc=tglx@linutronix.de \
--cc=thomas.falcon@intel.com \
--cc=xudong.hao@intel.com \
--cc=zide.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox