From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Jiri Olsa <jolsa@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Andi Kleen <ak@linux.intel.com>,
Eranian Stephane <eranian@google.com>,
Mark Rutland <mark.rutland@arm.com>,
broonie@kernel.org, Ravi Bangoria <ravi.bangoria@amd.com>,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Zide Chen <zide.chen@intel.com>,
Falcon Thomas <thomas.falcon@intel.com>,
Dapeng Mi <dapeng1.mi@intel.com>,
Xudong Hao <xudong.hao@intel.com>,
Kan Liang <kan.liang@linux.intel.com>
Subject: Re: [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format
Date: Tue, 20 Jan 2026 11:04:53 +0800 [thread overview]
Message-ID: <478c90df-61a8-4e19-a640-931ce791fe97@linux.intel.com> (raw)
In-Reply-To: <CAP-5=fUNeJpj-rdi+552-o9xhCiT41q-Q8Zhg8U7ev0Wbiyhog@mail.gmail.com>
On 1/20/2026 4:25 AM, Ian Rogers wrote:
> On Sun, Jan 18, 2026 at 10:55 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>
>> On 1/17/2026 1:50 PM, Ian Rogers wrote:
>>> On Mon, Jan 5, 2026 at 11:27 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>>> Ian,
>>>>
>>>> I looked at these perf regs __weak helpers again, like
>>>> arch__intr_reg_mask()/arch__user_reg_mask(). It could be really hard to
>>>> eliminate these __weak helpers and convert them into a generic function
>>>> like perf_reg_name(). All these __weak helpers are arch-dependent and
>>>> usually need to call perf_event_open sysctrl to get the required registers
>>>> mask. So even we convert them into a generic function, we still have no way
>>>> to get the registers mask of a different arch, like get x86 registers mask
>>>> on arm machine. Another reason is that these __weak helpers may contain
>>>> some arch-specific instructions. If we want to convert them into a general
>>>> perf function like perf_reg_name(). It may cause building error since these
>>>> arch-specific instructions may not exist on the building machine.
>>> Hi Dapeng,
>>>
>>> There was already a patch to better support cross architecture
>>> libdw-unwind-ing and I've just sent out a series to clean this up so
>>> that this is achieved by having mapping functions between perf and
>>> dwarf register names. The functions use the e_machine of the binary to
>>> determine how to map, etc. The series is here:
>>> https://lore.kernel.org/lkml/20260117052849.2205545-1-irogers@google.com/
>>> and I think it can be the foundation for avoiding the weak functions.
>> Hi Ian,
>>
>> Thanks for the reference patch. But they are different. The reference
>> patches mainly parse the regs from perf.data and the __weak functions can
>> be eliminated in the parsing phase since the registers bitmap is fixed for
>> a fixed arch. While these __weak functions
>> arch__intr_reg_mask()/arch__user_reg_mask() are used to obtain the support
>> sampling registers on a specific platform.
>>
>> We know different platforms even for same arch may support different
>> registers, e.g., some x86 platforms may only support XMM registers, but
>> some others may support XMM/YMM/ZMM registers, then all these arch-specific
>> arch__intr_reg_mask()/arch__user_reg_mask() functions have to depend on the
>> perf_event_open() syscall to retrieve the supported registers mask from kernel.
>>
>> Thus, it becomes impossible to retrieve the supported registers mask for a
>> x86 specific platform from running on a arm platform.
>>
>> Even we don't consider this limitation and forcibly convert the
>> __weak arch__intr_reg_mask() function to some kind of below function, just
>> like currently what perf_reg_name() does.
>>
>> uint64_t perf_intr_reg_mask(const char *arch)
>> {
>> uint64_t mask = 0;
>>
>> if (!strcmp(arch, "csky"))
>> mask = perf_intr_reg_mask_csky(id);
>> else if (!strcmp(arch, "loongarch"))
>> mask = perf_intr_reg_mask_loongarch(id);
>> else if (!strcmp(arch, "mips"))
>> mask = perf_intr_reg_mask_mips(id);
>> else if (!strcmp(arch, "powerpc"))
>> mask = perf_intr_reg_mask_powerpc(id);
>> else if (!strcmp(arch, "riscv"))
>> mask = perf_intr_reg_mask_riscv(id);
>> else if (!strcmp(arch, "s390"))
>> mask = perf_intr_reg_mask_s390(id);
>> else if (!strcmp(arch, "x86"))
>> mask = perf_intr_reg_mask_x86(id);
>> else if (!strcmp(arch, "arm"))
>> mask = perf_intr_reg_mask_arm(id);
>> else if (!strcmp(arch, "arm64"))
>> mask = perf_intr_reg_mask_arm64(id);
>>
>> return mask;
>> }
>>
>> But currently there are some arch-dependent instructions in these
>> arch-specific instructions, like the below code in powerpc specific
>> arch__intr_reg_mask().
>>
>> version = (((mfspr(SPRN_PVR)) >> 16) & 0xFFFF);
>>
>> mfspr is a powerpc specific instruction, building this converted
>> perf_intr_reg_mask on non-powerpc platform would lead to building error.
> Hi Dapeng,
>
> So my main point is the arch directory and ifdefs, how do they differ
> from writing code that uses the ELF machine? For example, your code
> uses the arch/x86 directory and has ifdefs on
> HAVE_ARCH_X86_64_SUPPORT. How is that different from:
> ```
> switch(e_machine) {
> case EM_X86_64:
> ...
> case EM_I386:
> ...
> default:
> return 0;
> }
> ```
> If we need to determine for the current running machine then e_machine
> can equal EM_HOST that is set up for this purpose.
I think the key factor that determines if we can convert the code into
above e_machine switch ... case format is whether the code is
architecture-dependent both in building and execution phases.
If the code is not architecture-dependent, It's good to covert the code
into the e_machine switch ... case and that would provide better applicability.
Otherwise, the architecture-dependent code would lead to the building error
(building phase) or get incorrect execution results (execution phase).
Even if we introduce EM_HOST case, it won't really solve the building
error, instead it may introduce new building error, e.g.,
```
switch(e_machine) {
case EM_HOST:
...
case EM_X86_64:
...
case EM_I386:
...
default:
return 0;
}
```
Assume the code is built on a x86_64 machine, then EM_HOST equals
EM_X86_64, that would cause the "duplicate case value" building error.
If we want to limit the architecture-dependent code is built only on the
correct architecture, then we still have to introduce the architecture
#ifdefs. This is actually no difference with current arch directory __weak
functions and make it more complex.
>
> I agree that determining features needs calls that may not be
> supported on other architectures. That should yield EOPNOTSUPP and we
> can use information like that to populate generic information like the
> PMU missing features:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n190
> we also probe API support with:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/perf_api_probe.h?h=perf-tools-next
In general, I agree we can return EOPNOTSUPP or some generic information
for some architecture independent code. But it's not applicable for these 2
specific arch__intr_reg_mask()/arch__user_reg_mask() functions, current
perf code depends on these 2 functions to return the supported registers
mask on a specific (running) platform.
>
> The current code doing lots of string comparisons is unnecessary
> overhead and imprecise (x86 is used for both 32-bit and 64-bit x86).
> It is removed in the series I linked to, I think we can eventually get
> rid of the whole arch string for similar reasons of trying to minimize
> the use of the arch directory. I'm curious what happens with APX, will
> the e_machine change? We may need to pass in the sample regs_dump's
> abi field for cases like this.
Yes, I agree we should git rid of the arch-string comparison and minimize
the use of arch directory. It would improve the efficiency.
I don't think the support of APX would change the e_machine, it should
still be EM_X86_64.
Yes, we need the abi filed (exactly PERF_SAMPLE_REGS_ABI_SIMD) to determine
it's APX or legacy XMM.
>
> My point on the unwinding is that the sample register mask appears to
> be set up the same regardless, whereas for stack samples
> (--call-graph=dwarf) maybe just sample IP and SP suffices. So perhaps
> there should be additional registers to set up the sample mask.
Yes, that's true. It can be further optimized.
>
> By avoiding the arch functions we can avoid the problem of broken
> cross architecture support, we can also lay the groundwork for support
> on different architectures that may want to do similar things. I agree
> that doesn't matter until >1 architecture is trying to have more
> register masks, my concern is trying to keep the code generic and
> trying to make sure cross architecture is working. New weak functions
> is going in the opposite direction to that.
Yes, I agree we should git rid of these arch functions as much as possible.
But for these architecture dependent code (as above shows), it seems the
__weak functions are still the simplest and best way to handle them.
Thanks.
>
> Thanks,
> Ian
>
>> -Dapeng Mi
>>
>>> I also noticed that I think we're sampling the XMM registers for dwarf
>>> unwinding, but it seems unlikely the XMM registers will hold stack
>>> frame information - so this is probably an x86 inefficiency.
>>>
>>> Thanks,
>>> Ian
>>>
next prev parent reply other threads:[~2026-01-20 3:05 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 6:54 [Patch v5 00/19] Support SIMD/eGPRs/SSP registers sampling for perf Dapeng Mi
2025-12-03 6:54 ` [Patch v5 01/19] perf: Eliminate duplicate arch-specific functions definations Dapeng Mi
2025-12-03 6:54 ` [Patch v5 02/19] perf/x86: Use x86_perf_regs in the x86 nmi handler Dapeng Mi
2025-12-03 6:54 ` [Patch v5 03/19] perf/x86: Introduce x86-specific x86_pmu_setup_regs_data() Dapeng Mi
2025-12-03 6:54 ` [Patch v5 04/19] x86/fpu/xstate: Add xsaves_nmi() helper Dapeng Mi
2025-12-03 6:54 ` [Patch v5 05/19] perf: Move and rename has_extended_regs() for ARCH-specific use Dapeng Mi
2025-12-03 6:54 ` [Patch v5 06/19] perf/x86: Add support for XMM registers in non-PEBS and REGS_USER Dapeng Mi
2025-12-04 15:17 ` Peter Zijlstra
2025-12-04 15:47 ` Peter Zijlstra
2025-12-05 6:37 ` Mi, Dapeng
2025-12-04 18:59 ` Dave Hansen
2025-12-05 8:42 ` Peter Zijlstra
2025-12-03 6:54 ` [Patch v5 07/19] perf: Add sampling support for SIMD registers Dapeng Mi
2025-12-05 11:07 ` Peter Zijlstra
2025-12-08 5:24 ` Mi, Dapeng
2025-12-05 11:40 ` Peter Zijlstra
2025-12-08 6:00 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 08/19] perf/x86: Enable XMM sampling using sample_simd_vec_reg_* fields Dapeng Mi
2025-12-05 11:25 ` Peter Zijlstra
2025-12-08 6:10 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 09/19] perf/x86: Enable YMM " Dapeng Mi
2025-12-03 6:54 ` [Patch v5 10/19] perf/x86: Enable ZMM " Dapeng Mi
2025-12-03 6:54 ` [Patch v5 11/19] perf/x86: Enable OPMASK sampling using sample_simd_pred_reg_* fields Dapeng Mi
2025-12-03 6:54 ` [Patch v5 12/19] perf/x86: Enable eGPRs sampling using sample_regs_* fields Dapeng Mi
2025-12-05 12:16 ` Peter Zijlstra
2025-12-08 6:11 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 13/19] perf/x86: Enable SSP " Dapeng Mi
2025-12-05 12:20 ` Peter Zijlstra
2025-12-08 6:21 ` Mi, Dapeng
2025-12-24 5:45 ` Ravi Bangoria
2025-12-24 6:26 ` Mi, Dapeng
2026-01-06 6:55 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 14/19] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS capability Dapeng Mi
2025-12-03 6:54 ` [Patch v5 15/19] perf/x86/intel: Enable arch-PEBS based SIMD/eGPRs/SSP sampling Dapeng Mi
2025-12-03 6:54 ` [Patch v5 16/19] perf/x86: Activate back-to-back NMI detection for arch-PEBS induced NMIs Dapeng Mi
2025-12-05 12:39 ` Peter Zijlstra
2025-12-07 20:44 ` Andi Kleen
2025-12-08 6:46 ` Mi, Dapeng
2025-12-08 8:50 ` Peter Zijlstra
2025-12-08 8:53 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 17/19] perf headers: Sync with the kernel headers Dapeng Mi
2025-12-03 23:43 ` Ian Rogers
2025-12-04 1:37 ` Mi, Dapeng
2025-12-04 7:28 ` Ian Rogers
2026-01-20 7:01 ` Ian Rogers
2026-01-20 7:25 ` Mi, Dapeng
2026-01-20 7:16 ` Ian Rogers
2026-01-20 7:43 ` Mi, Dapeng
2026-01-20 8:00 ` Ian Rogers
2026-01-20 9:22 ` Mi, Dapeng
2026-01-20 18:11 ` Ian Rogers
2026-01-21 2:03 ` Mi, Dapeng
2025-12-03 6:54 ` [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format Dapeng Mi
2025-12-04 0:17 ` Ian Rogers
2025-12-04 2:58 ` Mi, Dapeng
2025-12-04 7:49 ` Ian Rogers
2025-12-04 9:20 ` Mi, Dapeng
2025-12-04 16:16 ` Ian Rogers
2025-12-05 4:00 ` Mi, Dapeng
2025-12-05 6:38 ` Ian Rogers
2025-12-05 8:10 ` Mi, Dapeng
2025-12-05 16:35 ` Ian Rogers
2025-12-08 4:20 ` Mi, Dapeng
2026-01-06 7:27 ` Mi, Dapeng
2026-01-17 5:50 ` Ian Rogers
2026-01-19 6:55 ` Mi, Dapeng
2026-01-19 20:25 ` Ian Rogers
2026-01-20 3:04 ` Mi, Dapeng [this message]
2026-01-20 5:16 ` Ian Rogers
2026-01-20 6:46 ` Mi, Dapeng
2026-01-20 6:56 ` Ian Rogers
2026-01-20 7:39 ` Ian Rogers
2026-01-20 9:04 ` Mi, Dapeng
2026-01-20 18:20 ` Ian Rogers
2026-01-21 5:17 ` Mi, Dapeng
2026-01-21 7:09 ` Ian Rogers
2026-01-21 7:52 ` Mi, Dapeng
2026-01-21 14:48 ` Ian Rogers
2026-01-22 1:49 ` Mi, Dapeng
2026-01-22 7:27 ` Ian Rogers
2026-01-22 8:29 ` Mi, Dapeng
2025-12-03 6:55 ` [Patch v5 19/19] perf regs: Enable dumping of SIMD registers Dapeng Mi
2025-12-04 0:24 ` [Patch v5 00/19] Support SIMD/eGPRs/SSP registers sampling for perf Ian Rogers
2025-12-04 3:28 ` Mi, Dapeng
2025-12-16 4:42 ` Ravi Bangoria
2025-12-16 6:59 ` Mi, Dapeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=478c90df-61a8-4e19-a640-931ce791fe97@linux.intel.com \
--to=dapeng1.mi@linux.intel.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ak@linux.intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=broonie@kernel.org \
--cc=dapeng1.mi@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=eranian@google.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=ravi.bangoria@amd.com \
--cc=tglx@linutronix.de \
--cc=thomas.falcon@intel.com \
--cc=xudong.hao@intel.com \
--cc=zide.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.