Re: [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format

public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed

From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Eranian Stephane <eranian@google.com>,
	Mark Rutland <mark.rutland@arm.com>,
	broonie@kernel.org, Ravi Bangoria <ravi.bangoria@amd.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Zide Chen <zide.chen@intel.com>,
	Falcon Thomas <thomas.falcon@intel.com>,
	Dapeng Mi <dapeng1.mi@intel.com>,
	Xudong Hao <xudong.hao@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>
Subject: Re: [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format
Date: Tue, 20 Jan 2026 11:04:53 +0800	[thread overview]
Message-ID: <478c90df-61a8-4e19-a640-931ce791fe97@linux.intel.com> (raw)
In-Reply-To: <CAP-5=fUNeJpj-rdi+552-o9xhCiT41q-Q8Zhg8U7ev0Wbiyhog@mail.gmail.com>


On 1/20/2026 4:25 AM, Ian Rogers wrote:
> On Sun, Jan 18, 2026 at 10:55 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>
>> On 1/17/2026 1:50 PM, Ian Rogers wrote:
>>> On Mon, Jan 5, 2026 at 11:27 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>>> Ian,
>>>>
>>>> I looked at these perf regs __weak helpers again, like
>>>> arch__intr_reg_mask()/arch__user_reg_mask(). It could be really hard to
>>>> eliminate these __weak helpers and convert them into a generic function
>>>> like perf_reg_name(). All these __weak helpers are arch-dependent and
>>>> usually need to call perf_event_open sysctrl to get the required registers
>>>> mask. So even we convert them into a generic function, we still have no way
>>>> to get the registers mask of a different arch, like get x86 registers mask
>>>> on arm machine. Another reason is that these __weak helpers may contain
>>>> some arch-specific instructions. If we want to convert them into a general
>>>> perf function like perf_reg_name(). It may cause building error since these
>>>> arch-specific instructions may not exist on the building machine.
>>> Hi Dapeng,
>>>
>>> There was already a patch to better support cross architecture
>>> libdw-unwind-ing and I've just sent out a series to clean this up so
>>> that this is achieved by having mapping functions between perf and
>>> dwarf register names. The functions use the e_machine of the binary to
>>> determine how to map, etc. The series is here:
>>> https://lore.kernel.org/lkml/20260117052849.2205545-1-irogers@google.com/
>>> and I think it can be the foundation for avoiding the weak functions.
>> Hi Ian,
>>
>> Thanks for the reference patch. But they are different. The reference
>> patches mainly parse the regs from perf.data and the __weak functions can
>> be eliminated in the parsing phase since the registers bitmap is fixed for
>> a fixed arch. While these __weak functions
>> arch__intr_reg_mask()/arch__user_reg_mask() are used to obtain the support
>> sampling registers on a specific platform.
>>
>> We know different platforms even for same arch may support different
>> registers, e.g., some x86 platforms may only support XMM registers, but
>> some others may support XMM/YMM/ZMM registers, then all these arch-specific
>> arch__intr_reg_mask()/arch__user_reg_mask() functions have to depend on the
>> perf_event_open() syscall to retrieve the supported registers mask from kernel.
>>
>> Thus, it becomes impossible to retrieve the supported registers mask for a
>> x86 specific platform from running on a arm platform.
>>
>> Even we don't consider this limitation and forcibly convert the
>> __weak arch__intr_reg_mask() function to some kind of below function, just
>> like currently what perf_reg_name() does.
>>
>> uint64_t perf_intr_reg_mask(const char *arch)
>> {
>>     uint64_t mask = 0;
>>
>>     if (!strcmp(arch, "csky"))
>>         mask = perf_intr_reg_mask_csky(id);
>>     else if (!strcmp(arch, "loongarch"))
>>         mask = perf_intr_reg_mask_loongarch(id);
>>     else if (!strcmp(arch, "mips"))
>>         mask = perf_intr_reg_mask_mips(id);
>>     else if (!strcmp(arch, "powerpc"))
>>         mask = perf_intr_reg_mask_powerpc(id);
>>     else if (!strcmp(arch, "riscv"))
>>         mask = perf_intr_reg_mask_riscv(id);
>>     else if (!strcmp(arch, "s390"))
>>         mask = perf_intr_reg_mask_s390(id);
>>     else if (!strcmp(arch, "x86"))
>>         mask = perf_intr_reg_mask_x86(id);
>>     else if (!strcmp(arch, "arm"))
>>         mask = perf_intr_reg_mask_arm(id);
>>     else if (!strcmp(arch, "arm64"))
>>         mask = perf_intr_reg_mask_arm64(id);
>>
>>     return mask;
>> }
>>
>> But currently there are some arch-dependent instructions in these
>> arch-specific instructions, like the below code in powerpc specific
>> arch__intr_reg_mask().
>>
>>     version = (((mfspr(SPRN_PVR)) >>  16) & 0xFFFF);
>>
>> mfspr is a powerpc specific instruction, building this converted
>> perf_intr_reg_mask on non-powerpc platform would lead to building error.
> Hi Dapeng,
>
> So my main point is the arch directory and ifdefs, how do they differ
> from writing code that uses the ELF machine? For example, your code
> uses the arch/x86 directory and has ifdefs on
> HAVE_ARCH_X86_64_SUPPORT. How is that different from:
> ```
> switch(e_machine) {
> case EM_X86_64:
> ...
> case EM_I386:
> ...
> default:
> return 0;
> }
> ```
> If we need to determine for the current running machine then e_machine
> can equal EM_HOST that is set up for this purpose.

I think the key factor that determines if we can convert the code into
above e_machine switch ... case format is whether the code is
architecture-dependent both in building and execution phases.

If the code is not architecture-dependent, It's good to covert the code
into the e_machine switch ... case and that would provide better applicability.

Otherwise, the architecture-dependent code would lead to the building error
(building phase) or get incorrect execution results (execution phase).

Even if we introduce EM_HOST case, it won't really solve the building
error,  instead it may introduce new building error, e.g.,

```
switch(e_machine) {
case EM_HOST:
...
case EM_X86_64:
...
case EM_I386:
...
default:
return 0;
}
```

Assume the code is built on a x86_64 machine, then EM_HOST equals
EM_X86_64, that would cause the "duplicate case value" building error. 

If we want to limit the architecture-dependent code is built only on the
correct architecture, then we still have to introduce the architecture
#ifdefs. This is actually no difference with current arch directory __weak
functions and make it more complex.


>
> I agree that determining features needs calls that may not be
> supported on other architectures. That should yield EOPNOTSUPP and we
> can use information like that to populate generic information like the
> PMU missing features:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n190
> we also probe API support with:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/perf_api_probe.h?h=perf-tools-next

In general, I agree we can return EOPNOTSUPP or some generic information
for some architecture independent code. But it's not applicable for these 2
specific arch__intr_reg_mask()/arch__user_reg_mask() functions, current
perf code depends on these 2 functions to return the supported registers
mask on a specific (running) platform.


>
> The current code doing lots of string comparisons is unnecessary
> overhead and imprecise (x86 is used for both 32-bit and 64-bit x86).
> It is removed in the series I linked to, I think we can eventually get
> rid of the whole arch string for similar reasons of trying to minimize
> the use of the arch directory. I'm curious what happens with APX, will
> the e_machine change? We may need to pass in the sample regs_dump's
> abi field for cases like this.

Yes, I agree we should git rid of the arch-string comparison and minimize
the use of arch directory. It would improve the efficiency.

I don't think the support of APX would change the e_machine, it should
still be EM_X86_64.

Yes, we need the abi filed (exactly PERF_SAMPLE_REGS_ABI_SIMD) to determine
it's APX or legacy XMM.


>
> My point on the unwinding is that the sample register mask appears to
> be set up the same regardless, whereas for stack samples
> (--call-graph=dwarf) maybe just sample IP and SP suffices. So perhaps
> there should be additional registers to set up the sample mask.

Yes, that's true. It can be further optimized.


>
> By avoiding the arch functions we can avoid the problem of broken
> cross architecture support, we can also lay the groundwork for support
> on different architectures that may want to do similar things. I agree
> that doesn't matter until >1 architecture is trying to have more
> register masks, my concern is trying to keep the code generic and
> trying to make sure cross architecture is working. New weak functions
> is going in the opposite direction to that.

Yes, I agree we should git rid of these arch functions as much as possible.
But for these architecture dependent code (as above shows), it seems the
__weak functions are still the simplest and best way to handle them.

Thanks.

>
> Thanks,
> Ian
>
>> -Dapeng Mi
>>
>>> I also noticed that I think we're sampling the XMM registers for dwarf
>>> unwinding, but it seems unlikely the XMM registers will hold stack
>>> frame information - so this is probably an x86 inefficiency.
>>>
>>> Thanks,
>>> Ian
>>>

next prev parent reply	other threads:[~2026-01-20  3:05 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-03  6:54 [Patch v5 00/19] Support SIMD/eGPRs/SSP registers sampling for perf Dapeng Mi
2025-12-03  6:54 ` [Patch v5 01/19] perf: Eliminate duplicate arch-specific functions definations Dapeng Mi
2025-12-03  6:54 ` [Patch v5 02/19] perf/x86: Use x86_perf_regs in the x86 nmi handler Dapeng Mi
2025-12-03  6:54 ` [Patch v5 03/19] perf/x86: Introduce x86-specific x86_pmu_setup_regs_data() Dapeng Mi
2025-12-03  6:54 ` [Patch v5 04/19] x86/fpu/xstate: Add xsaves_nmi() helper Dapeng Mi
2025-12-03  6:54 ` [Patch v5 05/19] perf: Move and rename has_extended_regs() for ARCH-specific use Dapeng Mi
2025-12-03  6:54 ` [Patch v5 06/19] perf/x86: Add support for XMM registers in non-PEBS and REGS_USER Dapeng Mi
2025-12-04 15:17   ` Peter Zijlstra
2025-12-04 15:47     ` Peter Zijlstra
2025-12-05  6:37       ` Mi, Dapeng
2025-12-04 18:59     ` Dave Hansen
2025-12-05  8:42       ` Peter Zijlstra
2025-12-03  6:54 ` [Patch v5 07/19] perf: Add sampling support for SIMD registers Dapeng Mi
2025-12-05 11:07   ` Peter Zijlstra
2025-12-08  5:24     ` Mi, Dapeng
2025-12-05 11:40   ` Peter Zijlstra
2025-12-08  6:00     ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 08/19] perf/x86: Enable XMM sampling using sample_simd_vec_reg_* fields Dapeng Mi
2025-12-05 11:25   ` Peter Zijlstra
2025-12-08  6:10     ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 09/19] perf/x86: Enable YMM " Dapeng Mi
2025-12-03  6:54 ` [Patch v5 10/19] perf/x86: Enable ZMM " Dapeng Mi
2025-12-03  6:54 ` [Patch v5 11/19] perf/x86: Enable OPMASK sampling using sample_simd_pred_reg_* fields Dapeng Mi
2025-12-03  6:54 ` [Patch v5 12/19] perf/x86: Enable eGPRs sampling using sample_regs_* fields Dapeng Mi
2025-12-05 12:16   ` Peter Zijlstra
2025-12-08  6:11     ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 13/19] perf/x86: Enable SSP " Dapeng Mi
2025-12-05 12:20   ` Peter Zijlstra
2025-12-08  6:21     ` Mi, Dapeng
2025-12-24  5:45   ` Ravi Bangoria
2025-12-24  6:26     ` Mi, Dapeng
2026-01-06  6:55       ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 14/19] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS capability Dapeng Mi
2025-12-03  6:54 ` [Patch v5 15/19] perf/x86/intel: Enable arch-PEBS based SIMD/eGPRs/SSP sampling Dapeng Mi
2025-12-03  6:54 ` [Patch v5 16/19] perf/x86: Activate back-to-back NMI detection for arch-PEBS induced NMIs Dapeng Mi
2025-12-05 12:39   ` Peter Zijlstra
2025-12-07 20:44     ` Andi Kleen
2025-12-08  6:46     ` Mi, Dapeng
2025-12-08  8:50       ` Peter Zijlstra
2025-12-08  8:53         ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 17/19] perf headers: Sync with the kernel headers Dapeng Mi
2025-12-03 23:43   ` Ian Rogers
2025-12-04  1:37     ` Mi, Dapeng
2025-12-04  7:28       ` Ian Rogers
2026-01-20  7:01   ` Ian Rogers
2026-01-20  7:25     ` Mi, Dapeng
2026-01-20  7:16   ` Ian Rogers
2026-01-20  7:43     ` Mi, Dapeng
2026-01-20  8:00       ` Ian Rogers
2026-01-20  9:22         ` Mi, Dapeng
2026-01-20 18:11           ` Ian Rogers
2026-01-21  2:03             ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format Dapeng Mi
2025-12-04  0:17   ` Ian Rogers
2025-12-04  2:58     ` Mi, Dapeng
2025-12-04  7:49       ` Ian Rogers
2025-12-04  9:20         ` Mi, Dapeng
2025-12-04 16:16           ` Ian Rogers
2025-12-05  4:00             ` Mi, Dapeng
2025-12-05  6:38               ` Ian Rogers
2025-12-05  8:10                 ` Mi, Dapeng
2025-12-05 16:35                   ` Ian Rogers
2025-12-08  4:20                     ` Mi, Dapeng
2026-01-06  7:27                       ` Mi, Dapeng
2026-01-17  5:50                         ` Ian Rogers
2026-01-19  6:55                           ` Mi, Dapeng
2026-01-19 20:25                             ` Ian Rogers
2026-01-20  3:04                               ` Mi, Dapeng [this message]
2026-01-20  5:16                                 ` Ian Rogers
2026-01-20  6:46                                   ` Mi, Dapeng
2026-01-20  6:56                                     ` Ian Rogers
2026-01-20  7:39   ` Ian Rogers
2026-01-20  9:04     ` Mi, Dapeng
2026-01-20 18:20       ` Ian Rogers
2026-01-21  5:17         ` Mi, Dapeng
2026-01-21  7:09           ` Ian Rogers
2026-01-21  7:52             ` Mi, Dapeng
2026-01-21 14:48               ` Ian Rogers
2026-01-22  1:49                 ` Mi, Dapeng
2026-01-22  7:27                   ` Ian Rogers
2026-01-22  8:29                     ` Mi, Dapeng
2025-12-03  6:55 ` [Patch v5 19/19] perf regs: Enable dumping of SIMD registers Dapeng Mi
2025-12-04  0:24 ` [Patch v5 00/19] Support SIMD/eGPRs/SSP registers sampling for perf Ian Rogers
2025-12-04  3:28   ` Mi, Dapeng
2025-12-16  4:42 ` Ravi Bangoria
2025-12-16  6:59   ` Mi, Dapeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=478c90df-61a8-4e19-a640-931ce791fe97@linux.intel.com \
    --to=dapeng1.mi@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=broonie@kernel.org \
    --cc=dapeng1.mi@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.falcon@intel.com \
    --cc=xudong.hao@intel.com \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox