linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Eranian Stephane <eranian@google.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Dapeng Mi <dapeng1.mi@intel.com>
Subject: Re: [Patch v3 16/22] perf/core: Support to capture higher width vector registers
Date: Tue, 22 Apr 2025 11:05:11 +0800	[thread overview]
Message-ID: <5906dfa9-3fca-4e29-b6ab-abd2c02ae9fe@linux.intel.com> (raw)
In-Reply-To: <20250416155327.GD17910@noisy.programming.kicks-ass.net>


On 4/16/2025 11:53 PM, Peter Zijlstra wrote:
> On Wed, Apr 16, 2025 at 02:42:12PM +0800, Mi, Dapeng wrote:
>
>> Just think twice, using bitmap to represent these extended registers indeed
>> wastes bits and is hard to extend, there could be much much more vector
>> registers if considering AMX.
> *Groan* so AMX should never have been register state :-(
>
>
>> Considering different arch/HW may support different number vector register,
>> like platform A supports 8 XMM registers and 8 YMM registers, but platform
>> B only supports 16 XMM registers, a better way to represent these vector
>> registers may add two fields, one is a bitmap which represents which kinds
>> of vector registers needs to be captures. The other field could be a u16
>> array which represents the corresponding register length of each kind of
>> vector register. It may look like this.
>>
>> #define    PERF_SAMPLE_EXT_REGS_XMM    BIT(0)
>> #define    PERF_SAMPLE_EXT_REGS_YMM    BIT(1)
>> #define    PERF_SAMPLE_EXT_REGS_ZMM    BIT(2)
>>     __u32    sample_regs_intr_ext;
>>     __u16    sample_regs_intr_ext_len[4];
>>     __u32    sample_regs_user_ext;
>>     __u16    sample_regs_user_ext_len[4];
>>
>>
>> Peter, how do you think this? Thanks.
> I'm not entirely sure I understand.
>
> How about something like:
>
> 	__u16 sample_simd_reg_words;
> 	__u64 sample_simd_reg_intr;
> 	__u64 sample_simd_reg_user;
>
> Then the simd_reg_words tell us how many (quad) words per register (8 for
> 512) and simd_reg_{intr,user} are a simple bitmap, one bit per actual
> simd reg.
>
> So then all of XMM would be:
>
>   words = 2;
>   intr = user = 0xFFFF;
>
> (16 regs, 128 wide)
>
> Whereas ZMM would be:
>
>   words = 8
>   intr = user = 0xFFFFFFFF;
>
> (32 regs, 512 wide)
>
>
> Would this be sufficient? Possibly we can split the words thing into two
> __u8, but does it make sense to ask for different vector width for
> intr and user ?

Hi Peter,

Discussed with Kan offline, it sounds to be a real requirement for user to
sample multiple different kinds of SIMD registers, such as user may hope to
sample OPMASK and ZMM registers simultaneously. So to meet the requirement
and make the interface more flexible, we enhance the interface to this.

    /* Bitmap to represent SIMD regs. */
    __u64 sample_simd_reg_intr;
    __u64 sample_simd_reg_user;
    /*
     * Represent each kind of SIMD reg size (how many u64 words are needed)
     * in above bitmap order, e.g., x86 YMM regs are 256 bits and occupy 4
u64 words.
     */
    __u8 sample_simd_reg_size[4];

sample_simd_reg_intr/sample_simd_reg_user represents SIMD regs bitmap, e.g.
on x86 platform, bit[7:0] represents OPMASK[7:0], bit[23:8] represents
YMM[15:0], bit[55:24] represents ZMM[31:0].

sample_simd_reg_size[] represents how many u64 words are needed in above
bitmap order for each kind of SIMD regs, e.g., sample_simd_reg_size[0] = 1,
which represents each OPMASK occupies 1 u64 word, sample_simd_reg_size[1] =
4, which represents YMM occupies 4 u64 words and ample_simd_reg_size[2] =
8, which represents each ZMM occupies 8 u64 words.

How do you think this interface? Thanks.

Dapeng Mi


  parent reply	other threads:[~2025-04-22  3:05 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-15 11:44 [Patch v3 00/22] Arch-PEBS and PMU supports for Clearwater Forest and Panther Lake Dapeng Mi
2025-04-15 11:44 ` [Patch v3 01/22] perf/x86/intel: Add Panther Lake support Dapeng Mi
2025-04-15 11:44 ` [Patch v3 02/22] perf/x86/intel: Add PMU support for Clearwater Forest Dapeng Mi
2025-04-15 11:44 ` [Patch v3 03/22] perf/x86/intel: Parse CPUID archPerfmonExt leaves for non-hybrid CPUs Dapeng Mi
2025-04-15 11:44 ` [Patch v3 04/22] perf/x86/intel: Decouple BTS initialization from PEBS initialization Dapeng Mi
2025-04-15 11:44 ` [Patch v3 05/22] perf/x86/intel: Rename x86_pmu.pebs to x86_pmu.ds_pebs Dapeng Mi
2025-04-15 11:44 ` [Patch v3 06/22] perf/x86/intel: Introduce pairs of PEBS static calls Dapeng Mi
2025-04-15 11:44 ` [Patch v3 07/22] perf/x86/intel: Initialize architectural PEBS Dapeng Mi
2025-04-15 11:44 ` [Patch v3 08/22] perf/x86/intel/ds: Factor out PEBS record processing code to functions Dapeng Mi
2025-04-15 11:44 ` [Patch v3 09/22] perf/x86/intel/ds: Factor out PEBS group " Dapeng Mi
2025-04-15 11:44 ` [Patch v3 10/22] perf/x86/intel: Process arch-PEBS records or record fragments Dapeng Mi
2025-04-15 13:57   ` Peter Zijlstra
2025-04-15 16:09     ` Liang, Kan
2025-04-15 11:44 ` [Patch v3 11/22] perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR Dapeng Mi
2025-04-15 13:45   ` Peter Zijlstra
2025-04-16  0:59     ` Mi, Dapeng
2025-04-15 13:48   ` Peter Zijlstra
2025-04-16  1:03     ` Mi, Dapeng
2025-04-15 11:44 ` [Patch v3 12/22] perf/x86/intel: Update dyn_constranit base on PEBS event precise level Dapeng Mi
2025-04-15 13:53   ` Peter Zijlstra
2025-04-15 16:31     ` Liang, Kan
2025-04-16  1:46       ` Mi, Dapeng
2025-04-16 13:59         ` Liang, Kan
2025-04-17  1:15           ` Mi, Dapeng
2025-04-16 15:32       ` Peter Zijlstra
2025-04-16 19:45         ` Liang, Kan
2025-04-16 19:56           ` Peter Zijlstra
2025-04-22 22:50             ` Liang, Kan
2025-04-15 11:44 ` [Patch v3 13/22] perf/x86/intel: Setup PEBS data configuration and enable legacy groups Dapeng Mi
2025-04-15 11:44 ` [Patch v3 14/22] perf/x86/intel: Add counter group support for arch-PEBS Dapeng Mi
2025-04-15 11:44 ` [Patch v3 15/22] perf/x86/intel: Support SSP register capturing " Dapeng Mi
2025-04-15 14:07   ` Peter Zijlstra
2025-04-16  5:49     ` Mi, Dapeng
2025-04-15 11:44 ` [Patch v3 16/22] perf/core: Support to capture higher width vector registers Dapeng Mi
2025-04-15 14:36   ` Peter Zijlstra
2025-04-16  6:42     ` Mi, Dapeng
2025-04-16 15:53       ` Peter Zijlstra
2025-04-17  2:00         ` Mi, Dapeng
2025-04-22  3:05         ` Mi, Dapeng [this message]
2025-04-15 11:44 ` [Patch v3 17/22] perf/x86/intel: Support arch-PEBS vector registers group capturing Dapeng Mi
2025-04-15 11:44 ` [Patch v3 18/22] perf tools: Support to show SSP register Dapeng Mi
2025-04-15 11:44 ` [Patch v3 19/22] perf tools: Enhance arch__intr/user_reg_mask() helpers Dapeng Mi
2025-04-15 11:44 ` [Patch v3 20/22] perf tools: Enhance sample_regs_user/intr to capture more registers Dapeng Mi
2025-04-15 11:44 ` [Patch v3 21/22] perf tools: Support to capture more vector registers (x86/Intel) Dapeng Mi
2025-04-15 11:44 ` [Patch v3 22/22] perf tools/tests: Add vector registers PEBS sampling test Dapeng Mi
2025-04-15 15:21 ` [Patch v3 00/22] Arch-PEBS and PMU supports for Clearwater Forest and Panther Lake Liang, Kan
2025-04-16  7:42   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5906dfa9-3fca-4e29-b6ab-abd2c02ae9fe@linux.intel.com \
    --to=dapeng1.mi@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dapeng1.mi@intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).