All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Eranian Stephane <eranian@google.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Dapeng Mi <dapeng1.mi@intel.com>
Subject: Re: [Patch v2 18/24] perf/x86/intel: Support arch-PEBS vector registers group capturing
Date: Wed, 26 Feb 2025 16:08:40 +0800	[thread overview]
Message-ID: <bda04ccd-fa90-4f14-89cc-9835de36bcfb@linux.intel.com> (raw)
In-Reply-To: <20250225153257.GQ11590@noisy.programming.kicks-ass.net>


On 2/25/2025 11:32 PM, Peter Zijlstra wrote:
> On Tue, Feb 18, 2025 at 03:28:12PM +0000, Dapeng Mi wrote:
>> Add x86/intel specific vector register (VECR) group capturing for
>> arch-PEBS. Enable corresponding VECR group bits in
>> GPx_CFG_C/FX0_CFG_C MSRs if users configures these vector registers
>> bitmap in perf_event_attr and parse VECR group in arch-PEBS record.
>>
>> Currently vector registers capturing is only supported by PEBS based
>> sampling, PMU driver would return error if PMI based sampling tries to
>> capture these vector registers.
>
>> @@ -676,6 +709,32 @@ int x86_pmu_hw_config(struct perf_event *event)
>>  			return -EINVAL;
>>  	}
>>  
>> +	/*
>> +	 * Architectural PEBS supports to capture more vector registers besides
>> +	 * XMM registers, like YMM, OPMASK and ZMM registers.
>> +	 */
>> +	if (unlikely(has_more_extended_regs(event))) {
>> +		u64 caps = hybrid(event->pmu, arch_pebs_cap).caps;
>> +
>> +		if (!(event->pmu->capabilities & PERF_PMU_CAP_MORE_EXT_REGS))
>> +			return -EINVAL;
>> +
>> +		if (has_opmask_regs(event) && !(caps & ARCH_PEBS_VECR_OPMASK))
>> +			return -EINVAL;
>> +
>> +		if (has_ymmh_regs(event) && !(caps & ARCH_PEBS_VECR_YMM))
>> +			return -EINVAL;
>> +
>> +		if (has_zmmh_regs(event) && !(caps & ARCH_PEBS_VECR_ZMMH))
>> +			return -EINVAL;
>> +
>> +		if (has_h16zmm_regs(event) && !(caps & ARCH_PEBS_VECR_H16ZMM))
>> +			return -EINVAL;
>> +
>> +		if (!event->attr.precise_ip)
>> +			return -EINVAL;
>> +	}
>> +
>>  	return x86_setup_perfctr(event);
>>  }
>>  
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index f21d9f283445..8ef5b9a05fcc 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -2963,6 +2963,18 @@ static void intel_pmu_enable_event_ext(struct perf_event *event)
>>  			if (pebs_data_cfg & PEBS_DATACFG_XMMS)
>>  				ext |= ARCH_PEBS_VECR_XMM & cap.caps;
>>  
>> +			if (pebs_data_cfg & PEBS_DATACFG_YMMS)
>> +				ext |= ARCH_PEBS_VECR_YMM & cap.caps;
>> +
>> +			if (pebs_data_cfg & PEBS_DATACFG_OPMASKS)
>> +				ext |= ARCH_PEBS_VECR_OPMASK & cap.caps;
>> +
>> +			if (pebs_data_cfg & PEBS_DATACFG_ZMMHS)
>> +				ext |= ARCH_PEBS_VECR_ZMMH & cap.caps;
>> +
>> +			if (pebs_data_cfg & PEBS_DATACFG_H16ZMMS)
>> +				ext |= ARCH_PEBS_VECR_H16ZMM & cap.caps;
>> +
>>  			if (pebs_data_cfg & PEBS_DATACFG_LBRS)
>>  				ext |= ARCH_PEBS_LBR & cap.caps;
>>  
>> @@ -5115,6 +5127,9 @@ static inline void __intel_update_pmu_caps(struct pmu *pmu)
>>  
>>  	if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_VECR_XMM)
>>  		dest_pmu->capabilities |= PERF_PMU_CAP_EXTENDED_REGS;
>> +
>> +	if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_VECR_EXT)
>> +		dest_pmu->capabilities |= PERF_PMU_CAP_MORE_EXT_REGS;
>>  }
> There is no technical reason for it to error out, right? We can use
> FPU/XSAVE interface to read the CPU state just fine.

I think it's not because of technical reason. Let me confirm if we can add
it for non-PEBS sampling.


>
>
>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>> index 4b01beee15f4..7e5a4202de37 100644
>> --- a/arch/x86/events/intel/ds.c
>> +++ b/arch/x86/events/intel/ds.c
>> @@ -1437,9 +1438,37 @@ static u64 pebs_update_adaptive_cfg(struct perf_event *event)
>>  	if (gprs || (attr->precise_ip < 2) || tsx_weight)
>>  		pebs_data_cfg |= PEBS_DATACFG_GP;
>>  
>> -	if ((sample_type & PERF_SAMPLE_REGS_INTR) &&
>> -	    (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK))
>> -		pebs_data_cfg |= PEBS_DATACFG_XMMS;
>> +	if (sample_type & PERF_SAMPLE_REGS_INTR) {
>> +		if (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK)
>> +			pebs_data_cfg |= PEBS_DATACFG_XMMS;
>> +
>> +		for_each_set_bit_from(bit,
>> +			(unsigned long *)event->attr.sample_regs_intr_ext,
>> +			PERF_NUM_EXT_REGS) {
> This is indented wrong; please use cino=(0:0
> if you worry about indentation depth, break out in helper function.

Sure. would modify it.


>
>> +			switch (bit + PERF_REG_EXTENDED_OFFSET) {
>> +			case PERF_REG_X86_OPMASK0 ... PERF_REG_X86_OPMASK7:
>> +				pebs_data_cfg |= PEBS_DATACFG_OPMASKS;
>> +				bit = PERF_REG_X86_YMMH0 -
>> +				      PERF_REG_EXTENDED_OFFSET - 1;
>> +				break;
>> +			case PERF_REG_X86_YMMH0 ... PERF_REG_X86_ZMMH0 - 1:
>> +				pebs_data_cfg |= PEBS_DATACFG_YMMS;
>> +				bit = PERF_REG_X86_ZMMH0 -
>> +				      PERF_REG_EXTENDED_OFFSET - 1;
>> +				break;
>> +			case PERF_REG_X86_ZMMH0 ... PERF_REG_X86_ZMM16 - 1:
>> +				pebs_data_cfg |= PEBS_DATACFG_ZMMHS;
>> +				bit = PERF_REG_X86_ZMM16 -
>> +				      PERF_REG_EXTENDED_OFFSET - 1;
>> +				break;
>> +			case PERF_REG_X86_ZMM16 ... PERF_REG_X86_ZMM_MAX - 1:
>> +				pebs_data_cfg |= PEBS_DATACFG_H16ZMMS;
>> +				bit = PERF_REG_X86_ZMM_MAX -
>> +				      PERF_REG_EXTENDED_OFFSET - 1;
>> +				break;
>> +			}
>> +		}
>> +	}
>>  
>>  	if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
>>  		/*

  reply	other threads:[~2025-02-26  8:08 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-18 15:27 [Patch v2 00/24] Arch-PEBS and PMU supports for Clearwater Forest and Panther Lake Dapeng Mi
2025-02-18 15:27 ` [Patch v2 01/24] perf/x86: Add dynamic constraint Dapeng Mi
2025-02-18 15:27 ` [Patch v2 02/24] perf/x86/intel: Add Panther Lake support Dapeng Mi
2025-02-18 15:27 ` [Patch v2 03/24] perf/x86/intel: Add PMU support for Clearwater Forest Dapeng Mi
2025-02-18 15:27 ` [Patch v2 04/24] perf/x86/intel: Parse CPUID archPerfmonExt leaves for non-hybrid CPUs Dapeng Mi
2025-02-18 15:27 ` [Patch v2 05/24] perf/x86/intel: Decouple BTS initialization from PEBS initialization Dapeng Mi
2025-02-18 15:28 ` [Patch v2 06/24] perf/x86/intel: Rename x86_pmu.pebs to x86_pmu.ds_pebs Dapeng Mi
2025-02-18 15:28 ` [Patch v2 07/24] perf/x86/intel: Introduce pairs of PEBS static calls Dapeng Mi
2025-02-18 15:28 ` [Patch v2 08/24] perf/x86/intel: Initialize architectural PEBS Dapeng Mi
2025-02-18 15:28 ` [Patch v2 09/24] perf/x86/intel/ds: Factor out common PEBS processing code to functions Dapeng Mi
2025-02-18 15:28 ` [Patch v2 10/24] perf/x86/intel: Process arch-PEBS records or record fragments Dapeng Mi
2025-02-25 10:39   ` Peter Zijlstra
2025-02-25 11:00     ` Peter Zijlstra
2025-02-26  5:20       ` Mi, Dapeng
2025-02-26  9:35         ` Peter Zijlstra
2025-02-26 15:45           ` Liang, Kan
2025-02-27  2:04             ` Mi, Dapeng
2025-02-25 20:42     ` Andi Kleen
2025-02-26  2:54     ` Mi, Dapeng
2025-02-18 15:28 ` [Patch v2 11/24] perf/x86/intel: Factor out common functions to process PEBS groups Dapeng Mi
2025-02-25 11:02   ` Peter Zijlstra
2025-02-26  5:24     ` Mi, Dapeng
2025-02-18 15:28 ` [Patch v2 12/24] perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR Dapeng Mi
2025-02-25 11:18   ` Peter Zijlstra
2025-02-26  5:48     ` Mi, Dapeng
2025-02-26  9:46       ` Peter Zijlstra
2025-02-27  2:05         ` Mi, Dapeng
2025-02-25 11:25   ` Peter Zijlstra
2025-02-26  6:19     ` Mi, Dapeng
2025-02-26  9:48       ` Peter Zijlstra
2025-02-27  2:09         ` Mi, Dapeng
2025-02-18 15:28 ` [Patch v2 13/24] perf/x86/intel: Update dyn_constranit base on PEBS event precise level Dapeng Mi
2025-02-27 14:06   ` Liang, Kan
2025-03-05  1:41     ` Mi, Dapeng
2025-02-18 15:28 ` [Patch v2 14/24] perf/x86/intel: Setup PEBS data configuration and enable legacy groups Dapeng Mi
2025-02-18 15:28 ` [Patch v2 15/24] perf/x86/intel: Add SSP register support for arch-PEBS Dapeng Mi
2025-02-25 11:52   ` Peter Zijlstra
2025-02-26  6:56     ` Mi, Dapeng
2025-02-25 11:54   ` Peter Zijlstra
2025-02-25 20:44     ` Andi Kleen
2025-02-27  6:29       ` Mi, Dapeng
2025-02-18 15:28 ` [Patch v2 16/24] perf/x86/intel: Add counter group " Dapeng Mi
2025-02-18 15:28 ` [Patch v2 17/24] perf/core: Support to capture higher width vector registers Dapeng Mi
2025-02-25 20:32   ` Peter Zijlstra
2025-02-26  7:55     ` Mi, Dapeng
2025-02-18 15:28 ` [Patch v2 18/24] perf/x86/intel: Support arch-PEBS vector registers group capturing Dapeng Mi
2025-02-25 15:32   ` Peter Zijlstra
2025-02-26  8:08     ` Mi, Dapeng [this message]
2025-02-27  6:40       ` Mi, Dapeng
2025-03-04  3:08         ` Mi, Dapeng
2025-03-04 16:26           ` Liang, Kan
2025-03-05  1:34             ` Mi, Dapeng
2025-02-18 15:28 ` [Patch v2 19/24] perf tools: Support to show SSP register Dapeng Mi
2025-02-18 15:28 ` [Patch v2 20/24] perf tools: Enhance arch__intr/user_reg_mask() helpers Dapeng Mi
2025-02-18 15:28 ` [Patch v2 21/24] perf tools: Enhance sample_regs_user/intr to capture more registers Dapeng Mi
2025-02-18 15:28 ` [Patch v2 22/24] perf tools: Support to capture more vector registers (x86/Intel) Dapeng Mi
2025-02-18 15:28 ` [Patch v2 23/24] perf tools/tests: Add vector registers PEBS sampling test Dapeng Mi
2025-02-18 15:28 ` [Patch v2 24/24] perf tools: Fix incorrect --user-regs comments Dapeng Mi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bda04ccd-fa90-4f14-89cc-9835de36bcfb@linux.intel.com \
    --to=dapeng1.mi@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dapeng1.mi@intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.