Linux Perf Users
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Dapeng Mi" <dapeng1.mi@linux.intel.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [Patch v8 10/23] perf/x86: Enable XMM Register Sampling for Non-PEBS Events
Date: Fri, 29 May 2026 09:02:55 +0000	[thread overview]
Message-ID: <20260529090255.E5E801F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260529075645.580362-11-dapeng1.mi@linux.intel.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] reserve_ext_regs_buffers() fails to propagate memory allocation errors, leading to a predictable WARN_ON_ONCE in NMI context.
- [High] PEBS events on older platforms (e.g., Skylake) improperly accept `PERF_REG_EXTENDED_MASK`, bypassing capability checks.

Pre-existing issues:
- [High] Software PMU events (or hardware events requesting `PERF_SAMPLE_REGS_USER`) can request XMM registers, causing `perf_reg_value()` to perform an out-of-bounds read on the kernel stack.
--

commit cee33ccbb52239e18206f79aef67c4e7ac96ca59
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>

perf/x86: Enable XMM Register Sampling for Non-PEBS Events

This commit extends XMM register sampling to non-PEBS events by utilizing the
xsaves instruction. It introduces a per-CPU 64-byte aligned buffer to store
SIMD and other registers, allocating it during hardware reservation. The XMM
data is then retrieved during the REGS_INTR case.

> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 17c8f44ee43b3..c219a563434d4 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c

[ ... ]

> @@ -410,6 +410,56 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)

[ ... ]

> +static void reserve_ext_regs_buffers(void)
> +{
> +	bool compacted = cpu_feature_enabled(X86_FEATURE_XCOMPACTED);
> +	unsigned int size;
> +	int cpu;
> +
> +	if (!x86_pmu.ext_regs_mask)
> +		return;
> +
> +	/* +64 bytes for the 64 bytes alignment request of xsave area. */
> +	size = xstate_calculate_size(x86_pmu.ext_regs_mask, compacted) + 64;
> +
> +	for_each_possible_cpu(cpu) {
> +		per_cpu(ext_regs_buf, cpu) = kzalloc_node(size, GFP_KERNEL,
> +							  cpu_to_node(cpu));
> +		if (WARN_ON_ONCE(!per_cpu(ext_regs_buf, cpu)))
> +			goto err;
> +	}
> +
> +	return;
> +
> +err:
> +	release_ext_regs_buffers();
> +}

[Severity: High]
Does reserve_ext_regs_buffers() need to propagate the allocation failure back
to the caller?

When kzalloc_node() fails, this function frees any previously allocated buffers
and returns void. The caller, x86_reserve_hardware(), then assumes success
and allows the perf_event_open syscall to succeed. Later, when an event
requiring extended registers triggers a PMI, x86_pmu_sample_xregs() retrieves
a NULL buffer and triggers a WARN_ON_ONCE in NMI context.

Could this allow a local user to deliberately exhaust memory and repeatedly
open a perf event to trigger the warning? If the system has panic_on_warn
enabled, this could escalate to a kernel panic.

[ ... ]

> @@ -655,18 +707,26 @@ int x86_pmu_hw_config(struct perf_event *event)

[ ... ]

> +	if (event->attr.sample_type & PERF_SAMPLE_REGS_INTR) {
> +		/*
> +		 * Besides the general purpose registers, XMM registers may
> +		 * be collected as well.
> +		 */
> +		if (event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK) {
> +			if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS))
> +				return -EINVAL;
> +			if (is_sampling_event(event) && !event->attr.precise_ip &&
> +			    !this_cpu_has(X86_FEATURE_XSAVES))
> +				return -EINVAL;
> +		}
> +	}
>  
> -		if (!event->attr.precise_ip)
> +	if (event->attr.sample_type & PERF_SAMPLE_REGS_USER) {
> +		/*
> +		 * Currently XMM registers sampling for REGS_USER is not
> +		 * supported yet.
> +		 */
> +		if (event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK)
>  			return -EINVAL;
>  	}

[Severity: High]
This isn't a bug introduced by this patch, but can software PMU events request
XMM registers and cause an out-of-bounds read on the kernel stack?

While x86_pmu_hw_config() correctly blocks PERF_REG_EXTENDED_MASK for
unsupported scenarios here, software events bypass x86_pmu_hw_config()
completely. The function perf_reg_validate() checks requested registers against
PERF_REG_X86_RESERVED, which fails to exclude bits 32-63 (XMM registers).

If a software event requests XMM registers via PERF_SAMPLE_REGS_USER, and the
event overflows, a standard pt_regs pointing to the top of the kernel stack
is passed to perf_reg_value().

When it uses container_of(regs, struct x86_perf_regs, regs) and reads
perf_regs->xmm_regs, does it read past the end of the pt_regs on the stack,
potentially leaking kernel memory or triggering a page fault?

I note this is addressed later in the patch series by commit "perf/x86: Enable
XMM register sampling for REGS_USER case", but wanted to mention it here.

[ ... ]

> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 92cb9a716e83d..f5d458e3ba3f6 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c

[ ... ]

> @@ -6176,8 +6179,37 @@ static inline void __intel_update_large_pebs_flags(struct pmu *pmu)

[ ... ]

> +static void intel_extended_regs_init(struct pmu *pmu)
> +{
> +	struct pmu *dest_pmu = pmu ? pmu : x86_get_pmu(smp_processor_id());
> +
> +	/*
> +	 * Extend the vector registers support to non-PEBS.
> +	 * The feature is limited to newer Intel machines with
> +	 * PEBS V4+ or archPerfmonExt (0x23) enabled for now.
> +	 * In theory, the vector registers can be retrieved as
> +	 * long as the CPU supports. The support for the old
> +	 * generations may be added later if there is a
> +	 * requirement.
> +	 * Only support the extension when XSAVES is available.
> +	 */
> +	if (!boot_cpu_has(X86_FEATURE_XSAVES))
> +		return;
>  
> +	if (!boot_cpu_has(X86_FEATURE_XMM) ||
> +	    !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
> +		return;
> +
> +	/*
> +	 * On current hybrid platforms, P-cores and E-cores expose the same
> +	 * XSAVE feature set. Therefore, using the global x86_pmu.ext_regs_mask
> +	 * is sufficient to represent the hardware-supported XSAVE features.
> +	 */
> +	x86_pmu.ext_regs_mask |= XFEATURE_MASK_SSE;
> +	dest_pmu->capabilities |= PERF_PMU_CAP_EXTENDED_REGS;
> +}

[Severity: High]
Will this allow PEBS events on older platforms like Skylake to bypass capability
checks?

Before this change, PERF_PMU_CAP_EXTENDED_REGS was only set for CPUs with PEBS
format >= 4 (Icelake+). By introducing intel_extended_regs_init(), the
capability is now set globally for any CPU supporting XSAVES, including
Skylake.

Looking back at x86_pmu_hw_config(), if an event is a PEBS event
(precise_ip > 0), it passes the capability check because the PMU now advertises
it.

Since Skylake's PEBS v3 hardware cannot sample XMM registers, does this result
in a silent ABI regression where the kernel returns zeroes in the sample
payload instead of returning -EINVAL at event creation? Should
x86_pmu_hw_config() enforce that PEBS XMM requests specifically require PEBS
v4+ support?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529075645.580362-1-dapeng1.mi@linux.intel.com?part=10

  reply	other threads:[~2026-05-29  9:02 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-29  7:56 [Patch v8 00/23] Support SIMD/eGPRs/SSP registers sampling for perf Dapeng Mi
2026-05-29  7:56 ` [Patch v8 01/23] perf/x86/intel: Validate return value of intel_pmu_init_hybrid() Dapeng Mi
2026-05-29  8:53   ` sashiko-bot
2026-05-29 11:11   ` Peter Zijlstra
2026-05-29  7:56 ` [Patch v8 02/23] perf/x86: Move hybrid PMU initialization before x86_pmu_starting_cpu() Dapeng Mi
2026-05-29  8:51   ` sashiko-bot
2026-05-29  7:56 ` [Patch v8 03/23] perf/x86/intel: Enable large PEBS sampling for XMMs Dapeng Mi
2026-05-29  7:56 ` [Patch v8 04/23] perf/x86/intel: Convert x86_perf_regs to per-cpu variables Dapeng Mi
2026-05-29  7:56 ` [Patch v8 05/23] perf: Eliminate duplicate arch-specific functions definations Dapeng Mi
2026-05-29  7:56 ` [Patch v8 06/23] perf/x86: Use x86_perf_regs in the x86 nmi handlers Dapeng Mi
2026-05-29  7:56 ` [Patch v8 07/23] x86/fpu/xstate: Add xsaves_nmi() helper Dapeng Mi
2026-05-29  8:56   ` sashiko-bot
2026-05-29 11:32   ` Peter Zijlstra
2026-05-29  7:56 ` [Patch v8 08/23] x86/fpu: Ensure TIF_NEED_FPU_LOAD is set after saving FPU state Dapeng Mi
2026-05-29  7:56 ` [Patch v8 09/23] perf: Move and enhance has_extended_regs() for arch-specific use Dapeng Mi
2026-05-29  7:56 ` [Patch v8 10/23] perf/x86: Enable XMM Register Sampling for Non-PEBS Events Dapeng Mi
2026-05-29  9:02   ` sashiko-bot [this message]
2026-05-29 11:38   ` Peter Zijlstra
2026-05-29  7:56 ` [Patch v8 11/23] perf/x86: Enable XMM register sampling for REGS_USER case Dapeng Mi
2026-05-29  9:24   ` sashiko-bot
2026-05-29 11:42   ` Peter Zijlstra
2026-05-29  7:56 ` [Patch v8 12/23] perf: Add sampling support for SIMD registers Dapeng Mi
2026-05-29  8:36   ` sashiko-bot
2026-05-29  7:56 ` [Patch v8 13/23] perf/x86: Support XMM sampling using sample_simd_vec_reg_* fields Dapeng Mi
2026-05-29  8:49   ` sashiko-bot
2026-05-29  7:56 ` [Patch v8 14/23] perf/x86: Support YMM " Dapeng Mi
2026-05-29  8:47   ` sashiko-bot
2026-05-29  7:56 ` [Patch v8 15/23] perf/x86: Support ZMM " Dapeng Mi
2026-05-29  7:56 ` [Patch v8 16/23] perf/x86: Support OPMASK sampling using sample_simd_pred_reg_* fields Dapeng Mi
2026-05-29  9:21   ` sashiko-bot
2026-05-29  7:56 ` [Patch v8 17/23] perf: Enhance perf_reg_validate() with simd_enabled argument Dapeng Mi
2026-05-29  7:56 ` [Patch v8 18/23] perf/x86: Support eGPRs sampling using sample_regs_* fields Dapeng Mi
2026-05-29  9:31   ` sashiko-bot
2026-05-29  7:56 ` [Patch v8 19/23] perf/x86: Support SSP " Dapeng Mi
2026-05-29 10:03   ` sashiko-bot
2026-05-29  7:56 ` [Patch v8 20/23] perf/x86/intel: Support arch-PEBS based SIMD/eGPRs/SSP sampling Dapeng Mi
2026-05-29  9:45   ` sashiko-bot
2026-05-29  7:56 ` [Patch v8 21/23] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS capability Dapeng Mi
2026-05-29 10:43   ` sashiko-bot
2026-05-29  7:56 ` [Patch v8 22/23] perf/x86: Activate back-to-back NMI detection for arch-PEBS induced NMIs Dapeng Mi
2026-05-29  9:34   ` sashiko-bot
2026-05-29  7:56 ` [Patch v8 23/23] perf/x86/intel: Add sanity check for PEBS fragment size Dapeng Mi
2026-05-29  9:54   ` sashiko-bot
2026-05-29  8:32 ` [Patch v8 00/23] Support SIMD/eGPRs/SSP registers sampling for perf Mi, Dapeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260529090255.E5E801F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox