All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Eranian Stephane <eranian@google.com>,
	Mark Rutland <mark.rutland@arm.com>,
	broonie@kernel.org, Ravi Bangoria <ravi.bangoria@amd.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Zide Chen <zide.chen@intel.com>,
	Falcon Thomas <thomas.falcon@intel.com>,
	Dapeng Mi <dapeng1.mi@intel.com>,
	Xudong Hao <xudong.hao@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>
Subject: Re: [Patch v5 06/19] perf/x86: Add support for XMM registers in non-PEBS and REGS_USER
Date: Fri, 5 Dec 2025 14:37:24 +0800	[thread overview]
Message-ID: <becd48cc-8dce-463f-bb00-fd97a21bba7d@linux.intel.com> (raw)
In-Reply-To: <20251204154721.GB2619703@noisy.programming.kicks-ass.net>


On 12/4/2025 11:47 PM, Peter Zijlstra wrote:
> On Thu, Dec 04, 2025 at 04:17:35PM +0100, Peter Zijlstra wrote:
>> On Wed, Dec 03, 2025 at 02:54:47PM +0800, Dapeng Mi wrote:
>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>
>>> While collecting XMM registers in a PEBS record has been supported since
>>> Icelake, non-PEBS events have lacked this feature. By leveraging the
>>> xsaves instruction, it is now possible to snapshot XMM registers for
>>> non-PEBS events, completing the feature set.
>>>
>>> To utilize the xsaves instruction, a 64-byte aligned buffer is required.
>>> A per-CPU ext_regs_buf is added to store SIMD and other registers, with
>>> the buffer size being approximately 2K. The buffer is allocated using
>>> kzalloc_node(), ensuring natural alignment and 64-byte alignment for all
>>> kmalloc() allocations with powers of 2.
>>>
>>> The XMM sampling support is extended for both REGS_USER and REGS_INTR.
>>> For REGS_USER, perf_get_regs_user() returns the registers from
>>> task_pt_regs(current), which is a pt_regs structure. It needs to be
>>> copied to user space secific x86_user_regs structure since kernel may
>>> modify pt_regs structure later.
>>>
>>> For PEBS, XMM registers are retrieved from PEBS records.
>>>
>>> In cases where userspace tasks are trapped within kernel mode (e.g.,
>>> during a syscall) when an NMI arrives, pt_regs information can still be
>>> retrieved from task_pt_regs(). However, capturing SIMD and other
>>> xsave-based registers in this scenario is challenging. Therefore,
>>> snapshots for these registers are omitted in such cases.
>>>
>>> The reasons are:
>>> - Profiling a userspace task that requires SIMD/eGPR registers typically
>>>   involves NMIs hitting userspace, not kernel mode.
>>> - Although it is possible to retrieve values when the TIF_NEED_FPU_LOAD
>>>   flag is set, the complexity introduced to handle this uncommon case in
>>>   the critical path is not justified.
>>> - Additionally, checking the TIF_NEED_FPU_LOAD flag alone is insufficient.
>>>   Some corner cases, such as an NMI occurring just after the flag switches
>>>   but still in kernel mode, cannot be handled.
>> Urgh.. Dave, Thomas, is there any reason we could not set
>> TIF_NEED_FPU_LOAD *after* doing the XSAVE (clearing is already done
>> after restore).
>>
>> That way, when an NMI sees TIF_NEED_FPU_LOAD it knows the task copy is
>> consistent.
>>
>> I'm not at all sure this is complex, it just needs a little care.
>>
>> And then there is the deferred thing, just like unwind, we can defer
>> REGS_USER/STACK_USER much the same, except someone went and built all
>> that deferred stuff with unwind all tangled into it :/
> With something like the below, the NMI could do something like:
>
> 	struct xregs_state *xr = NULL;
>
> 	/*
> 	 * fpu code does:
> 	 *  XSAVE
> 	 *  set_thread_flag(TIF_NEED_FPU_LOAD)
> 	 *  ...
> 	 *  XRSTOR
> 	 *  clear_thread_flag(TIF_NEED_FPU_LOAD)
> 	 * therefore, when TIF_NEED_FPU_LOAD, the task fpu state holds a
> 	 * whole copy.
> 	 */
> 	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
> 		struct fpu *fpu = x86_task_fpu(current);
> 		/*
> 		 * If __task_fpstate is set, it holds the right pointer,
> 		 * otherwise fpstate will.
> 		 */
> 		struct fpstate *fps = READ_ONCE(fpu->__task_fpstate);
> 		if (!fps)
> 			fps = fpu->fpstate;
> 		xr = &fps->regs.xregs_state;
> 	} else {
> 		/* like fpu_sync_fpstate(), except NMI local */
> 		xsave_nmi(xr, mask);
> 	}
>
> 	// frob xr into perf data
>
> Or did I miss something? I've not looked at this very long and the above
> was very vague on the actual issues.
>
>
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index da233f20ae6f..0f91a0d7e799 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -359,18 +359,22 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
>  	struct fpstate *cur_fps = fpu->fpstate;
>  
>  	fpregs_lock();
> -	if (!cur_fps->is_confidential && !test_thread_flag(TIF_NEED_FPU_LOAD))
> +	if (!cur_fps->is_confidential && !test_thread_flag(TIF_NEED_FPU_LOAD)) {
>  		save_fpregs_to_fpstate(fpu);
> +		set_thread_flag(TIF_NEED_FPU_LOAD);
> +	}
>  
>  	/* Swap fpstate */
>  	if (enter_guest) {
> -		fpu->__task_fpstate = cur_fps;
> +		WRITE_ONCE(fpu->__task_fpstate, cur_fps);
> +		barrier();
>  		fpu->fpstate = guest_fps;
>  		guest_fps->in_use = true;
>  	} else {
>  		guest_fps->in_use = false;
>  		fpu->fpstate = fpu->__task_fpstate;
> -		fpu->__task_fpstate = NULL;
> +		barrier();
> +		WRITE_ONCE(fpu->__task_fpstate, NULL);
>  	}
>  
>  	cur_fps = fpu->fpstate;
> @@ -456,8 +460,8 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
>  
>  	if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
>  	    !test_thread_flag(TIF_NEED_FPU_LOAD)) {
> -		set_thread_flag(TIF_NEED_FPU_LOAD);
>  		save_fpregs_to_fpstate(x86_task_fpu(current));
> +		set_thread_flag(TIF_NEED_FPU_LOAD);
>  	}
>  	__cpu_invalidate_fpregs_state();
>  

Ok, I would involve these changes into next version and support the SIMD
registers sampling for user space registers sampling.



  reply	other threads:[~2025-12-05  6:37 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-03  6:54 [Patch v5 00/19] Support SIMD/eGPRs/SSP registers sampling for perf Dapeng Mi
2025-12-03  6:54 ` [Patch v5 01/19] perf: Eliminate duplicate arch-specific functions definations Dapeng Mi
2025-12-03  6:54 ` [Patch v5 02/19] perf/x86: Use x86_perf_regs in the x86 nmi handler Dapeng Mi
2025-12-03  6:54 ` [Patch v5 03/19] perf/x86: Introduce x86-specific x86_pmu_setup_regs_data() Dapeng Mi
2025-12-03  6:54 ` [Patch v5 04/19] x86/fpu/xstate: Add xsaves_nmi() helper Dapeng Mi
2025-12-03  6:54 ` [Patch v5 05/19] perf: Move and rename has_extended_regs() for ARCH-specific use Dapeng Mi
2025-12-03  6:54 ` [Patch v5 06/19] perf/x86: Add support for XMM registers in non-PEBS and REGS_USER Dapeng Mi
2025-12-04 15:17   ` Peter Zijlstra
2025-12-04 15:47     ` Peter Zijlstra
2025-12-05  6:37       ` Mi, Dapeng [this message]
2025-12-04 18:59     ` Dave Hansen
2025-12-05  8:42       ` Peter Zijlstra
2025-12-03  6:54 ` [Patch v5 07/19] perf: Add sampling support for SIMD registers Dapeng Mi
2025-12-05 11:07   ` Peter Zijlstra
2025-12-08  5:24     ` Mi, Dapeng
2025-12-05 11:40   ` Peter Zijlstra
2025-12-08  6:00     ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 08/19] perf/x86: Enable XMM sampling using sample_simd_vec_reg_* fields Dapeng Mi
2025-12-05 11:25   ` Peter Zijlstra
2025-12-08  6:10     ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 09/19] perf/x86: Enable YMM " Dapeng Mi
2025-12-03  6:54 ` [Patch v5 10/19] perf/x86: Enable ZMM " Dapeng Mi
2025-12-03  6:54 ` [Patch v5 11/19] perf/x86: Enable OPMASK sampling using sample_simd_pred_reg_* fields Dapeng Mi
2025-12-03  6:54 ` [Patch v5 12/19] perf/x86: Enable eGPRs sampling using sample_regs_* fields Dapeng Mi
2025-12-05 12:16   ` Peter Zijlstra
2025-12-08  6:11     ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 13/19] perf/x86: Enable SSP " Dapeng Mi
2025-12-05 12:20   ` Peter Zijlstra
2025-12-08  6:21     ` Mi, Dapeng
2025-12-24  5:45   ` Ravi Bangoria
2025-12-24  6:26     ` Mi, Dapeng
2026-01-06  6:55       ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 14/19] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS capability Dapeng Mi
2025-12-03  6:54 ` [Patch v5 15/19] perf/x86/intel: Enable arch-PEBS based SIMD/eGPRs/SSP sampling Dapeng Mi
2025-12-03  6:54 ` [Patch v5 16/19] perf/x86: Activate back-to-back NMI detection for arch-PEBS induced NMIs Dapeng Mi
2025-12-05 12:39   ` Peter Zijlstra
2025-12-07 20:44     ` Andi Kleen
2025-12-08  6:46     ` Mi, Dapeng
2025-12-08  8:50       ` Peter Zijlstra
2025-12-08  8:53         ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 17/19] perf headers: Sync with the kernel headers Dapeng Mi
2025-12-03 23:43   ` Ian Rogers
2025-12-04  1:37     ` Mi, Dapeng
2025-12-04  7:28       ` Ian Rogers
2026-01-20  7:01   ` Ian Rogers
2026-01-20  7:25     ` Mi, Dapeng
2026-01-20  7:16   ` Ian Rogers
2026-01-20  7:43     ` Mi, Dapeng
2026-01-20  8:00       ` Ian Rogers
2026-01-20  9:22         ` Mi, Dapeng
2026-01-20 18:11           ` Ian Rogers
2026-01-21  2:03             ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format Dapeng Mi
2025-12-04  0:17   ` Ian Rogers
2025-12-04  2:58     ` Mi, Dapeng
2025-12-04  7:49       ` Ian Rogers
2025-12-04  9:20         ` Mi, Dapeng
2025-12-04 16:16           ` Ian Rogers
2025-12-05  4:00             ` Mi, Dapeng
2025-12-05  6:38               ` Ian Rogers
2025-12-05  8:10                 ` Mi, Dapeng
2025-12-05 16:35                   ` Ian Rogers
2025-12-08  4:20                     ` Mi, Dapeng
2026-01-06  7:27                       ` Mi, Dapeng
2026-01-17  5:50                         ` Ian Rogers
2026-01-19  6:55                           ` Mi, Dapeng
2026-01-19 20:25                             ` Ian Rogers
2026-01-20  3:04                               ` Mi, Dapeng
2026-01-20  5:16                                 ` Ian Rogers
2026-01-20  6:46                                   ` Mi, Dapeng
2026-01-20  6:56                                     ` Ian Rogers
2026-01-20  7:39   ` Ian Rogers
2026-01-20  9:04     ` Mi, Dapeng
2026-01-20 18:20       ` Ian Rogers
2026-01-21  5:17         ` Mi, Dapeng
2026-01-21  7:09           ` Ian Rogers
2026-01-21  7:52             ` Mi, Dapeng
2026-01-21 14:48               ` Ian Rogers
2026-01-22  1:49                 ` Mi, Dapeng
2026-01-22  7:27                   ` Ian Rogers
2026-01-22  8:29                     ` Mi, Dapeng
2025-12-03  6:55 ` [Patch v5 19/19] perf regs: Enable dumping of SIMD registers Dapeng Mi
2025-12-04  0:24 ` [Patch v5 00/19] Support SIMD/eGPRs/SSP registers sampling for perf Ian Rogers
2025-12-04  3:28   ` Mi, Dapeng
2025-12-16  4:42 ` Ravi Bangoria
2025-12-16  6:59   ` Mi, Dapeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=becd48cc-8dce-463f-bb00-fd97a21bba7d@linux.intel.com \
    --to=dapeng1.mi@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=broonie@kernel.org \
    --cc=dapeng1.mi@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.falcon@intel.com \
    --cc=xudong.hao@intel.com \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.