linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Eranian Stephane <eranian@google.com>,
	Mark Rutland <mark.rutland@arm.com>,
	broonie@kernel.org, Ravi Bangoria <ravi.bangoria@amd.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Zide Chen <zide.chen@intel.com>,
	Falcon Thomas <thomas.falcon@intel.com>,
	Dapeng Mi <dapeng1.mi@intel.com>,
	Xudong Hao <xudong.hao@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>
Subject: Re: [Patch v5 06/19] perf/x86: Add support for XMM registers in non-PEBS and REGS_USER
Date: Fri, 5 Dec 2025 14:37:24 +0800	[thread overview]
Message-ID: <becd48cc-8dce-463f-bb00-fd97a21bba7d@linux.intel.com> (raw)
In-Reply-To: <20251204154721.GB2619703@noisy.programming.kicks-ass.net>


On 12/4/2025 11:47 PM, Peter Zijlstra wrote:
> On Thu, Dec 04, 2025 at 04:17:35PM +0100, Peter Zijlstra wrote:
>> On Wed, Dec 03, 2025 at 02:54:47PM +0800, Dapeng Mi wrote:
>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>
>>> While collecting XMM registers in a PEBS record has been supported since
>>> Icelake, non-PEBS events have lacked this feature. By leveraging the
>>> xsaves instruction, it is now possible to snapshot XMM registers for
>>> non-PEBS events, completing the feature set.
>>>
>>> To utilize the xsaves instruction, a 64-byte aligned buffer is required.
>>> A per-CPU ext_regs_buf is added to store SIMD and other registers, with
>>> the buffer size being approximately 2K. The buffer is allocated using
>>> kzalloc_node(), ensuring natural alignment and 64-byte alignment for all
>>> kmalloc() allocations with powers of 2.
>>>
>>> The XMM sampling support is extended for both REGS_USER and REGS_INTR.
>>> For REGS_USER, perf_get_regs_user() returns the registers from
>>> task_pt_regs(current), which is a pt_regs structure. It needs to be
>>> copied to user space secific x86_user_regs structure since kernel may
>>> modify pt_regs structure later.
>>>
>>> For PEBS, XMM registers are retrieved from PEBS records.
>>>
>>> In cases where userspace tasks are trapped within kernel mode (e.g.,
>>> during a syscall) when an NMI arrives, pt_regs information can still be
>>> retrieved from task_pt_regs(). However, capturing SIMD and other
>>> xsave-based registers in this scenario is challenging. Therefore,
>>> snapshots for these registers are omitted in such cases.
>>>
>>> The reasons are:
>>> - Profiling a userspace task that requires SIMD/eGPR registers typically
>>>   involves NMIs hitting userspace, not kernel mode.
>>> - Although it is possible to retrieve values when the TIF_NEED_FPU_LOAD
>>>   flag is set, the complexity introduced to handle this uncommon case in
>>>   the critical path is not justified.
>>> - Additionally, checking the TIF_NEED_FPU_LOAD flag alone is insufficient.
>>>   Some corner cases, such as an NMI occurring just after the flag switches
>>>   but still in kernel mode, cannot be handled.
>> Urgh.. Dave, Thomas, is there any reason we could not set
>> TIF_NEED_FPU_LOAD *after* doing the XSAVE (clearing is already done
>> after restore).
>>
>> That way, when an NMI sees TIF_NEED_FPU_LOAD it knows the task copy is
>> consistent.
>>
>> I'm not at all sure this is complex, it just needs a little care.
>>
>> And then there is the deferred thing, just like unwind, we can defer
>> REGS_USER/STACK_USER much the same, except someone went and built all
>> that deferred stuff with unwind all tangled into it :/
> With something like the below, the NMI could do something like:
>
> 	struct xregs_state *xr = NULL;
>
> 	/*
> 	 * fpu code does:
> 	 *  XSAVE
> 	 *  set_thread_flag(TIF_NEED_FPU_LOAD)
> 	 *  ...
> 	 *  XRSTOR
> 	 *  clear_thread_flag(TIF_NEED_FPU_LOAD)
> 	 * therefore, when TIF_NEED_FPU_LOAD, the task fpu state holds a
> 	 * whole copy.
> 	 */
> 	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
> 		struct fpu *fpu = x86_task_fpu(current);
> 		/*
> 		 * If __task_fpstate is set, it holds the right pointer,
> 		 * otherwise fpstate will.
> 		 */
> 		struct fpstate *fps = READ_ONCE(fpu->__task_fpstate);
> 		if (!fps)
> 			fps = fpu->fpstate;
> 		xr = &fps->regs.xregs_state;
> 	} else {
> 		/* like fpu_sync_fpstate(), except NMI local */
> 		xsave_nmi(xr, mask);
> 	}
>
> 	// frob xr into perf data
>
> Or did I miss something? I've not looked at this very long and the above
> was very vague on the actual issues.
>
>
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index da233f20ae6f..0f91a0d7e799 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -359,18 +359,22 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
>  	struct fpstate *cur_fps = fpu->fpstate;
>  
>  	fpregs_lock();
> -	if (!cur_fps->is_confidential && !test_thread_flag(TIF_NEED_FPU_LOAD))
> +	if (!cur_fps->is_confidential && !test_thread_flag(TIF_NEED_FPU_LOAD)) {
>  		save_fpregs_to_fpstate(fpu);
> +		set_thread_flag(TIF_NEED_FPU_LOAD);
> +	}
>  
>  	/* Swap fpstate */
>  	if (enter_guest) {
> -		fpu->__task_fpstate = cur_fps;
> +		WRITE_ONCE(fpu->__task_fpstate, cur_fps);
> +		barrier();
>  		fpu->fpstate = guest_fps;
>  		guest_fps->in_use = true;
>  	} else {
>  		guest_fps->in_use = false;
>  		fpu->fpstate = fpu->__task_fpstate;
> -		fpu->__task_fpstate = NULL;
> +		barrier();
> +		WRITE_ONCE(fpu->__task_fpstate, NULL);
>  	}
>  
>  	cur_fps = fpu->fpstate;
> @@ -456,8 +460,8 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
>  
>  	if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
>  	    !test_thread_flag(TIF_NEED_FPU_LOAD)) {
> -		set_thread_flag(TIF_NEED_FPU_LOAD);
>  		save_fpregs_to_fpstate(x86_task_fpu(current));
> +		set_thread_flag(TIF_NEED_FPU_LOAD);
>  	}
>  	__cpu_invalidate_fpregs_state();
>  

Ok, I would involve these changes into next version and support the SIMD
registers sampling for user space registers sampling.



  reply	other threads:[~2025-12-05  6:37 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-03  6:54 [Patch v5 00/19] Support SIMD/eGPRs/SSP registers sampling for perf Dapeng Mi
2025-12-03  6:54 ` [Patch v5 01/19] perf: Eliminate duplicate arch-specific functions definations Dapeng Mi
2025-12-03  6:54 ` [Patch v5 02/19] perf/x86: Use x86_perf_regs in the x86 nmi handler Dapeng Mi
2025-12-03  6:54 ` [Patch v5 03/19] perf/x86: Introduce x86-specific x86_pmu_setup_regs_data() Dapeng Mi
2025-12-03  6:54 ` [Patch v5 04/19] x86/fpu/xstate: Add xsaves_nmi() helper Dapeng Mi
2025-12-03  6:54 ` [Patch v5 05/19] perf: Move and rename has_extended_regs() for ARCH-specific use Dapeng Mi
2025-12-03  6:54 ` [Patch v5 06/19] perf/x86: Add support for XMM registers in non-PEBS and REGS_USER Dapeng Mi
2025-12-04 15:17   ` Peter Zijlstra
2025-12-04 15:47     ` Peter Zijlstra
2025-12-05  6:37       ` Mi, Dapeng [this message]
2025-12-04 18:59     ` Dave Hansen
2025-12-05  8:42       ` Peter Zijlstra
2025-12-03  6:54 ` [Patch v5 07/19] perf: Add sampling support for SIMD registers Dapeng Mi
2025-12-05 11:07   ` Peter Zijlstra
2025-12-08  5:24     ` Mi, Dapeng
2025-12-05 11:40   ` Peter Zijlstra
2025-12-08  6:00     ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 08/19] perf/x86: Enable XMM sampling using sample_simd_vec_reg_* fields Dapeng Mi
2025-12-05 11:25   ` Peter Zijlstra
2025-12-08  6:10     ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 09/19] perf/x86: Enable YMM " Dapeng Mi
2025-12-03  6:54 ` [Patch v5 10/19] perf/x86: Enable ZMM " Dapeng Mi
2025-12-03  6:54 ` [Patch v5 11/19] perf/x86: Enable OPMASK sampling using sample_simd_pred_reg_* fields Dapeng Mi
2025-12-03  6:54 ` [Patch v5 12/19] perf/x86: Enable eGPRs sampling using sample_regs_* fields Dapeng Mi
2025-12-05 12:16   ` Peter Zijlstra
2025-12-08  6:11     ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 13/19] perf/x86: Enable SSP " Dapeng Mi
2025-12-05 12:20   ` Peter Zijlstra
2025-12-08  6:21     ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 14/19] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS capability Dapeng Mi
2025-12-03  6:54 ` [Patch v5 15/19] perf/x86/intel: Enable arch-PEBS based SIMD/eGPRs/SSP sampling Dapeng Mi
2025-12-03  6:54 ` [Patch v5 16/19] perf/x86: Activate back-to-back NMI detection for arch-PEBS induced NMIs Dapeng Mi
2025-12-05 12:39   ` Peter Zijlstra
2025-12-07 20:44     ` Andi Kleen
2025-12-08  6:46     ` Mi, Dapeng
2025-12-08  8:50       ` Peter Zijlstra
2025-12-08  8:53         ` Mi, Dapeng
2025-12-03  6:54 ` [Patch v5 17/19] perf headers: Sync with the kernel headers Dapeng Mi
2025-12-03 23:43   ` Ian Rogers
2025-12-04  1:37     ` Mi, Dapeng
2025-12-04  7:28       ` Ian Rogers
2025-12-03  6:54 ` [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format Dapeng Mi
2025-12-04  0:17   ` Ian Rogers
2025-12-04  2:58     ` Mi, Dapeng
2025-12-04  7:49       ` Ian Rogers
2025-12-04  9:20         ` Mi, Dapeng
2025-12-04 16:16           ` Ian Rogers
2025-12-05  4:00             ` Mi, Dapeng
2025-12-05  6:38               ` Ian Rogers
2025-12-05  8:10                 ` Mi, Dapeng
2025-12-05 16:35                   ` Ian Rogers
2025-12-08  4:20                     ` Mi, Dapeng
2025-12-03  6:55 ` [Patch v5 19/19] perf regs: Enable dumping of SIMD registers Dapeng Mi
2025-12-04  0:24 ` [Patch v5 00/19] Support SIMD/eGPRs/SSP registers sampling for perf Ian Rogers
2025-12-04  3:28   ` Mi, Dapeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=becd48cc-8dce-463f-bb00-fd97a21bba7d@linux.intel.com \
    --to=dapeng1.mi@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=broonie@kernel.org \
    --cc=dapeng1.mi@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.falcon@intel.com \
    --cc=xudong.hao@intel.com \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).