From: Thomas Gleixner <tglx@linutronix.de>
To: "Chang S. Bae" <chang.seok.bae@intel.com>,
bp@suse.de, luto@kernel.org, mingo@kernel.org, x86@kernel.org
Cc: len.brown@intel.com, lenb@kernel.org, dave.hansen@intel.com,
thiago.macieira@intel.com, jing2.liu@intel.com,
ravi.v.shankar@intel.com, linux-kernel@vger.kernel.org,
chang.seok.bae@intel.com
Subject: Re: [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE
Date: Tue, 05 Oct 2021 02:30:43 +0200 [thread overview]
Message-ID: <87o884fh3g.ffs@tglx> (raw)
In-Reply-To: <20211001223728.9309-16-chang.seok.bae@intel.com>
On Fri, Oct 01 2021 at 15:37, Chang S. Bae wrote:
> arch_prctl(ARCH_SET_STATE_ENABLE, u64 bitmask)
> Some XSTATE features, such as AMX, are unavailable to applications
> until that process explicitly requests them via this call. Requests can
> be made for any number of valid user XSTATEs in a single call. This
> call is intended to be invoked very early in process initialization. A
> forked child inherits access, but permission is reset upon exec. There
> is no concept of un-requesting XSTATE access.
> Return codes:
> 0: success (including repeated calls)
> EINVAL: no hardware feature for the request
>
> arch_prctl(ARCH_GET_STATE_ENABLE, u64 *bitmask)
> Return the bitmask of permitted user XSTATE features. If XSAVE is
> disabled, the bitmask indicates only legacy states.
>
> The permission is checked at every XSTATE buffer expansion: e.g.
> XFD-induced #NM event, and ptracer's XSTATE injection. When no permission
> is found, inform userspace via SIGILL or with error code.
>
> For "dynamic" XSTATE features that have XFD hardware support, the kernel
> can enforce that users can not touch state without permission. For state
> that has no XFD support, the kernel can not prevent a user from touching
> that state.
>
> The notion of granted permission is recorded in the group leader only. A
> new task copies its permission bitmask.
while this patch does again way more than the subject suggests and
really should be split into bits and pieces, the prctl is ill defined
and the implementation is partially buggy.
> + /*
> + * @state_perm:
> + *
> + * This bitmap indicates the permission for state components.
> + *
> + * Always reference group_leader's value via
> + * get_group_state_perm() as it readily represents the process's
> + * state permission.
> + */
> + u64 state_perm;
This want's to be __state_perm to denote that this should not be
accessed directly.
Also the only reason to access this is when a task triggers a permission
check vs. it's own permissions and not unconditionally be deferenced all
over the place.
The point is that you don't want to derefence tsk->group_leader if not
absolutely required. That's why I want to have the information in
fpstate which is thread local and has to be accessed anyway.
Only if tsk->fpstate does not provide the permission then the group
leader state has to be checked, i.e. in #NM and ptrace which is a slow
path anyway. In the case that the permission check on the thread local
info fails then the task is either going to be killed or an extended
buffer has to be allocated.
See?
> +/* Require ARCH_SET_STATE_ENABLE for future features */
> +#define XFEATURE_MASK_PERMISSION_REQUIRED GENMASK_ULL(63, XFEATURE_MAX)
When you add AMX then XFEATURE_MAX is going to be past AMX. And no, even
if you fix that up later in weird ways, this does not make sense.
> +/**
> + * get_group_state_perm - Get a per-process state permission
> + * @tsk: A struct task_struct * pointer
> + * Return: A bitmap to indicate state permission.
> + */
> +static inline u64 get_group_state_perm(struct task_struct *tsk)
> +{
> + return tsk->group_leader->thread.fpu.state_perm;
This needs a READ_ONCE() because it can be concurrently modified and the
read is lockless. Which means that the write side needs a WRITE_ONCE(),
but see below.
> +}
> +
> +/**
> + * state_permitted - Check a task's permission for indicated features.
state_permitted and all the other state names are way too broad. This is
about xstate and not about random states. We have name spaces for a
reason.
> +#define ARCH_SET_STATE_ENABLE 0x1021
> +#define ARCH_GET_STATE_ENABLE 0x1022
This is about XSTATE components and should be named as such,
i.e. something which is entirely clear, e.g. _XCOMP_ because that is
what this is about. More below.
Aside of that why does this not start with the obvious and simple prctl
option to retrieve the possible supported features?
> +/**
> + * reset_state_perm - Reset a task's permission for dynamic user state
> + *
> + * It is expected to call at exec in which one task runs in a process.
> + *
> + * @task: A struct task_struct * pointer
> + */
> +void reset_state_perm(struct task_struct *tsk)
> +{
> + struct fpu *fpu = &tsk->thread.fpu;
> +
> + fpu->state_perm = xfeatures_mask_all & ~xfeatures_mask_user_perm();
> +
> + if (!xfeatures_mask_user_dynamic ||
> + !(fpu->state_mask & xfeatures_mask_user_dynamic))
> + return;
> +
> + WARN_ON(tsk->signal->nr_threads > 1);
Why? The only two callers are from set_personality*().
Aside of that why are you doing this from set_personality()?
This has absolutely nothing to do with set_personality() which is
exclusively about native and compat format of the executable.
At the point where set_personality() is invoked the task which does
exec() has already invoked flush_thread(), which invokes
fpu_flush_thread() which in turn resets the FPU state...
So what?
> +/**
> + * do_arch_prctl_state - Read or write the state permission.
> + * @fpu: A struct task_struct * pointer
fpu != tsk
Also yuck, that task argument is silly because task cannot be anything
else than current, but that's not your fault.
> + * @option: A subfunction of arch_prctl()
> + * @arg2: Either a pointer to userspace memory or state-component
> + * bitmap value.
> + * Return: 0 if successful; otherwise, return a relevant error code.
> + */
> +long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2)
> +{
> + u64 features_mask;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_FPU))
> + features_mask = 0;
> + else if (use_fxsr())
> + features_mask = XFEATURE_MASK_FPSSE;
> + else
> + features_mask = XFEATURE_MASK_FP;
Why? This feature mask should be evaluated once and not over and over again.
> + switch (option) {
> + case ARCH_SET_STATE_ENABLE: {
> + u64 state_perm = arg2;
This does nowhere mention in the comments above that this limits the
available state space to 32bit for 32bit tasks running on a 64 bit
kernel. I don't think it's a problem, but it has to be documented.
> +
> + if (use_xsave())
> + features_mask = xfeatures_mask_uabi();
> +
> + if (state_perm & ~features_mask)
> + return -EINVAL;
> +
> + state_perm &= xfeatures_mask_user_perm();
> + if (!state_perm)
> + return 0;
I really do not get the semantics of this prctl at all.
GET stores _all_ permitted bits in the user space variable which makes
sense.
SET is just accepting everything except not supported bits, but as it
takes a feature bitmask it suggests that this sets the xfeature bits
which are available for the task or the process.
How does prctl(..., SET, 0) make sense?
It does not make any sense at all. There is no support for downgrading
the permitted features:
1) Default features up to AVX512 cannot be disabled
2) Once AMX (or any upcoming state) is enabled there is not way back.
So no. This really want's to be
prctl(SET, xfeature_number)
and not something which is semanticaly ill defined.
> + tsk->group_leader->thread.fpu.state_perm |= state_perm;
While this "works" with a single permission controlled state this is
completely broken once more permission controlled states come into play
when tasks of the same process invoke this concurrently and request
different features.
> + return 0;
> + }
> + case ARCH_GET_STATE_ENABLE: {
> + if (use_xsave())
> + features_mask = get_group_state_perm(tsk);
> +
> + return put_user(features_mask, (unsigned long __user *)arg2);
This is broken for 32bit kernels. The prctl is unconditional and
therefore this needs to be a *u64 cast.
Thanks,
tglx
next prev parent reply other threads:[~2021-10-05 0:30 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 01/29] x86/fpu/xstate: Fix the state copy function to the XSTATE buffer Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 02/29] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 03/29] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 04/29] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 05/29] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 06/29] x86/fpu/xstate: Add new variables to indicate dynamic XSTATE buffer size Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 07/29] x86/fpu/xstate: Calculate and remember dynamic XSTATE buffer sizes Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 08/29] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 09/29] x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer dynamically Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 10/29] x86/fpu/xstate: Update the XSTATE save function to support dynamic states Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 11/29] x86/fpu/xstate: Update the XSTATE buffer address finder " Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 12/29] x86/fpu/xstate: Update the XSTATE context copy function " Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 13/29] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 14/29] x86/fpu/xstate: Support ptracer-induced XSTATE buffer expansion Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE Chang S. Bae
2021-10-05 0:30 ` Thomas Gleixner [this message]
2021-10-05 9:49 ` Thomas Gleixner
2021-10-05 11:23 ` Peter Zijlstra
2021-10-05 12:27 ` Thomas Gleixner
2021-10-01 22:37 ` [PATCH v11 16/29] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size Chang S. Bae
2021-10-05 12:30 ` Thomas Gleixner
2021-10-05 15:19 ` Thomas Gleixner
2021-10-01 22:37 ` [PATCH v11 17/29] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 18/29] x86/fpu/xstate: Disable XSTATE support if an inconsistent state is detected Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 19/29] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 20/29] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 21/29] x86/fpu/amx: Initialize child's AMX state Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 22/29] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 23/29] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 24/29] selftest/x86/amx: Test cases for the AMX state management Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 25/29] x86/insn/amx: Add TILERELEASE instruction to the opcode map Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 26/29] intel_idle/amx: Add SPR support with XTILEDATA capability Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 27/29] x86/fpu/xstate: Add a sanity check for XFD state when saving XSTATE Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 28/29] x86/arch_prctl: ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 29/29] x86/arch_prctl: ARCH_SET_STATE_ENABLE_ALLOC Chang S. Bae
2021-10-01 22:47 ` [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Bae, Chang Seok
2021-10-01 22:50 ` Bae, Chang Seok
2021-10-03 1:05 ` Thomas Gleixner
2021-10-04 14:48 ` Bae, Chang Seok
2021-10-02 21:54 ` Thomas Gleixner
2021-10-02 22:11 ` Bae, Chang Seok
2021-10-04 13:44 ` Thomas Gleixner
2021-10-04 14:47 ` Bae, Chang Seok
2021-10-02 22:20 ` Bae, Chang Seok
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o884fh3g.ffs@tglx \
--to=tglx@linutronix.de \
--cc=bp@suse.de \
--cc=chang.seok.bae@intel.com \
--cc=dave.hansen@intel.com \
--cc=jing2.liu@intel.com \
--cc=len.brown@intel.com \
--cc=lenb@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=ravi.v.shankar@intel.com \
--cc=thiago.macieira@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox