* [PATCH v3 00/10] Introduce CET supervisor state support
@ 2025-03-07 16:41 Chao Gao
2025-03-07 16:41 ` [PATCH v3 01/10] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Chao Gao
` (10 more replies)
0 siblings, 11 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
==Changelog==
v2->v3:
- reorder patches to add fpu_guest_cfg first and then introduce dynamic kernel
feature concept (Dave)
- Revise changelog for all patches except the first and the last one (Dave)
- Split up patches that do multiple things into separate patches.
- collect tags for patch 1
v1->v2:
- rebase onto the latest kvm-x86/next
- Add performance data to the cover-letter
- v1: https://lore.kernel.org/kvm/73802bff-833c-4233-9a5b-88af0d062c82@intel.com/
==Background==
This series spins off from CET KVM virtualization enabling series [1].
The purpose is to get these preparation work resolved ahead of KVM part
landing. There was a discussion about introducing CET supervisor state
support [2] [3].
CET supervisor state, i.e., IA32_PL{0,1,2}_SSP, are xsave-managed MSRs,
it can be enabled via IA32_XSS[bit 12]. KVM relies on host side CET
supervisor state support to fully enable guest CET MSR contents storage.
The benefits are: 1) No need to manually save/restore the 3 MSRs when
vCPU fpu context is sched in/out. 2) Omit manually swapping the three
MSRs at VM-Exit/VM-Entry for guest/host. 3) Make guest CET user/supervisor
states managed in a consistent manner within host kernel FPU framework.
==Solution==
This series tries to:
1) Fix existing issue regarding enabling guest supervisor states support.
2) Add CET supervisor state support in core kernel.
3) Introduce new FPU config for guest fpstate setup.
With the preparation work landed, for guest fpstate, we have xstate_bv[12]
== xcomp_bv[12] == 1 and CET supervisor state is saved/reloaded when
xsaves/xrstors executes on guest fpstate.
For non-guest/normal fpstate, we have xstate_bv[12] == xcomp_bv[12] == 0,
then HW can optimize xsaves/xrstors operations.
==Performance==
We measured context-switching performance with the benchmark [4] in following
three cases.
case 1: the baseline. i.e., this series isn't applied
case 2: baseline + this series. CET-S space is allocated for guest fpu only.
case 3: baseline + allocate CET-S space for all tasks. Hardware init
optimization avoids writing out CET-S space on each XSAVES.
the data are as follows
case |IA32_XSS[12] | Space | RFBM[12] | Drop%
-----+-------------+-------+----------+------
1 | 0 | None | 0 | 0.0%
2 | 1 | None | 0 | 0.2%
3 | 1 | 24B? | 1 | 0.2%
Case 2 and 3 have no difference in performnace. But case 2 is preferred because
it can save 24B of CET-S space for all non-vCPU threads with just a one-line
change:
+ fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_KERNEL_DYNAMIC;
fpu_guest_cfg has its own merits. Regardless of the approach we take, using
different FPU configuration settings for the guest and the kernel improves
readability, decouples them from each other, and arguably enhances
extensibility.
[1]: https://lore.kernel.org/all/20240219074733.122080-1-weijiang.yang@intel.com/
[2]: https://lore.kernel.org/all/ZM1jV3UPL0AMpVDI@google.com/
[3]: https://lore.kernel.org/all/2597a87b-1248-b8ce-ce60-94074bc67ea4@intel.com/
[4]: https://github.com/antonblanchard/will-it-scale/blob/master/tests/context_switch1.c
Chao Gao (2):
x86/fpu/xstate: Drop @perm from guest pseudo FPU container
x86/fpu/xstate: Correct xfeatures cache in guest pseudo fpu container
Sean Christopherson (1):
x86/fpu/xstate: Always preserve non-user xfeatures/flags in
__state_perm
Yang Weijiang (7):
x86/fpu/xstate: Correct guest fpstate size calculation
x86/fpu/xstate: Introduce guest FPU configuration
x86/fpu/xstate: Initialize guest perm with fpu_guest_cfg
x86/fpu/xstate: Initialize guest fpstate with fpu_guest_config
x86/fpu/xstate: Add CET supervisor xfeature support
x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set
x86/fpu/xstate: Warn if CET supervisor state is detected in normal
fpstate
arch/x86/include/asm/fpu/types.h | 31 +++++++++++++++------------
arch/x86/include/asm/fpu/xstate.h | 11 ++++++----
arch/x86/kernel/fpu/core.c | 34 +++++++++++++++++-------------
arch/x86/kernel/fpu/xstate.c | 35 ++++++++++++++++++++++++-------
arch/x86/kernel/fpu/xstate.h | 2 ++
5 files changed, 74 insertions(+), 39 deletions(-)
--
2.46.1
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v3 01/10] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
@ 2025-03-07 16:41 ` Chao Gao
2025-03-07 16:41 ` [PATCH v3 02/10] x86/fpu/xstate: Drop @perm from guest pseudo FPU container Chao Gao
` (9 subsequent siblings)
10 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
Maxim Levitsky
From: Sean Christopherson <seanjc@google.com>
When granting userspace or a KVM guest access to an xfeature, preserve the
entity's existing supervisor and software-defined permissions as tracked
by __state_perm, i.e. use __state_perm to track *all* permissions even
though all supported supervisor xfeatures are granted to all FPUs and
FPU_GUEST_PERM_LOCKED disallows changing permissions.
Effectively clobbering supervisor permissions results in inconsistent
behavior, as xstate_get_group_perm() will report supervisor features for
process that do NOT request access to dynamic user xfeatures, whereas any
and all supervisor features will be absent from the set of permissions for
any process that is granted access to one or more dynamic xfeatures (which
right now means AMX).
The inconsistency isn't problematic because fpu_xstate_prctl() already
strips out everything except user xfeatures:
case ARCH_GET_XCOMP_PERM:
/*
* Lockless snapshot as it can also change right after the
* dropping the lock.
*/
permitted = xstate_get_host_group_perm();
permitted &= XFEATURE_MASK_USER_SUPPORTED;
return put_user(permitted, uptr);
case ARCH_GET_XCOMP_GUEST_PERM:
permitted = xstate_get_guest_group_perm();
permitted &= XFEATURE_MASK_USER_SUPPORTED;
return put_user(permitted, uptr);
and similarly KVM doesn't apply the __state_perm to supervisor states
(kvm_get_filtered_xcr0() incorporates xstate_get_guest_group_perm()):
case 0xd: {
u64 permitted_xcr0 = kvm_get_filtered_xcr0();
u64 permitted_xss = kvm_caps.supported_xss;
But if KVM in particular were to ever change, dropping supervisor
permissions would result in subtle bugs in KVM's reporting of supported
CPUID settings. And the above behavior also means that having supervisor
xfeatures in __state_perm is correctly handled by all users.
Dropping supervisor permissions also creates another landmine for KVM. If
more dynamic user xfeatures are ever added, requesting access to multiple
xfeatures in separate ARCH_REQ_XCOMP_GUEST_PERM calls will result in the
second invocation of __xstate_request_perm() computing the wrong ksize, as
as the mask passed to xstate_calculate_size() would not contain *any*
supervisor features.
Commit 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTATE
permissions") fudged around the size issue for userspace FPUs, but for
reasons unknown skipped guest FPUs. Lack of a fix for KVM "works" only
because KVM doesn't yet support virtualizing features that have supervisor
xfeatures, i.e. as of today, KVM guest FPUs will never need the relevant
xfeatures.
Simply extending the hack-a-fix for guests would temporarily solve the
ksize issue, but wouldn't address the inconsistency issue and would leave
another lurking pitfall for KVM. KVM support for virtualizing CET will
likely add CET_KERNEL as a guest-only xfeature, i.e. CET_KERNEL will not
be set in xfeatures_mask_supervisor() and would again be dropped when
granting access to dynamic xfeatures.
Note, the existing clobbering behavior is rather subtle. The @permitted
parameter to __xstate_request_perm() comes from:
permitted = xstate_get_group_perm(guest);
which is either fpu->guest_perm.__state_perm or fpu->perm.__state_perm,
where __state_perm is initialized to:
fpu->perm.__state_perm = fpu_kernel_cfg.default_features;
and copied to the guest side of things:
/* Same defaults for guests */
fpu->guest_perm = fpu->perm;
fpu_kernel_cfg.default_features contains everything except the dynamic
xfeatures, i.e. everything except XFEATURE_MASK_XTILE_DATA:
fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features;
fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
When __xstate_request_perm() restricts the local "mask" variable to
compute the user state size:
mask &= XFEATURE_MASK_USER_SUPPORTED;
usize = xstate_calculate_size(mask, false);
it subtly overwrites the target __state_perm with "mask" containing only
user xfeatures:
perm = guest ? &fpu->guest_perm : &fpu->perm;
/* Pairs with the READ_ONCE() in xstate_get_group_perm() */
WRITE_ONCE(perm->__state_perm, mask);
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Weijiang Yang <weijiang.yang@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chao Gao <chao.gao@intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: John Allen <john.allen@amd.com>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/all/ZTqgzZl-reO1m01I@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
---
arch/x86/include/asm/fpu/types.h | 8 +++++---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++++-------
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index de16862bf230..46cc263f9f4f 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -407,9 +407,11 @@ struct fpu_state_perm {
/*
* @__state_perm:
*
- * This bitmap indicates the permission for state components, which
- * are available to a thread group. The permission prctl() sets the
- * enabled state bits in thread_group_leader()->thread.fpu.
+ * This bitmap indicates the permission for state components
+ * available to a thread group, including both user and supervisor
+ * components and software-defined bits like FPU_GUEST_PERM_LOCKED.
+ * The permission prctl() sets the enabled state bits in
+ * thread_group_leader()->thread.fpu.
*
* All run time operations use the per thread information in the
* currently active fpu.fpstate which contains the xfeature masks
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 27417b685c1d..7caafdb7f6b8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1618,16 +1618,20 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
if ((permitted & requested) == requested)
return 0;
- /* Calculate the resulting kernel state size */
+ /*
+ * Calculate the resulting kernel state size. Note, @permitted also
+ * contains supervisor xfeatures even though supervisor are always
+ * permitted for kernel and guest FPUs, and never permitted for user
+ * FPUs.
+ */
mask = permitted | requested;
- /* Take supervisor states into account on the host */
- if (!guest)
- mask |= xfeatures_mask_supervisor();
ksize = xstate_calculate_size(mask, compacted);
- /* Calculate the resulting user state size */
- mask &= XFEATURE_MASK_USER_SUPPORTED;
- usize = xstate_calculate_size(mask, false);
+ /*
+ * Calculate the resulting user state size. Take care not to clobber
+ * the supervisor xfeatures in the new mask!
+ */
+ usize = xstate_calculate_size(mask & XFEATURE_MASK_USER_SUPPORTED, false);
if (!guest) {
ret = validate_sigaltstack(usize);
--
2.46.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 02/10] x86/fpu/xstate: Drop @perm from guest pseudo FPU container
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
2025-03-07 16:41 ` [PATCH v3 01/10] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Chao Gao
@ 2025-03-07 16:41 ` Chao Gao
2025-03-07 16:41 ` [PATCH v3 03/10] x86/fpu/xstate: Correct xfeatures cache in guest pseudo fpu container Chao Gao
` (8 subsequent siblings)
10 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
Maxim Levitsky
Remove @perm from the guest pseudo FPU container. The field is
initialized during allocation and never used later.
Rename fpu_init_guest_permissions() to show that its sole purpose is to
lock down guest permissions.
Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/include/asm/fpu/types.h | 7 -------
arch/x86/kernel/fpu/core.c | 7 ++-----
2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 46cc263f9f4f..9f9ed406b179 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -526,13 +526,6 @@ struct fpu_guest {
*/
u64 xfeatures;
- /*
- * @perm: xfeature bitmap of features which are
- * permitted to be enabled for the guest
- * vCPU.
- */
- u64 perm;
-
/*
* @xfd_err: Save the guest value.
*/
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1209c7aebb21..dc169f3d336d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -195,7 +195,7 @@ void fpu_reset_from_exception_fixup(void)
#if IS_ENABLED(CONFIG_KVM)
static void __fpstate_reset(struct fpstate *fpstate, u64 xfd);
-static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
+static void fpu_lock_guest_permissions(struct fpu_guest *gfpu)
{
struct fpu_state_perm *fpuperm;
u64 perm;
@@ -211,8 +211,6 @@ static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
WRITE_ONCE(fpuperm->__state_perm, perm | FPU_GUEST_PERM_LOCKED);
spin_unlock_irq(¤t->sighand->siglock);
-
- gfpu->perm = perm & ~FPU_GUEST_PERM_LOCKED;
}
bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
@@ -233,7 +231,6 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
gfpu->fpstate = fpstate;
gfpu->xfeatures = fpu_user_cfg.default_features;
- gfpu->perm = fpu_user_cfg.default_features;
/*
* KVM sets the FP+SSE bits in the XSAVE header when copying FPU state
@@ -248,7 +245,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
if (WARN_ON_ONCE(fpu_user_cfg.default_size > gfpu->uabi_size))
gfpu->uabi_size = fpu_user_cfg.default_size;
- fpu_init_guest_permissions(gfpu);
+ fpu_lock_guest_permissions(gfpu);
return true;
}
--
2.46.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 03/10] x86/fpu/xstate: Correct xfeatures cache in guest pseudo fpu container
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
2025-03-07 16:41 ` [PATCH v3 01/10] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Chao Gao
2025-03-07 16:41 ` [PATCH v3 02/10] x86/fpu/xstate: Drop @perm from guest pseudo FPU container Chao Gao
@ 2025-03-07 16:41 ` Chao Gao
2025-03-07 17:48 ` Dave Hansen
2025-03-07 16:41 ` [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation Chao Gao
` (7 subsequent siblings)
10 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
The xfeatures field in struct fpu_guest is designed to track the enabled
xfeatures for guest FPUs. However, during allocation in
fpu_alloc_guest_fpstate(), gfpu->xfeatures is initialized to
fpu_user_cfg.default_features, while the corresponding
fpstate->xfeatures is set to fpu_kernel_cfg.default_features
Correct the mismatch to avoid confusion.
Note this mismatch does not cause any functional issues. The
gfpu->xfeatures is checked in fpu_enable_guest_xfd_features() to
verify if XFD features are already enabled:
xfeatures &= ~guest_fpu->xfeatures;
if (!xfeatures)
return 0;
It gets updated in fpstate_realloc() after enabling some XFD features:
guest_fpu->xfeatures |= xfeatures;
So, backport is not needed.
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kernel/fpu/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index dc169f3d336d..6166a928d3f5 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -230,7 +230,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
fpstate->is_guest = true;
gfpu->fpstate = fpstate;
- gfpu->xfeatures = fpu_user_cfg.default_features;
+ gfpu->xfeatures = fpu_kernel_cfg.default_features;
/*
* KVM sets the FP+SSE bits in the XSAVE header when copying FPU state
--
2.46.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
` (2 preceding siblings ...)
2025-03-07 16:41 ` [PATCH v3 03/10] x86/fpu/xstate: Correct xfeatures cache in guest pseudo fpu container Chao Gao
@ 2025-03-07 16:41 ` Chao Gao
2025-03-07 17:53 ` Dave Hansen
2025-03-07 21:37 ` Chang S. Bae
2025-03-07 16:41 ` [PATCH v3 05/10] x86/fpu/xstate: Introduce guest FPU configuration Chao Gao
` (6 subsequent siblings)
10 siblings, 2 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
From: Yang Weijiang <weijiang.yang@intel.com>
The guest fpstate size is calculated based on fpu_user_cfg, while
fpstate->xfeatures is set to fpu_kernel_cfg.default_features in
fpu_alloc_guest_fpstate(). In other words, the guest fpstate doesn't
allocate memory for all supervisor states, even though they are enabled.
Correct the calculation of the guest fpstate size.
Note that this issue does not cause any functional problems because the
guest fpstate is allocated using vmalloc(), which aligns the size to a
full page, providing enough space for all existing supervisor components.
On Emerald Rapids CPUs, the guest fpstate after this correction is ~2880
bytes.
Link: https://lore.kernel.org/kvm/20230914063325.85503-3-weijiang.yang@intel.com/
Fixes: 69f6ed1d14c6 ("x86/fpu: Provide infrastructure for KVM FPU cleanup")
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kernel/fpu/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 6166a928d3f5..adc34914634e 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -218,7 +218,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
struct fpstate *fpstate;
unsigned int size;
- size = fpu_user_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
+ size = fpu_kernel_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
fpstate = vzalloc(size);
if (!fpstate)
return false;
--
2.46.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 05/10] x86/fpu/xstate: Introduce guest FPU configuration
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
` (3 preceding siblings ...)
2025-03-07 16:41 ` [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation Chao Gao
@ 2025-03-07 16:41 ` Chao Gao
2025-03-07 18:06 ` Dave Hansen
2025-03-07 16:41 ` [PATCH v3 06/10] x86/fpu/xstate: Initialize guest perm with fpu_guest_cfg Chao Gao
` (5 subsequent siblings)
10 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
Maxim Levitsky
From: Yang Weijiang <weijiang.yang@intel.com>
The guest fpstate is currently initialized using fpu_kernel_cfg. This lacks
flexibility because every feature added to the kernel FPU config is
automatically available for use in the guest FPU. And to make any feature
available for the guest FPU, the feature should be added to the kernel FPU
config.
Introduce fpu_guest_cfg to separate the guest FPU config from the kernel
FPU config. This enhances code readability by allowing the guest FPU to be
initialized with its own config and also improves extensibility by allowing
the guest FPU config and kernel FPU config to evolve independently.
Note fpu_guest_cfg and fpu_kernel_cfg remain the same for now.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
arch/x86/include/asm/fpu/types.h | 2 +-
arch/x86/kernel/fpu/core.c | 3 ++-
arch/x86/kernel/fpu/xstate.c | 10 ++++++++++
3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 9f9ed406b179..d9515d7f65e4 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -596,6 +596,6 @@ struct fpu_state_config {
};
/* FPU state configuration information */
-extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg;
+extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg, fpu_guest_cfg;
#endif /* _ASM_X86_FPU_TYPES_H */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index adc34914634e..b0c1ef40d105 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -33,9 +33,10 @@ DEFINE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
DEFINE_PER_CPU(u64, xfd_state);
#endif
-/* The FPU state configuration data for kernel and user space */
+/* The FPU state configuration data for kernel, user space and guest */
struct fpu_state_config fpu_kernel_cfg __ro_after_init;
struct fpu_state_config fpu_user_cfg __ro_after_init;
+struct fpu_state_config fpu_guest_cfg __ro_after_init;
/*
* Represents the initial FPU state. It's mostly (but not completely) zeroes,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 7caafdb7f6b8..58325b3b8914 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -683,6 +683,7 @@ static int __init init_xstate_size(void)
{
/* Recompute the context size for enabled features: */
unsigned int user_size, kernel_size, kernel_default_size;
+ unsigned int guest_default_size;
bool compacted = cpu_feature_enabled(X86_FEATURE_XCOMPACTED);
/* Uncompacted user space size */
@@ -704,13 +705,18 @@ static int __init init_xstate_size(void)
kernel_default_size =
xstate_calculate_size(fpu_kernel_cfg.default_features, compacted);
+ guest_default_size =
+ xstate_calculate_size(fpu_guest_cfg.default_features, compacted);
+
if (!paranoid_xstate_size_valid(kernel_size))
return -EINVAL;
fpu_kernel_cfg.max_size = kernel_size;
fpu_user_cfg.max_size = user_size;
+ fpu_guest_cfg.max_size = kernel_size;
fpu_kernel_cfg.default_size = kernel_default_size;
+ fpu_guest_cfg.default_size = guest_default_size;
fpu_user_cfg.default_size =
xstate_calculate_size(fpu_user_cfg.default_features, false);
@@ -820,6 +826,10 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
fpu_user_cfg.default_features = fpu_user_cfg.max_features;
fpu_user_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
+ fpu_guest_cfg.max_features = fpu_kernel_cfg.max_features;
+ fpu_guest_cfg.default_features = fpu_guest_cfg.max_features;
+ fpu_guest_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
+
/* Store it for paranoia check at the end */
xfeatures = fpu_kernel_cfg.max_features;
--
2.46.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 06/10] x86/fpu/xstate: Initialize guest perm with fpu_guest_cfg
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
` (4 preceding siblings ...)
2025-03-07 16:41 ` [PATCH v3 05/10] x86/fpu/xstate: Introduce guest FPU configuration Chao Gao
@ 2025-03-07 16:41 ` Chao Gao
2025-03-07 18:14 ` Dave Hansen
2025-03-07 16:41 ` [PATCH v3 07/10] x86/fpu/xstate: Initialize guest fpstate with fpu_guest_config Chao Gao
` (4 subsequent siblings)
10 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
From: Yang Weijiang <weijiang.yang@intel.com>
Use the new fpu_guest_cfg to initialize guest permissions.
Note fpu_guest_cfg and fpu_kernel_cfg remain the same for now. So there
is no functional change.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
[Gao Chao: Extrace this from the previous patch ]
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kernel/fpu/core.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index b0c1ef40d105..d7ae684adbad 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -534,8 +534,15 @@ void fpstate_reset(struct fpu *fpu)
fpu->perm.__state_perm = fpu_kernel_cfg.default_features;
fpu->perm.__state_size = fpu_kernel_cfg.default_size;
fpu->perm.__user_state_size = fpu_user_cfg.default_size;
- /* Same defaults for guests */
- fpu->guest_perm = fpu->perm;
+
+ /* Guest permission settings */
+ fpu->guest_perm.__state_perm = fpu_guest_cfg.default_features;
+ fpu->guest_perm.__state_size = fpu_guest_cfg.default_size;
+ /*
+ * Set guest's __user_state_size to fpu_user_cfg.default_size so that
+ * existing uAPIs can still work.
+ */
+ fpu->guest_perm.__user_state_size = fpu_user_cfg.default_size;
}
static inline void fpu_inherit_perms(struct fpu *dst_fpu)
--
2.46.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 07/10] x86/fpu/xstate: Initialize guest fpstate with fpu_guest_config
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
` (5 preceding siblings ...)
2025-03-07 16:41 ` [PATCH v3 06/10] x86/fpu/xstate: Initialize guest perm with fpu_guest_cfg Chao Gao
@ 2025-03-07 16:41 ` Chao Gao
2025-03-07 18:41 ` Dave Hansen
2025-03-07 16:41 ` [PATCH v3 08/10] x86/fpu/xstate: Add CET supervisor xfeature support Chao Gao
` (3 subsequent siblings)
10 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
Maxim Levitsky
From: Yang Weijiang <weijiang.yang@intel.com>
Use fpu_guest_cfg to initialize the guest fpstate and the guest FPU pseduo
container.
The user_* fields remain unchanged for compatibility with KVM uAPIs.
Inline the logic of __fpstate_reset() to directly utilize fpu_guest_cfg.
Note fpu_guest_cfg and fpu_kernel_cfg remain the same for now. So there
is no functional change.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
arch/x86/kernel/fpu/core.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index d7ae684adbad..9cb800918b6d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -194,8 +194,6 @@ void fpu_reset_from_exception_fixup(void)
}
#if IS_ENABLED(CONFIG_KVM)
-static void __fpstate_reset(struct fpstate *fpstate, u64 xfd);
-
static void fpu_lock_guest_permissions(struct fpu_guest *gfpu)
{
struct fpu_state_perm *fpuperm;
@@ -219,19 +217,22 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
struct fpstate *fpstate;
unsigned int size;
- size = fpu_kernel_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
+ size = fpu_guest_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
+
fpstate = vzalloc(size);
if (!fpstate)
return false;
- /* Leave xfd to 0 (the reset value defined by spec) */
- __fpstate_reset(fpstate, 0);
+ fpstate->size = fpu_guest_cfg.default_size;
+ fpstate->xfeatures = fpu_guest_cfg.default_features;
+ fpstate->user_size = fpu_user_cfg.default_size;
+ fpstate->user_xfeatures = fpu_user_cfg.default_features;
fpstate_init_user(fpstate);
fpstate->is_valloc = true;
fpstate->is_guest = true;
gfpu->fpstate = fpstate;
- gfpu->xfeatures = fpu_kernel_cfg.default_features;
+ gfpu->xfeatures = fpu_guest_cfg.default_features;
/*
* KVM sets the FP+SSE bits in the XSAVE header when copying FPU state
--
2.46.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 08/10] x86/fpu/xstate: Add CET supervisor xfeature support
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
` (6 preceding siblings ...)
2025-03-07 16:41 ` [PATCH v3 07/10] x86/fpu/xstate: Initialize guest fpstate with fpu_guest_config Chao Gao
@ 2025-03-07 16:41 ` Chao Gao
2025-03-07 18:39 ` Dave Hansen
2025-03-07 16:41 ` [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set Chao Gao
` (2 subsequent siblings)
10 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
Maxim Levitsky
From: Yang Weijiang <weijiang.yang@intel.com>
To support CET virtualization, KVM needs the kernel to save/restore CET
supervisor xstate in guest FPUs when switching between guest and host
FPUs.
Add CET supervisor xstate (i.e., XFEATURE_CET_KERNEL) support. Both the
guest FPU and the kernel FPU will allocate memory for the new xstate.
For the guest FPU, the xstate remains unused until the upcoming CET
virtualization is added to KVM. For the kernel FPU, the xstate is unused
until CET_S is enabled within the kernel.
Note CET_S may or may not be enabled within the kernel, so always
allocating memory for XFEATURE_CET_KERNEL could potentially waste some
XSAVE buffer space. If necessary, this issue can be addressed by making
XFEATURE_CET_KERNEL a guest-only feature.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
arch/x86/include/asm/fpu/types.h | 14 ++++++++++++--
arch/x86/include/asm/fpu/xstate.h | 6 +++---
arch/x86/kernel/fpu/xstate.c | 6 +++++-
3 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index d9515d7f65e4..eb034b7ab8c0 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -118,7 +118,7 @@ enum xfeature {
XFEATURE_PKRU,
XFEATURE_PASID,
XFEATURE_CET_USER,
- XFEATURE_CET_KERNEL_UNUSED,
+ XFEATURE_CET_KERNEL,
XFEATURE_RSRVD_COMP_13,
XFEATURE_RSRVD_COMP_14,
XFEATURE_LBR,
@@ -141,7 +141,7 @@ enum xfeature {
#define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
#define XFEATURE_MASK_PASID (1 << XFEATURE_PASID)
#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER)
-#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED)
+#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL)
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
@@ -266,6 +266,16 @@ struct cet_user_state {
u64 user_ssp;
};
+/*
+ * State component 12 is Control-flow Enforcement supervisor states
+ */
+struct cet_supervisor_state {
+ /* supervisor ssp pointers */
+ u64 pl0_ssp;
+ u64 pl1_ssp;
+ u64 pl2_ssp;
+};
+
/*
* State component 15: Architectural LBR configuration state.
* The size of Arch LBR state depends on the number of LBRs (lbr_depth).
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 7f39fe7980c5..8990cf381bef 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -47,7 +47,8 @@
/* All currently supported supervisor features */
#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \
- XFEATURE_MASK_CET_USER)
+ XFEATURE_MASK_CET_USER | \
+ XFEATURE_MASK_CET_KERNEL)
/*
* A supervisor state component may not always contain valuable information,
@@ -74,8 +75,7 @@
* Unsupported supervisor features. When a supervisor feature in this mask is
* supported in the future, move it to the supported supervisor feature mask.
*/
-#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \
- XFEATURE_MASK_CET_KERNEL)
+#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT)
/* All supervisor states including supported and unsupported states. */
#define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 58325b3b8914..12613ebdbb5d 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -55,7 +55,7 @@ static const char *xfeature_names[] =
"Protection Keys User registers",
"PASID state",
"Control-flow User registers",
- "Control-flow Kernel registers (unused)",
+ "Control-flow Kernel registers",
"unknown xstate feature",
"unknown xstate feature",
"unknown xstate feature",
@@ -78,6 +78,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_PKRU] = X86_FEATURE_OSPKE,
[XFEATURE_PASID] = X86_FEATURE_ENQCMD,
[XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
+ [XFEATURE_CET_KERNEL] = X86_FEATURE_SHSTK,
[XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
[XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
};
@@ -283,6 +284,7 @@ static void __init print_xstate_features(void)
print_xstate_feature(XFEATURE_MASK_PKRU);
print_xstate_feature(XFEATURE_MASK_PASID);
print_xstate_feature(XFEATURE_MASK_CET_USER);
+ print_xstate_feature(XFEATURE_MASK_CET_KERNEL);
print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
}
@@ -352,6 +354,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_PASID | \
XFEATURE_MASK_CET_USER | \
+ XFEATURE_MASK_CET_KERNEL | \
XFEATURE_MASK_XTILE)
/*
@@ -552,6 +555,7 @@ static bool __init check_xstate_against_struct(int nr)
case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state);
case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
+ case XFEATURE_CET_KERNEL: return XCHECK_SZ(sz, nr, struct cet_supervisor_state);
case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
default:
XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
--
2.46.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
` (7 preceding siblings ...)
2025-03-07 16:41 ` [PATCH v3 08/10] x86/fpu/xstate: Add CET supervisor xfeature support Chao Gao
@ 2025-03-07 16:41 ` Chao Gao
2025-03-09 22:06 ` Chang S. Bae
2025-03-07 16:41 ` [PATCH v3 10/10] x86/fpu/xstate: Warn if CET supervisor state is detected in normal fpstate Chao Gao
2025-03-07 19:09 ` [PATCH v3 00/10] Introduce CET supervisor state support Dave Hansen
10 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
From: Yang Weijiang <weijiang.yang@intel.com>
Define a new XFEATURE_MASK_KERNEL_DYNAMIC mask to specify the features
that can be optionally enabled by kernel components. This is similar to
XFEATURE_MASK_USER_DYNAMIC in that it contains optional xfeatures that
can allows the FPU buffer to be dynamically sized. The difference is that
the KERNEL variant contains supervisor features and will be enabled by
kernel components that need them, and not directly by the user. Currently
it's used by KVM to configure guest dedicated fpstate for calculating
the xfeature and fpstate storage size etc.
Kernel dynamic features are enabled for the guest FPU and disabled for
the kernel FPU, effectively making them guest-only features.
Set XFEATURE_CET_KERNEL as the first kernel dynamic feature, as it is
required only by the guest FPU for the upcoming CET virtualization
support in KVM.
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
I am tempted to rename XFEATURE_MASK_KERNEL_DYNAMIC to
XFEATURE_MASK_GUEST_ONLY. But I am not sure if this was discussed
and rejected.
---
arch/x86/include/asm/fpu/xstate.h | 5 ++++-
arch/x86/kernel/fpu/xstate.c | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 8990cf381bef..f342715d204b 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -42,9 +42,12 @@
#define XFEATURE_MASK_USER_RESTORE \
(XFEATURE_MASK_USER_SUPPORTED & ~XFEATURE_MASK_PKRU)
-/* Features which are dynamically enabled for a process on request */
+/* Features which are dynamically enabled per userspace request */
#define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA
+/* Features which are dynamically enabled per kernel side request */
+#define XFEATURE_MASK_KERNEL_DYNAMIC XFEATURE_MASK_CET_KERNEL
+
/* All currently supported supervisor features */
#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \
XFEATURE_MASK_CET_USER | \
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 12613ebdbb5d..e5284e67dfec 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -826,6 +826,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
/* Clean out dynamic features from default */
fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features;
fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
+ fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_KERNEL_DYNAMIC;
fpu_user_cfg.default_features = fpu_user_cfg.max_features;
fpu_user_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
--
2.46.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 10/10] x86/fpu/xstate: Warn if CET supervisor state is detected in normal fpstate
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
` (8 preceding siblings ...)
2025-03-07 16:41 ` [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set Chao Gao
@ 2025-03-07 16:41 ` Chao Gao
2025-03-07 19:09 ` [PATCH v3 00/10] Introduce CET supervisor state support Dave Hansen
10 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-07 16:41 UTC (permalink / raw)
To: chao.gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
From: Yang Weijiang <weijiang.yang@intel.com>
CET supervisor state bit is __ONLY__ enabled for guest fpstate, i.e.,
never for normal kernel fpstate. The bit is set when guest FPU config
is initialized.
For normal fpstate, the bit should have been removed when initializes
kernel FPU config settings, WARN_ONCE() if kernel detects normal fpstate
xfeatures contains CET supervisor state bit before xsaves operation.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
arch/x86/kernel/fpu/xstate.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index aa16f1a1bbcf..3df135a7d8bd 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -209,6 +209,8 @@ static inline void os_xsave(struct fpstate *fpstate)
WARN_ON_FPU(!alternatives_patched);
xfd_validate_state(fpstate, mask, false);
+ WARN_ON_FPU(!fpstate->is_guest && (mask & XFEATURE_MASK_CET_KERNEL));
+
XSTATE_XSAVE(&fpstate->regs.xsave, lmask, hmask, err);
/* We should never fault when copying to a kernel buffer: */
--
2.46.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v3 03/10] x86/fpu/xstate: Correct xfeatures cache in guest pseudo fpu container
2025-03-07 16:41 ` [PATCH v3 03/10] x86/fpu/xstate: Correct xfeatures cache in guest pseudo fpu container Chao Gao
@ 2025-03-07 17:48 ` Dave Hansen
2025-03-08 2:44 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Dave Hansen @ 2025-03-07 17:48 UTC (permalink / raw)
To: Chao Gao, tglx, x86, seanjc, pbonzini, linux-kernel, kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/7/25 08:41, Chao Gao wrote:
> The xfeatures field in struct fpu_guest is designed to track the enabled
> xfeatures for guest FPUs. However, during allocation in
> fpu_alloc_guest_fpstate(), gfpu->xfeatures is initialized to
> fpu_user_cfg.default_features, while the corresponding
> fpstate->xfeatures is set to fpu_kernel_cfg.default_features
>
> Correct the mismatch to avoid confusion.
>
> Note this mismatch does not cause any functional issues. The
> gfpu->xfeatures is checked in fpu_enable_guest_xfd_features() to
> verify if XFD features are already enabled:
>
> xfeatures &= ~guest_fpu->xfeatures;
> if (!xfeatures)
> return 0;
>
> It gets updated in fpstate_realloc() after enabling some XFD features:
>
> guest_fpu->xfeatures |= xfeatures;
>
> So, backport is not needed.
I don't have any great suggestions for improving this, but I just don't
seem to find this changelog compelling. I can't put my finger on it, though.
I think I'd find it more convincing if you argued what the *CORRECT*
value is and why rather than just arguing for consistency with a random
value. I also don't get the pivot over the XFD for explaining why it is
harmless. XFD isn't even used in most cases, so I'd find a justification
separate from XFD more compelling.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-07 16:41 ` [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation Chao Gao
@ 2025-03-07 17:53 ` Dave Hansen
2025-03-08 2:56 ` Chao Gao
2025-03-07 21:37 ` Chang S. Bae
1 sibling, 1 reply; 40+ messages in thread
From: Dave Hansen @ 2025-03-07 17:53 UTC (permalink / raw)
To: Chao Gao, tglx, x86, seanjc, pbonzini, linux-kernel, kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/7/25 08:41, Chao Gao wrote:
> Note that this issue does not cause any functional problems because the
> guest fpstate is allocated using vmalloc(), which aligns the size to a
> full page, providing enough space for all existing supervisor components.
> On Emerald Rapids CPUs, the guest fpstate after this correction is ~2880
> bytes.
How about we move up the fpstate pointers at allocation time so they
just scrape the end of the vmalloc buffer? Basically, move the
page-alignment padding to the beginning of the first page instead of the
end of the last page.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 05/10] x86/fpu/xstate: Introduce guest FPU configuration
2025-03-07 16:41 ` [PATCH v3 05/10] x86/fpu/xstate: Introduce guest FPU configuration Chao Gao
@ 2025-03-07 18:06 ` Dave Hansen
2025-03-08 3:00 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Dave Hansen @ 2025-03-07 18:06 UTC (permalink / raw)
To: Chao Gao, tglx, x86, seanjc, pbonzini, linux-kernel, kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
Maxim Levitsky
On 3/7/25 08:41, Chao Gao wrote:
> + fpu_guest_cfg.max_features = fpu_kernel_cfg.max_features;
> + fpu_guest_cfg.default_features = fpu_guest_cfg.max_features;
> + fpu_guest_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
Is just copying and pasting the 'fpu_kernel_cfg' initialization really
the best way to explain what's going on?
Let's say you just did:
fpu_guest_cfg = fpu_kernel_cfg;
?
That would explicitly tell the reader that they're equal.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 06/10] x86/fpu/xstate: Initialize guest perm with fpu_guest_cfg
2025-03-07 16:41 ` [PATCH v3 06/10] x86/fpu/xstate: Initialize guest perm with fpu_guest_cfg Chao Gao
@ 2025-03-07 18:14 ` Dave Hansen
2025-03-08 3:14 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Dave Hansen @ 2025-03-07 18:14 UTC (permalink / raw)
To: Chao Gao, tglx, x86, seanjc, pbonzini, linux-kernel, kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/7/25 08:41, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Use the new fpu_guest_cfg to initialize guest permissions.
Background, please.
What are the guest permissions currently set to? Why does it need to change?
> Note fpu_guest_cfg and fpu_kernel_cfg remain the same for now. So there
> is no functional change.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> [Gao Chao: Extrace this from the previous patch ]
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> arch/x86/kernel/fpu/core.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index b0c1ef40d105..d7ae684adbad 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -534,8 +534,15 @@ void fpstate_reset(struct fpu *fpu)
> fpu->perm.__state_perm = fpu_kernel_cfg.default_features;
> fpu->perm.__state_size = fpu_kernel_cfg.default_size;
> fpu->perm.__user_state_size = fpu_user_cfg.default_size;
> - /* Same defaults for guests */
> - fpu->guest_perm = fpu->perm;
> +
> + /* Guest permission settings */
> + fpu->guest_perm.__state_perm = fpu_guest_cfg.default_features;
> + fpu->guest_perm.__state_size = fpu_guest_cfg.default_size;
> + /*
> + * Set guest's __user_state_size to fpu_user_cfg.default_size so that
> + * existing uAPIs can still work.
> + */
I suspect that readers here will understand that this line:
> + fpu->guest_perm.__user_state_size = fpu_user_cfg.default_size;
> }
means "Set guest's __user_state_size to fpu_user_cfg.default_size". The
comment basically just literally restates what the code does. That part
of the comment doesn't add value.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 08/10] x86/fpu/xstate: Add CET supervisor xfeature support
2025-03-07 16:41 ` [PATCH v3 08/10] x86/fpu/xstate: Add CET supervisor xfeature support Chao Gao
@ 2025-03-07 18:39 ` Dave Hansen
2025-03-08 3:24 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Dave Hansen @ 2025-03-07 18:39 UTC (permalink / raw)
To: Chao Gao, tglx, x86, seanjc, pbonzini, linux-kernel, kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
Maxim Levitsky
On 3/7/25 08:41, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> To support CET virtualization, KVM needs the kernel to save/restore CET
> supervisor xstate in guest FPUs when switching between guest and host
> FPUs.
>
> Add CET supervisor xstate (i.e., XFEATURE_CET_KERNEL) support. Both the
> guest FPU and the kernel FPU will allocate memory for the new xstate.
> For the guest FPU, the xstate remains unused until the upcoming CET
> virtualization is added to KVM. For the kernel FPU, the xstate is unused
> until CET_S is enabled within the kernel.
>
> Note CET_S may or may not be enabled within the kernel, so always
> allocating memory for XFEATURE_CET_KERNEL could potentially waste some
> XSAVE buffer space. If necessary, this issue can be addressed by making
> XFEATURE_CET_KERNEL a guest-only feature.
I feel like these changelogs are long but say very little.
This patch *WASTES* resources. Granted, it's only for a single patch,
but it's totally not obvious.
Could you work on tightening down the changelog, please?
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -55,7 +55,7 @@ static const char *xfeature_names[] =
> "Protection Keys User registers",
> "PASID state",
> "Control-flow User registers",
> - "Control-flow Kernel registers (unused)",
> + "Control-flow Kernel registers",
This should probably be:
> + "Control-flow Kernel registers (KVM only)",
or something similar for now. XFEATURE_CET_KERNEL is *VERY* different
from all of the other features and it's silly to pretend that it's the same.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 07/10] x86/fpu/xstate: Initialize guest fpstate with fpu_guest_config
2025-03-07 16:41 ` [PATCH v3 07/10] x86/fpu/xstate: Initialize guest fpstate with fpu_guest_config Chao Gao
@ 2025-03-07 18:41 ` Dave Hansen
2025-03-08 3:38 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Dave Hansen @ 2025-03-07 18:41 UTC (permalink / raw)
To: Chao Gao, tglx, x86, seanjc, pbonzini, linux-kernel, kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
Maxim Levitsky
On 3/7/25 08:41, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Use fpu_guest_cfg to initialize the guest fpstate and the guest FPU pseduo
> container.
>
> The user_* fields remain unchanged for compatibility with KVM uAPIs.
>
> Inline the logic of __fpstate_reset() to directly utilize fpu_guest_cfg.
Why? Seriously, why? Why would you just inline it? Could you please
revisit the moment when you decided to do this? Please go back to that
moment and try to unlearn whatever propensity you have for taking this path.
There are two choices: make the existing function work for guests, or
add a new guest-only reset function.
Just an an example:
static void __fpstate_reset(struct fpstate *fpstate,
struct fpu_state_config *kernel_cfg,
u64 xfd)
{
/* Initialize sizes and feature masks */
fpstate->size = kernel_cfg->default_size;
fpstate->xfeatures = kernel_cfg->default_features;
/* Some comment about why user states don't vary */
fpstate->user_size = fpu_user_cfg.default_size;
fpstate->user_xfeatures = fpu_user_cfg.default_features;
fpstate->xfd = xfd;
}
Then you have two call sites:
__fpstate_reset(fpstate, &fpu_guest_cfg, 0);
and
__fpstate_reset(fpu->fpstate, &fpu_kernel_cfg,
init_fpstate.xfd);
What does this tell you?
It clearly lays out that to reset an fpstate, you need a specific kernel
config. That kernel config is (can be) different for guests.
Refactoring the code as you go along is not optional. It's a requirement.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 00/10] Introduce CET supervisor state support
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
` (9 preceding siblings ...)
2025-03-07 16:41 ` [PATCH v3 10/10] x86/fpu/xstate: Warn if CET supervisor state is detected in normal fpstate Chao Gao
@ 2025-03-07 19:09 ` Dave Hansen
2025-03-18 15:24 ` Chao Gao
10 siblings, 1 reply; 40+ messages in thread
From: Dave Hansen @ 2025-03-07 19:09 UTC (permalink / raw)
To: Chao Gao, tglx, x86, seanjc, pbonzini, linux-kernel, kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/7/25 08:41, Chao Gao wrote:
> case |IA32_XSS[12] | Space | RFBM[12] | Drop%
> -----+-------------+-------+----------+------
> 1 | 0 | None | 0 | 0.0%
> 2 | 1 | None | 0 | 0.2%
> 3 | 1 | 24B? | 1 | 0.2%
So, 0.2% is still, what, dozens of cycles? Are you sure that it really
takes the CPU dozens of cycles to skip over the feature during XSAVE?
If it really turns out to be this measurable, we should probably follow
up with the folks that implement XSAVE and see what's going on under the
covers.
On a separate note, I was bugging Thomas a bit on IRC. His memory was
that the AMX-era FPU rework only expected KVM to support user features.
You might want to dig through the history a bit and see if _that_ was
ever properly addressed because that would change the problem you're
trying to solve.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-07 16:41 ` [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation Chao Gao
2025-03-07 17:53 ` Dave Hansen
@ 2025-03-07 21:37 ` Chang S. Bae
2025-03-08 2:49 ` Chao Gao
1 sibling, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-03-07 21:37 UTC (permalink / raw)
To: Chao Gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/7/2025 8:41 AM, Chao Gao wrote:
>
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 6166a928d3f5..adc34914634e 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -218,7 +218,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
> struct fpstate *fpstate;
> unsigned int size;
>
> - size = fpu_user_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
> + size = fpu_kernel_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
> fpstate = vzalloc(size);
> if (!fpstate)
> return false;
BTW, did you ever base this series on the tip/master branch? The fix has
already been merged there:
1937e18cc3cf ("x86/fpu: Fix guest FPU state buffer allocation size")
Thanks,
Chang
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 03/10] x86/fpu/xstate: Correct xfeatures cache in guest pseudo fpu container
2025-03-07 17:48 ` Dave Hansen
@ 2025-03-08 2:44 ` Chao Gao
0 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-08 2:44 UTC (permalink / raw)
To: Dave Hansen
Cc: tglx, x86, seanjc, pbonzini, linux-kernel, kvm, peterz,
rick.p.edgecombe, weijiang.yang, john.allen, bp
On Fri, Mar 07, 2025 at 09:48:25AM -0800, Dave Hansen wrote:
>On 3/7/25 08:41, Chao Gao wrote:
>> The xfeatures field in struct fpu_guest is designed to track the enabled
>> xfeatures for guest FPUs. However, during allocation in
>> fpu_alloc_guest_fpstate(), gfpu->xfeatures is initialized to
>> fpu_user_cfg.default_features, while the corresponding
>> fpstate->xfeatures is set to fpu_kernel_cfg.default_features
>>
>> Correct the mismatch to avoid confusion.
>>
>> Note this mismatch does not cause any functional issues. The
>> gfpu->xfeatures is checked in fpu_enable_guest_xfd_features() to
>> verify if XFD features are already enabled:
>>
>> xfeatures &= ~guest_fpu->xfeatures;
>> if (!xfeatures)
>> return 0;
>>
>> It gets updated in fpstate_realloc() after enabling some XFD features:
>>
>> guest_fpu->xfeatures |= xfeatures;
>>
>> So, backport is not needed.
>
>I don't have any great suggestions for improving this, but I just don't
>seem to find this changelog compelling. I can't put my finger on it, though.
>
>I think I'd find it more convincing if you argued what the *CORRECT*
>value is and why rather than just arguing for consistency with a random
>value. I also don't get the pivot over the XFD for explaining why it is
fpstate->xfeatures isn't a random value. It is the RFBM, right? see os_xsave().
The xfeatures in the guest FPU pesudo container (gfpu->xfeatures) is to track
enabled xfeatures of the guest FPU. I think "enabled" refers to RFBM because
only enabled features need save/restore. so gfpu->xfeatures should be
consistent with fpstate->xfeatures.
They become misaligned during allocation. Specifically, gfpu->xfeatures does
not track any supervisor features. Excluding all _supervisor_ features is
harmless, as the value is solely used to check if XFD features, which are all
_user_ features, are already enabled in fpu_enable_guest_xfd_features(). It
just causes confusion.
>harmless. XFD isn't even used in most cases, so I'd find a justification
>separate from XFD more compelling.
>
To me, there is a discrepancy between the field's name and the value it holds.
We have two options to fix it:
1. rename @xfeatures in struct fpu_guest to @user_xfeatures and update the
comment above to state the field only tracks enabled _user_ features.
2. ensure @xfeatures in struct fpu_guest matches fpstate->xfeatures
this patch implements the option #2.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-07 21:37 ` Chang S. Bae
@ 2025-03-08 2:49 ` Chao Gao
2025-03-09 22:06 ` Chang S. Bae
0 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-08 2:49 UTC (permalink / raw)
To: Chang S. Bae
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On Fri, Mar 07, 2025 at 01:37:15PM -0800, Chang S. Bae wrote:
>On 3/7/2025 8:41 AM, Chao Gao wrote:
>>
>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
>> index 6166a928d3f5..adc34914634e 100644
>> --- a/arch/x86/kernel/fpu/core.c
>> +++ b/arch/x86/kernel/fpu/core.c
>> @@ -218,7 +218,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
>> struct fpstate *fpstate;
>> unsigned int size;
>> - size = fpu_user_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
>> + size = fpu_kernel_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
>> fpstate = vzalloc(size);
>> if (!fpstate)
>> return false;
>
>BTW, did you ever base this series on the tip/master branch? The fix has
>already been merged there:
>
> 1937e18cc3cf ("x86/fpu: Fix guest FPU state buffer allocation size")
Thanks for the information. I will remove this patch.
This series is currently based on 6.14-rc5. I should have used the tip/master
branch as the base and will do so next time.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-07 17:53 ` Dave Hansen
@ 2025-03-08 2:56 ` Chao Gao
0 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-08 2:56 UTC (permalink / raw)
To: Dave Hansen
Cc: tglx, x86, seanjc, pbonzini, linux-kernel, kvm, peterz,
rick.p.edgecombe, weijiang.yang, john.allen, bp
On Fri, Mar 07, 2025 at 09:53:40AM -0800, Dave Hansen wrote:
>On 3/7/25 08:41, Chao Gao wrote:
>> Note that this issue does not cause any functional problems because the
>> guest fpstate is allocated using vmalloc(), which aligns the size to a
>> full page, providing enough space for all existing supervisor components.
>> On Emerald Rapids CPUs, the guest fpstate after this correction is ~2880
>> bytes.
>
>How about we move up the fpstate pointers at allocation time so they
>just scrape the end of the vmalloc buffer? Basically, move the
>page-alignment padding to the beginning of the first page instead of the
>end of the last page.
That sounds like a good way to detect similar errors and might be helpful for
all other vmalloc'ed buffers.
I can try to implement this for the fpstate pointers. The patch will be put
at the end of the series or even in a separate series.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 05/10] x86/fpu/xstate: Introduce guest FPU configuration
2025-03-07 18:06 ` Dave Hansen
@ 2025-03-08 3:00 ` Chao Gao
0 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-08 3:00 UTC (permalink / raw)
To: Dave Hansen
Cc: tglx, x86, seanjc, pbonzini, linux-kernel, kvm, peterz,
rick.p.edgecombe, weijiang.yang, john.allen, bp, Maxim Levitsky
On Fri, Mar 07, 2025 at 10:06:56AM -0800, Dave Hansen wrote:
>On 3/7/25 08:41, Chao Gao wrote:
>> + fpu_guest_cfg.max_features = fpu_kernel_cfg.max_features;
>> + fpu_guest_cfg.default_features = fpu_guest_cfg.max_features;
>> + fpu_guest_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
>
>Is just copying and pasting the 'fpu_kernel_cfg' initialization really
>the best way to explain what's going on?
>
>Let's say you just did:
>
> fpu_guest_cfg = fpu_kernel_cfg;
>
>?
>
>That would explicitly tell the reader that they're equal.
>
Yes. This is much better.
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 06/10] x86/fpu/xstate: Initialize guest perm with fpu_guest_cfg
2025-03-07 18:14 ` Dave Hansen
@ 2025-03-08 3:14 ` Chao Gao
0 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-08 3:14 UTC (permalink / raw)
To: Dave Hansen
Cc: tglx, x86, seanjc, pbonzini, linux-kernel, kvm, peterz,
rick.p.edgecombe, weijiang.yang, john.allen, bp
On Fri, Mar 07, 2025 at 10:14:19AM -0800, Dave Hansen wrote:
>On 3/7/25 08:41, Chao Gao wrote:
>> From: Yang Weijiang <weijiang.yang@intel.com>
>>
>> Use the new fpu_guest_cfg to initialize guest permissions.
>
>Background, please.
>
>What are the guest permissions currently set to? Why does it need to change?
Ok. Will add:
Currently, fpu->guest_perm is copied from fpu->perm, which is derived from
fpu_kernel_cfg. To separate the guest FPU from the kernel FPU, switch to use
the new fpu_guest_cfg to initialize guest permissions. This ensures that any
future changes to fpu_guest_cfg will automatically update the guest permissions
The __user_state_size is tied to existing uAPIs, so it remains unchanged.
>
>> Note fpu_guest_cfg and fpu_kernel_cfg remain the same for now. So there
>> is no functional change.
>>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> [Gao Chao: Extrace this from the previous patch ]
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> arch/x86/kernel/fpu/core.c | 11 +++++++++--
>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
>> index b0c1ef40d105..d7ae684adbad 100644
>> --- a/arch/x86/kernel/fpu/core.c
>> +++ b/arch/x86/kernel/fpu/core.c
>> @@ -534,8 +534,15 @@ void fpstate_reset(struct fpu *fpu)
>> fpu->perm.__state_perm = fpu_kernel_cfg.default_features;
>> fpu->perm.__state_size = fpu_kernel_cfg.default_size;
>> fpu->perm.__user_state_size = fpu_user_cfg.default_size;
>> - /* Same defaults for guests */
>> - fpu->guest_perm = fpu->perm;
>> +
>> + /* Guest permission settings */
>> + fpu->guest_perm.__state_perm = fpu_guest_cfg.default_features;
>> + fpu->guest_perm.__state_size = fpu_guest_cfg.default_size;
>> + /*
>> + * Set guest's __user_state_size to fpu_user_cfg.default_size so that
>> + * existing uAPIs can still work.
>> + */
>
>I suspect that readers here will understand that this line:
>
>> + fpu->guest_perm.__user_state_size = fpu_user_cfg.default_size;
>> }
>
>means "Set guest's __user_state_size to fpu_user_cfg.default_size". The
>comment basically just literally restates what the code does. That part
>of the comment doesn't add value.
Will drop this useless comment.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 08/10] x86/fpu/xstate: Add CET supervisor xfeature support
2025-03-07 18:39 ` Dave Hansen
@ 2025-03-08 3:24 ` Chao Gao
0 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-08 3:24 UTC (permalink / raw)
To: Dave Hansen
Cc: tglx, x86, seanjc, pbonzini, linux-kernel, kvm, peterz,
rick.p.edgecombe, weijiang.yang, john.allen, bp, Maxim Levitsky
On Fri, Mar 07, 2025 at 10:39:47AM -0800, Dave Hansen wrote:
>On 3/7/25 08:41, Chao Gao wrote:
>> From: Yang Weijiang <weijiang.yang@intel.com>
>>
>> To support CET virtualization, KVM needs the kernel to save/restore CET
>> supervisor xstate in guest FPUs when switching between guest and host
>> FPUs.
>>
>> Add CET supervisor xstate (i.e., XFEATURE_CET_KERNEL) support. Both the
>> guest FPU and the kernel FPU will allocate memory for the new xstate.
>> For the guest FPU, the xstate remains unused until the upcoming CET
>> virtualization is added to KVM. For the kernel FPU, the xstate is unused
>> until CET_S is enabled within the kernel.
>>
>> Note CET_S may or may not be enabled within the kernel, so always
>> allocating memory for XFEATURE_CET_KERNEL could potentially waste some
>> XSAVE buffer space. If necessary, this issue can be addressed by making
>> XFEATURE_CET_KERNEL a guest-only feature.
>
>I feel like these changelogs are long but say very little.
>
>This patch *WASTES* resources. Granted, it's only for a single patch,
>but it's totally not obvious.
>
>Could you work on tightening down the changelog, please?
ok. will update the changelog to:
To support CET virtualization, KVM needs the kernel to save and restore the CET
supervisor xstate in guest FPUs when switching between guest and host FPUs.
Add CET supervisor xstate support in preparation for the upcoming CET
virtualization in KVM.
Currently, kernel FPUs will not utilize the CET supervisor xstate, resulting in
some wasted XSAVE buffer space (24 Bytes) for all kernel FPUs.
>
>> --- a/arch/x86/kernel/fpu/xstate.c
>> +++ b/arch/x86/kernel/fpu/xstate.c
>> @@ -55,7 +55,7 @@ static const char *xfeature_names[] =
>> "Protection Keys User registers",
>> "PASID state",
>> "Control-flow User registers",
>> - "Control-flow Kernel registers (unused)",
>> + "Control-flow Kernel registers",
>
>This should probably be:
>
>> + "Control-flow Kernel registers (KVM only)",
>
>or something similar for now. XFEATURE_CET_KERNEL is *VERY* different
>from all of the other features and it's silly to pretend that it's the same.
Agreed. Should "KVM only" tag be added in the next patch, where CET supervisor
xstate becomes a guest-only feature?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 07/10] x86/fpu/xstate: Initialize guest fpstate with fpu_guest_config
2025-03-07 18:41 ` Dave Hansen
@ 2025-03-08 3:38 ` Chao Gao
0 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-08 3:38 UTC (permalink / raw)
To: Dave Hansen
Cc: tglx, x86, seanjc, pbonzini, linux-kernel, kvm, peterz,
rick.p.edgecombe, weijiang.yang, john.allen, bp, Maxim Levitsky
On Fri, Mar 07, 2025 at 10:41:49AM -0800, Dave Hansen wrote:
>On 3/7/25 08:41, Chao Gao wrote:
>> From: Yang Weijiang <weijiang.yang@intel.com>
>>
>> Use fpu_guest_cfg to initialize the guest fpstate and the guest FPU pseduo
>> container.
>>
>> The user_* fields remain unchanged for compatibility with KVM uAPIs.
>>
>> Inline the logic of __fpstate_reset() to directly utilize fpu_guest_cfg.
>
>Why? Seriously, why? Why would you just inline it? Could you please
>revisit the moment when you decided to do this? Please go back to that
>moment and try to unlearn whatever propensity you have for taking this path.
Thanks for this suggestion.
>
>There are two choices: make the existing function work for guests, or
>add a new guest-only reset function.
>
>Just an an example:
>
>static void __fpstate_reset(struct fpstate *fpstate,
> struct fpu_state_config *kernel_cfg,
> u64 xfd)
>{
> /* Initialize sizes and feature masks */
> fpstate->size = kernel_cfg->default_size;
> fpstate->xfeatures = kernel_cfg->default_features;
>
> /* Some comment about why user states don't vary */
> fpstate->user_size = fpu_user_cfg.default_size;
> fpstate->user_xfeatures = fpu_user_cfg.default_features;
>
> fpstate->xfd = xfd;
>}
>
>Then you have two call sites:
>
> __fpstate_reset(fpstate, &fpu_guest_cfg, 0);
>and
> __fpstate_reset(fpu->fpstate, &fpu_kernel_cfg,
> init_fpstate.xfd);
>
>What does this tell you?
>
>It clearly lays out that to reset an fpstate, you need a specific kernel
>config. That kernel config is (can be) different for guests.
>
>Refactoring the code as you go along is not optional. It's a requirement.
Got it. I was actually tempted to refactor __fpstate_reset() while preparing
the v3. I considered two options:
1. Move "fpstate->is_guest = true" before calling __fpstate_reset() and use it
within the function to select the right config.
2. Add a boolean parameter to __fpstate_reset() to indicate whether it is
operating on a guest fpstate.
I dislike both of them. So I gave up and left it as-is.
Your version looks good. I will incorporate it in the next version.
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-08 2:49 ` Chao Gao
@ 2025-03-09 22:06 ` Chang S. Bae
2025-03-10 1:33 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-03-09 22:06 UTC (permalink / raw)
To: Chao Gao
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/7/2025 6:49 PM, Chao Gao wrote:
> On Fri, Mar 07, 2025 at 01:37:15PM -0800, Chang S. Bae wrote:
>> On 3/7/2025 8:41 AM, Chao Gao wrote:
>>>
>>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
>>> index 6166a928d3f5..adc34914634e 100644
>>> --- a/arch/x86/kernel/fpu/core.c
>>> +++ b/arch/x86/kernel/fpu/core.c
>>> @@ -218,7 +218,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
>>> struct fpstate *fpstate;
>>> unsigned int size;
>>> - size = fpu_user_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
>>> + size = fpu_kernel_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
>>> fpstate = vzalloc(size);
>>> if (!fpstate)
>>> return false;
>>
>> BTW, did you ever base this series on the tip/master branch? The fix has
>> already been merged there:
>>
>> 1937e18cc3cf ("x86/fpu: Fix guest FPU state buffer allocation size")
>
> Thanks for the information. I will remove this patch.
But, I think there is a fallout that someone should follow up:
The merged patch ensures size consistency between
fpu_alloc_guest_fpstate() and fpstate_realloc(), maintaining a
consistent reference to the kernel buffer size. However, within
fpu_alloc_guest_fpstate(), fpu_guest->xfeatures should also be adjusted
accordingly for consistency. Instead of referencing fpu_user_cfg, it
should reference fpu_kernel_cfg.
Thanks,
CHang
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set
2025-03-07 16:41 ` [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set Chao Gao
@ 2025-03-09 22:06 ` Chang S. Bae
2025-03-10 3:49 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-03-09 22:06 UTC (permalink / raw)
To: Chao Gao, tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel,
kvm
Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/7/2025 8:41 AM, Chao Gao wrote:
>
> Define a new XFEATURE_MASK_KERNEL_DYNAMIC mask to specify the features
> that can be optionally enabled by kernel components. This is similar to
> XFEATURE_MASK_USER_DYNAMIC in that it contains optional xfeatures that
> can allows the FPU buffer to be dynamically sized. The difference is that
> the KERNEL variant contains supervisor features and will be enabled by
> kernel components that need them, and not directly by the user. Currently
> it's used by KVM to configure guest dedicated fpstate for calculating
> the xfeature and fpstate storage size etc.
>
> Kernel dynamic features are enabled for the guest FPU and disabled for
> the kernel FPU, effectively making them guest-only features.
>
> Set XFEATURE_CET_KERNEL as the first kernel dynamic feature, as it is
> required only by the guest FPU for the upcoming CET virtualization
> support in KVM.
When introducing user dynamic features, AMX required a large state, so
buffer reallocation for expansion was deferred until it was actually
used. This introduction was associated with introducing a permission
mechanism, which was expected to be requested by userspace.
For VCPU tasks, the userspace component (QEMU) requests permission [1],
and buffer expansion then follows based on the exposed CPUID
determination [2].
Now, regarding the new kernel dynamic features, I’m unsure whether this
changelog or anything else sufficiently describes its semantics
distintively. It appears that both permission grant and buffer
allocation for the kernel dynamic feature occur at VCPU allocation time.
However, this model differs from the deferred buffer expansion model for
user dynamic features.
If the kernel dynamic feature model were to follow the same deferred
reallocation approach as user dynamic features, buffer reallocation
would be expected. In that case, I'd also question whether fpu_guest_cfg
is truly necessary.
VCPU allocation could still rely on fpu_kernel_cfg, and fpu->guest_perm
could be extrapolated from fpu->perm or fpu_kernel_cfg. Then,
reallocation could proceed as usual based on the permission, extending
fpu_enable_guest_xfd_features(), possibly renaming it to
fpu_enable_dynamic_features().
That said, this is a relatively small state. Even if the intent was to
introduce a new semantic model distinct from user dynamic features, it
should be clearly documented to avoid confusion.
On the other hand, if the goal is rather to establish a new approach for
handling a previously nonexistent set of guest-exclusive features, then
the current approach remains somewhat convoluted without clear
descriptions. Perhaps, I'm missing something.
Thanks,
Chang
[1] https://github.com/qemu/qemu/blob/master/target/i386/kvm/kvm.c#L6395
[2]
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/cpuid.c#n195
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-09 22:06 ` Chang S. Bae
@ 2025-03-10 1:33 ` Chao Gao
2025-03-10 5:21 ` Chang S. Bae
0 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-10 1:33 UTC (permalink / raw)
To: Chang S. Bae
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On Sun, Mar 09, 2025 at 03:06:19PM -0700, Chang S. Bae wrote:
>On 3/7/2025 6:49 PM, Chao Gao wrote:
>> On Fri, Mar 07, 2025 at 01:37:15PM -0800, Chang S. Bae wrote:
>> > On 3/7/2025 8:41 AM, Chao Gao wrote:
>> > >
>> > > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
>> > > index 6166a928d3f5..adc34914634e 100644
>> > > --- a/arch/x86/kernel/fpu/core.c
>> > > +++ b/arch/x86/kernel/fpu/core.c
>> > > @@ -218,7 +218,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
>> > > struct fpstate *fpstate;
>> > > unsigned int size;
>> > > - size = fpu_user_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
>> > > + size = fpu_kernel_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
>> > > fpstate = vzalloc(size);
>> > > if (!fpstate)
>> > > return false;
>> >
>> > BTW, did you ever base this series on the tip/master branch? The fix has
>> > already been merged there:
>> >
>> > 1937e18cc3cf ("x86/fpu: Fix guest FPU state buffer allocation size")
>>
>> Thanks for the information. I will remove this patch.
>
>But, I think there is a fallout that someone should follow up:
>
>The merged patch ensures size consistency between fpu_alloc_guest_fpstate()
>and fpstate_realloc(), maintaining a consistent reference to the kernel
>buffer size. However, within fpu_alloc_guest_fpstate(), fpu_guest->xfeatures
>should also be adjusted accordingly for consistency. Instead of referencing
>fpu_user_cfg, it should reference fpu_kernel_cfg.
This is fixed by the patch 3.
>
>Thanks,
>CHang
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set
2025-03-09 22:06 ` Chang S. Bae
@ 2025-03-10 3:49 ` Chao Gao
2025-03-10 5:20 ` Chang S. Bae
0 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-10 3:49 UTC (permalink / raw)
To: Chang S. Bae
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
>When introducing user dynamic features, AMX required a large state, so buffer
>reallocation for expansion was deferred until it was actually used. This
>introduction was associated with introducing a permission mechanism, which
>was expected to be requested by userspace.
>
>For VCPU tasks, the userspace component (QEMU) requests permission [1], and
>buffer expansion then follows based on the exposed CPUID determination [2].
>
>Now, regarding the new kernel dynamic features, I’m unsure whether this
>changelog or anything else sufficiently describes its semantics distintively.
>It appears that both permission grant and buffer allocation for the kernel
>dynamic feature occur at VCPU allocation time. However, this model differs
>from the deferred buffer expansion model for user dynamic features.
>
>If the kernel dynamic feature model were to follow the same deferred
>reallocation approach as user dynamic features, buffer reallocation would be
>expected. In that case, I'd also question whether fpu_guest_cfg is truly
>necessary.
>
>VCPU allocation could still rely on fpu_kernel_cfg, and fpu->guest_perm could
>be extrapolated from fpu->perm or fpu_kernel_cfg. Then, reallocation could
>proceed as usual based on the permission, extending
>fpu_enable_guest_xfd_features(), possibly renaming it to
>fpu_enable_dynamic_features().
>
>That said, this is a relatively small state.
Yes, there's no need to make the guest FPU dynamically sized for the CET
supervisor state, as it is only 24 bytes.
XFEATURE_MASK_KERNEL_DYNAMIC is a misnomer. It is misleading readers into
thinking it involves permission requests and dynamic sizing, similar to
XFEATURE_MASK_USER_DYNAMIC
>Even if the intent was to
>introduce a new semantic model distinct from user dynamic features, it should
>be clearly documented to avoid confusion.
The goal isn't to add a new semantic model for dynamic features.
>
>On the other hand, if the goal is rather to establish a new approach for
>handling a previously nonexistent set of guest-exclusive features, then the
Yes. This is the goal of this patch.
>current approach remains somewhat convoluted without clear descriptions.
>Perhaps, I'm missing something.
Do you mean this patch is "somewhat convoluted"? or the whole series?
I am assuming you meant this series as this patch itself is quite small.
Here is how this series is organized:
Patches 1–4 : Cleanups and preparatory fixes.
Patches 5–7 : Introduce fpu_guest_cfg to formalize guest FPU configuration.
Patch 8 (Primary Goal): Add CET supervisor state support.
Patches 9–10 : make CET supserviosr state a guest-only feature to save XSAVE buffer
space for non-guest FPUs (placed at the end for easier review/drop).
I believe the "somewhat convoluted" impression comes from the introduction of
fpu_guest_cfg. But as I alluded to in patch 5's changelog, fpu_guest_cfg
actually simplifies the architecture rather than adding complexity, with
minimal overhead, i.e., a single global config. It was suggested by Sean [1].
In my view, it offers three benefits:
- Readability: Removes ambiguity in fpu_alloc_guest_fpstate() by initializing
the guest FPU with its own config.
- Extensibility: Supports clean addition of guest-only features (e.g., CET
supervisor state) or potentially kernel-only features (e.g.,
PASID, which is not used by guest FPUs)
- Robustness: Prevent issues like those addressed by patches 3/4.
It is possible to make some features guest-only without fpu_guest_cfg, but
doing so would make fpu_alloc_guest_fpstate() a bit difficult to understand.
See [2].
[1]: https://lore.kernel.org/kvm/ZTf5wPKXuHBQk0AN@google.com/
[2]: https://lore.kernel.org/kvm/20230914063325.85503-8-weijiang.yang@intel.com/
>
>Thanks,
>Chang
>
>[1] https://github.com/qemu/qemu/blob/master/target/i386/kvm/kvm.c#L6395
>[2] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/cpuid.c#n195
Thanks for these references.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set
2025-03-10 3:49 ` Chao Gao
@ 2025-03-10 5:20 ` Chang S. Bae
2025-03-10 5:53 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-03-10 5:20 UTC (permalink / raw)
To: Chao Gao
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/9/2025 8:49 PM, Chao Gao wrote:
>
> It was suggested by Sean [1].
...
> [1]: https://lore.kernel.org/kvm/ZTf5wPKXuHBQk0AN@google.com/
But, you're defining a kernel "dynamic" feature while introducing a
"guest-only" xfeature concept. Both seem to be mixed together with this
patch. Why not call it as a guest-only feature? That's what Sean was
suggesting, no?
"I would much prefer to avoid the whole "dynamic" thing and instead make
CET explicitly guest-only. E.g. fpu_kernel_guest_only_xfeatures? Or
even better if it doesn't cause weirdness elsewhere, a dedicated
fpu_guest_cfg. For me at least, a fpu_guest_cfg would make it easier to
understand what all is going on."
Thanks,
Chang
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-10 1:33 ` Chao Gao
@ 2025-03-10 5:21 ` Chang S. Bae
2025-03-10 7:06 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-03-10 5:21 UTC (permalink / raw)
To: Chao Gao
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/9/2025 6:33 PM, Chao Gao wrote:
>
> This is fixed by the patch 3.
Well, take a look at your changelog — the context is quite different. I
don't think it'S mergeable without a rewrite. Also, this should be a
standalone fix to complement the recent tip-tree changes.
Thanks,
Chang
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set
2025-03-10 5:20 ` Chang S. Bae
@ 2025-03-10 5:53 ` Chao Gao
2025-03-11 12:27 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-10 5:53 UTC (permalink / raw)
To: Chang S. Bae
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On Sun, Mar 09, 2025 at 10:20:47PM -0700, Chang S. Bae wrote:
>On 3/9/2025 8:49 PM, Chao Gao wrote:
>>
>> It was suggested by Sean [1].
>...
>> [1]: https://lore.kernel.org/kvm/ZTf5wPKXuHBQk0AN@google.com/
>
>But, you're defining a kernel "dynamic" feature while introducing a
>"guest-only" xfeature concept. Both seem to be mixed together with this
>patch. Why not call it as a guest-only feature? That's what Sean was
>suggesting, no?
Yes. I agree that we should call it as a guest-only feature. That's also why I
included a note in this patch below the "---" to seek feedback on the naming:
I am tempted to rename XFEATURE_MASK_KERNEL_DYNAMIC to
XFEATURE_MASK_GUEST_ONLY. But I am not sure if this was discussed
and rejected.
Thanks for confirming that the renaming is necessary.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-10 5:21 ` Chang S. Bae
@ 2025-03-10 7:06 ` Chao Gao
2025-03-10 17:33 ` Chang S. Bae
0 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-10 7:06 UTC (permalink / raw)
To: Chang S. Bae
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On Sun, Mar 09, 2025 at 10:21:12PM -0700, Chang S. Bae wrote:
>On 3/9/2025 6:33 PM, Chao Gao wrote:
>>
>> This is fixed by the patch 3.
>
>Well, take a look at your changelog — the context is quite different. I don't
>think it'S mergeable without a rewrite. Also, this should be a standalone fix
>to complement the recent tip-tree changes.
Should patch 2 be posted separately?
Because current tip/master branch has:
gfpu->xfeatures = fpu_user_cfg.default_features;
gfpu->perm = fpu_user_cfg.default_features;
Adjusting only fpu_guest->features raises the question: why isn't gfpu->perm
adjusted as well?
If patch 2 should go first, I don't think it's necessary to post patches 2-3
separately as maintainers can easily pick up patches 1-3 when they are in good
shape.
Regarding the changelog, I am uncertain what's quite different in the context.
It seems both you and I are talking about the inconsistency between
gfpu->xfeatures and fpstate->xfeatures. Did I miss something obvious?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-10 7:06 ` Chao Gao
@ 2025-03-10 17:33 ` Chang S. Bae
2025-03-11 12:09 ` Chao Gao
0 siblings, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-03-10 17:33 UTC (permalink / raw)
To: Chao Gao
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/10/2025 12:06 AM, Chao Gao wrote:
>
> Should patch 2 be posted separately?
gfpu->perm has been somewhat overlooked, as __xstate_request_perm() does
not update this field. However, I see that as a separate issue. The
options are either to fix it so that it remains in sync with
fpu->guest_perm consistently or to remove it entirely, as you proposed,
if it has no actual use.
There hasn’t been any relevant change that would justify a quick
follow-up like the other case. So, I'd assume it as part of this series.
But yes, I think gfpu->perm is also going to be
fpu_kernel_cfg.default_features at the moment.
> Regarding the changelog, I am uncertain what's quite different in the context.
> It seems both you and I are talking about the inconsistency between
> gfpu->xfeatures and fpstate->xfeatures. Did I miss something obvious?
I saw a distinction between inconsistencies within a function and
inconsistencies across functions.
Stepping back a bit, the approach for defining the VCPU xfeature set was
originally intended to include only user features, but it now appears
somewhat inconsistent:
(a) In fpu_alloc_guest_fpstate(), fpu_user_cfg is used.
(b) However, __fpstate_reset() references fpu_kernel_cfg to set storage
attributes.
(c) Additionally, fpu->guest_perm takes fpu_kernel_cfg, which affects
fpstate_realloc().
To maintain a consistent VCPU xfeature set, (b) and (c) should be corrected.
Alternatively, the VCPU xfeature set could be reconsidered to align with
how other tasks handle it. This might offer better maintainability
across functions. In that case, another option would be simply updating
fpu_alloc_guest_fpstate().
The recent tip-tree change seems somewhat incomplete — perhaps in
hindsight. If following up on this, the changelog should specifically
address inconsistencies within a function. I saw this as a way to
solidify an upcoming change, where addressing it sooner rather than
later would be beneficial.
In patch 3, you've pointed out the inconsistency between (a) and (b),
which is a valid point. However, the fix is only partial and does not
fully address the issue either. Moreover, the patch does not reference
the recent tip-tree change as it didn't have any context at that time.
Thanks,
Chang
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-10 17:33 ` Chang S. Bae
@ 2025-03-11 12:09 ` Chao Gao
2025-03-12 1:03 ` Chang S. Bae
0 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-11 12:09 UTC (permalink / raw)
To: Chang S. Bae
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On Mon, Mar 10, 2025 at 10:33:20AM -0700, Chang S. Bae wrote:
>On 3/10/2025 12:06 AM, Chao Gao wrote:
>>
>> Should patch 2 be posted separately?
>
>gfpu->perm has been somewhat overlooked, as __xstate_request_perm() does not
>update this field. However, I see that as a separate issue. The options are
>either to fix it so that it remains in sync with fpu->guest_perm consistently
>or to remove it entirely, as you proposed, if it has no actual use.
>
>There hasn’t been any relevant change that would justify a quick follow-up
>like the other case. So, I'd assume it as part of this series.
>
>But yes, I think gfpu->perm is also going to be
>fpu_kernel_cfg.default_features at the moment.
>
>> Regarding the changelog, I am uncertain what's quite different in the context.
>> It seems both you and I are talking about the inconsistency between
>> gfpu->xfeatures and fpstate->xfeatures. Did I miss something obvious?
>
>I saw a distinction between inconsistencies within a function and
>inconsistencies across functions.
>
>Stepping back a bit, the approach for defining the VCPU xfeature set was
>originally intended to include only user features, but it now appears
>somewhat inconsistent:
>
>(a) In fpu_alloc_guest_fpstate(), fpu_user_cfg is used.
>(b) However, __fpstate_reset() references fpu_kernel_cfg to set storage
> attributes.
>(c) Additionally, fpu->guest_perm takes fpu_kernel_cfg, which affects
> fpstate_realloc().
>
>To maintain a consistent VCPU xfeature set, (b) and (c) should be corrected.
>
>Alternatively, the VCPU xfeature set could be reconsidered to align with how
>other tasks handle it. This might offer better maintainability across
>functions. In that case, another option would be simply updating
>fpu_alloc_guest_fpstate().
>
>The recent tip-tree change seems somewhat incomplete — perhaps in hindsight.
>If following up on this, the changelog should specifically address
>inconsistencies within a function. I saw this as a way to solidify an
>upcoming change, where addressing it sooner rather than later would be
>beneficial.
>
>In patch 3, you've pointed out the inconsistency between (a) and (b), which
>is a valid point. However, the fix is only partial and does not fully address
>the issue either. Moreover, the patch does not reference the recent tip-tree
>change as it didn't have any context at that time.
Hi Chang,
All of the above makes sense to me. Thank you for your review and suggestions.
I will update the changelog to reference the recent change in tip-tree and post
it separately.
One thing I'm not entirely clear on is "the fix is only partial". I assume I
need to update gfpu->perm to reference fpu_kernel_cfg to complement the fix.
Is that correct?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set
2025-03-10 5:53 ` Chao Gao
@ 2025-03-11 12:27 ` Chao Gao
2025-03-12 1:03 ` Chang S. Bae
0 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-03-11 12:27 UTC (permalink / raw)
To: Chang S. Bae
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On Mon, Mar 10, 2025 at 01:53:09PM +0800, Chao Gao wrote:
>On Sun, Mar 09, 2025 at 10:20:47PM -0700, Chang S. Bae wrote:
>>On 3/9/2025 8:49 PM, Chao Gao wrote:
>>>
>>> It was suggested by Sean [1].
>>...
>>> [1]: https://lore.kernel.org/kvm/ZTf5wPKXuHBQk0AN@google.com/
>>
>>But, you're defining a kernel "dynamic" feature while introducing a
>>"guest-only" xfeature concept. Both seem to be mixed together with this
>>patch. Why not call it as a guest-only feature? That's what Sean was
>>suggesting, no?
>
>Yes. I agree that we should call it as a guest-only feature. That's also why I
>included a note in this patch below the "---" to seek feedback on the naming:
>
> I am tempted to rename XFEATURE_MASK_KERNEL_DYNAMIC to
> XFEATURE_MASK_GUEST_ONLY. But I am not sure if this was discussed
> and rejected.
>
>Thanks for confirming that the renaming is necessary.
Hi Chang,
I dug through the history and found a discussion about the naming at:
https://lore.kernel.org/all/893ac578-baaf-4f4f-96ee-e012dfc073a8@intel.com/#t
I think I should revise the changelog to call out why 'DYNAMIC' is preferred
over 'GUEST' and reference that discussion.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation
2025-03-11 12:09 ` Chao Gao
@ 2025-03-12 1:03 ` Chang S. Bae
0 siblings, 0 replies; 40+ messages in thread
From: Chang S. Bae @ 2025-03-12 1:03 UTC (permalink / raw)
To: Chao Gao
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/11/2025 5:09 AM, Chao Gao wrote:
>
> One thing I'm not entirely clear on is "the fix is only partial". I assume I
> need to update gfpu->perm to reference fpu_kernel_cfg to complement the fix.
> Is that correct?
Yes, I think so.
Thanks,
Chang
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set
2025-03-11 12:27 ` Chao Gao
@ 2025-03-12 1:03 ` Chang S. Bae
0 siblings, 0 replies; 40+ messages in thread
From: Chang S. Bae @ 2025-03-12 1:03 UTC (permalink / raw)
To: Chao Gao
Cc: tglx, dave.hansen, x86, seanjc, pbonzini, linux-kernel, kvm,
peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp
On 3/11/2025 5:27 AM, Chao Gao wrote:
>
> I dug through the history and found a discussion about the naming at:
>
> https://lore.kernel.org/all/893ac578-baaf-4f4f-96ee-e012dfc073a8@intel.com/#t
Okay, it looks like you've finally captured the full context!
> I think I should revise the changelog to call out why 'DYNAMIC' is preferred
> over 'GUEST' and reference that discussion.
You might want to take some time to think about some code comments when
defining the feature. I think 'independent feature' is a good example to
look at:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/asm/fpu/xstate.h#n53
Thanks,
Chang
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 00/10] Introduce CET supervisor state support
2025-03-07 19:09 ` [PATCH v3 00/10] Introduce CET supervisor state support Dave Hansen
@ 2025-03-18 15:24 ` Chao Gao
0 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-03-18 15:24 UTC (permalink / raw)
To: Dave Hansen
Cc: tglx, x86, seanjc, pbonzini, linux-kernel, kvm, peterz,
rick.p.edgecombe, weijiang.yang, john.allen, bp
On Fri, Mar 07, 2025 at 11:09:42AM -0800, Dave Hansen wrote:
>On 3/7/25 08:41, Chao Gao wrote:
>> case |IA32_XSS[12] | Space | RFBM[12] | Drop%
>> -----+-------------+-------+----------+------
>> 1 | 0 | None | 0 | 0.0%
>> 2 | 1 | None | 0 | 0.2%
>> 3 | 1 | 24B? | 1 | 0.2%
>
>So, 0.2% is still, what, dozens of cycles? Are you sure that it really
>takes the CPU dozens of cycles to skip over the feature during XSAVE?
>
>If it really turns out to be this measurable, we should probably follow
>up with the folks that implement XSAVE and see what's going on under the
>covers.
I reran the performance tests and observed a run-to-run variation of 0.4%
to 0.7%. So, I don't think there is any measurable performance difference.
I will update the performance statements in the cover letter.
>
>On a separate note, I was bugging Thomas a bit on IRC. His memory was
>that the AMX-era FPU rework only expected KVM to support user features.
>You might want to dig through the history a bit and see if _that_ was
>ever properly addressed because that would change the problem you're
>trying to solve.
I went through the email discussions and found only one relevant thread:
https://lore.kernel.org/kvm/87wnmf66m5.ffs@tglx/#t
where Thomas mentioned that guest_perm would be set as follows:
guest_fpu::__state_perm = supported_xcr0 & xstate_get_group_perm();
If implemented this way, KVM would only support user features. However, the
committed change is:
980fe2fddcff ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions")
In this change, fpu->guest_perm is copied from fpu->perm:
+ /* Same defaults for guests */
+ fpu->guest_perm = fpu->perm;
There are indeed some issues with enabling supervisor features for guest
FPUs, but they have been addressed by recent changes in the tip-tree ([1],
[2]) and patch 1 of this series.
[1]: https://lore.kernel.org/all/20250218141045.85201-1-stanspas@amazon.de/
[2]: https://lore.kernel.org/all/20250317140613.1761633-1-chao.gao@intel.com/
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2025-03-18 15:24 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-07 16:41 [PATCH v3 00/10] Introduce CET supervisor state support Chao Gao
2025-03-07 16:41 ` [PATCH v3 01/10] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Chao Gao
2025-03-07 16:41 ` [PATCH v3 02/10] x86/fpu/xstate: Drop @perm from guest pseudo FPU container Chao Gao
2025-03-07 16:41 ` [PATCH v3 03/10] x86/fpu/xstate: Correct xfeatures cache in guest pseudo fpu container Chao Gao
2025-03-07 17:48 ` Dave Hansen
2025-03-08 2:44 ` Chao Gao
2025-03-07 16:41 ` [PATCH v3 04/10] x86/fpu/xstate: Correct guest fpstate size calculation Chao Gao
2025-03-07 17:53 ` Dave Hansen
2025-03-08 2:56 ` Chao Gao
2025-03-07 21:37 ` Chang S. Bae
2025-03-08 2:49 ` Chao Gao
2025-03-09 22:06 ` Chang S. Bae
2025-03-10 1:33 ` Chao Gao
2025-03-10 5:21 ` Chang S. Bae
2025-03-10 7:06 ` Chao Gao
2025-03-10 17:33 ` Chang S. Bae
2025-03-11 12:09 ` Chao Gao
2025-03-12 1:03 ` Chang S. Bae
2025-03-07 16:41 ` [PATCH v3 05/10] x86/fpu/xstate: Introduce guest FPU configuration Chao Gao
2025-03-07 18:06 ` Dave Hansen
2025-03-08 3:00 ` Chao Gao
2025-03-07 16:41 ` [PATCH v3 06/10] x86/fpu/xstate: Initialize guest perm with fpu_guest_cfg Chao Gao
2025-03-07 18:14 ` Dave Hansen
2025-03-08 3:14 ` Chao Gao
2025-03-07 16:41 ` [PATCH v3 07/10] x86/fpu/xstate: Initialize guest fpstate with fpu_guest_config Chao Gao
2025-03-07 18:41 ` Dave Hansen
2025-03-08 3:38 ` Chao Gao
2025-03-07 16:41 ` [PATCH v3 08/10] x86/fpu/xstate: Add CET supervisor xfeature support Chao Gao
2025-03-07 18:39 ` Dave Hansen
2025-03-08 3:24 ` Chao Gao
2025-03-07 16:41 ` [PATCH v3 09/10] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set Chao Gao
2025-03-09 22:06 ` Chang S. Bae
2025-03-10 3:49 ` Chao Gao
2025-03-10 5:20 ` Chang S. Bae
2025-03-10 5:53 ` Chao Gao
2025-03-11 12:27 ` Chao Gao
2025-03-12 1:03 ` Chang S. Bae
2025-03-07 16:41 ` [PATCH v3 10/10] x86/fpu/xstate: Warn if CET supervisor state is detected in normal fpstate Chao Gao
2025-03-07 19:09 ` [PATCH v3 00/10] Introduce CET supervisor state support Dave Hansen
2025-03-18 15:24 ` Chao Gao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox