public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/7] Introduce CET supervisor state support
@ 2025-04-10  7:24 Chao Gao
  2025-04-10  7:24 ` [PATCH v5 1/7] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Chao Gao
                   ` (7 more replies)
  0 siblings, 8 replies; 40+ messages in thread
From: Chao Gao @ 2025-04-10  7:24 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, tglx, dave.hansen, seanjc, pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
	chang.seok.bae, xin3.li, Chao Gao, Aruna Ramakrishna, Dave Hansen,
	Eric Biggers, H. Peter Anvin, Ingo Molnar, Maxim Levitsky,
	Mitchell Levy, Samuel Holland, Stanislav Spassov, Uros Bizjak,
	Vignesh Balasubramanian, Zhao Liu

Dear maintainers and reviewers,

I would appreciate your feedback on two unresolved issues:

1. Management of guest default features and sizes (patch 3)

The CET supervisor state will be added as a guest-only feature, causing
guest FPU default features and sizes to diverge from those of non-guest
FPUs. Calculating guest FPU default features and sizes at runtime based on
defaults in fpu_kernel/user_cfg is undesirable due to unnecessary runtime
overhead and the need for calculations in multiple places, such as during
vCPU fpstate allocation and FPU permission initialization.

The preferred approach is to calculate guest default features and sizes
at boot time and then cache them.

Versions v3 and earlier used a guest-specific fpu_state_config to hold
guest defaults, but the other fields in the struct were unused, suggesting
that fpu_guest_cfg might not be the best fit. In v4, inspired by the
existing independent_features field, I switched to adding guest defaults to
fpu_kernel/user_cfg. During the review, Chang suggested a cleaner and more
structured alternative by introducing a dedicated struct for guest
defaults. This v5 follows Chang's suggestion.

2. Naming of the guest-only feature mask (patch 6)

Instead of hard-coding the CET supervisor state in various places, we
establish an infrastructure that introduces a mask to abstract the features
necessary for guest FPUs, which are not enabled by non-guest FPUs.

The mask was previously named XFEATURE_MASK_KERNEL_DYNAMIC. It was slightly
preferred because it reflected the XSAVE buffer state (some buffers have
the features while others do not) rather than being tied to its
usage - KVM. However, this name led to confusion. In this v5, it has been
renamed to XFEATURE_MASK_GUEST_SUPERVISOR. Please refer to patch 6 for the
decision-making process.

Aside from these issues, I believe this series is in good shape. However,
since most of the patches have not yet received Reviewed-by/Acked-by tags,
please take a look at them as well.

Also, credit to Chang and Rick for their help with this series.

== Changelog ==
v4->v5:
 - Reorder patches to ensure new features come after cleanups and bug fixes
   (Chang, Dave, Ingo)
 - Add a dedicated structure for default xfeatures and sizes for guest FPUs
   (Chang)
 - Provide a detailed explanation for choosing XFEATURE_MASK_GUEST_SUPERVISOR
   (Chang)
 - Summarize the long history of the CET FPU series and the rationale behind
   the "guest-only" approach in the cover-letter and the last patch
 - Other minor changes; please see each patch for details. (Chang)
 - v4: https://lore.kernel.org/kvm/20250318153316.1970147-1-chao.gao@intel.com/

== Background ==

CET defines two register states: CET user, which includes user-mode control
registers, and CET supervisor, which consists of shadow-stack pointers for
privilege levels 0-2.

Current kernels disable shadow stacks in kernel mode, making the CET
supervisor state unused and eliminating the need for context switching.

== Problem ==

To virtualize CET for guests, KVM must accurately emulate hardware
behavior. A key challenge arises because there is no CPUID flag to indicate
that shadow stack is supported only in user mode. Therefore, KVM cannot
assume guests will not enable shadow stacks in kernel mode and must
preserve the CET supervisor state of vCPUs.

== Solution ==

An initial proposal to manually save and restore CET supervisor states
using raw RDMSR/WRMSR in KVM was rejected due to performance concerns and
its impact on KVM's ABI. Instead, leveraging the kernel's FPU
infrastructure for context switching was favored [1].

The main question then became whether to enable the CET supervisor state
globally for all processes or restrict it to vCPU processes. This decision
involves a trade-off between a 24-byte XSTATE buffer waste for all non-vCPU
processes and approximately 100 lines of code complexity in the kernel [2].
The agreed approach is to first try this optimal solution [3], i.e.,
restricting the CET supervisor state to guest FPUs only and eliminating
unnecessary space waste.

Key changes in this series are:

1) Fix existing issue regarding enabling guest supervisor states support.
2) Add default features and size for guest FPUs.
3) Add infrastructure to support guest-only features.
4) Add CET supervisor state as the first guest-only feature.

With the series in place, guest FPUs have xstate_bv[12] == xcomp_bv[12] == 1
and CET supervisor state is saved/reloaded when xsaves/xrstors executes on
guest FPUs. non-guest FPUs have xstate_bv[12] == xcomp_bv[12] == 0, then
CET supervisor state is not saved/restored.

== Performance ==

We measured context-switching performance with the benchmark [4] in following
three cases.

case 1: the baseline. i.e., this series isn't applied
case 2: baseline + this series. CET-S space is allocated for guest fpu only.
case 3: baseline + allocate CET-S space for all tasks. Hardware init
        optimization avoids writing out CET-S space on each XSAVES.

The performance differences in the three cases are very small and fall within the
run-to-run variation.

[1]: https://lore.kernel.org/kvm/ZM1jV3UPL0AMpVDI@google.com/
[2]: https://lore.kernel.org/kvm/1c2fd06e-2e97-4724-80ab-8695aa4334e7@intel.com/
[3]: https://lore.kernel.org/kvm/2597a87b-1248-b8ce-ce60-94074bc67ea4@intel.com/
[4]: https://github.com/antonblanchard/will-it-scale/blob/master/tests/context_switch1.c

Chao Gao (4):
  x86/fpu: Drop @perm from guest pseudo FPU container
  x86/fpu/xstate: Differentiate default features for host and guest FPUs
  x86/fpu: Initialize guest FPU permissions from guest defaults
  x86/fpu: Initialize guest fpstate and FPU pseudo container from guest
    defaults

Sean Christopherson (1):
  x86/fpu/xstate: Always preserve non-user xfeatures/flags in
    __state_perm

Yang Weijiang (2):
  x86/fpu/xstate: Introduce "guest-only" supervisor xfeature set
  x86/fpu/xstate: Add CET supervisor xfeature support as a guest-only
    feature

 arch/x86/include/asm/fpu/types.h  | 81 +++++++++++++++++++++++++------
 arch/x86/include/asm/fpu/xstate.h |  9 ++--
 arch/x86/kernel/fpu/core.c        | 42 ++++++++++------
 arch/x86/kernel/fpu/xstate.c      | 60 +++++++++++++++++------
 arch/x86/kernel/fpu/xstate.h      |  5 ++
 5 files changed, 149 insertions(+), 48 deletions(-)


base-commit: ed91ad084ef9fa49f6c6dfef2cb10c12ce3786a6
-- 
2.46.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/7] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm
  2025-04-10  7:24 [PATCH v5 0/7] Introduce CET supervisor state support Chao Gao
@ 2025-04-10  7:24 ` Chao Gao
  2025-04-18 20:50   ` Chang S. Bae
  2025-04-10  7:24 ` [PATCH v5 2/7] x86/fpu: Drop @perm from guest pseudo FPU container Chao Gao
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-04-10  7:24 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, tglx, dave.hansen, seanjc, pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
	chang.seok.bae, xin3.li, Maxim Levitsky, Chao Gao, Ingo Molnar,
	Dave Hansen, H. Peter Anvin, Samuel Holland, Mitchell Levy,
	Vignesh Balasubramanian, Aruna Ramakrishna

From: Sean Christopherson <seanjc@google.com>

When granting userspace or a KVM guest access to an xfeature, preserve the
entity's existing supervisor and software-defined permissions as tracked
by __state_perm, i.e. use __state_perm to track *all* permissions even
though all supported supervisor xfeatures are granted to all FPUs and
FPU_GUEST_PERM_LOCKED disallows changing permissions.

Effectively clobbering supervisor permissions results in inconsistent
behavior, as xstate_get_group_perm() will report supervisor features for
process that do NOT request access to dynamic user xfeatures, whereas any
and all supervisor features will be absent from the set of permissions for
any process that is granted access to one or more dynamic xfeatures (which
right now means AMX).

The inconsistency isn't problematic because fpu_xstate_prctl() already
strips out everything except user xfeatures:

        case ARCH_GET_XCOMP_PERM:
                /*
                 * Lockless snapshot as it can also change right after the
                 * dropping the lock.
                 */
                permitted = xstate_get_host_group_perm();
                permitted &= XFEATURE_MASK_USER_SUPPORTED;
                return put_user(permitted, uptr);

        case ARCH_GET_XCOMP_GUEST_PERM:
                permitted = xstate_get_guest_group_perm();
                permitted &= XFEATURE_MASK_USER_SUPPORTED;
                return put_user(permitted, uptr);

and similarly KVM doesn't apply the __state_perm to supervisor states
(kvm_get_filtered_xcr0() incorporates xstate_get_guest_group_perm()):

        case 0xd: {
                u64 permitted_xcr0 = kvm_get_filtered_xcr0();
                u64 permitted_xss = kvm_caps.supported_xss;

But if KVM in particular were to ever change, dropping supervisor
permissions would result in subtle bugs in KVM's reporting of supported
CPUID settings.  And the above behavior also means that having supervisor
xfeatures in __state_perm is correctly handled by all users.

Dropping supervisor permissions also creates another landmine for KVM.  If
more dynamic user xfeatures are ever added, requesting access to multiple
xfeatures in separate ARCH_REQ_XCOMP_GUEST_PERM calls will result in the
second invocation of __xstate_request_perm() computing the wrong ksize, as
as the mask passed to xstate_calculate_size() would not contain *any*
supervisor features.

Commit 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTATE
permissions") fudged around the size issue for userspace FPUs, but for
reasons unknown skipped guest FPUs.  Lack of a fix for KVM "works" only
because KVM doesn't yet support virtualizing features that have supervisor
xfeatures, i.e. as of today, KVM guest FPUs will never need the relevant
xfeatures.

Simply extending the hack-a-fix for guests would temporarily solve the
ksize issue, but wouldn't address the inconsistency issue and would leave
another lurking pitfall for KVM.  KVM support for virtualizing CET will
likely add CET_KERNEL as a guest-only xfeature, i.e. CET_KERNEL will not
be set in xfeatures_mask_supervisor() and would again be dropped when
granting access to dynamic xfeatures.

Note, the existing clobbering behavior is rather subtle.  The @permitted
parameter to __xstate_request_perm() comes from:

	permitted = xstate_get_group_perm(guest);

which is either fpu->guest_perm.__state_perm or fpu->perm.__state_perm,
where __state_perm is initialized to:

        fpu->perm.__state_perm          = fpu_kernel_cfg.default_features;

and copied to the guest side of things:

	/* Same defaults for guests */
	fpu->guest_perm = fpu->perm;

fpu_kernel_cfg.default_features contains everything except the dynamic
xfeatures, i.e. everything except XFEATURE_MASK_XTILE_DATA:

        fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features;
        fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;

When __xstate_request_perm() restricts the local "mask" variable to
compute the user state size:

	mask &= XFEATURE_MASK_USER_SUPPORTED;
	usize = xstate_calculate_size(mask, false);

it subtly overwrites the target __state_perm with "mask" containing only
user xfeatures:

	perm = guest ? &fpu->guest_perm : &fpu->perm;
	/* Pairs with the READ_ONCE() in xstate_get_group_perm() */
	WRITE_ONCE(perm->__state_perm, mask);

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Weijiang Yang <weijiang.yang@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chao Gao <chao.gao@intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: John Allen <john.allen@amd.com>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/all/ZTqgzZl-reO1m01I@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
---
 arch/x86/include/asm/fpu/types.h |  8 +++++---
 arch/x86/kernel/fpu/xstate.c     | 18 +++++++++++-------
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index de16862bf230..46cc263f9f4f 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -407,9 +407,11 @@ struct fpu_state_perm {
 	/*
 	 * @__state_perm:
 	 *
-	 * This bitmap indicates the permission for state components, which
-	 * are available to a thread group. The permission prctl() sets the
-	 * enabled state bits in thread_group_leader()->thread.fpu.
+	 * This bitmap indicates the permission for state components
+	 * available to a thread group, including both user and supervisor
+	 * components and software-defined bits like FPU_GUEST_PERM_LOCKED.
+	 * The permission prctl() sets the enabled state bits in
+	 * thread_group_leader()->thread.fpu.
 	 *
 	 * All run time operations use the per thread information in the
 	 * currently active fpu.fpstate which contains the xfeature masks
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 46c45e2f2a5a..6f10f5490022 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1642,16 +1642,20 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
 	if ((permitted & requested) == requested)
 		return 0;
 
-	/* Calculate the resulting kernel state size */
+	/*
+	 * Calculate the resulting kernel state size.  Note, @permitted also
+	 * contains supervisor xfeatures even though supervisor are always
+	 * permitted for kernel and guest FPUs, and never permitted for user
+	 * FPUs.
+	 */
 	mask = permitted | requested;
-	/* Take supervisor states into account on the host */
-	if (!guest)
-		mask |= xfeatures_mask_supervisor();
 	ksize = xstate_calculate_size(mask, compacted);
 
-	/* Calculate the resulting user state size */
-	mask &= XFEATURE_MASK_USER_SUPPORTED;
-	usize = xstate_calculate_size(mask, false);
+	/*
+	 * Calculate the resulting user state size.  Take care not to clobber
+	 * the supervisor xfeatures in the new mask!
+	 */
+	usize = xstate_calculate_size(mask & XFEATURE_MASK_USER_SUPPORTED, false);
 
 	if (!guest) {
 		ret = validate_sigaltstack(usize);
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 2/7] x86/fpu: Drop @perm from guest pseudo FPU container
  2025-04-10  7:24 [PATCH v5 0/7] Introduce CET supervisor state support Chao Gao
  2025-04-10  7:24 ` [PATCH v5 1/7] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Chao Gao
@ 2025-04-10  7:24 ` Chao Gao
  2025-04-18 20:51   ` Chang S. Bae
  2025-04-10  7:24 ` [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs Chao Gao
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-04-10  7:24 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, tglx, dave.hansen, seanjc, pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
	chang.seok.bae, xin3.li, Chao Gao, Maxim Levitsky, Ingo Molnar,
	Dave Hansen, H. Peter Anvin, Samuel Holland, Mitchell Levy,
	Stanislav Spassov, Eric Biggers

Remove @perm from the guest pseudo FPU container. The field is
initialized during allocation and never used later.

Rename fpu_init_guest_permissions() to show that its sole purpose is to
lock down guest permissions.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v5: drop the useless fpu_guest argument (Chang)
---
 arch/x86/include/asm/fpu/types.h | 7 -------
 arch/x86/kernel/fpu/core.c       | 7 ++-----
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 46cc263f9f4f..9f9ed406b179 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -526,13 +526,6 @@ struct fpu_guest {
 	 */
 	u64				xfeatures;
 
-	/*
-	 * @perm:			xfeature bitmap of features which are
-	 *				permitted to be enabled for the guest
-	 *				vCPU.
-	 */
-	u64				perm;
-
 	/*
 	 * @xfd_err:			Save the guest value.
 	 */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1b734a9ff088..28ad7ec56eaa 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -202,7 +202,7 @@ void fpu_reset_from_exception_fixup(void)
 #if IS_ENABLED(CONFIG_KVM)
 static void __fpstate_reset(struct fpstate *fpstate, u64 xfd);
 
-static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
+static void fpu_lock_guest_permissions(void)
 {
 	struct fpu_state_perm *fpuperm;
 	u64 perm;
@@ -218,8 +218,6 @@ static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
 	WRITE_ONCE(fpuperm->__state_perm, perm | FPU_GUEST_PERM_LOCKED);
 
 	spin_unlock_irq(&current->sighand->siglock);
-
-	gfpu->perm = perm & ~FPU_GUEST_PERM_LOCKED;
 }
 
 bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
@@ -240,7 +238,6 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
 
 	gfpu->fpstate		= fpstate;
 	gfpu->xfeatures		= fpu_kernel_cfg.default_features;
-	gfpu->perm		= fpu_kernel_cfg.default_features;
 
 	/*
 	 * KVM sets the FP+SSE bits in the XSAVE header when copying FPU state
@@ -255,7 +252,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
 	if (WARN_ON_ONCE(fpu_user_cfg.default_size > gfpu->uabi_size))
 		gfpu->uabi_size = fpu_user_cfg.default_size;
 
-	fpu_init_guest_permissions(gfpu);
+	fpu_lock_guest_permissions();
 
 	return true;
 }
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-10  7:24 [PATCH v5 0/7] Introduce CET supervisor state support Chao Gao
  2025-04-10  7:24 ` [PATCH v5 1/7] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Chao Gao
  2025-04-10  7:24 ` [PATCH v5 2/7] x86/fpu: Drop @perm from guest pseudo FPU container Chao Gao
@ 2025-04-10  7:24 ` Chao Gao
  2025-04-24 22:52   ` Edgecombe, Rick P
  2025-05-01 14:24   ` Chang S. Bae
  2025-04-10  7:24 ` [PATCH v5 4/7] x86/fpu: Initialize guest FPU permissions from guest defaults Chao Gao
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 40+ messages in thread
From: Chao Gao @ 2025-04-10  7:24 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, tglx, dave.hansen, seanjc, pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
	chang.seok.bae, xin3.li, Chao Gao, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Maxim Levitsky, Samuel Holland, Mitchell Levy,
	Eric Biggers, Stanislav Spassov, Vignesh Balasubramanian,
	Aruna Ramakrishna

Currently, guest and host FPUs share the same default features. However,
the CET supervisor xstate is the first feature that needs to be enabled
exclusively for guest FPUs. Enabling it for host FPUs leads to a waste of
24 bytes in the XSAVE buffer.

To support "guest-only" features, add a new structure to hold the
default features and sizes for guest FPUs to clearly differentiate them
from those for host FPUs.

An alternative approach is adding a guest_only_xfeatures member to
fpu_kernel_cfg and adding two helper functions to calculate the guest
default xfeatures and size. However, calculating these defaults at runtime
would introduce unnecessary overhead.

Note that, for now, the default features for guest and host FPUs remain the
same. This will change in a follow-up patch once guest permissions, default
xfeatures, and fpstate size are all converted to use the guest defaults.

Suggested-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v5:
Add a new vcpu_fpu_config instead of adding new members to
fpu_state_config (Chang)
Extract a helper to set default values (Chang)
---
 arch/x86/include/asm/fpu/types.h | 43 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/fpu/core.c       |  1 +
 arch/x86/kernel/fpu/xstate.c     | 29 ++++++++++++++++-----
 3 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 9f9ed406b179..769155a0401a 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -542,6 +542,48 @@ struct fpu_guest {
 	struct fpstate			*fpstate;
 };
 
+/*
+ * FPU state configuration data for fpu_guest.
+ * Initialized at boot time. Read only after init.
+ */
+struct vcpu_fpu_config {
+	/*
+	 * @size:
+	 *
+	 * The default size of the register state buffer in guest FPUs.
+	 * Includes all supported features except independent managed
+	 * features and features which have to be requested by user space
+	 * before usage.
+	 */
+	unsigned int size;
+
+	/*
+	 * @user_size:
+	 *
+	 * The default UABI size of the register state buffer in guest
+	 * FPUs. Includes all supported user features except independent
+	 * managed features and features which have to be requested by
+	 * user space before usage.
+	 */
+	unsigned int user_size;
+
+	/*
+	 * @features:
+	 *
+	 * The default supported features bitmap in guest FPUs. Does not
+	 * include independent managed features and features which have to
+	 * be requested by user space before usage.
+	 */
+	u64 features;
+
+	/*
+	 * @user_features:
+	 *
+	 * Same as @features except only user xfeatures are included.
+	 */
+	u64 user_features;
+};
+
 /*
  * FPU state configuration data. Initialized at boot time. Read only after init.
  */
@@ -597,5 +639,6 @@ struct fpu_state_config {
 
 /* FPU state configuration information */
 extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg;
+extern struct vcpu_fpu_config guest_default_cfg;
 
 #endif /* _ASM_X86_FPU_TYPES_H */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 28ad7ec56eaa..25f13cc8ad92 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -36,6 +36,7 @@ DEFINE_PER_CPU(u64, xfd_state);
 /* The FPU state configuration data for kernel and user space */
 struct fpu_state_config	fpu_kernel_cfg __ro_after_init;
 struct fpu_state_config fpu_user_cfg __ro_after_init;
+struct vcpu_fpu_config guest_default_cfg __ro_after_init;
 
 /*
  * Represents the initial FPU state. It's mostly (but not completely) zeroes,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6f10f5490022..cdd1e51fb93e 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -738,6 +738,11 @@ static int __init init_xstate_size(void)
 	fpu_user_cfg.default_size =
 		xstate_calculate_size(fpu_user_cfg.default_features, false);
 
+	guest_default_cfg.size =
+		xstate_calculate_size(guest_default_cfg.features, compacted);
+	guest_default_cfg.user_size =
+		xstate_calculate_size(guest_default_cfg.user_features, false);
+
 	return 0;
 }
 
@@ -766,6 +771,22 @@ static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
 	fpstate_reset(&current->thread.fpu);
 }
 
+static void __init init_default_features(u64 kernel_max_features, u64 user_max_features)
+{
+	u64 kfeatures = kernel_max_features;
+	u64 ufeatures = user_max_features;
+
+	/* Default feature sets should not include dynamic xfeatures. */
+	kfeatures &= ~XFEATURE_MASK_USER_DYNAMIC;
+	ufeatures &= ~XFEATURE_MASK_USER_DYNAMIC;
+
+	fpu_kernel_cfg.default_features = kfeatures;
+	fpu_user_cfg.default_features   = ufeatures;
+
+	guest_default_cfg.features      = kfeatures;
+	guest_default_cfg.user_features = ufeatures;
+}
+
 /*
  * Enable and initialize the xsave feature.
  * Called once per system bootup.
@@ -837,12 +858,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	fpu_user_cfg.max_features = fpu_kernel_cfg.max_features;
 	fpu_user_cfg.max_features &= XFEATURE_MASK_USER_SUPPORTED;
 
-	/* Clean out dynamic features from default */
-	fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features;
-	fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
-
-	fpu_user_cfg.default_features = fpu_user_cfg.max_features;
-	fpu_user_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
+	/* Now, given maximum feature set, determine default values */
+	init_default_features(fpu_kernel_cfg.max_features, fpu_user_cfg.max_features);
 
 	/* Store it for paranoia check at the end */
 	xfeatures = fpu_kernel_cfg.max_features;
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 4/7] x86/fpu: Initialize guest FPU permissions from guest defaults
  2025-04-10  7:24 [PATCH v5 0/7] Introduce CET supervisor state support Chao Gao
                   ` (2 preceding siblings ...)
  2025-04-10  7:24 ` [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs Chao Gao
@ 2025-04-10  7:24 ` Chao Gao
  2025-04-30 15:45   ` Edgecombe, Rick P
  2025-04-10  7:24 ` [PATCH v5 5/7] x86/fpu: Initialize guest fpstate and FPU pseudo container " Chao Gao
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-04-10  7:24 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, tglx, dave.hansen, seanjc, pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
	chang.seok.bae, xin3.li, Chao Gao, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Eric Biggers, Stanislav Spassov

Currently, fpu->guest_perm is copied from fpu->perm, which is derived from
fpu_kernel_cfg.default_features.

Guest defaults were introduced to differentiate the features and sizes of
host and guest FPUs. Copying guest FPU permissions from the host will lead
to inconsistencies between the guest default features and permissions.

Initialize guest FPU permissions from guest defaults instead of host
defaults. This ensures that any changes to guest default features are
automatically reflected in guest permissions, which in turn guarantees
that fpstate_realloc() allocates a correctly sized XSAVE buffer for guest
FPUs.

Suggested-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kernel/fpu/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 25f13cc8ad92..e23e435b85c4 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -543,8 +543,10 @@ void fpstate_reset(struct fpu *fpu)
 	fpu->perm.__state_perm		= fpu_kernel_cfg.default_features;
 	fpu->perm.__state_size		= fpu_kernel_cfg.default_size;
 	fpu->perm.__user_state_size	= fpu_user_cfg.default_size;
-	/* Same defaults for guests */
-	fpu->guest_perm = fpu->perm;
+
+	fpu->guest_perm.__state_perm	= guest_default_cfg.features;
+	fpu->guest_perm.__state_size	= guest_default_cfg.size;
+	fpu->guest_perm.__user_state_size = guest_default_cfg.user_size;
 }
 
 static inline void fpu_inherit_perms(struct fpu *dst_fpu)
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 5/7] x86/fpu: Initialize guest fpstate and FPU pseudo container from guest defaults
  2025-04-10  7:24 [PATCH v5 0/7] Introduce CET supervisor state support Chao Gao
                   ` (3 preceding siblings ...)
  2025-04-10  7:24 ` [PATCH v5 4/7] x86/fpu: Initialize guest FPU permissions from guest defaults Chao Gao
@ 2025-04-10  7:24 ` Chao Gao
  2025-04-30 18:29   ` Edgecombe, Rick P
  2025-04-10  7:24 ` [PATCH v5 6/7] x86/fpu/xstate: Introduce "guest-only" supervisor xfeature set Chao Gao
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-04-10  7:24 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, tglx, dave.hansen, seanjc, pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
	chang.seok.bae, xin3.li, Chao Gao, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Eric Biggers, Stanislav Spassov

fpu_alloc_guest_fpstate() currently uses host defaults to initialize guest
fpstate and pseudo containers. Guest defaults were introduced to
differentiate the features and sizes of host and guest FPUs. Switch to
using guest defaults instead.

Additionally, incorporate the initialization of indicators (is_valloc and
is_guest) into the newly added guest-specific reset function to centralize
the resetting of guest fpstate.

Suggested-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v5: init is_valloc/is_guest in the guest-specific reset function (Chang)
---
 arch/x86/kernel/fpu/core.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index e23e435b85c4..f5593f6009a4 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -201,7 +201,20 @@ void fpu_reset_from_exception_fixup(void)
 }
 
 #if IS_ENABLED(CONFIG_KVM)
-static void __fpstate_reset(struct fpstate *fpstate, u64 xfd);
+static void __guest_fpstate_reset(struct fpstate *fpstate, u64 xfd)
+{
+	/* Initialize sizes and feature masks */
+	fpstate->size		= guest_default_cfg.size;
+	fpstate->user_size	= guest_default_cfg.user_size;
+	fpstate->xfeatures	= guest_default_cfg.features;
+	fpstate->user_xfeatures	= guest_default_cfg.user_features;
+	fpstate->xfd		= xfd;
+
+	/* Initialize indicators to reflect properties of the fpstate */
+	fpstate->is_valloc	= true;
+	fpstate->is_guest	= true;
+}
+
 
 static void fpu_lock_guest_permissions(void)
 {
@@ -226,19 +239,18 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
 	struct fpstate *fpstate;
 	unsigned int size;
 
-	size = fpu_kernel_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
+	size = guest_default_cfg.size + ALIGN(offsetof(struct fpstate, regs), 64);
+
 	fpstate = vzalloc(size);
 	if (!fpstate)
 		return false;
 
 	/* Leave xfd to 0 (the reset value defined by spec) */
-	__fpstate_reset(fpstate, 0);
+	__guest_fpstate_reset(fpstate, 0);
 	fpstate_init_user(fpstate);
-	fpstate->is_valloc	= true;
-	fpstate->is_guest	= true;
 
 	gfpu->fpstate		= fpstate;
-	gfpu->xfeatures		= fpu_kernel_cfg.default_features;
+	gfpu->xfeatures		= guest_default_cfg.features;
 
 	/*
 	 * KVM sets the FP+SSE bits in the XSAVE header when copying FPU state
@@ -250,8 +262,8 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
 	 * all features that can expand the uABI size must be opt-in.
 	 */
 	gfpu->uabi_size		= sizeof(struct kvm_xsave);
-	if (WARN_ON_ONCE(fpu_user_cfg.default_size > gfpu->uabi_size))
-		gfpu->uabi_size = fpu_user_cfg.default_size;
+	if (WARN_ON_ONCE(guest_default_cfg.user_size > gfpu->uabi_size))
+		gfpu->uabi_size = guest_default_cfg.user_size;
 
 	fpu_lock_guest_permissions();
 
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 6/7] x86/fpu/xstate: Introduce "guest-only" supervisor xfeature set
  2025-04-10  7:24 [PATCH v5 0/7] Introduce CET supervisor state support Chao Gao
                   ` (4 preceding siblings ...)
  2025-04-10  7:24 ` [PATCH v5 5/7] x86/fpu: Initialize guest fpstate and FPU pseudo container " Chao Gao
@ 2025-04-10  7:24 ` Chao Gao
  2025-04-24 22:58   ` Edgecombe, Rick P
  2025-04-10  7:24 ` [PATCH v5 7/7] x86/fpu/xstate: Add CET supervisor xfeature support as a guest-only feature Chao Gao
  2025-04-24 23:28 ` [PATCH v5 0/7] Introduce CET supervisor state support Edgecombe, Rick P
  7 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-04-10  7:24 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, tglx, dave.hansen, seanjc, pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
	chang.seok.bae, xin3.li, Chao Gao, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Maxim Levitsky, Mitchell Levy, Samuel Holland,
	Zhao Liu, Vignesh Balasubramanian, Aruna Ramakrishna, Uros Bizjak

From: Yang Weijiang <weijiang.yang@intel.com>

In preparation for upcoming CET virtualization support, the CET supervisor
state will be added as a "guest-only" feature, since it is required only by
KVM (i.e., guest FPUs). Establish the infrastructure for "guest-only"
features.

Define a new XFEATURE_MASK_GUEST_SUPERVISOR mask to specify features that
are enabled by default in guest FPUs but not in host FPUs. Specifically,
for any bit in this set, permission is granted and XSAVE space is allocated
during vCPU creation. Non-guest FPUs cannot enable guest-only features,
even dynamically, and no XSAVE space will be allocated for them.

The mask is currently empty, but this will be changed by a subsequent
patch.

Note that there is no plan to add "guest-only" user xfeatures, so the user
default features remain unchanged.

Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
v5: Explain in detail the reasoning behind the mask name choice below the
"---" separator line.

In previous versions, the mask was named "XFEATURE_MASK_SUPERVISOR_DYNAMIC"
Dave suggested this name [1], but he also noted, "I don't feel strongly about
it and I've said my piece. I won't NAK it one way or the other."

The term "dynamic" was initially preferred because it reflects the impact
on XSAVE buffers—some buffers accommodate dynamic features while others do
not. This naming allows for the introduction of dynamic features that are
not strictly "guest-only", offering flexibility beyond KVM.

However, using "dynamic" has led to confusion [2]. Chang pointed out that
permission granting and buffer allocation are actually static at VCPU
allocation, diverging from the model for user dynamic features. He also
questioned the rationale for introducing a kernel dynamic feature mask
while using it as a guest-only feature mask [3]. Moreover, Thomas remarked
that "the dynamic naming is really bad" [4]. Although his specific concerns
are unclear, we should be cautious about reinstating the "kernel dynamic
feature" naming.

Therefore, in v4, I renamed the mask to "XFEATURE_MASK_SUPERVISOR_GUEST"
and further refined it to "XFEATURE_MASK_GUEST_SUPERVISOR" in this v5.

[1]: https://lore.kernel.org/all/893ac578-baaf-4f4f-96ee-e012dfc073a8@intel.com/#t
[2]: https://lore.kernel.org/kvm/e15d1074-d5ec-431d-86e5-a58bc6297df8@intel.com/
[3]: https://lore.kernel.org/kvm/7bee70fd-b2b9-4466-a694-4bf3486b19c7@intel.com/
[4]: https://lore.kernel.org/all/87sg1owmth.ffs@nanos.tec.linutronix.de/
---
 arch/x86/include/asm/fpu/types.h  |  9 +++++----
 arch/x86/include/asm/fpu/xstate.h |  6 +++++-
 arch/x86/kernel/fpu/xstate.c      | 14 +++++++++++---
 arch/x86/kernel/fpu/xstate.h      |  5 +++++
 4 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 769155a0401a..7494d732b296 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -600,8 +600,9 @@ struct fpu_state_config {
 	 * @default_size:
 	 *
 	 * The default size of the register state buffer. Includes all
-	 * supported features except independent managed features and
-	 * features which have to be requested by user space before usage.
+	 * supported features except independent managed features,
+	 * guest-only features and features which have to be requested by
+	 * user space before usage.
 	 */
 	unsigned int		default_size;
 
@@ -617,8 +618,8 @@ struct fpu_state_config {
 	 * @default_features:
 	 *
 	 * The default supported features bitmap. Does not include
-	 * independent managed features and features which have to
-	 * be requested by user space before usage.
+	 * independent managed features, guest-only features and features
+	 * which have to be requested by user space before usage.
 	 */
 	u64 default_features;
 	/*
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 7f39fe7980c5..62768d2131ec 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -45,9 +45,13 @@
 /* Features which are dynamically enabled for a process on request */
 #define XFEATURE_MASK_USER_DYNAMIC	XFEATURE_MASK_XTILE_DATA
 
+/* Supervisor features which are enabled only in guest FPUs */
+#define XFEATURE_MASK_GUEST_SUPERVISOR	0
+
 /* All currently supported supervisor features */
 #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \
-					    XFEATURE_MASK_CET_USER)
+					    XFEATURE_MASK_CET_USER | \
+					    XFEATURE_MASK_GUEST_SUPERVISOR)
 
 /*
  * A supervisor state component may not always contain valuable information,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index cdd1e51fb93e..c7db9f1407f5 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -776,14 +776,22 @@ static void __init init_default_features(u64 kernel_max_features, u64 user_max_f
 	u64 kfeatures = kernel_max_features;
 	u64 ufeatures = user_max_features;
 
-	/* Default feature sets should not include dynamic xfeatures. */
-	kfeatures &= ~XFEATURE_MASK_USER_DYNAMIC;
+	/*
+	 * Default feature sets should not include dynamic and guest-only
+	 * xfeatures at all.
+	 */
+	kfeatures &= ~(XFEATURE_MASK_USER_DYNAMIC | XFEATURE_MASK_GUEST_SUPERVISOR);
 	ufeatures &= ~XFEATURE_MASK_USER_DYNAMIC;
 
 	fpu_kernel_cfg.default_features = kfeatures;
 	fpu_user_cfg.default_features   = ufeatures;
 
-	guest_default_cfg.features      = kfeatures;
+	/*
+	 * Ensure VCPU FPU container only reserves a space for guest-only
+	 * xfeatures. This distinction can save kernel memory by
+	 * maintaining a necessary amount of XSAVE buffer.
+	 */
+	guest_default_cfg.features      = kfeatures | xfeatures_mask_guest_supervisor();
 	guest_default_cfg.user_features = ufeatures;
 }
 
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 0fd34f53f025..8be3df4aa28b 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -61,6 +61,11 @@ static inline u64 xfeatures_mask_supervisor(void)
 	return fpu_kernel_cfg.max_features & XFEATURE_MASK_SUPERVISOR_SUPPORTED;
 }
 
+static inline u64 xfeatures_mask_guest_supervisor(void)
+{
+	return fpu_kernel_cfg.max_features & XFEATURE_MASK_GUEST_SUPERVISOR;
+}
+
 static inline u64 xfeatures_mask_independent(void)
 {
 	if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR))
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 7/7] x86/fpu/xstate: Add CET supervisor xfeature support as a guest-only feature
  2025-04-10  7:24 [PATCH v5 0/7] Introduce CET supervisor state support Chao Gao
                   ` (5 preceding siblings ...)
  2025-04-10  7:24 ` [PATCH v5 6/7] x86/fpu/xstate: Introduce "guest-only" supervisor xfeature set Chao Gao
@ 2025-04-10  7:24 ` Chao Gao
  2025-04-24 23:28 ` [PATCH v5 0/7] Introduce CET supervisor state support Edgecombe, Rick P
  7 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-04-10  7:24 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, tglx, dave.hansen, seanjc, pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp,
	chang.seok.bae, xin3.li, Chao Gao, Maxim Levitsky, Ingo Molnar,
	Dave Hansen, H. Peter Anvin, Samuel Holland, Mitchell Levy,
	Zhao Liu, Aruna Ramakrishna, Vignesh Balasubramanian

From: Yang Weijiang <weijiang.yang@intel.com>

== Background ==

CET defines two register states: CET user, which includes user-mode control
registers, and CET supervisor, which consists of shadow-stack pointers for
privilege levels 0-2.

Current kernels disable shadow stacks in kernel mode, making the CET
supervisor state unused and eliminating the need for context switching.

== Problem ==

To virtualize CET for guests, KVM must accurately emulate hardware
behavior. A key challenge arises because there is no CPUID flag to indicate
that shadow stack is supported only in user mode. Therefore, KVM cannot
assume guests will not enable shadow stacks in kernel mode and must
preserve the CET supervisor state of vCPUs.

== Solution ==

An initial proposal to manually save and restore CET supervisor states
using raw RDMSR/WRMSR in KVM was rejected due to performance concerns and
its impact on KVM's ABI. Instead, leveraging the kernel's FPU
infrastructure for context switching was favored [1].

The main question then became whether to enable the CET supervisor state
globally for all processes or restrict it to vCPU processes. This decision
involves a trade-off between a 24-byte XSTATE buffer waste for all non-vCPU
processes and approximately 100 lines of code complexity in the kernel [2].
The agreed approach is to first try this optimal solution [3], i.e.,
restricting the CET supervisor state to guest FPUs only and eliminating
unnecessary space waste.

The guest-only xfeature infrastructure has already been added. Now,
introduce CET supervisor xstate support as the first guest-only feature
to prepare for the upcoming CET virtualization in KVM.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/kvm/ZM1jV3UPL0AMpVDI@google.com/ [1]
Link: https://lore.kernel.org/kvm/1c2fd06e-2e97-4724-80ab-8695aa4334e7@intel.com/ [2]
Link: https://lore.kernel.org/kvm/2597a87b-1248-b8ce-ce60-94074bc67ea4@intel.com/ [3]
---
v5:
Introduce CET supervisor xfeature directly as a guest-only feature, rather
than first introducing it in one patch and then converting it to guest-only
in a subsequent patch. (Chang)
Add new features after cleanups/bug fixes (Chang, Dave, Ingo)
Improve the commit message to follow the suggested
background-problem-solution pattern.
---
 arch/x86/include/asm/fpu/types.h  | 14 ++++++++++++--
 arch/x86/include/asm/fpu/xstate.h |  5 ++---
 arch/x86/kernel/fpu/xstate.c      |  5 ++++-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 7494d732b296..c9b83beb6d74 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -118,7 +118,7 @@ enum xfeature {
 	XFEATURE_PKRU,
 	XFEATURE_PASID,
 	XFEATURE_CET_USER,
-	XFEATURE_CET_KERNEL_UNUSED,
+	XFEATURE_CET_KERNEL,
 	XFEATURE_RSRVD_COMP_13,
 	XFEATURE_RSRVD_COMP_14,
 	XFEATURE_LBR,
@@ -141,7 +141,7 @@ enum xfeature {
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
 #define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
 #define XFEATURE_MASK_CET_USER		(1 << XFEATURE_CET_USER)
-#define XFEATURE_MASK_CET_KERNEL	(1 << XFEATURE_CET_KERNEL_UNUSED)
+#define XFEATURE_MASK_CET_KERNEL	(1 << XFEATURE_CET_KERNEL)
 #define XFEATURE_MASK_LBR		(1 << XFEATURE_LBR)
 #define XFEATURE_MASK_XTILE_CFG		(1 << XFEATURE_XTILE_CFG)
 #define XFEATURE_MASK_XTILE_DATA	(1 << XFEATURE_XTILE_DATA)
@@ -266,6 +266,16 @@ struct cet_user_state {
 	u64 user_ssp;
 };
 
+/*
+ * State component 12 is Control-flow Enforcement supervisor states.
+ * This state includes SSP pointers for privilege levels 0 through 2.
+ */
+struct cet_supervisor_state {
+	u64 pl0_ssp;
+	u64 pl1_ssp;
+	u64 pl2_ssp;
+} __packed;
+
 /*
  * State component 15: Architectural LBR configuration state.
  * The size of Arch LBR state depends on the number of LBRs (lbr_depth).
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 62768d2131ec..86070ac1c708 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -46,7 +46,7 @@
 #define XFEATURE_MASK_USER_DYNAMIC	XFEATURE_MASK_XTILE_DATA
 
 /* Supervisor features which are enabled only in guest FPUs */
-#define XFEATURE_MASK_GUEST_SUPERVISOR	0
+#define XFEATURE_MASK_GUEST_SUPERVISOR	XFEATURE_MASK_CET_KERNEL
 
 /* All currently supported supervisor features */
 #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \
@@ -78,8 +78,7 @@
  * Unsupported supervisor features. When a supervisor feature in this mask is
  * supported in the future, move it to the supported supervisor feature mask.
  */
-#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \
-					      XFEATURE_MASK_CET_KERNEL)
+#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT)
 
 /* All supervisor states including supported and unsupported states. */
 #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c7db9f1407f5..e12df668291c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -56,7 +56,7 @@ static const char *xfeature_names[] =
 	"Protection Keys User registers",
 	"PASID state",
 	"Control-flow User registers",
-	"Control-flow Kernel registers (unused)",
+	"Control-flow Kernel registers (KVM only)",
 	"unknown xstate feature",
 	"unknown xstate feature",
 	"unknown xstate feature",
@@ -79,6 +79,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
 	[XFEATURE_PKRU]				= X86_FEATURE_OSPKE,
 	[XFEATURE_PASID]			= X86_FEATURE_ENQCMD,
 	[XFEATURE_CET_USER]			= X86_FEATURE_SHSTK,
+	[XFEATURE_CET_KERNEL]			= X86_FEATURE_SHSTK,
 	[XFEATURE_XTILE_CFG]			= X86_FEATURE_AMX_TILE,
 	[XFEATURE_XTILE_DATA]			= X86_FEATURE_AMX_TILE,
 };
@@ -369,6 +370,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
 	 XFEATURE_MASK_BNDCSR |			\
 	 XFEATURE_MASK_PASID |			\
 	 XFEATURE_MASK_CET_USER |		\
+	 XFEATURE_MASK_CET_KERNEL |		\
 	 XFEATURE_MASK_XTILE)
 
 /*
@@ -569,6 +571,7 @@ static bool __init check_xstate_against_struct(int nr)
 	case XFEATURE_PASID:	  return XCHECK_SZ(sz, nr, struct ia32_pasid_state);
 	case XFEATURE_XTILE_CFG:  return XCHECK_SZ(sz, nr, struct xtile_cfg);
 	case XFEATURE_CET_USER:	  return XCHECK_SZ(sz, nr, struct cet_user_state);
+	case XFEATURE_CET_KERNEL: return XCHECK_SZ(sz, nr, struct cet_supervisor_state);
 	case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
 	default:
 		XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
-- 
2.46.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/7] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm
  2025-04-10  7:24 ` [PATCH v5 1/7] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Chao Gao
@ 2025-04-18 20:50   ` Chang S. Bae
  0 siblings, 0 replies; 40+ messages in thread
From: Chang S. Bae @ 2025-04-18 20:50 UTC (permalink / raw)
  To: Chao Gao, x86, linux-kernel, kvm, tglx, dave.hansen, seanjc,
	pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp, xin3.li,
	Maxim Levitsky, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	Samuel Holland, Mitchell Levy, Vignesh Balasubramanian,
	Aruna Ramakrishna

On 4/10/2025 12:24 AM, Chao Gao wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> When granting userspace or a KVM guest access to an xfeature, preserve the
> entity's existing supervisor and software-defined permissions as tracked
> by __state_perm, i.e. use __state_perm to track *all* permissions even
> though all supported supervisor xfeatures are granted to all FPUs and
> FPU_GUEST_PERM_LOCKED disallows changing permissions.
> 
> Effectively clobbering supervisor permissions results in inconsistent
> behavior, as xstate_get_group_perm() will report supervisor features for
> process that do NOT request access to dynamic user xfeatures, whereas any
> and all supervisor features will be absent from the set of permissions for
> any process that is granted access to one or more dynamic xfeatures (which
> right now means AMX).
> 
> The inconsistency isn't problematic because fpu_xstate_prctl() already
> strips out everything except user xfeatures:
> 
>          case ARCH_GET_XCOMP_PERM:
>                  /*
>                   * Lockless snapshot as it can also change right after the
>                   * dropping the lock.
>                   */
>                  permitted = xstate_get_host_group_perm();
>                  permitted &= XFEATURE_MASK_USER_SUPPORTED;
>                  return put_user(permitted, uptr);
> 
>          case ARCH_GET_XCOMP_GUEST_PERM:
>                  permitted = xstate_get_guest_group_perm();
>                  permitted &= XFEATURE_MASK_USER_SUPPORTED;
>                  return put_user(permitted, uptr);
> 
> and similarly KVM doesn't apply the __state_perm to supervisor states
> (kvm_get_filtered_xcr0() incorporates xstate_get_guest_group_perm()):
> 
>          case 0xd: {
>                  u64 permitted_xcr0 = kvm_get_filtered_xcr0();
>                  u64 permitted_xss = kvm_caps.supported_xss;
> 
> But if KVM in particular were to ever change, dropping supervisor
> permissions would result in subtle bugs in KVM's reporting of supported
> CPUID settings.  And the above behavior also means that having supervisor
> xfeatures in __state_perm is correctly handled by all users.
> 
> Dropping supervisor permissions also creates another landmine for KVM.  If
> more dynamic user xfeatures are ever added, requesting access to multiple
> xfeatures in separate ARCH_REQ_XCOMP_GUEST_PERM calls will result in the
> second invocation of __xstate_request_perm() computing the wrong ksize, as
> as the mask passed to xstate_calculate_size() would not contain *any*
> supervisor features.
> 
> Commit 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTATE
> permissions") fudged around the size issue for userspace FPUs, but for
> reasons unknown skipped guest FPUs.  Lack of a fix for KVM "works" only
> because KVM doesn't yet support virtualizing features that have supervisor
> xfeatures, i.e. as of today, KVM guest FPUs will never need the relevant
> xfeatures.
> 
> Simply extending the hack-a-fix for guests would temporarily solve the
> ksize issue, but wouldn't address the inconsistency issue and would leave
> another lurking pitfall for KVM.  KVM support for virtualizing CET will
> likely add CET_KERNEL as a guest-only xfeature, i.e. CET_KERNEL will not
> be set in xfeatures_mask_supervisor() and would again be dropped when
> granting access to dynamic xfeatures.
> 
> Note, the existing clobbering behavior is rather subtle.  The @permitted
> parameter to __xstate_request_perm() comes from:
> 
> 	permitted = xstate_get_group_perm(guest);
> 
> which is either fpu->guest_perm.__state_perm or fpu->perm.__state_perm,
> where __state_perm is initialized to:
> 
>          fpu->perm.__state_perm          = fpu_kernel_cfg.default_features;
> 
> and copied to the guest side of things:
> 
> 	/* Same defaults for guests */
> 	fpu->guest_perm = fpu->perm;
> 
> fpu_kernel_cfg.default_features contains everything except the dynamic
> xfeatures, i.e. everything except XFEATURE_MASK_XTILE_DATA:
> 
>          fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features;
>          fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
> 
> When __xstate_request_perm() restricts the local "mask" variable to
> compute the user state size:
> 
> 	mask &= XFEATURE_MASK_USER_SUPPORTED;
> 	usize = xstate_calculate_size(mask, false);
> 
> it subtly overwrites the target __state_perm with "mask" containing only
> user xfeatures:
> 
> 	perm = guest ? &fpu->guest_perm : &fpu->perm;
> 	/* Pairs with the READ_ONCE() in xstate_get_group_perm() */
> 	WRITE_ONCE(perm->__state_perm, mask);
> 
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Cc: Weijiang Yang <weijiang.yang@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Chao Gao <chao.gao@intel.com>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Cc: John Allen <john.allen@amd.com>
> Cc: kvm@vger.kernel.org
> Link: https://lore.kernel.org/all/ZTqgzZl-reO1m01I@google.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Acked-by: Dave Hansen <dave.hansen@intel.com>

The code change looks reasonable to me.

While the changelog is a bit TL;DR, I understand and respect the intent 
to keep the full context, especially given the KVM maintainer's 
preference. So, feel free to add my tag:

     Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/7] x86/fpu: Drop @perm from guest pseudo FPU container
  2025-04-10  7:24 ` [PATCH v5 2/7] x86/fpu: Drop @perm from guest pseudo FPU container Chao Gao
@ 2025-04-18 20:51   ` Chang S. Bae
  2025-04-18 20:54     ` Chang S. Bae
  2025-04-19  1:01     ` Chao Gao
  0 siblings, 2 replies; 40+ messages in thread
From: Chang S. Bae @ 2025-04-18 20:51 UTC (permalink / raw)
  To: Chao Gao, x86, linux-kernel, kvm, tglx, dave.hansen, seanjc,
	pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp, xin3.li,
	Maxim Levitsky, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	Samuel Holland, Mitchell Levy, Stanislav Spassov, Eric Biggers

On 4/10/2025 12:24 AM, Chao Gao wrote:
> Remove @perm from the guest pseudo FPU container. The field is
> initialized during allocation and never used later.
> 
> Rename fpu_init_guest_permissions() to show that its sole purpose is to
> lock down guest permissions.
> 
> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>

This patch appears to be new in V3, as I can see from the diff here:

https://github.com/ChangSeokBae/kernel/compare/xstate-scet-chao-v2...xstate-scet-chao-v3

However, I don’t see any relevant comment from Maxim on your V2 series. 
Unlike patch 1, this one doesn’t include a URL referencing the 
suggestion either -- so I suspect the Suggested-by tag might be incorrect.
> @@ -255,7 +252,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
>   	if (WARN_ON_ONCE(fpu_user_cfg.default_size > gfpu->uabi_size))
>   		gfpu->uabi_size = fpu_user_cfg.default_size;
>   
> -	fpu_init_guest_permissions(gfpu);
> +	fpu_lock_guest_permissions();

As a future improvement, you might consider updating this to:

     if (xstate_get_guest_group_perm() & FPU_GUEST_PERM_LOCKED)
         fpu_lock_guest_permissions();

Or, embed the check inside fpu_lock_guest_permissions():

     if (xstate_get_guest_group_perm() & FPU_GUEST_PERM_LOCKED)
         return;

But for this patch itself, the change looks good to me. Please feel free 
to add my tag:

     Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com>

Thanks,
Chang

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/7] x86/fpu: Drop @perm from guest pseudo FPU container
  2025-04-18 20:51   ` Chang S. Bae
@ 2025-04-18 20:54     ` Chang S. Bae
  2025-04-19  1:01     ` Chao Gao
  1 sibling, 0 replies; 40+ messages in thread
From: Chang S. Bae @ 2025-04-18 20:54 UTC (permalink / raw)
  To: Chao Gao, x86, linux-kernel, kvm, tglx, dave.hansen, seanjc,
	pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp, xin3.li,
	Maxim Levitsky, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	Samuel Holland, Mitchell Levy, Stanislav Spassov, Eric Biggers

On 4/18/2025 1:51 PM, Chang S. Bae wrote:
>      if (xstate_get_guest_group_perm() & FPU_GUEST_PERM_LOCKED)
>          fpu_lock_guest_permissions();

Sorry, this should be:

     if (!(xstate_get_guest_group_perm() & FPU_GUEST_PERM_LOCKED))
         fpu_lock_guest_permissions();

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/7] x86/fpu: Drop @perm from guest pseudo FPU container
  2025-04-18 20:51   ` Chang S. Bae
  2025-04-18 20:54     ` Chang S. Bae
@ 2025-04-19  1:01     ` Chao Gao
  1 sibling, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-04-19  1:01 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, linux-kernel, kvm, tglx, dave.hansen, seanjc, pbonzini,
	peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp, xin3.li,
	Maxim Levitsky, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	Samuel Holland, Mitchell Levy, Stanislav Spassov, Eric Biggers

On Fri, Apr 18, 2025 at 01:51:02PM -0700, Chang S. Bae wrote:
>On 4/10/2025 12:24 AM, Chao Gao wrote:
>> Remove @perm from the guest pseudo FPU container. The field is
>> initialized during allocation and never used later.
>> 
>> Rename fpu_init_guest_permissions() to show that its sole purpose is to
>> lock down guest permissions.
>> 
>> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
>
>This patch appears to be new in V3, as I can see from the diff here:
>
>https://github.com/ChangSeokBae/kernel/compare/xstate-scet-chao-v2...xstate-scet-chao-v3
>
>However, I don’t see any relevant comment from Maxim on your V2 series.
>Unlike patch 1, this one doesn’t include a URL referencing the suggestion
>either -- so I suspect the Suggested-by tag might be incorrect.

v3 was the version where I truly began refining the patches based on my
understanding, the historical discussion, and feedback on v2 [*]. While
reviewing the historical discussion, I found Maxim's suggestion to be
valuable:

https://lore.kernel.org/kvm/af972fe5981b9e7101b64de43c7be0a8cc165323.camel@redhat.com/

So, I implemented it in v3, but I should have included the link.

[*] v2 was simply a resend of v1
    https://lore.kernel.org/kvm/20241126101710.62492-1-chao.gao@intel.com/

>> @@ -255,7 +252,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
>>   	if (WARN_ON_ONCE(fpu_user_cfg.default_size > gfpu->uabi_size))
>>   		gfpu->uabi_size = fpu_user_cfg.default_size;
>> -	fpu_init_guest_permissions(gfpu);
>> +	fpu_lock_guest_permissions();
>
>As a future improvement, you might consider updating this to:
>
>    if (xstate_get_guest_group_perm() & FPU_GUEST_PERM_LOCKED)
>        fpu_lock_guest_permissions();
>
>Or, embed the check inside fpu_lock_guest_permissions():
>
>    if (xstate_get_guest_group_perm() & FPU_GUEST_PERM_LOCKED)
>        return;
>
>But for this patch itself, the change looks good to me. Please feel free to
>add my tag:
>
>    Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com>

Thanks a lot.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-10  7:24 ` [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs Chao Gao
@ 2025-04-24 22:52   ` Edgecombe, Rick P
  2025-04-25  8:24     ` Chao Gao
  2025-05-01 14:24   ` Chang S. Bae
  1 sibling, 1 reply; 40+ messages in thread
From: Edgecombe, Rick P @ 2025-04-24 22:52 UTC (permalink / raw)
  To: Gao, Chao, Hansen, Dave, seanjc@google.com, x86@kernel.org,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	kvm@vger.kernel.org, pbonzini@redhat.com
  Cc: ebiggers@google.com, dave.hansen@linux.intel.com,
	Spassov, Stanislav, levymitchell0@gmail.com,
	samuel.holland@sifive.com, Li, Xin3, Yang, Weijiang,
	mingo@redhat.com, mlevitsk@redhat.com, john.allen@amd.com,
	Bae, Chang Seok, vigbalas@amd.com, hpa@zytor.com,
	peterz@infradead.org, aruna.ramakrishna@oracle.com, bp@alien8.de

On Thu, 2025-04-10 at 15:24 +0800, Chao Gao wrote:
> +
> +	/*
> +	 * @user_size:
> +	 *
> +	 * The default UABI size of the register state buffer in guest
> +	 * FPUs. Includes all supported user features except independent
> +	 * managed features and features which have to be requested by
> +	 * user space before usage.
> +	 */
> +	unsigned int user_size;
> +
> +	/*
> +	 * @features:
> +	 *
> +	 * The default supported features bitmap in guest FPUs. Does not
> +	 * include independent managed features and features which have to
> +	 * be requested by user space before usage.
> +	 */
> +	u64 features;
> +
> +	/*
> +	 * @user_features:
> +	 *
> +	 * Same as @features except only user xfeatures are included.
> +	 */
> +	u64 user_features;
> +};

Tracing through the code, it seems that fpu_user_cfg.default_features and
guest_default_cfg.user_features are the same, leading to
fpu_user_cfg.default_size and guest_default_cfg.user_size being also the same.

In the later patches, it doesn't seem to change the "user" parts. These
configurations end up controlling the default size and features that gets copied
to userspace in KVM_SET_XSAVE. I guess today there is only one default size and
feature set for xstate copied to userspace. The suggestion from Chang was that
it makes the code more readable, but it seems like it also breaks apart a
unified concept for no functional benefit.

Maybe we don't need user_features or user_size here in vcpu_fpu_config? Or did I
get lost somewhere along the way in all the twists and turns that features and
sizes go through.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 6/7] x86/fpu/xstate: Introduce "guest-only" supervisor xfeature set
  2025-04-10  7:24 ` [PATCH v5 6/7] x86/fpu/xstate: Introduce "guest-only" supervisor xfeature set Chao Gao
@ 2025-04-24 22:58   ` Edgecombe, Rick P
  0 siblings, 0 replies; 40+ messages in thread
From: Edgecombe, Rick P @ 2025-04-24 22:58 UTC (permalink / raw)
  To: Gao, Chao, Hansen, Dave, seanjc@google.com, x86@kernel.org,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	kvm@vger.kernel.org, pbonzini@redhat.com
  Cc: dave.hansen@linux.intel.com, Bae, Chang Seok,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	Yang, Weijiang, mingo@redhat.com, mlevitsk@redhat.com,
	john.allen@amd.com, vigbalas@amd.com, hpa@zytor.com,
	peterz@infradead.org, aruna.ramakrishna@oracle.com, bp@alien8.de,
	Liu, Zhao1, ubizjak@gmail.com

On Thu, 2025-04-10 at 15:24 +0800, Chao Gao wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
> 
> In preparation for upcoming CET virtualization support, the CET supervisor
> state will be added as a "guest-only" feature, since it is required only by
> KVM (i.e., guest FPUs). Establish the infrastructure for "guest-only"
> features.
> 
> Define a new XFEATURE_MASK_GUEST_SUPERVISOR mask to specify features that
> are enabled by default in guest FPUs but not in host FPUs. Specifically,
> for any bit in this set, permission is granted and XSAVE space is allocated
> during vCPU creation. Non-guest FPUs cannot enable guest-only features,
> even dynamically, and no XSAVE space will be allocated for them.
> 
> The mask is currently empty, but this will be changed by a subsequent
> patch.
> 
> Note that there is no plan to add "guest-only" user xfeatures, so the user
> default features remain unchanged.
> 
> Co-developed-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/7] Introduce CET supervisor state support
  2025-04-10  7:24 [PATCH v5 0/7] Introduce CET supervisor state support Chao Gao
                   ` (6 preceding siblings ...)
  2025-04-10  7:24 ` [PATCH v5 7/7] x86/fpu/xstate: Add CET supervisor xfeature support as a guest-only feature Chao Gao
@ 2025-04-24 23:28 ` Edgecombe, Rick P
  7 siblings, 0 replies; 40+ messages in thread
From: Edgecombe, Rick P @ 2025-04-24 23:28 UTC (permalink / raw)
  To: Gao, Chao, Hansen, Dave, seanjc@google.com, x86@kernel.org,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	kvm@vger.kernel.org, pbonzini@redhat.com
  Cc: ebiggers@google.com, Bae, Chang Seok, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	Yang, Weijiang, mingo@redhat.com, mlevitsk@redhat.com,
	john.allen@amd.com, dave.hansen@linux.intel.com, vigbalas@amd.com,
	hpa@zytor.com, peterz@infradead.org, aruna.ramakrishna@oracle.com,
	bp@alien8.de, Liu, Zhao1, ubizjak@gmail.com

On Thu, 2025-04-10 at 15:24 +0800, Chao Gao wrote:
> == Performance ==
> 
> We measured context-switching performance with the benchmark [4] in following
> three cases.
> 
> case 1: the baseline. i.e., this series isn't applied
> case 2: baseline + this series. CET-S space is allocated for guest fpu only.
> case 3: baseline + allocate CET-S space for all tasks. Hardware init
>         optimization avoids writing out CET-S space on each XSAVES.
> 
> The performance differences in the three cases are very small and fall within the
> run-to-run variation.

It seems like this dilemma is settled.

I had the question about why we need a separate guest user features and size.
Otherwise it looks pretty good to me. I'll return to review the rest after the
answer for the need for separate guest default user features.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-24 22:52   ` Edgecombe, Rick P
@ 2025-04-25  8:24     ` Chao Gao
  2025-04-25 16:09       ` Edgecombe, Rick P
  0 siblings, 1 reply; 40+ messages in thread
From: Chao Gao @ 2025-04-25  8:24 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Hansen, Dave, seanjc@google.com, x86@kernel.org,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	kvm@vger.kernel.org, pbonzini@redhat.com, ebiggers@google.com,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	Yang, Weijiang, mingo@redhat.com, mlevitsk@redhat.com,
	john.allen@amd.com, Bae, Chang Seok, vigbalas@amd.com,
	hpa@zytor.com, peterz@infradead.org, aruna.ramakrishna@oracle.com,
	bp@alien8.de

On Fri, Apr 25, 2025 at 06:52:59AM +0800, Edgecombe, Rick P wrote:
>On Thu, 2025-04-10 at 15:24 +0800, Chao Gao wrote:
>> +
>> +	/*
>> +	 * @user_size:
>> +	 *
>> +	 * The default UABI size of the register state buffer in guest
>> +	 * FPUs. Includes all supported user features except independent
>> +	 * managed features and features which have to be requested by
>> +	 * user space before usage.
>> +	 */
>> +	unsigned int user_size;
>> +
>> +	/*
>> +	 * @features:
>> +	 *
>> +	 * The default supported features bitmap in guest FPUs. Does not
>> +	 * include independent managed features and features which have to
>> +	 * be requested by user space before usage.
>> +	 */
>> +	u64 features;
>> +
>> +	/*
>> +	 * @user_features:
>> +	 *
>> +	 * Same as @features except only user xfeatures are included.
>> +	 */
>> +	u64 user_features;
>> +};
>
>Tracing through the code, it seems that fpu_user_cfg.default_features and
>guest_default_cfg.user_features are the same, leading to
>fpu_user_cfg.default_size and guest_default_cfg.user_size being also the same.

Right. This is primarily for readability and symmetry.

I slightly prefer __guest_fpstate_reset() in this series:

	fpstate->size		= guest_default_cfg.size;
	fpstate->user_size	= guest_default_cfg.user_size;
	fpstate->xfeatures	= guest_default_cfg.features;
	fpstate->user_xfeatures	= guest_default_cfg.user_features;

over this version:

	fpstate->size		= guest_default_cfg.size;
	fpstate->xfeatures	= guest_default_cfg.features;

	/*
	 * use fpu_user_cfg for user_* settings for compatibility of exiting
	 * uAPIs.
	 */
	fpstate->user_size	= fpu_user_cfg.user_size;
	fpstate->user_xfeatures	= fpu_user_cfg.default_features;

Referencing different structures for size/xfeatures and their user_*
counterparts is not elegant to me. The need for a comment indicates that
this chunk may cause confusion. And this pattern will repeat when
initializing fpu->guest_perm in fpstate_reset().

>
>In the later patches, it doesn't seem to change the "user" parts. These
>configurations end up controlling the default size and features that gets copied
>to userspace in KVM_SET_XSAVE. I guess today there is only one default size and
>feature set for xstate copied to userspace. The suggestion from Chang was that
>it makes the code more readable, but it seems like it also breaks apart a
>unified concept for no functional benefit.

In the future, the feature and size of the uABI buffer for guest FPUs may
differ from those of non-guest FPUs. Sean rejected the idea of saving/restoring
CET_S xstate in KVM partly because:

 :Especially because another big negative is that not utilizing XSTATE bleeds into
 :KVM's ABI.  Userspace has to be told to manually save+restore MSRs instead of just
 :letting KVM_{G,S}ET_XSAVE handle the state.  And that will create a bit of a
 :snafu if Linux does gain support for SSS.

*: https://lore.kernel.org/kvm/ZM1jV3UPL0AMpVDI@google.com/

[To be clear, it is not an issue caused by Chang's suggestion. v4 which adds
new members @guest_size @guest_default_features to fpu_state_config has the
same problem. i.e., fpu_user_cfg.guest_default_feaures is identical to
fpu_user_cfg.default_features, adding no functional benefit.]

>
>Maybe we don't need user_features or user_size here in vcpu_fpu_config? Or did I

I don't have a strong opinion on this. I am ok with dropping them. Do you have
a strong preference?

>get lost somewhere along the way in all the twists and turns that features and
>sizes go through.

No, your analysis is correct.

>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-25  8:24     ` Chao Gao
@ 2025-04-25 16:09       ` Edgecombe, Rick P
  2025-04-25 23:48         ` Sean Christopherson
  2025-04-28  5:51         ` Xin Li
  0 siblings, 2 replies; 40+ messages in thread
From: Edgecombe, Rick P @ 2025-04-25 16:09 UTC (permalink / raw)
  To: Gao, Chao
  Cc: kvm@vger.kernel.org, ebiggers@google.com, Hansen, Dave,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, Yang, Weijiang, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, seanjc@google.com, Bae, Chang Seok,
	vigbalas@amd.com, peterz@infradead.org, hpa@zytor.com,
	bp@alien8.de, aruna.ramakrishna@oracle.com, x86@kernel.org

On Fri, 2025-04-25 at 16:24 +0800, Chao Gao wrote:
> > 
> > In the later patches, it doesn't seem to change the "user" parts. These
> > configurations end up controlling the default size and features that gets
> > copied
> > to userspace in KVM_SET_XSAVE. I guess today there is only one default size
> > and
> > feature set for xstate copied to userspace. The suggestion from Chang was
> > that
> > it makes the code more readable, but it seems like it also breaks apart a
> > unified concept for no functional benefit.
> 
> In the future, the feature and size of the uABI buffer for guest FPUs may
> differ from those of non-guest FPUs. Sean rejected the idea of
> saving/restoring
> CET_S xstate in KVM partly because:
> 
>  :Especially because another big negative is that not utilizing XSTATE bleeds
> into
>  :KVM's ABI.  Userspace has to be told to manually save+restore MSRs instead
> of just
>  :letting KVM_{G,S}ET_XSAVE handle the state.

Hmm, interesting. I guess there are two things.
1. Should CET_S be part of KVM_GET_XSAVE instead of via MSRs ioctls? It never
was in the KVM CET patches.
2. A feature mask far away in the FPU code controls KVM's xsave ABI.

For (1), does any userspace depend on their not being supervisor features? (i.e.
tries to restore the guest FPU for emulation or something). There probably are
some advantages to keeping supervisor features out of it, or at least a separate
ioctl. (2) is an existing problem. But if we think KVM should have its own
feature set of bits for ABI purposes, it seems like maybe it should have some
dedicated consideration. 

I'd think we should not try to address it in this series. Let's stick to what
the current CET KVM series needs. Changing KVM_GET_XSAVE behavior or cleanup
could be a separate series.

>   And that will create a bit of a
>  :snafu if Linux does gain support for SSS.
> 
> *: https://lore.kernel.org/kvm/ZM1jV3UPL0AMpVDI@google.com/

I chatted with Xin about this a few weeks ago. It sounds like FRED bare metal
SSS will not need CET_S state, but it wasn't 100% clear.

> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-25 16:09       ` Edgecombe, Rick P
@ 2025-04-25 23:48         ` Sean Christopherson
  2025-04-28  3:26           ` Chao Gao
                             ` (2 more replies)
  2025-04-28  5:51         ` Xin Li
  1 sibling, 3 replies; 40+ messages in thread
From: Sean Christopherson @ 2025-04-25 23:48 UTC (permalink / raw)
  To: Rick P Edgecombe
  Cc: Chao Gao, kvm@vger.kernel.org, ebiggers@google.com, Dave Hansen,
	dave.hansen@linux.intel.com, Stanislav Spassov,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Xin3 Li,
	linux-kernel@vger.kernel.org, Weijiang Yang, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, Chang Seok Bae, vigbalas@amd.com,
	peterz@infradead.org, hpa@zytor.com, bp@alien8.de,
	aruna.ramakrishna@oracle.com, x86@kernel.org

On Fri, Apr 25, 2025, Rick P Edgecombe wrote:
> On Fri, 2025-04-25 at 16:24 +0800, Chao Gao wrote:
> > > 
> > > In the later patches, it doesn't seem to change the "user" parts. These
> > > configurations end up controlling the default size and features that gets
> > > copied
> > > to userspace in KVM_SET_XSAVE. I guess today there is only one default size
> > > and
> > > feature set for xstate copied to userspace. The suggestion from Chang was
> > > that
> > > it makes the code more readable, but it seems like it also breaks apart a
> > > unified concept for no functional benefit.
> > 
> > In the future, the feature and size of the uABI buffer for guest FPUs may
> > differ from those of non-guest FPUs. Sean rejected the idea of
> > saving/restoring
> > CET_S xstate in KVM partly because:
> > 
> >  :Especially because another big negative is that not utilizing XSTATE bleeds
> > into
> >  :KVM's ABI.  Userspace has to be told to manually save+restore MSRs instead
> > of just
> >  :letting KVM_{G,S}ET_XSAVE handle the state.
> 
> Hmm, interesting. I guess there are two things.
> 1. Should CET_S be part of KVM_GET_XSAVE instead of via MSRs ioctls? It never
> was in the KVM CET patches.
> 2. A feature mask far away in the FPU code controls KVM's xsave ABI.
> 
> For (1), does any userspace depend on their not being supervisor features? (i.e.
> tries to restore the guest FPU for emulation or something). There probably are
> some advantages to keeping supervisor features out of it, or at least a separate
> ioctl.

CET_S probably shouldn't be in XSAVE ABI, because that would technically leak
kernel state to userspace for the non-KVM use case.  I assume the kernel has
bigger problems if CET_S is somehow tied to a userspace task.

For KVM, it's just the one MSR, and KVM needs to support save/restore of that MSR
no matter what, so supporting it via XSAVE would be more work, a bit sketchy, and
create yet another way for userspace to do weird things when saving/restoring vCPU
state.

> (2) is an existing problem. But if we think KVM should have its own
> feature set of bits for ABI purposes, it seems like maybe it should have some
> dedicated consideration. 

Nah, don't bother.  The kernel needs to solve the exact same problems for the
signal ABI, I don't see any reason to generate more work.  From a validation
coverage perspective, I see a lot of value in shared code.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-25 23:48         ` Sean Christopherson
@ 2025-04-28  3:26           ` Chao Gao
  2025-04-28  7:44             ` Xin Li
  2025-04-28 14:28             ` Sean Christopherson
  2025-04-28  6:31           ` Xin Li
  2025-04-28 15:42           ` Edgecombe, Rick P
  2 siblings, 2 replies; 40+ messages in thread
From: Chao Gao @ 2025-04-28  3:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Rick P Edgecombe, kvm@vger.kernel.org, ebiggers@google.com,
	Dave Hansen, dave.hansen@linux.intel.com, Stanislav Spassov,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Xin3 Li,
	linux-kernel@vger.kernel.org, Weijiang Yang, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, Chang Seok Bae, vigbalas@amd.com,
	peterz@infradead.org, hpa@zytor.com, bp@alien8.de,
	aruna.ramakrishna@oracle.com, x86@kernel.org

>> Hmm, interesting. I guess there are two things.
>> 1. Should CET_S be part of KVM_GET_XSAVE instead of via MSRs ioctls? It never
>> was in the KVM CET patches.
>> 2. A feature mask far away in the FPU code controls KVM's xsave ABI.
>> 
>> For (1), does any userspace depend on their not being supervisor features? (i.e.
>> tries to restore the guest FPU for emulation or something). There probably are
>> some advantages to keeping supervisor features out of it, or at least a separate
>> ioctl.
>
>CET_S probably shouldn't be in XSAVE ABI, because that would technically leak
>kernel state to userspace for the non-KVM use case.

ok. thanks for the clarification.

>I assume the kernel has
>bigger problems if CET_S is somehow tied to a userspace task.

To be clear, CET_S here refers to the CET supervisor state, which includes SSP
pointers for privilege levels 0 through 2. The IA32_S_CET MSR is not part of
that state.

>
>For KVM, it's just the one MSR, and KVM needs to support save/restore of that MSR
>no matter what, so supporting it via XSAVE would be more work, a bit sketchy, and
>create yet another way for userspace to do weird things when saving/restoring vCPU
>state.

Agreed. One more issue of including CET_S into KVM_GET/SET_XSAVE{2} is:

XSAVE UABI buffers adhere to the standard format defined by the SDM, which
never includes supervisor states. Attempting to incorporate supervisor states
into UABI buffers would lead to many issues, such as deviating from the
standard format and the need to define offsets for each supervisor state.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-25 16:09       ` Edgecombe, Rick P
  2025-04-25 23:48         ` Sean Christopherson
@ 2025-04-28  5:51         ` Xin Li
  2025-04-28  6:12           ` Xin Li
  1 sibling, 1 reply; 40+ messages in thread
From: Xin Li @ 2025-04-28  5:51 UTC (permalink / raw)
  To: Edgecombe, Rick P, Gao, Chao
  Cc: kvm@vger.kernel.org, ebiggers@google.com, Hansen, Dave,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, Yang, Weijiang, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, seanjc@google.com, Bae, Chang Seok,
	vigbalas@amd.com, peterz@infradead.org, hpa@zytor.com,
	bp@alien8.de, aruna.ramakrishna@oracle.com, x86@kernel.org

On 4/25/2025 9:09 AM, Edgecombe, Rick P wrote:
>>    And that will create a bit of a
>>   :snafu if Linux does gain support for SSS.
>>
>> *:https://lore.kernel.org/kvm/ZM1jV3UPL0AMpVDI@google.com/
> I chatted with Xin about this a few weeks ago. It sounds like FRED bare metal
> SSS will not need CET_S state, but it wasn't 100% clear.

FRED reuses one CET_S MSR IA32_PL0_SSP, and give it an alias
IA32_FRED_SSP0.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-28  5:51         ` Xin Li
@ 2025-04-28  6:12           ` Xin Li
  0 siblings, 0 replies; 40+ messages in thread
From: Xin Li @ 2025-04-28  6:12 UTC (permalink / raw)
  To: Edgecombe, Rick P, Gao, Chao
  Cc: kvm@vger.kernel.org, ebiggers@google.com, Hansen, Dave,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, Yang, Weijiang, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, seanjc@google.com, Bae, Chang Seok,
	vigbalas@amd.com, peterz@infradead.org, hpa@zytor.com,
	bp@alien8.de, aruna.ramakrishna@oracle.com, x86@kernel.org

On 4/27/2025 10:51 PM, Xin Li wrote:
> On 4/25/2025 9:09 AM, Edgecombe, Rick P wrote:
>>>    And that will create a bit of a
>>>   :snafu if Linux does gain support for SSS.
>>>
>>> *:https://lore.kernel.org/kvm/ZM1jV3UPL0AMpVDI@google.com/
>> I chatted with Xin about this a few weeks ago. It sounds like FRED 
>> bare metal
>> SSS will not need CET_S state, but it wasn't 100% clear.
> 
> FRED reuses one CET_S MSR IA32_PL0_SSP, and give it an alias
> IA32_FRED_SSP0.

Native use of IA32_FRED_SSP0 is very much like IA32_FRED_RSP0:

1) Both are per-task constants.

2) Both are only used for delivering events when running userspace.


IA32_FRED_RSP0 is set on return to userspace:

https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe85ee391966c4cf3bfe1c405314e894c951f521


I suppose we'll likely apply the same approach to IA32_FRED_SSP0 if we
plan to enable SSS for the kernel.  This won't add any extra maintenance
cost, as both x86 and KVM maintainers are well aware.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-25 23:48         ` Sean Christopherson
  2025-04-28  3:26           ` Chao Gao
@ 2025-04-28  6:31           ` Xin Li
  2025-04-28 15:42           ` Edgecombe, Rick P
  2 siblings, 0 replies; 40+ messages in thread
From: Xin Li @ 2025-04-28  6:31 UTC (permalink / raw)
  To: Sean Christopherson, Rick P Edgecombe
  Cc: Chao Gao, kvm@vger.kernel.org, ebiggers@google.com, Dave Hansen,
	dave.hansen@linux.intel.com, Stanislav Spassov,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Xin3 Li,
	linux-kernel@vger.kernel.org, Weijiang Yang, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, Chang Seok Bae, vigbalas@amd.com,
	peterz@infradead.org, hpa@zytor.com, bp@alien8.de,
	aruna.ramakrishna@oracle.com, x86@kernel.org

On 4/25/2025 4:48 PM, Sean Christopherson wrote:
> CET_S probably shouldn't be in XSAVE ABI, because that would technically leak
> kernel state to userspace for the non-KVM use case.  I assume the kernel has
> bigger problems if CET_S is somehow tied to a userspace task.

+1000

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-28  3:26           ` Chao Gao
@ 2025-04-28  7:44             ` Xin Li
  2025-04-28 14:28             ` Sean Christopherson
  1 sibling, 0 replies; 40+ messages in thread
From: Xin Li @ 2025-04-28  7:44 UTC (permalink / raw)
  To: Chao Gao, Sean Christopherson
  Cc: Rick P Edgecombe, kvm@vger.kernel.org, ebiggers@google.com,
	Dave Hansen, dave.hansen@linux.intel.com, Stanislav Spassov,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Xin3 Li,
	linux-kernel@vger.kernel.org, Weijiang Yang, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, Chang Seok Bae, vigbalas@amd.com,
	peterz@infradead.org, hpa@zytor.com, bp@alien8.de,
	aruna.ramakrishna@oracle.com, x86@kernel.org

On 4/27/2025 8:26 PM, Chao Gao wrote:
>> For KVM, it's just the one MSR, and KVM needs to support save/restore of that MSR
>> no matter what, so supporting it via XSAVE would be more work, a bit sketchy, and
>> create yet another way for userspace to do weird things when saving/restoring vCPU
>> state.
> Agreed. One more issue of including CET_S into KVM_GET/SET_XSAVE{2} is:
> 
> XSAVE UABI buffers adhere to the standard format defined by the SDM, which
> never includes supervisor states. Attempting to incorporate supervisor states
> into UABI buffers would lead to many issues, such as deviating from the
> standard format and the need to define offsets for each supervisor state.

Good point.

FRED RSPs are in the supervisor state and are not defined within the
XSAVE area.  Their save/restore operations need to be explicitly added 
one by one.

Would it be sensible to introduce a KVM supervisor MSR state that
includes new MSRs?

This approach is similar to what KVM has with its CPUID API, allowing
batch MSR set/get operations and improving MSR code sharing.  Since the
FRED state is entirely defined in MSRs (except for CR4.FRED), this
should simplify the context save and restore for FRED.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-28  3:26           ` Chao Gao
  2025-04-28  7:44             ` Xin Li
@ 2025-04-28 14:28             ` Sean Christopherson
  1 sibling, 0 replies; 40+ messages in thread
From: Sean Christopherson @ 2025-04-28 14:28 UTC (permalink / raw)
  To: Chao Gao
  Cc: Rick P Edgecombe, kvm@vger.kernel.org, ebiggers@google.com,
	Dave Hansen, dave.hansen@linux.intel.com, Stanislav Spassov,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Xin3 Li,
	linux-kernel@vger.kernel.org, Weijiang Yang, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, Chang Seok Bae, vigbalas@amd.com,
	peterz@infradead.org, hpa@zytor.com, bp@alien8.de,
	aruna.ramakrishna@oracle.com, x86@kernel.org

On Mon, Apr 28, 2025, Chao Gao wrote:
> >I assume the kernel has bigger problems if CET_S is somehow tied to a
> >userspace task.
> 
> To be clear, CET_S here refers to the CET supervisor state, which includes SSP
> pointers for privilege levels 0 through 2. The IA32_S_CET MSR is not part of
> that state.
> 
> >
> >For KVM, it's just the one MSR, and KVM needs to support save/restore of that MSR
> >no matter what, 

Oh, it's not just one MSR.  I was indeed thinking this was just IA32_S_CET.  But
lucky for me, the statement holds for SSP0-SS2.

> so supporting it via XSAVE would be more work, a bit sketchy, and
> >create yet another way for userspace to do weird things when saving/restoring vCPU
> >state.
> 
> Agreed. One more issue of including CET_S into KVM_GET/SET_XSAVE{2} is:
> 
> XSAVE UABI buffers adhere to the standard format defined by the SDM, which
> never includes supervisor states. Attempting to incorporate supervisor states
> into UABI buffers would lead to many issues, such as deviating from the
> standard format and the need to define offsets for each supervisor state.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-25 23:48         ` Sean Christopherson
  2025-04-28  3:26           ` Chao Gao
  2025-04-28  6:31           ` Xin Li
@ 2025-04-28 15:42           ` Edgecombe, Rick P
  2025-04-29  1:11             ` Chang S. Bae
  2 siblings, 1 reply; 40+ messages in thread
From: Edgecombe, Rick P @ 2025-04-28 15:42 UTC (permalink / raw)
  To: seanjc@google.com
  Cc: kvm@vger.kernel.org, ebiggers@google.com, Hansen, Dave,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, Yang, Weijiang, mingo@redhat.com,
	pbonzini@redhat.com, mlevitsk@redhat.com, john.allen@amd.com,
	tglx@linutronix.de, Bae, Chang Seok, vigbalas@amd.com,
	peterz@infradead.org, hpa@zytor.com, Gao, Chao, bp@alien8.de,
	aruna.ramakrishna@oracle.com, x86@kernel.org

On Fri, 2025-04-25 at 16:48 -0700, Sean Christopherson wrote:
> > (2) is an existing problem. But if we think KVM should have its own
> > feature set of bits for ABI purposes, it seems like maybe it should have
> > some
> > dedicated consideration. 
> 
> Nah, don't bother.  The kernel needs to solve the exact same problems for the
> signal ABI, I don't see any reason to generate more work.  From a validation
> coverage perspective, I see a lot of value in shared code.

Right, so there should be no need to keep a separate features and buffer size
for KVM's xsave UABI, as this patch does. Let's just leave it using the core
kernels UABI version.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-28 15:42           ` Edgecombe, Rick P
@ 2025-04-29  1:11             ` Chang S. Bae
  2025-04-29  2:50               ` Edgecombe, Rick P
  0 siblings, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-04-29  1:11 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kvm@vger.kernel.org, ebiggers@google.com, Hansen, Dave,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, Yang, Weijiang, mingo@redhat.com,
	pbonzini@redhat.com, mlevitsk@redhat.com, john.allen@amd.com,
	tglx@linutronix.de, vigbalas@amd.com, peterz@infradead.org,
	hpa@zytor.com, Gao, Chao, bp@alien8.de,
	aruna.ramakrishna@oracle.com, x86@kernel.org, seanjc@google.com

On 4/28/2025 8:42 AM, Edgecombe, Rick P wrote:
> 
> Right, so there should be no need to keep a separate features and buffer size
> for KVM's xsave UABI, as this patch does. Let's just leave it using the core
> kernels UABI version.

Hmm, why so?

As I see it, the vcpu->arch.guest_fpu structure is already exposed to 
KVM. This series doesn’t modify those structures (fpu_guest and 
fpstate), other than removing a dead field (patch 2).

Both ->usersize and ->user_xfeatures fields are already exposed -- 
currently KVM just doesn’t reference them at all.

All the changes introduced here are transparent to KVM. Organizing the 
initial values and wiring up guest_perm and fpstate are entirely 
internal to the x86 core, no?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-29  1:11             ` Chang S. Bae
@ 2025-04-29  2:50               ` Edgecombe, Rick P
  2025-04-29  3:22                 ` Chang S. Bae
  0 siblings, 1 reply; 40+ messages in thread
From: Edgecombe, Rick P @ 2025-04-29  2:50 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: ebiggers@google.com, kvm@vger.kernel.org, Hansen, Dave,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, mingo@redhat.com, Yang, Weijiang,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, vigbalas@amd.com, hpa@zytor.com,
	peterz@infradead.org, aruna.ramakrishna@oracle.com, Gao, Chao,
	bp@alien8.de, seanjc@google.com, x86@kernel.org

On Mon, 2025-04-28 at 18:11 -0700, Chang S. Bae wrote:
> On 4/28/2025 8:42 AM, Edgecombe, Rick P wrote:
> > 
> > Right, so there should be no need to keep a separate features and buffer size
> > for KVM's xsave UABI, as this patch does. Let's just leave it using the core
> > kernels UABI version.
> 
> Hmm, why so?
> 
> As I see it, the vcpu->arch.guest_fpu structure is already exposed to 
> KVM. This series doesn’t modify those structures (fpu_guest and 
> fpstate), other than removing a dead field (patch 2).
> 
> Both ->usersize and ->user_xfeatures fields are already exposed -- 
> currently KVM just doesn’t reference them at all.
> 
> All the changes introduced here are transparent to KVM. Organizing the 
> initial values and wiring up guest_perm and fpstate are entirely 
> internal to the x86 core, no?

This patch adds struct vcpu_fpu_config, with new fields user_size,
user_features. Then those fields are used to configure the guest FPU, where
today it just uses fpu_user_cfg.default_features, etc.

KVM doesn't refer to any of those fields specifically, but since they are used
to configure struct fpu_guest they become part of KVM's uABI.

Per Sean, KVM's KVM_GET_XSAVE API won't differ from arch/x86's uABI behavior.
There is (and will be) only one default user features and size. So what is the
point of having a special guest version with identical values? Just use the
single one for guest FPU and normal.

Chao mentioned offline it was for symmetry. I don't want to make a big deal out
of it, but it doesn't make sense to me. It made me wonder if there was some
divergence in KVM and arch/x86 user features.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-29  2:50               ` Edgecombe, Rick P
@ 2025-04-29  3:22                 ` Chang S. Bae
  2025-04-29  3:36                   ` Edgecombe, Rick P
  0 siblings, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-04-29  3:22 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: ebiggers@google.com, kvm@vger.kernel.org, Hansen, Dave,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, mingo@redhat.com, Yang, Weijiang,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, vigbalas@amd.com, hpa@zytor.com,
	peterz@infradead.org, aruna.ramakrishna@oracle.com, Gao, Chao,
	bp@alien8.de, seanjc@google.com, x86@kernel.org

On 4/28/2025 7:50 PM, Edgecombe, Rick P wrote:
> On Mon, 2025-04-28 at 18:11 -0700, Chang S. Bae wrote:
>> On 4/28/2025 8:42 AM, Edgecombe, Rick P wrote:
>>>
>>> Right, so there should be no need to keep a separate features and buffer size
>>> for KVM's xsave UABI, as this patch does. Let's just leave it using the core
>>> kernels UABI version.
>>
>> Hmm, why so?
>>
>> As I see it, the vcpu->arch.guest_fpu structure is already exposed to
>> KVM. This series doesn’t modify those structures (fpu_guest and
>> fpstate), other than removing a dead field (patch 2).
>>
>> Both ->usersize and ->user_xfeatures fields are already exposed --
>> currently KVM just doesn’t reference them at all.
>>
>> All the changes introduced here are transparent to KVM. Organizing the
>> initial values and wiring up guest_perm and fpstate are entirely
>> internal to the x86 core, no?
> 
> This patch adds struct vcpu_fpu_config, with new fields user_size,
> user_features. Then those fields are used to configure the guest FPU, where
> today it just uses fpu_user_cfg.default_features, etc.
> 
> KVM doesn't refer to any of those fields specifically, but since they are used
> to configure struct fpu_guest they become part of KVM's uABI.

Today, fpu_alloc_guest_fpstate() -> __fpstate_reset() sets
vcpu->arch.guest_fpu.fpstate->user_xfeatures using 
fpu_user_cfg.default_features.

Are you really saying that switching this to 
guest_default_cfg.user_features would constitute a uABI change? Do you 
consider fpu_user_cfg.default_features to be part of the uABI or 
anything else?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-29  3:22                 ` Chang S. Bae
@ 2025-04-29  3:36                   ` Edgecombe, Rick P
  2025-04-30  3:27                     ` Chao Gao
  2025-04-30 15:01                     ` Chang S. Bae
  0 siblings, 2 replies; 40+ messages in thread
From: Edgecombe, Rick P @ 2025-04-29  3:36 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: ebiggers@google.com, kvm@vger.kernel.org, Hansen, Dave,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, mingo@redhat.com, vigbalas@amd.com,
	pbonzini@redhat.com, mlevitsk@redhat.com, john.allen@amd.com,
	Yang, Weijiang, tglx@linutronix.de, hpa@zytor.com,
	peterz@infradead.org, aruna.ramakrishna@oracle.com, Gao, Chao,
	bp@alien8.de, seanjc@google.com, x86@kernel.org

On Mon, 2025-04-28 at 20:22 -0700, Chang S. Bae wrote:
> > 
> > This patch adds struct vcpu_fpu_config, with new fields user_size,
> > user_features. Then those fields are used to configure the guest FPU, where
> > today it just uses fpu_user_cfg.default_features, etc.
> > 
> > KVM doesn't refer to any of those fields specifically, but since they are
> > used
> > to configure struct fpu_guest they become part of KVM's uABI.
> 
> Today, fpu_alloc_guest_fpstate() -> __fpstate_reset() sets
> vcpu->arch.guest_fpu.fpstate->user_xfeatures using 
> fpu_user_cfg.default_features.
> 
> Are you really saying that switching this to 
> guest_default_cfg.user_features would constitute a uABI change?

I'm not saying there is a uABI change... I don't see a change in uABI.

>  Do you 
> consider fpu_user_cfg.default_features to be part of the uABI or 
> anything else?

KVM_GET_XSAVE is part of KVM's API. It uses fields configured in struct
fpu_guest. If fpu_user_cfg.default_features changes value (in the current code)
it would change KVM's uABI. But I'm starting to suspect we are talking past each
other.

It should be simple. Two new configuration fields are added in this patch that
match the existing concept and values of existing configurations fields. Per
Sean, there are no plans to have them diverge. So why add them. If anyone feels
strongly, I won't argue. But I think there is just miscommunication.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-29  3:36                   ` Edgecombe, Rick P
@ 2025-04-30  3:27                     ` Chao Gao
  2025-04-30 15:01                     ` Chang S. Bae
  1 sibling, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-04-30  3:27 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Bae, Chang Seok, ebiggers@google.com, kvm@vger.kernel.org,
	Hansen, Dave, dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, mingo@redhat.com, vigbalas@amd.com,
	pbonzini@redhat.com, mlevitsk@redhat.com, john.allen@amd.com,
	Yang, Weijiang, tglx@linutronix.de, hpa@zytor.com,
	peterz@infradead.org, aruna.ramakrishna@oracle.com, bp@alien8.de,
	seanjc@google.com, x86@kernel.org

On Tue, Apr 29, 2025 at 11:36:40AM +0800, Edgecombe, Rick P wrote:
>On Mon, 2025-04-28 at 20:22 -0700, Chang S. Bae wrote:
>> > 
>> > This patch adds struct vcpu_fpu_config, with new fields user_size,
>> > user_features. Then those fields are used to configure the guest FPU, where
>> > today it just uses fpu_user_cfg.default_features, etc.
>> > 
>> > KVM doesn't refer to any of those fields specifically, but since they are
>> > used
>> > to configure struct fpu_guest they become part of KVM's uABI.
>> 
>> Today, fpu_alloc_guest_fpstate() -> __fpstate_reset() sets
>> vcpu->arch.guest_fpu.fpstate->user_xfeatures using 
>> fpu_user_cfg.default_features.
>> 
>> Are you really saying that switching this to 
>> guest_default_cfg.user_features would constitute a uABI change?
>
>I'm not saying there is a uABI change... I don't see a change in uABI.

Yes. We all agree that this series has no uABI change.

>
>>  Do you 
>> consider fpu_user_cfg.default_features to be part of the uABI or 
>> anything else?
>
>KVM_GET_XSAVE is part of KVM's API. It uses fields configured in struct
>fpu_guest. If fpu_user_cfg.default_features changes value (in the current code)
>it would change KVM's uABI. But I'm starting to suspect we are talking past each
>other.
>
>It should be simple. Two new configuration fields are added in this patch that

Yes. it is a minor issue.

>match the existing concept and values of existing configurations fields. Per
>Sean, there are no plans to have them diverge. So why add them. If anyone feels
>strongly, I won't argue. But I think there is just miscommunication.

Ok. I will drop vcpu_fpu_config.user*.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-29  3:36                   ` Edgecombe, Rick P
  2025-04-30  3:27                     ` Chao Gao
@ 2025-04-30 15:01                     ` Chang S. Bae
  2025-04-30 15:33                       ` Edgecombe, Rick P
  1 sibling, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-04-30 15:01 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: ebiggers@google.com, kvm@vger.kernel.org, Hansen, Dave,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, mingo@redhat.com, vigbalas@amd.com,
	pbonzini@redhat.com, mlevitsk@redhat.com, john.allen@amd.com,
	Yang, Weijiang, tglx@linutronix.de, hpa@zytor.com,
	peterz@infradead.org, aruna.ramakrishna@oracle.com, Gao, Chao,
	bp@alien8.de, seanjc@google.com, x86@kernel.org

On 4/28/2025 8:36 PM, Edgecombe, Rick P wrote:
> 
> KVM_GET_XSAVE is part of KVM's API. It uses fields configured in struct
> fpu_guest. If fpu_user_cfg.default_features changes value (in the current code)
> it would change KVM's uABI. 

Not quite. The ABI reflects the XSAVE format directly. The XSAVE header 
indicates which feature states are present, so while the _contents_ of 
the buffer may vary depending on the feature set, the _format_ itself 
remains unchanged. That doesn't constitute a uABI change.

> It should be simple. Two new configuration fields are added in this patch that
> match the existing concept and values of existing configurations fields. Per
> Sean, there are no plans to have them diverge. So why add them. 

I'm fine with dropping them -- as long as the resulting code remains 
clear and avoids unnecessary complexity around VCPU allocation.

Here are some of the considerations that led me to suggest them in the 
first place:

  * The guest-only feature model should be established in a clean and
    structured way.
  * The initialization logic should stay explicit -- especially to make
    it clear what constitutes guest features, even when they match host
    features. That naturally led to introducing a dedicated data
    structure.
  * Since the VCPU FPU container includes struct fpstate, it felt
    appropriate to mirror relevant fields where useful.
  * Including user_size and user_xfeatures made the VCPU allocation logic
    more straightforward and self-contained.

And to clarify -- this addition doesn’t necessarily imply divergence 
from fpu_guest_cfg. Its usage is local to setting up the guest fpstate, 
and nothing more.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-30 15:01                     ` Chang S. Bae
@ 2025-04-30 15:33                       ` Edgecombe, Rick P
  2025-04-30 16:20                         ` Sean Christopherson
  0 siblings, 1 reply; 40+ messages in thread
From: Edgecombe, Rick P @ 2025-04-30 15:33 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: ebiggers@google.com, kvm@vger.kernel.org, Hansen, Dave,
	dave.hansen@linux.intel.com, Spassov, Stanislav,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Li, Xin3,
	linux-kernel@vger.kernel.org, mingo@redhat.com, vigbalas@amd.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, Yang, Weijiang, hpa@zytor.com,
	peterz@infradead.org, aruna.ramakrishna@oracle.com, Gao, Chao,
	bp@alien8.de, seanjc@google.com, x86@kernel.org

On Wed, 2025-04-30 at 08:01 -0700, Chang S. Bae wrote:
> On 4/28/2025 8:36 PM, Edgecombe, Rick P wrote:
> > 
> > KVM_GET_XSAVE is part of KVM's API. It uses fields configured in struct
> > fpu_guest. If fpu_user_cfg.default_features changes value (in the current code)
> > it would change KVM's uABI. 
> 
> Not quite. The ABI reflects the XSAVE format directly. The XSAVE header 
> indicates which feature states are present, so while the _contents_ of 
> the buffer may vary depending on the feature set, the _format_ itself 
> remains unchanged. That doesn't constitute a uABI change.

Heh, ok sure.

> 
> > It should be simple. Two new configuration fields are added in this patch that
> > match the existing concept and values of existing configurations fields. Per
> > Sean, there are no plans to have them diverge. So why add them. 
> 
> I'm fine with dropping them -- as long as the resulting code remains 
> clear and avoids unnecessary complexity around VCPU allocation.
> 
> Here are some of the considerations that led me to suggest them in the 
> first place:
> 
>   * The guest-only feature model should be established in a clean and
>     structured way.
>   * The initialization logic should stay explicit -- especially to make
>     it clear what constitutes guest features, even when they match host
>     features. That naturally led to introducing a dedicated data
>     structure.
>   * Since the VCPU FPU container includes struct fpstate, it felt
>     appropriate to mirror relevant fields where useful.
>   * Including user_size and user_xfeatures made the VCPU allocation logic
>     more straightforward and self-contained.
> 
> And to clarify -- this addition doesn’t necessarily imply divergence 
> from fpu_guest_cfg. Its usage is local to setting up the guest fpstate, 
> and nothing more.

I'd like to close this out. I see there there is currently one concept of user
features and size, and per Sean, KVM intends to stay consistent with the rest of
the kernel - leaving it at one concept. This was new info since you suggested
the fields. So why don't you propose a resolution here and we'll just go with
it.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 4/7] x86/fpu: Initialize guest FPU permissions from guest defaults
  2025-04-10  7:24 ` [PATCH v5 4/7] x86/fpu: Initialize guest FPU permissions from guest defaults Chao Gao
@ 2025-04-30 15:45   ` Edgecombe, Rick P
  0 siblings, 0 replies; 40+ messages in thread
From: Edgecombe, Rick P @ 2025-04-30 15:45 UTC (permalink / raw)
  To: Gao, Chao, Hansen, Dave, seanjc@google.com, x86@kernel.org,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	kvm@vger.kernel.org, pbonzini@redhat.com
  Cc: Yang, Weijiang, Li, Xin3, ebiggers@google.com, bp@alien8.de,
	dave.hansen@linux.intel.com, peterz@infradead.org,
	john.allen@amd.com, Bae, Chang Seok, mingo@redhat.com,
	hpa@zytor.com, Spassov, Stanislav

On Thu, 2025-04-10 at 15:24 +0800, Chao Gao wrote:

> +	fpu->guest_perm.__user_state_size = guest_default_cfg.user_size;

This part depends on whether we have the separate guest field or not. Otherwise
patch looks good.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-30 15:33                       ` Edgecombe, Rick P
@ 2025-04-30 16:20                         ` Sean Christopherson
  2025-04-30 18:26                           ` Chang S. Bae
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2025-04-30 16:20 UTC (permalink / raw)
  To: Rick P Edgecombe
  Cc: Chang Seok Bae, ebiggers@google.com, kvm@vger.kernel.org,
	Dave Hansen, dave.hansen@linux.intel.com, Stanislav Spassov,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Xin3 Li,
	linux-kernel@vger.kernel.org, mingo@redhat.com, vigbalas@amd.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, Weijiang Yang, hpa@zytor.com,
	peterz@infradead.org, aruna.ramakrishna@oracle.com, Chao Gao,
	bp@alien8.de, x86@kernel.org

On Wed, Apr 30, 2025, Rick P Edgecombe wrote:
> On Wed, 2025-04-30 at 08:01 -0700, Chang S. Bae wrote:
> > On 4/28/2025 8:36 PM, Edgecombe, Rick P wrote:
> > > 
> > > KVM_GET_XSAVE is part of KVM's API. It uses fields configured in struct
> > > fpu_guest. If fpu_user_cfg.default_features changes value (in the current code)
> > > it would change KVM's uABI. 
> > 
> > Not quite. The ABI reflects the XSAVE format directly. The XSAVE header 
> > indicates which feature states are present, so while the _contents_ of 
> > the buffer may vary depending on the feature set, the _format_ itself 
> > remains unchanged. That doesn't constitute a uABI change.
> 
> Heh, ok sure.

Hmm, it's a valid point that format isn't changing, and that host userspace and
guests will inevitably have different state in the XSAVE buffer.

That said, it's still an ABI change in the sense that once support for CET_S is
added, userspace can rely on KVM_{G,S}ET_XSAVE(2) to save/restore CET_S state,
and dropping that support would clearly break userspace.

> > > It should be simple. Two new configuration fields are added in this patch that
> > > match the existing concept and values of existing configurations fields. Per
> > > Sean, there are no plans to have them diverge. So why add them. 
> > 
> > I'm fine with dropping them -- as long as the resulting code remains 
> > clear and avoids unnecessary complexity around VCPU allocation.
> > 
> > Here are some of the considerations that led me to suggest them in the 
> > first place:
> > 
> >   * The guest-only feature model should be established in a clean and
> >     structured way.
> >   * The initialization logic should stay explicit -- especially to make
> >     it clear what constitutes guest features, even when they match host
> >     features. That naturally led to introducing a dedicated data
> >     structure.
> >   * Since the VCPU FPU container includes struct fpstate, it felt
> >     appropriate to mirror relevant fields where useful.
> >   * Including user_size and user_xfeatures made the VCPU allocation logic
> >     more straightforward and self-contained.
> > 
> > And to clarify -- this addition doesn’t necessarily imply divergence 
> > from fpu_guest_cfg. Its usage is local to setting up the guest fpstate, 
> > and nothing more.
> 
> I'd like to close this out. I see there there is currently one concept of user
> features and size, and per Sean, KVM intends to stay consistent with the rest of
> the kernel - leaving it at one concept. This was new info since you suggested
> the fields. So why don't you propose a resolution here and we'll just go with
> it.

I'm not totally opposed to diverging, but IMO there needs to be strong motivation
to do so.  Sharing code and ABI between KVM and the kernel is mutually beneficial
on multiple fronts.

Unless I've missed something, KVM will need to provide save/restore support via
MSRs for all CET_S state anyways, so I don't see any motivation whatsoever in this
particular case.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-30 16:20                         ` Sean Christopherson
@ 2025-04-30 18:26                           ` Chang S. Bae
  0 siblings, 0 replies; 40+ messages in thread
From: Chang S. Bae @ 2025-04-30 18:26 UTC (permalink / raw)
  To: Sean Christopherson, Rick P Edgecombe
  Cc: ebiggers@google.com, kvm@vger.kernel.org, Dave Hansen,
	dave.hansen@linux.intel.com, Stanislav Spassov,
	levymitchell0@gmail.com, samuel.holland@sifive.com, Xin3 Li,
	linux-kernel@vger.kernel.org, mingo@redhat.com, vigbalas@amd.com,
	pbonzini@redhat.com, tglx@linutronix.de, john.allen@amd.com,
	mlevitsk@redhat.com, Weijiang Yang, hpa@zytor.com,
	peterz@infradead.org, aruna.ramakrishna@oracle.com, Chao Gao,
	bp@alien8.de, x86@kernel.org

On 4/30/2025 9:20 AM, Sean Christopherson wrote:
> On Wed, Apr 30, 2025, Rick P Edgecombe wrote:
>> On Wed, 2025-04-30 at 08:01 -0700, Chang S. Bae wrote:
>>> On 4/28/2025 8:36 PM, Edgecombe, Rick P wrote:
>>>>
>>>> KVM_GET_XSAVE is part of KVM's API. It uses fields configured in struct
>>>> fpu_guest. If fpu_user_cfg.default_features changes value (in the current code)
>>>> it would change KVM's uABI.
>>>
>>> Not quite. The ABI reflects the XSAVE format directly. The XSAVE header
>>> indicates which feature states are present, so while the _contents_ of
>>> the buffer may vary depending on the feature set, the _format_ itself
>>> remains unchanged. That doesn't constitute a uABI change.
>>
>> Heh, ok sure.
> 
> Hmm, it's a valid point that format isn't changing, and that host userspace and
> guests will inevitably have different state in the XSAVE buffer.
> 
> That said, it's still an ABI change in the sense that once support for CET_S is
> added, userspace can rely on KVM_{G,S}ET_XSAVE(2) to save/restore CET_S state,
> and dropping that support would clearly break userspace.

I think my comment was specifically in response to this statement "if 
fpu_user_cfg.default_features changes value," which I took to mean 
changes limited to user features.

Diverging guest user features wasn’t something I intended here -- 
although I briefly considered MPX but dropped it due to complexity:

https://lore.kernel.org/lkml/2ac2d1e7-d04b-443a-8fff-7aa3f436dcce@intel.com/

At this point, I think the reaction here speaks for itself. If adding 
those two fields leads to confusion or demands fatty code comment, the 
net benefit goes negative.

So yes, overall, let's just reference fpu_guest_cfg directly as-is.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 5/7] x86/fpu: Initialize guest fpstate and FPU pseudo container from guest defaults
  2025-04-10  7:24 ` [PATCH v5 5/7] x86/fpu: Initialize guest fpstate and FPU pseudo container " Chao Gao
@ 2025-04-30 18:29   ` Edgecombe, Rick P
  2025-05-01 14:24     ` Chang S. Bae
  0 siblings, 1 reply; 40+ messages in thread
From: Edgecombe, Rick P @ 2025-04-30 18:29 UTC (permalink / raw)
  To: Gao, Chao, Hansen, Dave, seanjc@google.com, x86@kernel.org,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	kvm@vger.kernel.org, pbonzini@redhat.com
  Cc: Yang, Weijiang, Li, Xin3, ebiggers@google.com, bp@alien8.de,
	dave.hansen@linux.intel.com, peterz@infradead.org,
	john.allen@amd.com, Bae, Chang Seok, mingo@redhat.com,
	hpa@zytor.com, Spassov, Stanislav

On Thu, 2025-04-10 at 15:24 +0800, Chao Gao wrote:
> fpu_alloc_guest_fpstate() currently uses host defaults to initialize guest
> fpstate and pseudo containers. Guest defaults were introduced to
> differentiate the features and sizes of host and guest FPUs. Switch to
> using guest defaults instead.
> 
> Additionally, incorporate the initialization of indicators (is_valloc and
> is_guest) into the newly added guest-specific reset function to centralize
> the resetting of guest fpstate.
> 
> Suggested-by: Chang S. Bae <chang.seok.bae@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v5: init is_valloc/is_guest in the guest-specific reset function (Chang)
> ---
>  arch/x86/kernel/fpu/core.c | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index e23e435b85c4..f5593f6009a4 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -201,7 +201,20 @@ void fpu_reset_from_exception_fixup(void)
>  }
>  
>  #if IS_ENABLED(CONFIG_KVM)
> -static void __fpstate_reset(struct fpstate *fpstate, u64 xfd);
> +static void __guest_fpstate_reset(struct fpstate *fpstate, u64 xfd)
> +{
> +	/* Initialize sizes and feature masks */
> +	fpstate->size		= guest_default_cfg.size;
> +	fpstate->user_size	= guest_default_cfg.user_size;
> +	fpstate->xfeatures	= guest_default_cfg.features;
> +	fpstate->user_xfeatures	= guest_default_cfg.user_features;
> +	fpstate->xfd		= xfd;
> +
> +	/* Initialize indicators to reflect properties of the fpstate */
> +	fpstate->is_valloc	= true;
> +	fpstate->is_guest	= true;
> +}
> +
>  
>  static void fpu_lock_guest_permissions(void)
>  {
> @@ -226,19 +239,18 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
>  	struct fpstate *fpstate;
>  	unsigned int size;
>  
> -	size = fpu_kernel_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64);
> +	size = guest_default_cfg.size + ALIGN(offsetof(struct fpstate, regs), 64);
> +
>  	fpstate = vzalloc(size);
>  	if (!fpstate)
>  		return false;
>  
>  	/* Leave xfd to 0 (the reset value defined by spec) */
> -	__fpstate_reset(fpstate, 0);
> +	__guest_fpstate_reset(fpstate, 0);
>  	fpstate_init_user(fpstate);
> -	fpstate->is_valloc	= true;
> -	fpstate->is_guest	= true;
>  
>  	gfpu->fpstate		= fpstate;
> -	gfpu->xfeatures		= fpu_kernel_cfg.default_features;
> +	gfpu->xfeatures		= guest_default_cfg.features;
>  
>  	/*
>  	 * KVM sets the FP+SSE bits in the XSAVE header when copying FPU state
> @@ -250,8 +262,8 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
>  	 * all features that can expand the uABI size must be opt-in.
>  	 */

The above comment is enlightening to the debate about whether guest needs a
separate user size and features:

	/*
	 * KVM sets the FP+SSE bits in the XSAVE header when copying FPU state
	 * to userspace, even when XSAVE is unsupported, so that restoring FPU
	 * state on a different CPU that does support XSAVE can cleanly load
	 * the incoming state using its natural XSAVE.  In other words, KVM's
	 * uABI size may be larger than this host's default size.  Conversely,
	 * the default size should never be larger than KVM's base uABI size;
	 * all features that can expand the uABI size must be opt-in.
	 */


The KVM FPU user xsave behavior *is* different, just not in the way than we have
been discussing. So the below code responds to mismatch between
fpu_user_cfg.default_size and KVM's ABI.

The fix that added it, d187ba531230 ("x86/fpu: KVM: Set the base guest FPU uABI
size to sizeof(struct kvm_xsave)"), seems like quick fix that could have instead
been fixed more properly by something like proposed in this series.

I propose we drop it from this series and follow up with a proper cleanup. It
deserves more than currently done here. For example in the below hunk it's now
comparing guest_default_cfg.user_size which is a guest only thing. I also wonder
if we really need gfpu->uabi_size.

So let's drop the code but not the idea. Chang what do you think of that?

>  	gfpu->uabi_size		= sizeof(struct kvm_xsave);
> -	if (WARN_ON_ONCE(fpu_user_cfg.default_size > gfpu->uabi_size))
> -		gfpu->uabi_size = fpu_user_cfg.default_size;
> +	if (WARN_ON_ONCE(guest_default_cfg.user_size > gfpu->uabi_size))
> +		gfpu->uabi_size = guest_default_cfg.user_size;
>  
>  	fpu_lock_guest_permissions();
>  


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 5/7] x86/fpu: Initialize guest fpstate and FPU pseudo container from guest defaults
  2025-04-30 18:29   ` Edgecombe, Rick P
@ 2025-05-01 14:24     ` Chang S. Bae
  2025-05-06  3:33       ` Chao Gao
  0 siblings, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-05-01 14:24 UTC (permalink / raw)
  To: Edgecombe, Rick P, Gao, Chao, Hansen, Dave, seanjc@google.com,
	x86@kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de,
	kvm@vger.kernel.org, pbonzini@redhat.com
  Cc: Yang, Weijiang, Li, Xin3, ebiggers@google.com, bp@alien8.de,
	dave.hansen@linux.intel.com, peterz@infradead.org,
	john.allen@amd.com, mingo@redhat.com, hpa@zytor.com,
	Spassov, Stanislav

On 4/30/2025 11:29 AM, Edgecombe, Rick P wrote:
> 
> So let's drop the code but not the idea. Chang what do you think of that?
Sure, that makes sense. Thanks for bringing up this case.

Staring at the struct guest_fpu fields, some of them appear to match 
with fields in fpstate. In any case, I agree it would be helpful to 
consider a follow-up series after this.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-04-10  7:24 ` [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs Chao Gao
  2025-04-24 22:52   ` Edgecombe, Rick P
@ 2025-05-01 14:24   ` Chang S. Bae
  2025-05-06  3:29     ` Chao Gao
  1 sibling, 1 reply; 40+ messages in thread
From: Chang S. Bae @ 2025-05-01 14:24 UTC (permalink / raw)
  To: Chao Gao, x86, linux-kernel, kvm, tglx, dave.hansen, seanjc,
	pbonzini
  Cc: peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp, xin3.li,
	Ingo Molnar, Dave Hansen, H. Peter Anvin, Maxim Levitsky,
	Samuel Holland, Mitchell Levy, Eric Biggers, Stanislav Spassov,
	Vignesh Balasubramanian, Aruna Ramakrishna

On 4/10/2025 12:24 AM, Chao Gao wrote:
> 
> +struct vcpu_fpu_config guest_default_cfg __ro_after_init;

I just noticed that the patch set is missing the initialization of 
guest_default_cfg.size = size (or legacy_size) in the following functions:

    fpu__init_system_xstate_size_legacy()
    fpu__init_disable_system_xstate()

Without that, it looks buggy when XSAVE is either unavailable or disabled.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs
  2025-05-01 14:24   ` Chang S. Bae
@ 2025-05-06  3:29     ` Chao Gao
  0 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-05-06  3:29 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, linux-kernel, kvm, tglx, dave.hansen, seanjc, pbonzini,
	peterz, rick.p.edgecombe, weijiang.yang, john.allen, bp, xin3.li,
	Ingo Molnar, Dave Hansen, H. Peter Anvin, Maxim Levitsky,
	Samuel Holland, Mitchell Levy, Eric Biggers, Stanislav Spassov,
	Vignesh Balasubramanian, Aruna Ramakrishna

On Thu, May 01, 2025 at 07:24:25AM -0700, Chang S. Bae wrote:
>On 4/10/2025 12:24 AM, Chao Gao wrote:
>> 
>> +struct vcpu_fpu_config guest_default_cfg __ro_after_init;
>
>I just noticed that the patch set is missing the initialization of
>guest_default_cfg.size = size (or legacy_size) in the following functions:
>
>   fpu__init_system_xstate_size_legacy()
>   fpu__init_disable_system_xstate()
>
>Without that, it looks buggy when XSAVE is either unavailable or disabled.

Good catch. Will fix them.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 5/7] x86/fpu: Initialize guest fpstate and FPU pseudo container from guest defaults
  2025-05-01 14:24     ` Chang S. Bae
@ 2025-05-06  3:33       ` Chao Gao
  0 siblings, 0 replies; 40+ messages in thread
From: Chao Gao @ 2025-05-06  3:33 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: Edgecombe, Rick P, Hansen, Dave, seanjc@google.com,
	x86@kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de,
	kvm@vger.kernel.org, pbonzini@redhat.com, Yang, Weijiang,
	Li, Xin3, ebiggers@google.com, bp@alien8.de,
	dave.hansen@linux.intel.com, peterz@infradead.org,
	john.allen@amd.com, mingo@redhat.com, hpa@zytor.com,
	Spassov, Stanislav

On Thu, May 01, 2025 at 07:24:09AM -0700, Chang S. Bae wrote:
>On 4/30/2025 11:29 AM, Edgecombe, Rick P wrote:
>> 
>> So let's drop the code but not the idea. Chang what do you think of that?
>Sure, that makes sense. Thanks for bringing up this case.

Yea, I agree with dropping vcpu_fpu_config.user_* for now.

>
>Staring at the struct guest_fpu fields, some of them appear to match with
>fields in fpstate. In any case, I agree it would be helpful to consider a
>follow-up series after this.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2025-05-06  3:34 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-10  7:24 [PATCH v5 0/7] Introduce CET supervisor state support Chao Gao
2025-04-10  7:24 ` [PATCH v5 1/7] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Chao Gao
2025-04-18 20:50   ` Chang S. Bae
2025-04-10  7:24 ` [PATCH v5 2/7] x86/fpu: Drop @perm from guest pseudo FPU container Chao Gao
2025-04-18 20:51   ` Chang S. Bae
2025-04-18 20:54     ` Chang S. Bae
2025-04-19  1:01     ` Chao Gao
2025-04-10  7:24 ` [PATCH v5 3/7] x86/fpu/xstate: Differentiate default features for host and guest FPUs Chao Gao
2025-04-24 22:52   ` Edgecombe, Rick P
2025-04-25  8:24     ` Chao Gao
2025-04-25 16:09       ` Edgecombe, Rick P
2025-04-25 23:48         ` Sean Christopherson
2025-04-28  3:26           ` Chao Gao
2025-04-28  7:44             ` Xin Li
2025-04-28 14:28             ` Sean Christopherson
2025-04-28  6:31           ` Xin Li
2025-04-28 15:42           ` Edgecombe, Rick P
2025-04-29  1:11             ` Chang S. Bae
2025-04-29  2:50               ` Edgecombe, Rick P
2025-04-29  3:22                 ` Chang S. Bae
2025-04-29  3:36                   ` Edgecombe, Rick P
2025-04-30  3:27                     ` Chao Gao
2025-04-30 15:01                     ` Chang S. Bae
2025-04-30 15:33                       ` Edgecombe, Rick P
2025-04-30 16:20                         ` Sean Christopherson
2025-04-30 18:26                           ` Chang S. Bae
2025-04-28  5:51         ` Xin Li
2025-04-28  6:12           ` Xin Li
2025-05-01 14:24   ` Chang S. Bae
2025-05-06  3:29     ` Chao Gao
2025-04-10  7:24 ` [PATCH v5 4/7] x86/fpu: Initialize guest FPU permissions from guest defaults Chao Gao
2025-04-30 15:45   ` Edgecombe, Rick P
2025-04-10  7:24 ` [PATCH v5 5/7] x86/fpu: Initialize guest fpstate and FPU pseudo container " Chao Gao
2025-04-30 18:29   ` Edgecombe, Rick P
2025-05-01 14:24     ` Chang S. Bae
2025-05-06  3:33       ` Chao Gao
2025-04-10  7:24 ` [PATCH v5 6/7] x86/fpu/xstate: Introduce "guest-only" supervisor xfeature set Chao Gao
2025-04-24 22:58   ` Edgecombe, Rick P
2025-04-10  7:24 ` [PATCH v5 7/7] x86/fpu/xstate: Add CET supervisor xfeature support as a guest-only feature Chao Gao
2025-04-24 23:28 ` [PATCH v5 0/7] Introduce CET supervisor state support Edgecombe, Rick P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox