[PATCH v10 0/6] Add RMPOPT support.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v10 0/6] Add RMPOPT support.
@ 2026-06-30 18:08 Ashish Kalra
  2026-06-30 18:09 ` [PATCH v10 1/6] x86/cpufeatures: Add X86_FEATURE_RMPOPT feature flag Ashish Kalra
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Ashish Kalra @ 2026-06-30 18:08 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco

From: Ashish Kalra <ashish.kalra@amd.com>

In the SEV-SNP architecture, hypervisor and non-SNP guests are subject
to RMP checks on writes to provide integrity of SEV-SNP guest memory.

The RMPOPT architecture enables optimizations whereby the RMP checks
can be skipped if 1GB regions of memory are known to not contain any
SNP guest memory.

RMPOPT is a new instruction designed to minimize the performance
overhead of RMP checks for the hypervisor and non-SNP guests.

RMPOPT instruction currently supports two functions. In case of the
verify and report status function the CPU will read the RMP contents,
verify the entire 1GB region starting at the provided SPA is HV-owned.
For the entire 1GB region it checks that all RMP entries in this region
are HV-owned (i.e, not in assigned state) and then accordingly updates
the RMPOPT table to indicate if optimization has been enabled and
provide indication to software if the optimization was successful.

In case of report status function, the CPU returns the optimization
status for the 1GB region.

The RMPOPT table is managed by a combination of software and hardware.
Software uses the RMPOPT instruction to set bits in the table,
indicating that regions of memory are entirely HV-owned.  Hardware
automatically clears bits in the RMPOPT table when RMP contents are
changed during RMPUPDATE instruction.

For more information on the RMPOPT instruction, see the AMD64 RMPOPT
technical documentation.

As SNP is enabled by default the hypervisor and non-SNP guests are
subject to RMP write checks to provide integrity of SNP guest memory.

This patch-series adds support to enable RMP optimizations for up to
2TB of system RAM across the system and allow RMPUPDATE to disable
those optimizations as SNP guests are launched.

Support for RAM larger than 2 TB will be added in follow-on series.

This series also adds support to disable CPU hotplug while SNP is
active, as the SEV firmware enumerates CPUs at SNP initialization and is
not aware of the OS bringing CPUs online or offline afterwards.  This
also keeps the set of CPUs stable for the asynchronous RMPOPT scan, so
the per-core RMPOPT_BASE MSRs programmed during setup remain valid.

This series also introduces support to re-enable RMP optimizations
during SNP guest termination, after guest pages have been converted
back to shared.

RMP optimizations are performed asynchronously by queuing work on a
dedicated workqueue after a 10 second delay.

Delaying work allows batching of multiple SNP guest terminations.

Once 1GB hugetlb guest_memfd support is merged, support for
re-enabling RMPOPT optimizations during 1GB page cleanup will be added
in follow-on series.

v10:
- Rework the CPU-hotplug patch (3/6): disable CPU hotplug in
  snp_prepare(), before SnpEn is set, instead of late in
  __sev_snp_init_locked(), so no CPU can come online without SnpEn during
  SNP initialization (per upstream review).  Tie hotplug to SnpEn: it
  stays disabled while SnpEn is set -- including across a failed SNP_INIT
  and across the legacy SNP_SHUTDOWN_EX path -- and is re-enabled only
  once the firmware clears SnpEn on the x86_snp_shutdown path.  Drop the
  separate idempotent flag: snp_prepare() re-enables hotplug on its own
  early failure, and a kexec target that boots with SnpEn already set
  disables hotplug once in snp_rmptable_init().  Reword the commit log and
  comments accordingly.
- Emit a pr_warn() in rmpopt_work_handler() (4/6) when the follower
  cpumask allocation fails, instead of silently skipping the optimization
  pass.

  Sashiko AI upstream review identified several of the above issues.

v9:
- Rename rmpopt_configured to rmpopt_capable.
- Make rmpopt_cpumask a cpumask_var_t (allocated/freed at setup/cleanup)
  instead of a static cpumask_t.
- Drop the v8 WARN_ON_ONCE() on the RMPOPT_BASE writes; use a plain
  wrmsrq_on_cpu(), matching the SNP MSR-write convention in this file.
- Disable CPU hotplug with cpu_hotplug_disable()/cpu_hotplug_enable()
  (per tglx); re-enable only on the full x86_snp_shutdown path.
- Simplify rmpopt_work_handler() to a single leader-then-followers path:
  with CPU hotplug disabled while SNP is active and snp_prepare()
  requiring all CPUs online when RMPOPT_BASE is programmed, every core is
  always programmed, so the explicit-leader fallback is now unreachable.
  Drop it along with the v8 work_on_cpu()/rmpopt_leader_fn() helper.
- Drop the debugfs interface (was patch 7/7) and its report-only
  plumbing; observability will be revisited after this series is merged.
- Restrict snp_rmpopt_all_physmem()'s export to the kvm-amd module.
- Use scoped_guard(cpus_read_lock) for the per-CPU MSR and follower
  loops.

  Sashiko AI upstream review identified several of the above issues.

v8:
- Add a new patch to disable CPU hotplug while SNP is active, keeping
  the CPU set stable for the RMPOPT work handler.
- Drop the setup_clear_cpu_cap(X86_FEATURE_RMPOPT) calls; the
  rmpopt_configured bool is the runtime guard.
- WARN_ON_ONCE() on the RMPOPT_BASE MSR writes that previously ignored
  their return value.
- Simplify rmpopt_work_handler() by removing the explicit-leader
  fallback: with CPU hotplug disabled while SNP is active and
  snp_prepare() requiring all CPUs online when RMPOPT_BASE is programmed,
  every core is always programmed, so the running CPU can always be the
  leader.  This drops the smp_call_function_single() fallback (and with
  it the AB-BA deadlock and IRQ-latency concerns) and collapses the
  leader selection into a single leader-then-followers path.
- Use mod_delayed_work() in snp_rmpopt_all_physmem() so the batching
  delay tracks the last SNP guest termination.

  Sashiko AI code review identified several of the above issues.

v7:
- Sync tools/arch/x86/include/asm/cpufeatures.h to mirror the kernel
  header for X86_FEATURE_RMPOPT.
- Fix commit title to use X86_FEATURE_RMPOPT to match the code
  (was X86_FEATURE_AMD_RMPOPT).
- Add static bool rmpopt_configured, set only when segmented RMP setup
  succeeds in setup_rmptable().  Check rmpopt_configured alongside
  cpu_feature_enabled(X86_FEATURE_RMPOPT) in snp_setup_rmpopt() and
  snp_rmpopt_all_physmem(), because setup_clear_cpu_cap() is unreliable
  after alternatives are patched.  Add snp_clear_rmpopt_configured()
  called from amd_cc_platform_clear() when CC_ATTR_HOST_SEV_SNP is
  cleared.  Do not use __ro_after_init on rmpopt_configured since the
  writer snp_clear_rmpopt_configured() is not __init.
- Add cond_resched() to all three leader loops in rmpopt_work_handler()
  to prevent soft lockups on systems with up to 2TB of RAM.
- Add comment above __rmpopt() documenting the RMPOPT instruction
  encoding (F2 0F 01 FC) and register interface (RAX = system physical
  address input, RCX = operation type input, RFLAGS.CF = output).
  Note: RMPOPT does not modify RAX unlike PVALIDATE/RMPUPDATE, so
  the existing "a" (input-only) constraint is correct.

  Sashiko AI code review identified several of the above issues.

v6:
- Drop wrmsrq_on_cpus() helper; use for_each_cpu() with wrmsrq_on_cpu()
  instead, as RMPOPT_BASE MSR programming is not performance-critical.
- Rewrite rmpopt_work_handler() leader selection to use a local
  follower_mask copy instead of modifying the global rmpopt_cpumask.
  This eliminates the current_cpu_cleared tracking and the restore at
  the end, and removes the need for synchronization comments about
  transient cpumask inconsistency.
- Add three-way leader selection in rmpopt_work_handler():
  1. Current CPU is a primary thread in cpumask: run leader locally.
  2. Current CPU is a sibling thread whose primary is in cpumask:
     run leader locally (RMPOPT_BASE MSR is per-core), remove the
     primary from followers via cpumask_andnot(topology_sibling_cpumask).
  3. Current CPU's core has no RMPOPT_BASE MSR programmed: pick an
     explicit leader via cpumask_first() + smp_call_function_single()
     to avoid #UD, with cpus_read_lock() around the IPI loop.
- Add WARN_ON_ONCE guard for empty cpumask in the explicit leader
  fallback path, with migrate_enable() before goto out.
- Add .llseek = seq_lseek to rmpopt_table_fops for consistency with
  other seq_file-based debugfs files and to support tools like "less".
- Change debugfs file permissions from 0444 to 0400 to restrict access
  to root only.
- Add comment in rmpopt_table_seq_show() explaining why cpu_online_mask
  is safe: RMPOPT_BASE MSR is per-core and snp_prepare() ensures all
  CPUs are online when the MSR is programmed.

  Sashiko AI code review identified several of the above issues.

v5:
- Introduce rmpopt_cleanup() to tear down workqueue, debugfs, cpumask,
  and MSR state, called from snp_shutdown().
- Introduce rmpopt_wq_mutex to serialize snp_setup_rmpopt(),
  snp_rmpopt_all_physmem(), and rmpopt_cleanup().
- Introduce rmpopt_show_mutex to serialize debugfs reporting of
  rmpopt_report_cpumask.
- Move snp_rmpopt_all_physmem() call after SNP DECOMMISSION during
  guest shutdown.
- Use migrate_disable()/migrate_enable() for CPU pinning in the
  rmpopt_work_handler() leader loop to maintain CPU affinity without
  disabling preemption for the entire RMPOPT scan.
- Add cpus_read_lock()/cpus_read_unlock() around the follower
  on_each_cpu_mask() loop in rmpopt_work_handler().
- Guard snp_setup_rmpopt() against re-initialization when
  SNP_SHUTDOWN_EX with x86_snp_shutdown=0 skips rmpopt_cleanup()
  but clears snp_initialized, preventing workqueue and resource
  leaks on repeated init/shutdown cycles.
- Replace setup_clear_cpu_cap() with pr_err() on alloc_workqueue()
  failure in snp_setup_rmpopt(), as setup_clear_cpu_cap() cannot be
  used after alternatives are patched; callers check rmpopt_wq != NULL
  as the runtime guard instead.
- Add pr_info() when RMPOPT coverage is capped at 2TB.
- Add comments noting CPU hotplug is not supported with SNP enabled
  and only online primary threads are covered by rmpopt_cpumask.
- Add comment in setup_rmptable() noting Segmented RMP must be
  enabled to enable RMPOPT.
- Simplify cpumask setup loop to set if primary thread rather than
  skip if not primary.
- Improve grammar and clarity in snp_setup_rmpopt() comments.
- Added Reviewed-by's.

  Sashiko AI code review identified several of the above issues.

v4:
- Add new wrmsrq_on_cpus() helper to write same u64 value to a
  per-CPU MSR across a cpumask without per-cpu struct allocation
  overhead.
- Rename configure_and_enable_rmpopt() to snp_setup_rmpopt().
- Use wrmsrq_on_cpus() instead of wrmsrq_on_cpu() loop for
  programming RMPOPT_BASE MSRs.
- Add setup_clear_cpu_cap(X86_FEATURE_RMPOPT) if segmented RMP
  setup fails or workqueue allocation fails.
- Add X86_FEATURE_RMPOPT feature clear logic in amd_cc_platform_clear()
  for CC_ATTR_HOST_SEV_SNP.
- All of the above allow checking for only X86_FEATURE_RMPOPT for both
  RMPOPT setup/enable and RMP re-optimizations.
- Rename snp_perform_rmp_optimization() to snp_rmpopt_all_physmem().
- Split rmpopt() into rmpopt() and rmpopt_smp() for SMP callback use.
- Introduce separate rmpopt_report_cpumask for debugfs reporting,
  distinct from rmpopt_cpumask used for primary thread tracking.
- Remove snp_perform_rmp_optimization() call from __sev_snp_init_locked()
  and instead setup and enable RMPOPT after SNP is enabled and
  initialized.

v3:
- Drop all RMPOPT kthread support and introduce adding custom and
  dedicated workqueue to schedule delayed and asynchronous RMPOPT work.
- Drop the guest_memfd inode cleanup interface and add support to
  re-enable RMP optimizations during guest shutdown using the
  asynchronous and delayed workqueue interface.
- Introduce new __rmpopt() helper and rmpopt() and
  rmpopt_report_status() wrappers on top which use rax and rcx
  parameters to closely match RMPOPT specs.
- Use new optimized RMPOPT loop to issue RMPOPT instructions on all
  system RAM upto 2TB and all CPUs, by optimizing each range on one CPU
  first, then let other CPUs execute RMPOPT in parallel so they can skip
  most work as the range has already been optimized.
- Also add support for running the optimized RMPOPT loop only on
  one thread per core.
- Replace all PUD_SIZE references with SZ_1G to conform to 1GB regions
  as specified by RMPOPT specifications and not be dependent on PUD_SIZE
  which makes the RMPOPT patch-set independent of x86 page table sizes.
- Use wrmsrq_on_cpu() to program the RMPOPT_BASE MSR registers on
  all CPUs that removes all ugly casting to use on_each_cpu_mask().
- Fix inline commits and patch commit messages


v2:
- Drop all NUMA and Socket configuration and enablement support and
  enable RMPOPT support for up to 2TB of system RAM.
- Drop get_cpumask_of_primary_threads() and enable per-core RMPOPT
  base MSRs and issue RMPOPT instruction on all CPUs.
- Drop the configfs interface to manually re-enable RMP optimizations.
- Add new guest_memfd cleanup interface to automatically re-enable
  RMP optimizations during guest shutdown.
- Include references to the public RMPOPT documentation.
- Move debugfs directory for RMPOPT under architecuture specific
  parent directory.

Ashish Kalra (6):
  x86/cpufeatures: Add X86_FEATURE_RMPOPT feature flag
  x86/sev: Initialize RMPOPT configuration MSRs
  x86/sev: Disable CPU hotplug while SNP is active
  x86/sev: Add support to perform RMP optimizations asynchronously
  x86/sev: Add interface to re-enable RMP optimizations.
  KVM: SEV: Perform RMP optimizations on SNP guest shutdown

 arch/x86/coco/core.c                     |   2 +
 arch/x86/include/asm/cpufeatures.h       |   2 +-
 arch/x86/include/asm/msr-index.h         |   3 +
 arch/x86/include/asm/sev.h               |   6 +
 arch/x86/kernel/cpu/scattered.c          |   1 +
 arch/x86/kvm/svm/sev.c                   |  10 +
 arch/x86/virt/svm/sev.c                  | 277 +++++++++++++++++++++++
 drivers/crypto/ccp/sev-dev.c             |   3 +
 tools/arch/x86/include/asm/cpufeatures.h |   2 +-
 9 files changed, 304 insertions(+), 2 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v10 1/6] x86/cpufeatures: Add X86_FEATURE_RMPOPT feature flag
  2026-06-30 18:08 [PATCH v10 0/6] Add RMPOPT support Ashish Kalra
@ 2026-06-30 18:09 ` Ashish Kalra
  2026-06-30 18:10 ` [PATCH v10 2/6] x86/sev: Initialize RMPOPT configuration MSRs Ashish Kalra
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ashish Kalra @ 2026-06-30 18:09 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco

From: Ashish Kalra <ashish.kalra@amd.com>

Add a flag indicating whether RMPOPT instruction is supported.

RMPOPT is a new instruction that reduces the performance overhead of
RMP checks for the hypervisor and non-SNP guests by allowing those
checks to be skipped when 1-GB memory regions are known to contain no
SEV-SNP guest memory.

For more information on the RMPOPT instruction, see the AMD64 RMPOPT
technical documentation.

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/cpufeatures.h       | 2 +-
 arch/x86/kernel/cpu/scattered.c          | 1 +
 tools/arch/x86/include/asm/cpufeatures.h | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1b4a48bff18f..14f23d19d864 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -76,7 +76,7 @@
 #define X86_FEATURE_K8			( 3*32+ 4) /* Opteron, Athlon64 */
 #define X86_FEATURE_ZEN5		( 3*32+ 5) /* CPU based on Zen5 microarchitecture */
 #define X86_FEATURE_ZEN6		( 3*32+ 6) /* CPU based on Zen6 microarchitecture */
-/* Free                                 ( 3*32+ 7) */
+#define X86_FEATURE_RMPOPT		( 3*32+ 7) /* Support for AMD RMPOPT instruction */
 #define X86_FEATURE_CONSTANT_TSC	( 3*32+ 8) /* "constant_tsc" TSC ticks at a constant rate */
 #define X86_FEATURE_UP			( 3*32+ 9) /* "up" SMP kernel running on UP */
 #define X86_FEATURE_ART			( 3*32+10) /* "art" Always running timer (ART) */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 937129ce6a96..021c0bf22de2 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -67,6 +67,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PERFMON_V2,		CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,		CPUID_EAX,  1, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
+	{ X86_FEATURE_RMPOPT,			CPUID_EDX,  0, 0x80000025, 0 },
 	{ X86_FEATURE_AMD_HTR_CORES,		CPUID_EAX, 30, 0x80000026, 0 },
 	{ 0, 0, 0, 0, 0 }
 };
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index 86d17b195e79..7ce681af1dd7 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -76,7 +76,7 @@
 #define X86_FEATURE_K8			( 3*32+ 4) /* Opteron, Athlon64 */
 #define X86_FEATURE_ZEN5		( 3*32+ 5) /* CPU based on Zen5 microarchitecture */
 #define X86_FEATURE_ZEN6		( 3*32+ 6) /* CPU based on Zen6 microarchitecture */
-/* Free                                 ( 3*32+ 7) */
+#define X86_FEATURE_RMPOPT		( 3*32+ 7) /* Support for AMD RMPOPT instruction */
 #define X86_FEATURE_CONSTANT_TSC	( 3*32+ 8) /* "constant_tsc" TSC ticks at a constant rate */
 #define X86_FEATURE_UP			( 3*32+ 9) /* "up" SMP kernel running on UP */
 #define X86_FEATURE_ART			( 3*32+10) /* "art" Always running timer (ART) */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 2/6] x86/sev: Initialize RMPOPT configuration MSRs
  2026-06-30 18:08 [PATCH v10 0/6] Add RMPOPT support Ashish Kalra
  2026-06-30 18:09 ` [PATCH v10 1/6] x86/cpufeatures: Add X86_FEATURE_RMPOPT feature flag Ashish Kalra
@ 2026-06-30 18:10 ` Ashish Kalra
  2026-06-30 18:11 ` [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active Ashish Kalra
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ashish Kalra @ 2026-06-30 18:10 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco

From: Ashish Kalra <ashish.kalra@amd.com>

The new RMPOPT instruction helps manage per-CPU RMP optimization
structures inside the CPU. It takes a 1GB-aligned physical address
and either returns the status of the optimizations or tries to enable
the optimizations.

Per-CPU RMPOPT tables support at most 2 TB of addressable memory for
RMP optimizations.

Initialize the per-CPU RMPOPT table base to the starting physical
address. This enables RMP optimization for up to 2 TB of system RAM on
all CPUs.

Additionally, add support to setup and enable RMPOPT once SNP is
enabled and initialized.

Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/coco/core.c             |  2 +
 arch/x86/include/asm/msr-index.h |  3 ++
 arch/x86/include/asm/sev.h       |  4 ++
 arch/x86/virt/svm/sev.c          | 70 ++++++++++++++++++++++++++++++++
 drivers/crypto/ccp/sev-dev.c     |  3 ++
 5 files changed, 82 insertions(+)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index 989ca9f72ba3..f0ed6c62d86c 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -16,6 +16,7 @@
 #include <asm/archrandom.h>
 #include <asm/coco.h>
 #include <asm/processor.h>
+#include <asm/sev.h>
 
 enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
 SYM_PIC_ALIAS(cc_vendor);
@@ -172,6 +173,7 @@ static void amd_cc_platform_clear(enum cc_attr attr)
 	switch (attr) {
 	case CC_ATTR_HOST_SEV_SNP:
 		cc_flags.host_sev_snp = 0;
+		snp_clear_rmpopt_capable();
 		break;
 	default:
 		break;
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 18c4be75e927..d2cb0a7cd0a2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -761,6 +761,9 @@
 #define MSR_AMD64_SEG_RMP_ENABLED_BIT	0
 #define MSR_AMD64_SEG_RMP_ENABLED	BIT_ULL(MSR_AMD64_SEG_RMP_ENABLED_BIT)
 #define MSR_AMD64_RMP_SEGMENT_SHIFT(x)	(((x) & GENMASK_ULL(13, 8)) >> 8)
+#define MSR_AMD64_RMPOPT_BASE		0xc0010139
+#define MSR_AMD64_RMPOPT_ENABLE_BIT	0
+#define MSR_AMD64_RMPOPT_ENABLE		BIT_ULL(MSR_AMD64_RMPOPT_ENABLE_BIT)
 
 #define MSR_SVSM_CAA			0xc001f000
 
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 594cfa19cbd4..0243989f229b 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -662,6 +662,8 @@ static inline void snp_leak_pages(u64 pfn, unsigned int pages)
 	__snp_leak_pages(pfn, pages, true);
 }
 int snp_prepare(void);
+void snp_setup_rmpopt(void);
+void snp_clear_rmpopt_capable(void);
 void snp_shutdown(void);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
@@ -680,6 +682,8 @@ static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
 static inline void kdump_sev_callback(void) { }
 static inline void snp_fixup_e820_tables(void) {}
 static inline int snp_prepare(void) { return -ENODEV; }
+static inline void snp_setup_rmpopt(void) {}
+static inline void snp_clear_rmpopt_capable(void) {}
 static inline void snp_shutdown(void) {}
 #endif
 
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 8bcdce98f6dc..dab6e1c290bc 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -124,6 +124,10 @@ static void *rmp_bookkeeping __ro_after_init;
 
 static u64 probed_rmp_base, probed_rmp_size;
 
+static cpumask_var_t rmpopt_cpumask;
+static phys_addr_t rmpopt_pa_start;
+static bool rmpopt_capable;
+
 static LIST_HEAD(snp_leaked_pages_list);
 static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
 
@@ -490,6 +494,11 @@ static bool __init setup_rmptable(void)
 	if (rmp_cfg & MSR_AMD64_SEG_RMP_ENABLED) {
 		if (!setup_segmented_rmptable())
 			return false;
+		/*
+		 * RMPOPT requires a segmented RMP, so indicate that the
+		 * system is capable of configuring and running RMPOPT.
+		 */
+		rmpopt_capable = true;
 	} else {
 		if (!setup_contiguous_rmptable())
 			return false;
@@ -555,6 +564,19 @@ int snp_prepare(void)
 }
 EXPORT_SYMBOL_FOR_MODULES(snp_prepare, "ccp");
 
+static void rmpopt_cleanup(void)
+{
+	int cpu;
+
+	scoped_guard(cpus_read_lock) {
+		for_each_cpu(cpu, rmpopt_cpumask)
+			wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, 0);
+	}
+
+	free_cpumask_var(rmpopt_cpumask);
+	rmpopt_pa_start = 0;
+}
+
 void snp_shutdown(void)
 {
 	u64 syscfg;
@@ -563,11 +585,59 @@ void snp_shutdown(void)
 	if (syscfg & MSR_AMD64_SYSCFG_SNP_EN)
 		return;
 
+	rmpopt_cleanup();
+
 	clear_rmp();
 	on_each_cpu(mfd_reconfigure, NULL, 1);
 }
 EXPORT_SYMBOL_FOR_MODULES(snp_shutdown, "ccp");
 
+void snp_clear_rmpopt_capable(void)
+{
+	rmpopt_capable = false;
+}
+
+void snp_setup_rmpopt(void)
+{
+	u64 rmpopt_base;
+	int cpu;
+
+	if (!cpu_feature_enabled(X86_FEATURE_RMPOPT) || !rmpopt_capable)
+		return;
+
+	if (!zalloc_cpumask_var(&rmpopt_cpumask, GFP_KERNEL)) {
+		pr_err("Failed to allocate RMPOPT cpumask\n");
+		return;
+	}
+
+	/*
+	 * The RMPOPT_BASE MSR is per-core, so only one thread per core needs
+	 * to set up the RMPOPT_BASE MSR.
+	 *
+	 * Note: only online primary threads are included.  If a core's
+	 * primary thread is offline, that core is not covered.  CPU hotplug
+	 * is not currently supported with SNP enabled.
+	 */
+	scoped_guard(cpus_read_lock) {
+		for_each_online_cpu(cpu)
+			if (topology_is_primary_thread(cpu))
+				cpumask_set_cpu(cpu, rmpopt_cpumask);
+
+		rmpopt_pa_start = ALIGN_DOWN(PFN_PHYS(min_low_pfn), SZ_1G);
+		rmpopt_base = rmpopt_pa_start | MSR_AMD64_RMPOPT_ENABLE;
+
+		/*
+		 * Per-CPU RMPOPT tables support at most 2 TB of addressable memory
+		 * for RMP optimizations. Initialize the per-CPU RMPOPT table base
+		 * to the starting physical address to enable RMP optimizations for
+		 * up to 2 TB of system RAM on all CPUs.
+		 */
+		for_each_cpu(cpu, rmpopt_cpumask)
+			wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, rmpopt_base);
+	}
+}
+EXPORT_SYMBOL_FOR_MODULES(snp_setup_rmpopt, "ccp");
+
 /*
  * Do the necessary preparations which are verified by the firmware as
  * described in the SNP_INIT_EX firmware command description in the SNP
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index ca473ca198b8..c002a7ca26a8 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1477,6 +1477,9 @@ static int __sev_snp_init_locked(int *error, unsigned int max_snp_asid)
 	}
 
 	snp_hv_fixed_pages_state_update(sev, HV_FIXED);
+
+	snp_setup_rmpopt();
+
 	sev->snp_initialized = true;
 	dev_dbg(sev->dev, "SEV-SNP firmware initialized, SEV-TIO is %s\n",
 		data.tio_en ? "enabled" : "disabled");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active
  2026-06-30 18:08 [PATCH v10 0/6] Add RMPOPT support Ashish Kalra
  2026-06-30 18:09 ` [PATCH v10 1/6] x86/cpufeatures: Add X86_FEATURE_RMPOPT feature flag Ashish Kalra
  2026-06-30 18:10 ` [PATCH v10 2/6] x86/sev: Initialize RMPOPT configuration MSRs Ashish Kalra
@ 2026-06-30 18:11 ` Ashish Kalra
  2026-06-30 18:32   ` sashiko-bot
  2026-07-01  9:40   ` Jethro Beekman
  2026-06-30 18:11 ` [PATCH v10 4/6] x86/sev: Add support to perform RMP optimizations asynchronously Ashish Kalra
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Ashish Kalra @ 2026-06-30 18:11 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco

From: Ashish Kalra <ashish.kalra@amd.com>

While SNP is active, every memory write is checked against the RMP to
protect SEV-SNP guest memory.  A core performs these RMP checks only once
SNP has been initialized via SNP_INIT and the SNP-enable bit in SYSCFG is
set on that core; the firmware requires the SNP-enable bit to be set on
every present CPU before SNP initialization.  A core that is not
SNP-enabled and not SNP-initialized performs no RMP checks at all, so
there is no valid configuration with SNP active and any CPU exempt from
RMP checks.

The firmware determines which CPUs are present from the processor and the
BIOS/UEFI configuration (e.g. SMT disabled in the BIOS) and enumerates
them at SNP init; it is not aware of the OS bringing CPUs online or
offline afterwards.  SNP_INIT fails unless SnpEn is set on all CPUs, so a
CPU that is offline at SNP init does not have SnpEn set, SNP_INIT fails,
and there can be no SNP guest memory.  OS CPU hotplug can thus diverge
from the firmware's expectations and break SNP.

Tie CPU hotplug to the SNP-enable bit: disable it in snp_prepare() before
SNP is enabled, and re-enable it in snp_shutdown() once the firmware has
disabled SNP.  If snp_prepare() fails before enabling SNP it re-enables
hotplug itself; once SNP is enabled hotplug stays disabled, including
across a failed SNP_INIT and across the legacy SNP_SHUTDOWN_EX path, both
of which leave SNP enabled.  A kexec target that boots with SNP already
enabled disables hotplug once in snp_rmptable_init(), since snp_prepare()
bails when SNP is already enabled.

This also keeps the CPU set stable for the asynchronous RMPOPT scan added
later in this series, and ensures cpus_read_lock() in the scan is
uncontended.

Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/virt/svm/sev.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index dab6e1c290bc..04a58ac4339c 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -535,6 +535,15 @@ int snp_prepare(void)
 
 	clear_rmp();
 
+	/*
+	 * Disable CPU hotplug before enabling SNP, so no CPU can come online
+	 * without SnpEn while SNP is enabled; it is re-enabled in snp_shutdown()
+	 * once SNP is disabled.  Must be before cpus_read_lock():
+	 * cpu_hotplug_disable() takes cpu_add_remove_lock, which nests above
+	 * cpu_hotplug_lock.
+	 */
+	cpu_hotplug_disable();
+
 	cpus_read_lock();
 
 	if (!cpumask_equal(cpu_online_mask, cpu_present_mask)) {
@@ -560,6 +569,10 @@ int snp_prepare(void)
 unlock:
 	cpus_read_unlock();
 
+	/* Re-enable CPU hotplug; SnpEn was never set. */
+	if (ret)
+		cpu_hotplug_enable();
+
 	return ret;
 }
 EXPORT_SYMBOL_FOR_MODULES(snp_prepare, "ccp");
@@ -587,6 +600,13 @@ void snp_shutdown(void)
 
 	rmpopt_cleanup();
 
+	/*
+	 * Re-enable CPU hotplug now that the firmware has disabled SNP; CPU
+	 * hotplug is not re-enabled for a legacy SNP shutdown.  After
+	 * rmpopt_cleanup() so RMPOPT_BASE is cleared with hotplug still disabled.
+	 */
+	cpu_hotplug_enable();
+
 	clear_rmp();
 	on_each_cpu(mfd_reconfigure, NULL, 1);
 }
@@ -645,6 +665,8 @@ EXPORT_SYMBOL_FOR_MODULES(snp_setup_rmpopt, "ccp");
  */
 int __init snp_rmptable_init(void)
 {
+	u64 val;
+
 	if (WARN_ON_ONCE(!cc_platform_has(CC_ATTR_HOST_SEV_SNP)))
 		return -ENOSYS;
 
@@ -654,6 +676,15 @@ int __init snp_rmptable_init(void)
 	if (!setup_rmptable())
 		return -ENOSYS;
 
+	/*
+	 * On a kexec boot SNP may already be enabled (legacy firmware leaves
+	 * SnpEn set across shutdown), in which case snp_prepare() bails without
+	 * disabling CPU hotplug, so disable it here.
+	 */
+	rdmsrq(MSR_AMD64_SYSCFG, val);
+	if (val & MSR_AMD64_SYSCFG_SNP_EN)
+		cpu_hotplug_disable();
+
 	/*
 	 * Setting crash_kexec_post_notifiers to 'true' to ensure that SNP panic
 	 * notifier is invoked to do SNP IOMMU shutdown before kdump.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 4/6] x86/sev: Add support to perform RMP optimizations asynchronously
  2026-06-30 18:08 [PATCH v10 0/6] Add RMPOPT support Ashish Kalra
                   ` (2 preceding siblings ...)
  2026-06-30 18:11 ` [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active Ashish Kalra
@ 2026-06-30 18:11 ` Ashish Kalra
  2026-06-30 18:28   ` sashiko-bot
  2026-06-30 18:11 ` [PATCH v10 5/6] x86/sev: Add interface to re-enable RMP optimizations Ashish Kalra
  2026-06-30 18:12 ` [PATCH v10 6/6] KVM: SEV: Perform RMP optimizations on SNP guest shutdown Ashish Kalra
  5 siblings, 1 reply; 14+ messages in thread
From: Ashish Kalra @ 2026-06-30 18:11 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco

From: Ashish Kalra <ashish.kalra@amd.com>

When SEV-SNP is enabled, all writes to memory are checked to ensure
integrity of SNP guest memory. This imposes performance overhead on the
whole system.

RMPOPT is a new instruction that minimizes the performance overhead of
RMP checks on the hypervisor and on non-SNP guests by allowing RMP
checks to be skipped for 1GB regions of memory that are known not to
contain any SEV-SNP guest memory.

Add support for performing RMP optimizations asynchronously using a
dedicated workqueue.

Enable RMPOPT optimizations for up to 2TB of system RAM starting from
the lowest physical memory address aligned down to a 1GB boundary at
RMP initialization time. RMP checks can initially be skipped for 1GB
memory ranges that do not contain SEV-SNP guest memory (excluding
preassigned pages such as the RMP table and firmware pages). As SNP
guests are launched, RMPUPDATE will disable the corresponding RMPOPT
optimizations.

Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/virt/svm/sev.c | 167 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 164 insertions(+), 3 deletions(-)

diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 04a58ac4339c..40b06e959ee8 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -19,6 +19,7 @@
 #include <linux/iommu.h>
 #include <linux/amd-iommu.h>
 #include <linux/nospec.h>
+#include <linux/workqueue.h>
 
 #include <asm/sev.h>
 #include <asm/processor.h>
@@ -125,9 +126,20 @@ static void *rmp_bookkeeping __ro_after_init;
 static u64 probed_rmp_base, probed_rmp_size;
 
 static cpumask_var_t rmpopt_cpumask;
-static phys_addr_t rmpopt_pa_start;
+static phys_addr_t rmpopt_pa_start, rmpopt_pa_end;
 static bool rmpopt_capable;
 
+enum rmpopt_function {
+	RMPOPT_FUNC_VERIFY_AND_REPORT_STATUS,
+	RMPOPT_FUNC_REPORT_STATUS
+};
+
+#define RMPOPT_WORK_TIMEOUT	10000
+
+static struct workqueue_struct *rmpopt_wq;
+static struct delayed_work rmpopt_delayed_work;
+static DEFINE_MUTEX(rmpopt_wq_mutex);
+
 static LIST_HEAD(snp_leaked_pages_list);
 static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
 
@@ -581,13 +593,22 @@ static void rmpopt_cleanup(void)
 {
 	int cpu;
 
+	guard(mutex)(&rmpopt_wq_mutex);
+
+	if (!rmpopt_wq)
+		return;
+
+	cancel_delayed_work_sync(&rmpopt_delayed_work);
+	destroy_workqueue(rmpopt_wq);
+
 	scoped_guard(cpus_read_lock) {
 		for_each_cpu(cpu, rmpopt_cpumask)
 			wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, 0);
 	}
 
 	free_cpumask_var(rmpopt_cpumask);
-	rmpopt_pa_start = 0;
+	rmpopt_pa_start = rmpopt_pa_end = 0;
+	rmpopt_wq = NULL;
 }
 
 void snp_shutdown(void)
@@ -617,6 +638,103 @@ void snp_clear_rmpopt_capable(void)
 	rmpopt_capable = false;
 }
 
+/*
+ * RMPOPT: F2 0F 01 FC
+ *   Input:  RAX = system physical address (1GB aligned)
+ *           RCX = operation type
+ *   Output: CF set if the range was optimized
+ */
+static inline bool __rmpopt(u64 pa_start, u64 op_type)
+{
+	bool optimized;
+
+	asm volatile(".byte 0xf2, 0x0f, 0x01, 0xfc"
+		     : "=@ccc" (optimized)
+		     : "a" (pa_start), "c" (op_type)
+		     : "memory", "cc");
+
+	return optimized;
+}
+
+static void rmpopt(u64 pa)
+{
+	u64 pa_start = ALIGN_DOWN(pa, SZ_1G);
+	u64 op_type = RMPOPT_FUNC_VERIFY_AND_REPORT_STATUS;
+
+	__rmpopt(pa_start, op_type);
+}
+
+/*
+ * 'val' is a system physical address.
+ */
+static void rmpopt_smp(void *val)
+{
+	rmpopt((u64)val);
+}
+
+/*
+ * RMPOPT optimizations skip RMP checks at 1GB granularity if this
+ * range of memory does not contain any SNP guest memory.
+ */
+static void rmpopt_work_handler(struct work_struct *work)
+{
+	cpumask_var_t follower_mask;
+	phys_addr_t pa;
+	int this_cpu;
+
+	pr_info("Attempt RMP optimizations on physical address range @1GB alignment [0x%016llx - 0x%016llx]\n",
+		rmpopt_pa_start, rmpopt_pa_end);
+
+	if (!alloc_cpumask_var(&follower_mask, GFP_KERNEL)) {
+		pr_warn("RMP optimization pass skipped: cpumask allocation failed\n");
+		return;
+	}
+
+	/*
+	 * RMPOPT scans the RMP table, stores the result of the scan in the
+	 * reserved processor memory. The RMP scan is the most expensive
+	 * part. If a second RMPOPT occurs, it can skip the expensive scan
+	 * if they can see a cached result in the reserved processor memory.
+	 *
+	 * Do RMPOPT on one CPU alone. Then, follow that up with RMPOPT
+	 * on every other primary thread. Followers are "designed to"
+	 * skip the scan if they see the "cached" scan results.
+	 *
+	 * Pin the worker to the current CPU for the leader loop so that
+	 * this_cpu remains valid and the RMPOPT instruction executes on
+	 * the correct CPU.  Use migrate_disable() rather than get_cpu() to
+	 * prevent migration while still allowing preemption.
+	 */
+	migrate_disable();
+	this_cpu = smp_processor_id();
+
+	cpumask_andnot(follower_mask, rmpopt_cpumask,
+		       topology_sibling_cpumask(this_cpu));
+
+	for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G) {
+		rmpopt(pa);
+		cond_resched();
+	}
+	migrate_enable();
+
+	/*
+	 * Followers: run RMPOPT on remaining cores.  CPUs cannot go offline
+	 * while SNP is active, so the follower set stays valid across the
+	 * scan and cpus_read_lock() is uncontended.
+	 */
+	scoped_guard(cpus_read_lock) {
+		for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G) {
+			on_each_cpu_mask(follower_mask, rmpopt_smp,
+					 (void *)pa, true);
+
+			/* Give a chance for other threads to run */
+			cond_resched();
+		}
+	}
+
+	free_cpumask_var(follower_mask);
+}
+
 void snp_setup_rmpopt(void)
 {
 	u64 rmpopt_base;
@@ -625,14 +743,42 @@ void snp_setup_rmpopt(void)
 	if (!cpu_feature_enabled(X86_FEATURE_RMPOPT) || !rmpopt_capable)
 		return;
 
+	guard(mutex)(&rmpopt_wq_mutex);
+
+	/*
+	 * Guard against re-initialization.  When SNP_SHUTDOWN_EX is issued
+	 * with x86_snp_shutdown=0, snp_shutdown() is not called and
+	 * rmpopt_cleanup() is skipped, but snp_initialized is still cleared.
+	 * A subsequent __sev_snp_init_locked() would call snp_setup_rmpopt()
+	 * again, leaking the existing workqueue, delayed work, and cpumask
+	 * state.
+	 */
+	if (rmpopt_wq)
+		return;
+
+	/*
+	 * Create an RMPOPT-specific workqueue to avoid scheduling
+	 * RMPOPT workitem on the global system workqueue.
+	 */
+	rmpopt_wq = alloc_workqueue("rmpopt_wq", WQ_UNBOUND, 1);
+	if (!rmpopt_wq) {
+		pr_err("Failed to allocate RMPOPT workqueue\n");
+		return;
+	}
+
+	INIT_DELAYED_WORK(&rmpopt_delayed_work, rmpopt_work_handler);
+
 	if (!zalloc_cpumask_var(&rmpopt_cpumask, GFP_KERNEL)) {
 		pr_err("Failed to allocate RMPOPT cpumask\n");
+		destroy_workqueue(rmpopt_wq);
+		rmpopt_wq = NULL;
 		return;
 	}
 
 	/*
 	 * The RMPOPT_BASE MSR is per-core, so only one thread per core needs
-	 * to set up the RMPOPT_BASE MSR.
+	 * to set up the RMPOPT_BASE MSR. Likewise, only one thread per core
+	 * needs to issue the RMPOPT instruction.
 	 *
 	 * Note: only online primary threads are included.  If a core's
 	 * primary thread is offline, that core is not covered.  CPU hotplug
@@ -655,6 +801,21 @@ void snp_setup_rmpopt(void)
 		for_each_cpu(cpu, rmpopt_cpumask)
 			wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, rmpopt_base);
 	}
+
+	rmpopt_pa_end = ALIGN(PFN_PHYS(max_pfn), SZ_1G);
+
+	/* Limit memory scanning to 2TB of RAM */
+	if ((rmpopt_pa_end - rmpopt_pa_start) > SZ_2T) {
+		pr_info("RMPOPT coverage limited to 2TB; memory above 0x%llx not optimized\n",
+			rmpopt_pa_start + SZ_2T);
+		rmpopt_pa_end = rmpopt_pa_start + SZ_2T;
+	}
+
+	/*
+	 * Once all per-CPU RMPOPT tables have been configured, enable RMPOPT
+	 * optimizations on all physical memory.
+	 */
+	queue_delayed_work(rmpopt_wq, &rmpopt_delayed_work, 0);
 }
 EXPORT_SYMBOL_FOR_MODULES(snp_setup_rmpopt, "ccp");
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 5/6] x86/sev: Add interface to re-enable RMP optimizations.
  2026-06-30 18:08 [PATCH v10 0/6] Add RMPOPT support Ashish Kalra
                   ` (3 preceding siblings ...)
  2026-06-30 18:11 ` [PATCH v10 4/6] x86/sev: Add support to perform RMP optimizations asynchronously Ashish Kalra
@ 2026-06-30 18:11 ` Ashish Kalra
  2026-06-30 18:12 ` [PATCH v10 6/6] KVM: SEV: Perform RMP optimizations on SNP guest shutdown Ashish Kalra
  5 siblings, 0 replies; 14+ messages in thread
From: Ashish Kalra @ 2026-06-30 18:11 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco

From: Ashish Kalra <ashish.kalra@amd.com>

RMPOPT table is a per-CPU table which indicates if 1GB regions of
physical memory are entirely hypervisor-owned or not.

When performing host memory accesses in hypervisor mode as well as
non-SNP guest mode, the processor may consult the RMPOPT table to
potentially skip an RMP access and improve performance.

Normal guest events clear RMP optimizations: pages are converted from
shared to private as SNP guests are launched, and large pages are split
and collapsed during guest operation -- both clear the RMPOPT
optimizations for the affected 1GB regions.  Conversely, guest pages are
converted back to shared during SNP guest termination, so those regions
may become eligible for RMPOPT optimization again.

Without some intervention, all RMP optimizations would eventually be
lost.  Add an interface to re-optimize all of physical memory.

The interface uses mod_delayed_work() instead of queue_delayed_work()
so that the delay timer is reset on each call. This provides proper
batching semantics: re-optimization runs 10 seconds after the *last*
VM termination rather than after the first. mod_delayed_work() also
re-queues work that is already in-flight, so a re-scan request
during an active scan is not silently dropped.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/sev.h |  2 ++
 arch/x86/virt/svm/sev.c    | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 0243989f229b..54b4ae5c3735 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -662,6 +662,7 @@ static inline void snp_leak_pages(u64 pfn, unsigned int pages)
 	__snp_leak_pages(pfn, pages, true);
 }
 int snp_prepare(void);
+void snp_rmpopt_all_physmem(void);
 void snp_setup_rmpopt(void);
 void snp_clear_rmpopt_capable(void);
 void snp_shutdown(void);
@@ -682,6 +683,7 @@ static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
 static inline void kdump_sev_callback(void) { }
 static inline void snp_fixup_e820_tables(void) {}
 static inline int snp_prepare(void) { return -ENODEV; }
+static inline void snp_rmpopt_all_physmem(void) {}
 static inline void snp_setup_rmpopt(void) {}
 static inline void snp_clear_rmpopt_capable(void) {}
 static inline void snp_shutdown(void) {}
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 40b06e959ee8..6672c7f17825 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -735,6 +735,21 @@ static void rmpopt_work_handler(struct work_struct *work)
 	free_cpumask_var(follower_mask);
 }

+void snp_rmpopt_all_physmem(void)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_RMPOPT) || !rmpopt_capable)
+		return;
+
+	guard(mutex)(&rmpopt_wq_mutex);
+
+	if (!rmpopt_wq)
+		return;
+
+	mod_delayed_work(rmpopt_wq, &rmpopt_delayed_work,
+			 msecs_to_jiffies(RMPOPT_WORK_TIMEOUT));
+}
+EXPORT_SYMBOL_FOR_MODULES(snp_rmpopt_all_physmem, "kvm-amd");
+
 void snp_setup_rmpopt(void)
 {
 	u64 rmpopt_base;
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 6/6] KVM: SEV: Perform RMP optimizations on SNP guest shutdown
  2026-06-30 18:08 [PATCH v10 0/6] Add RMPOPT support Ashish Kalra
                   ` (4 preceding siblings ...)
  2026-06-30 18:11 ` [PATCH v10 5/6] x86/sev: Add interface to re-enable RMP optimizations Ashish Kalra
@ 2026-06-30 18:12 ` Ashish Kalra
  2026-06-30 18:24   ` sashiko-bot
  5 siblings, 1 reply; 14+ messages in thread
From: Ashish Kalra @ 2026-06-30 18:12 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco

From: Ashish Kalra <ashish.kalra@amd.com>

Pages are converted from shared to private as SNP guests are launched.
This destroys exisiting RMPOPT optimizations in the regions where
pages are converted.

Conversely, guest pages are converted back to shared during SNP guest
termination and their region may become eligible for RMPOPT
optimization.

To take advantage of this, perform RMPOPT after guest termination.
Do it after a delay so that a single RMPOPT pass can be done if
multiple guests terminate in a short period of time.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/kvm/svm/sev.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 427229347876..c574849b1587 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2992,6 +2992,16 @@ void sev_vm_destroy(struct kvm *kvm)
 		 */
 		if (snp_decommission_context(kvm))
 			return;
+
+		/*
+		 * Perform RMP optimizations on memory freed by terminating
+		 * guests.  The scan is deferred, so it normally runs after
+		 * sev_gmem_invalidate() has converted this guest's pages back to
+		 * shared, and picks them up then.  A very large guest whose
+		 * conversion has not finished by then is picked up by a later
+		 * teardown's scan.
+		 */
+		snp_rmpopt_all_physmem();
 	} else {
 		sev_unbind_asid(kvm, sev->handle);
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 6/6] KVM: SEV: Perform RMP optimizations on SNP guest shutdown
  2026-06-30 18:12 ` [PATCH v10 6/6] KVM: SEV: Perform RMP optimizations on SNP guest shutdown Ashish Kalra
@ 2026-06-30 18:24   ` sashiko-bot
  0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-30 18:24 UTC (permalink / raw)
  To: Ashish Kalra; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Indefinite deferral of RMP optimizations via mod_delayed_work timer reset.
--

commit 5a78324bed688cb2c4b58d363233a323513bd9b4
Author: Ashish Kalra <ashish.kalra@amd.com>

KVM: SEV: Perform RMP optimizations on SNP guest shutdown

This commit performs RMP optimizations on memory freed by terminating SNP
guests. The scan is deferred so that a single optimization pass can batch
memory from multiple guests terminating in a short period of time.

> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 4272293478766..c574849b15878 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
[ ... ]
> @@ -2992,6 +2992,16 @@ void sev_vm_destroy(struct kvm *kvm)
>  		 */
>  		if (snp_decommission_context(kvm))
>  			return;
> +
> +		/*
> +		 * Perform RMP optimizations on memory freed by terminating
> +		 * guests.  The scan is deferred, so it normally runs after
> +		 * sev_gmem_invalidate() has converted this guest's pages back to
> +		 * shared, and picks them up then.  A very large guest whose
> +		 * conversion has not finished by then is picked up by a later
> +		 * teardown's scan.
> +		 */
> +		snp_rmpopt_all_physmem();

[Severity: High]
Can an unprivileged user indefinitely defer the RMP optimizations by
repeatedly creating and destroying minimal SNP VMs?

Looking at snp_rmpopt_all_physmem(), it uses mod_delayed_work(), which
cancels any pending timer and pushes the execution timeout back on every
call.

If VMs are destroyed at a frequency higher than the timeout,
the background optimization task might be starved and never execute,
leading to degraded memory access performance.

Would using queue_delayed_work() instead prevent this starvation by
ensuring the timer is not pushed back if the work is already queued?

>  	} else {
>  		sev_unbind_asid(kvm, sev->handle);
>  	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782841284.git.ashish.kalra@amd.com?part=6

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 4/6] x86/sev: Add support to perform RMP optimizations asynchronously
  2026-06-30 18:11 ` [PATCH v10 4/6] x86/sev: Add support to perform RMP optimizations asynchronously Ashish Kalra
@ 2026-06-30 18:28   ` sashiko-bot
  0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-30 18:28 UTC (permalink / raw)
  To: Ashish Kalra; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Re-initializing SNP after a partial shutdown bypasses the RMPOPT optimization pass, leaving the new RMP table unoptimized.
--

commit 2d85f8f029da9849b60d0154807adccc997f6502
Author: Ashish Kalra <ashish.kalra@amd.com>

x86/sev: Add support to perform RMP optimizations asynchronously

This patch introduces support for the RMPOPT instruction to minimize performance
overhead by skipping RMP checks on 1GB memory ranges known not to contain
SEV-SNP guest memory. It adds a dedicated workqueue to perform these
optimizations asynchronously.

> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index 04a58ac4339cb..40b06e959ee87 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c

[ ... ]

> @@ -625,14 +743,42 @@ void snp_setup_rmpopt(void)
>  	if (!cpu_feature_enabled(X86_FEATURE_RMPOPT) || !rmpopt_capable)
>  		return;
>  
> +	guard(mutex)(&rmpopt_wq_mutex);
> +
> +	/*
> +	 * Guard against re-initialization.  When SNP_SHUTDOWN_EX is issued
> +	 * with x86_snp_shutdown=0, snp_shutdown() is not called and
> +	 * rmpopt_cleanup() is skipped, but snp_initialized is still cleared.
> +	 * A subsequent __sev_snp_init_locked() would call snp_setup_rmpopt()
> +	 * again, leaking the existing workqueue, delayed work, and cpumask
> +	 * state.
> +	 */
> +	if (rmpopt_wq)
> +		return;

[Severity: Medium]
Will this early return prevent the initial RMPOPT optimization pass if SNP is
re-initialized after a partial shutdown?

If a partial shutdown occurs where rmpopt_cleanup() is skipped, rmpopt_wq
remains allocated. When snp_setup_rmpopt() is called again during
re-initialization, returning early here skips the queue_delayed_work()
call at the end of the function.

Does this mean the system will run without RMP optimizations on the newly
initialized RMP table, incurring RMP check overhead until a later event
triggers a rescan?

[ ... ]

> +	/* Limit memory scanning to 2TB of RAM */
> +	if ((rmpopt_pa_end - rmpopt_pa_start) > SZ_2T) {
> +		pr_info("RMPOPT coverage limited to 2TB; memory above 0x%llx not optimized\n",
> +			rmpopt_pa_start + SZ_2T);
> +		rmpopt_pa_end = rmpopt_pa_start + SZ_2T;
> +	}
> +
> +	/*
> +	 * Once all per-CPU RMPOPT tables have been configured, enable RMPOPT
> +	 * optimizations on all physical memory.
> +	 */
> +	queue_delayed_work(rmpopt_wq, &rmpopt_delayed_work, 0);
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782841284.git.ashish.kalra@amd.com?part=4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active
  2026-06-30 18:11 ` [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active Ashish Kalra
@ 2026-06-30 18:32   ` sashiko-bot
  2026-07-01  9:40   ` Jethro Beekman
  1 sibling, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-30 18:32 UTC (permalink / raw)
  To: Ashish Kalra; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] System suspend (PM transitions) bypasses `cpu_hotplug_disable()`, stripping `SnpEn` from secondary CPUs on resume.
--

commit 29b8362e9ad72efa8892f6bb3124dbbcc29ee822
Author: Ashish Kalra <ashish.kalra@amd.com>

x86/sev: Disable CPU hotplug while SNP is active

This commit disables CPU hotplug in snp_prepare() before SNP is enabled and
re-enables it in snp_shutdown(). This aims to prevent CPUs from coming online
without the SnpEn bit set while SNP is active, as the firmware cannot track
OS CPU hotplug events.

> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index dab6e1c290bca..04a58ac4339cb 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -535,6 +535,15 @@ int snp_prepare(void)
>  
>  	clear_rmp();
>  
> +	/*
> +	 * Disable CPU hotplug before enabling SNP, so no CPU can come online
> +	 * without SnpEn while SNP is enabled; it is re-enabled in snp_shutdown()
> +	 * once SNP is disabled.  Must be before cpus_read_lock():
> +	 * cpu_hotplug_disable() takes cpu_add_remove_lock, which nests above
> +	 * cpu_hotplug_lock.
> +	 */
> +	cpu_hotplug_disable();

[Severity: Critical]
Does this actually prevent all CPUs from coming online without SnpEn?

Looking at system suspend (PM transitions), freeze_secondary_cpus() calls
_cpu_down() directly, which bypasses the cpu_hotplug_disabled check. Upon
resume, thaw_secondary_cpus() brings them back online via _cpu_up().

Since snp_enable() is called via a one-off on_each_cpu() in snp_prepare()
rather than being registered as a CPU hotplug callback, wouldn't secondary
CPUs resume with the SnpEn MSR bit cleared? 

This could allow a suspend/resume cycle to create an asymmetrical state where
hardware RMP checks are bypassed.

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782841284.git.ashish.kalra@amd.com?part=3

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active
  2026-06-30 18:11 ` [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active Ashish Kalra
  2026-06-30 18:32   ` sashiko-bot
@ 2026-07-01  9:40   ` Jethro Beekman
  2026-07-01 16:39     ` K Prateek Nayak
  2026-07-01 21:25     ` Kalra, Ashish
  1 sibling, 2 replies; 14+ messages in thread
From: Jethro Beekman @ 2026-07-01  9:40 UTC (permalink / raw)
  To: Ashish Kalra, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco

[-- Attachment #1: Type: text/plain, Size: 4510 bytes --]

Hi Ashish,

I don't believe my concern has been addressed

https://lore.kernel.org/lkml/0df3b665-3a9c-4c46-a7aa-14388e8e1577@fortanix.com/

--
Jethro Beekman | CTO | Fortanix

On 2026-06-30 20:11, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> While SNP is active, every memory write is checked against the RMP to
> protect SEV-SNP guest memory.  A core performs these RMP checks only once
> SNP has been initialized via SNP_INIT and the SNP-enable bit in SYSCFG is
> set on that core; the firmware requires the SNP-enable bit to be set on
> every present CPU before SNP initialization.  A core that is not
> SNP-enabled and not SNP-initialized performs no RMP checks at all, so
> there is no valid configuration with SNP active and any CPU exempt from
> RMP checks.
> 
> The firmware determines which CPUs are present from the processor and the
> BIOS/UEFI configuration (e.g. SMT disabled in the BIOS) and enumerates
> them at SNP init; it is not aware of the OS bringing CPUs online or
> offline afterwards.  SNP_INIT fails unless SnpEn is set on all CPUs, so a
> CPU that is offline at SNP init does not have SnpEn set, SNP_INIT fails,
> and there can be no SNP guest memory.  OS CPU hotplug can thus diverge
> from the firmware's expectations and break SNP.
> 
> Tie CPU hotplug to the SNP-enable bit: disable it in snp_prepare() before
> SNP is enabled, and re-enable it in snp_shutdown() once the firmware has
> disabled SNP.  If snp_prepare() fails before enabling SNP it re-enables
> hotplug itself; once SNP is enabled hotplug stays disabled, including
> across a failed SNP_INIT and across the legacy SNP_SHUTDOWN_EX path, both
> of which leave SNP enabled.  A kexec target that boots with SNP already
> enabled disables hotplug once in snp_rmptable_init(), since snp_prepare()
> bails when SNP is already enabled.
> 
> This also keeps the CPU set stable for the asynchronous RMPOPT scan added
> later in this series, and ensures cpus_read_lock() in the scan is
> uncontended.
> 
> Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/virt/svm/sev.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index dab6e1c290bc..04a58ac4339c 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -535,6 +535,15 @@ int snp_prepare(void)
>  
>  	clear_rmp();
>  
> +	/*
> +	 * Disable CPU hotplug before enabling SNP, so no CPU can come online
> +	 * without SnpEn while SNP is enabled; it is re-enabled in snp_shutdown()
> +	 * once SNP is disabled.  Must be before cpus_read_lock():
> +	 * cpu_hotplug_disable() takes cpu_add_remove_lock, which nests above
> +	 * cpu_hotplug_lock.
> +	 */
> +	cpu_hotplug_disable();
> +
>  	cpus_read_lock();
>  
>  	if (!cpumask_equal(cpu_online_mask, cpu_present_mask)) {
> @@ -560,6 +569,10 @@ int snp_prepare(void)
>  unlock:
>  	cpus_read_unlock();
>  
> +	/* Re-enable CPU hotplug; SnpEn was never set. */
> +	if (ret)
> +		cpu_hotplug_enable();
> +
>  	return ret;
>  }
>  EXPORT_SYMBOL_FOR_MODULES(snp_prepare, "ccp");
> @@ -587,6 +600,13 @@ void snp_shutdown(void)
>  
>  	rmpopt_cleanup();
>  
> +	/*
> +	 * Re-enable CPU hotplug now that the firmware has disabled SNP; CPU
> +	 * hotplug is not re-enabled for a legacy SNP shutdown.  After
> +	 * rmpopt_cleanup() so RMPOPT_BASE is cleared with hotplug still disabled.
> +	 */
> +	cpu_hotplug_enable();
> +
>  	clear_rmp();
>  	on_each_cpu(mfd_reconfigure, NULL, 1);
>  }
> @@ -645,6 +665,8 @@ EXPORT_SYMBOL_FOR_MODULES(snp_setup_rmpopt, "ccp");
>   */
>  int __init snp_rmptable_init(void)
>  {
> +	u64 val;
> +
>  	if (WARN_ON_ONCE(!cc_platform_has(CC_ATTR_HOST_SEV_SNP)))
>  		return -ENOSYS;
>  
> @@ -654,6 +676,15 @@ int __init snp_rmptable_init(void)
>  	if (!setup_rmptable())
>  		return -ENOSYS;
>  
> +	/*
> +	 * On a kexec boot SNP may already be enabled (legacy firmware leaves
> +	 * SnpEn set across shutdown), in which case snp_prepare() bails without
> +	 * disabling CPU hotplug, so disable it here.
> +	 */
> +	rdmsrq(MSR_AMD64_SYSCFG, val);
> +	if (val & MSR_AMD64_SYSCFG_SNP_EN)
> +		cpu_hotplug_disable();
> +
>  	/*
>  	 * Setting crash_kexec_post_notifiers to 'true' to ensure that SNP panic
>  	 * notifier is invoked to do SNP IOMMU shutdown before kdump.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4839 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active
  2026-07-01  9:40   ` Jethro Beekman
@ 2026-07-01 16:39     ` K Prateek Nayak
  2026-07-01 21:08       ` Kalra, Ashish
  2026-07-01 21:25     ` Kalra, Ashish
  1 sibling, 1 reply; 14+ messages in thread
From: K Prateek Nayak @ 2026-07-01 16:39 UTC (permalink / raw)
  To: Jethro Beekman, Ashish Kalra, tglx, mingo, bp, dave.hansen, x86,
	hpa, seanjc, peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, Tycho.Andersen, Nathan.Fontenot,
	ackerleytng, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco

Hello Jethro,

On 7/1/2026 3:10 PM, Jethro Beekman wrote:
> I don't believe my concern has been addressed
> 
> https://lore.kernel.org/lkml/0df3b665-3a9c-4c46-a7aa-14388e8e1577@fortanix.com/

Quoting your question:

> I think this is too broad. If I have a hypervisor that supports SNP
> virtualization, a (non-confidential) L1 guest running Linux should
> still support CPU hotplug while also running confidential L2 guests.

Ashish, Tom, correct me if I'm wrong, but I don't think KVM exposes SNP
support to L1, at least as per
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/cpuid.c?h=v7.2-rc1#n1221
and only SNP initialization disables hotplug - not the other variants.

L1, running a confidential guest (SEV/SEV-ES) should still be able to
support hotplug since it doesn't go through SNP init. Only the base
hypervisor can setup the RMP tables and go through snp_prepare().

Also bsp_determine_snp() should clear CC_ATTR_HOST_SEV_SNP if it
detects X86_FEATURE_HYPERVISOR so I don't see how this can be a
problem for hotplug in L1.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/cpu/amd.c?h=v7.2-rc1#n368

-- 
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active
  2026-07-01 16:39     ` K Prateek Nayak
@ 2026-07-01 21:08       ` Kalra, Ashish
  0 siblings, 0 replies; 14+ messages in thread
From: Kalra, Ashish @ 2026-07-01 21:08 UTC (permalink / raw)
  To: K Prateek Nayak, Jethro Beekman, tglx, mingo, bp, dave.hansen,
	x86, hpa, seanjc, peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, Tycho.Andersen, Nathan.Fontenot,
	ackerleytng, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco

Hi Prateek,

On 7/1/2026 11:39 AM, K Prateek Nayak wrote:
> Hello Jethro,
> 
> On 7/1/2026 3:10 PM, Jethro Beekman wrote:
>> I don't believe my concern has been addressed
>>
>> https://lore.kernel.org/lkml/0df3b665-3a9c-4c46-a7aa-14388e8e1577@fortanix.com/
> 
> Quoting your question:
> 
>> I think this is too broad. If I have a hypervisor that supports SNP
>> virtualization, a (non-confidential) L1 guest running Linux should
>> still support CPU hotplug while also running confidential L2 guests.
> 
> Ashish, Tom, correct me if I'm wrong, but I don't think KVM exposes SNP
> support to L1, at least as per
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/cpuid.c?h=v7.2-rc1#n1221
> and only SNP initialization disables hotplug - not the other variants.
> 
> L1, running a confidential guest (SEV/SEV-ES) should still be able to
> support hotplug since it doesn't go through SNP init. Only the base
> hypervisor can setup the RMP tables and go through snp_prepare().
> 
> Also bsp_determine_snp() should clear CC_ATTR_HOST_SEV_SNP if it
> detects X86_FEATURE_HYPERVISOR so I don't see how this can be a
> problem for hotplug in L1.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/cpu/amd.c?h=v7.2-rc1#n368
> 

bsp_determine_snp() only sets CC_ATTR_HOST_SEV_SNP when X86_FEATURE_HYPERVISOR is clear:

  if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&
      (ZEN3 || ZEN4 || RMPREAD) && snp_probe_rmptable_info())
          cc_platform_set(CC_ATTR_HOST_SEV_SNP);
  else {
          setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
          cc_platform_clear(CC_ATTR_HOST_SEV_SNP);  
  }

So Linux running as an L1 guest (HYPERVISOR set) never has CC_ATTR_HOST_SEV_SNP.

And both hotplug-disable sites sit behind that flag:
  - snp_prepare() is only called from __sev_snp_init_locked(), which returns -ENODEV early if !cc_platform_has(CC_ATTR_HOST_SEV_SNP).
  - snp_rmptable_init() bails (WARN_ON_ONCE(!cc_platform_has(CC_ATTR_HOST_SEV_SNP))) before its kexec one-shot disable.

So an L1 guest can't reach the disable at all; only the bare-metal host that programs the RMP does.

An L1 running SEV/SEV-ES guests never goes through SNP host init, so it's hotplug is unaffected and KVM doesn't expose SNP to L1.

So there's no impact on L1 hotplug currently.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active
  2026-07-01  9:40   ` Jethro Beekman
  2026-07-01 16:39     ` K Prateek Nayak
@ 2026-07-01 21:25     ` Kalra, Ashish
  1 sibling, 0 replies; 14+ messages in thread
From: Kalra, Ashish @ 2026-07-01 21:25 UTC (permalink / raw)
  To: Jethro Beekman, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco


On 7/1/2026 4:40 AM, Jethro Beekman wrote:
> Hi Ashish,
> 
> I don't believe my concern has been addressed
> 
> https://lore.kernel.org/lkml/0df3b665-3a9c-4c46-a7aa-14388e8e1577@fortanix.com/
> 
> --

The disable tracks SNP_INIT, not "SNP" in general: SNP_INIT requires SnpEn to be set on all present CPUs, and a CPU brought online afterward wouldn't have it, so the kernel that runs SNP_INIT must keep its CPU set stable. Today the only kernel that runs SNP_INIT is the bare-metal host, so a plain L1 guest keeps full CPU hotplug.

Concretely, the path is gated by CC_ATTR_HOST_SEV_SNP, which bsp_determine_snp() sets only when X86_FEATURE_HYPERVISOR is clear and clears otherwise 
(as Prateek pointed out). So a Linux L1 guest never has it set, never reaches snp_prepare()/snp_rmptable_init(), and keeps CPU hotplug — 
including while running SEV/SEV-ES confidential L2 guests. Only SNP initialization disables hotplug; the other SEV variants don't. And KVM doesn't expose
SNP to L1, so an L1 can't be an SNP host today in any case.
  
On the nested scenario you raised: if SNP-guest-as-L2 support is added, an L1 acting as an SNP host would run a *virtualized* SNP_INIT. A faithful virtualization carries the same constraint as physical SNP_INIT — all present (v)CPUs must be SnpEn — so that L1 would have the same (v)CPU-hotplug-disable requirement, just over its virtual CPUs, and this same code would apply at that level. So the disable isn't too broad; it correctly tracks SNP_INIT. It simply doesn't apply to a plain L1 guest today, because such a guest isn't running SNP_INIT.

Thanks,
Ashish

> Jethro Beekman | CTO | Fortanix
> 
> On 2026-06-30 20:11, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> While SNP is active, every memory write is checked against the RMP to
>> protect SEV-SNP guest memory.  A core performs these RMP checks only once
>> SNP has been initialized via SNP_INIT and the SNP-enable bit in SYSCFG is
>> set on that core; the firmware requires the SNP-enable bit to be set on
>> every present CPU before SNP initialization.  A core that is not
>> SNP-enabled and not SNP-initialized performs no RMP checks at all, so
>> there is no valid configuration with SNP active and any CPU exempt from
>> RMP checks.
>>
>> The firmware determines which CPUs are present from the processor and the
>> BIOS/UEFI configuration (e.g. SMT disabled in the BIOS) and enumerates
>> them at SNP init; it is not aware of the OS bringing CPUs online or
>> offline afterwards.  SNP_INIT fails unless SnpEn is set on all CPUs, so a
>> CPU that is offline at SNP init does not have SnpEn set, SNP_INIT fails,
>> and there can be no SNP guest memory.  OS CPU hotplug can thus diverge
>> from the firmware's expectations and break SNP.
>>
>> Tie CPU hotplug to the SNP-enable bit: disable it in snp_prepare() before
>> SNP is enabled, and re-enable it in snp_shutdown() once the firmware has
>> disabled SNP.  If snp_prepare() fails before enabling SNP it re-enables
>> hotplug itself; once SNP is enabled hotplug stays disabled, including
>> across a failed SNP_INIT and across the legacy SNP_SHUTDOWN_EX path, both
>> of which leave SNP enabled.  A kexec target that boots with SNP already
>> enabled disables hotplug once in snp_rmptable_init(), since snp_prepare()
>> bails when SNP is already enabled.
>>
>> This also keeps the CPU set stable for the asynchronous RMPOPT scan added
>> later in this series, and ensures cpus_read_lock() in the scan is
>> uncontended.
>>
>> Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> ---
>>  arch/x86/virt/svm/sev.c | 31 +++++++++++++++++++++++++++++++
>>  1 file changed, 31 insertions(+)
>>
>> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
>> index dab6e1c290bc..04a58ac4339c 100644
>> --- a/arch/x86/virt/svm/sev.c
>> +++ b/arch/x86/virt/svm/sev.c
>> @@ -535,6 +535,15 @@ int snp_prepare(void)
>>  
>>  	clear_rmp();
>>  
>> +	/*
>> +	 * Disable CPU hotplug before enabling SNP, so no CPU can come online
>> +	 * without SnpEn while SNP is enabled; it is re-enabled in snp_shutdown()
>> +	 * once SNP is disabled.  Must be before cpus_read_lock():
>> +	 * cpu_hotplug_disable() takes cpu_add_remove_lock, which nests above
>> +	 * cpu_hotplug_lock.
>> +	 */
>> +	cpu_hotplug_disable();
>> +
>>  	cpus_read_lock();
>>  
>>  	if (!cpumask_equal(cpu_online_mask, cpu_present_mask)) {
>> @@ -560,6 +569,10 @@ int snp_prepare(void)
>>  unlock:
>>  	cpus_read_unlock();
>>  
>> +	/* Re-enable CPU hotplug; SnpEn was never set. */
>> +	if (ret)
>> +		cpu_hotplug_enable();
>> +
>>  	return ret;
>>  }
>>  EXPORT_SYMBOL_FOR_MODULES(snp_prepare, "ccp");
>> @@ -587,6 +600,13 @@ void snp_shutdown(void)
>>  
>>  	rmpopt_cleanup();
>>  
>> +	/*
>> +	 * Re-enable CPU hotplug now that the firmware has disabled SNP; CPU
>> +	 * hotplug is not re-enabled for a legacy SNP shutdown.  After
>> +	 * rmpopt_cleanup() so RMPOPT_BASE is cleared with hotplug still disabled.
>> +	 */
>> +	cpu_hotplug_enable();
>> +
>>  	clear_rmp();
>>  	on_each_cpu(mfd_reconfigure, NULL, 1);
>>  }
>> @@ -645,6 +665,8 @@ EXPORT_SYMBOL_FOR_MODULES(snp_setup_rmpopt, "ccp");
>>   */
>>  int __init snp_rmptable_init(void)
>>  {
>> +	u64 val;
>> +
>>  	if (WARN_ON_ONCE(!cc_platform_has(CC_ATTR_HOST_SEV_SNP)))
>>  		return -ENOSYS;
>>  
>> @@ -654,6 +676,15 @@ int __init snp_rmptable_init(void)
>>  	if (!setup_rmptable())
>>  		return -ENOSYS;
>>  
>> +	/*
>> +	 * On a kexec boot SNP may already be enabled (legacy firmware leaves
>> +	 * SnpEn set across shutdown), in which case snp_prepare() bails without
>> +	 * disabling CPU hotplug, so disable it here.
>> +	 */
>> +	rdmsrq(MSR_AMD64_SYSCFG, val);
>> +	if (val & MSR_AMD64_SYSCFG_SNP_EN)
>> +		cpu_hotplug_disable();
>> +
>>  	/*
>>  	 * Setting crash_kexec_post_notifiers to 'true' to ensure that SNP panic
>>  	 * notifier is invoked to do SNP IOMMU shutdown before kdump.
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-07-01 21:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-30 18:08 [PATCH v10 0/6] Add RMPOPT support Ashish Kalra
2026-06-30 18:09 ` [PATCH v10 1/6] x86/cpufeatures: Add X86_FEATURE_RMPOPT feature flag Ashish Kalra
2026-06-30 18:10 ` [PATCH v10 2/6] x86/sev: Initialize RMPOPT configuration MSRs Ashish Kalra
2026-06-30 18:11 ` [PATCH v10 3/6] x86/sev: Disable CPU hotplug while SNP is active Ashish Kalra
2026-06-30 18:32   ` sashiko-bot
2026-07-01  9:40   ` Jethro Beekman
2026-07-01 16:39     ` K Prateek Nayak
2026-07-01 21:08       ` Kalra, Ashish
2026-07-01 21:25     ` Kalra, Ashish
2026-06-30 18:11 ` [PATCH v10 4/6] x86/sev: Add support to perform RMP optimizations asynchronously Ashish Kalra
2026-06-30 18:28   ` sashiko-bot
2026-06-30 18:11 ` [PATCH v10 5/6] x86/sev: Add interface to re-enable RMP optimizations Ashish Kalra
2026-06-30 18:12 ` [PATCH v10 6/6] KVM: SEV: Perform RMP optimizations on SNP guest shutdown Ashish Kalra
2026-06-30 18:24   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.