From: "Kalra, Ashish" <ashish.kalra@amd.com>
To: Dave Hansen <dave.hansen@intel.com>,
Sean Christopherson <seanjc@google.com>
Cc: tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
peterz@infradead.org, thomas.lendacky@amd.com,
herbert@gondor.apana.org.au, davem@davemloft.net,
ardb@kernel.org, pbonzini@redhat.com, aik@amd.com,
Michael.Roth@amd.com, KPrateek.Nayak@amd.com,
Tycho.Andersen@amd.com, Nathan.Fontenot@amd.com,
jackyli@google.com, pgonda@google.com, rientjes@google.com,
jacobhxu@google.com, xin@zytor.com,
pawan.kumar.gupta@linux.intel.com, babu.moger@amd.com,
dyoung@redhat.com, nikunj@amd.com, john.allen@amd.com,
darwi@linutronix.de, linux-kernel@vger.kernel.org,
linux-crypto@vger.kernel.org, kvm@vger.kernel.org,
linux-coco@lists.linux.dev
Subject: Re: [PATCH v2 3/7] x86/sev: add support for RMPOPT instruction
Date: Wed, 4 Mar 2026 19:40:25 -0600 [thread overview]
Message-ID: <0fbb94ad-bfcf-4fbe-bf40-d79051d67ad8@amd.com> (raw)
In-Reply-To: <7ab8d3af-b4f5-481c-ab2e-059ddd7e718e@intel.com>
Hello Dave and Sean,
On 3/4/2026 9:25 AM, Dave Hansen wrote:
> On 3/4/26 07:01, Sean Christopherson wrote:
>> I don't see any performance data in either posted version. Bluntly, this series
>> isn't going anywhere without data to guide us. E.g. comments like this from v1
>>
>> : And there is a cost associated with re-enabling the optimizations for all
>> : system RAM (even though it runs as a background kernel thread executing RMPOPT
>> : on different 1GB regions in parallel and with inline cond_resched()'s),
>> : we don't want to run this periodically.
>>
>> suggest there is meaningful cost associated with the scan.
>
> Well the RMP is 0.4% of the size of system memory, and I assume that you
> need to scan the whole table. There are surely shortcuts for 2M pages,
> but with 4k, that's ~8.5GB of RMP table for 2TB of memory. That's an
> awful lot of memory traffic for each CPU.
The RMPOPT instruction is optimized for 2M pages, so it checks that
all 512 2MB entries in that 1GB region are not assigned, i.e., for each
2MB RMP in the 1GB region containing the specified address it checks if
they are not assigned.
>
> It'll be annoying to keep a refcount per 1GB of paddr space.
>
> One other way to do it would be to loosely mirror the RMPOPT bitmap and
> keep our own bitmap of 1GB regions that _need_ RMPOPT run on them. Any
> private=>shared conversion sets a bit in the bitmap and schedules some
> work out in the future.
>
> It could also be less granular than that. Instead of any private=>shared
> conversion, the RMPOPT scan could be triggered on VM destruction which
> is much more likely to result in RMPOPT doing anything useful.
Yes, it will need to be more granular than scheduling RMPOPT work for any
private->shared conversion.
And that's what we are doing in v2 patch series, RMPOPT scan getting
triggered on VM destruction.
>
> BTW, I assume that the RMPOPT disable machinery is driven from the
> INVLPGB-like TLB invalidations that are a part of the SNP
> shared=>private conversions. It's a darn shame that RMPOPT wasn't
> broadcast in the same way. It would save the poor OS a lot of work. The
> RMPOPT table is per-cpu of course, but I'm not sure what keeps *a* CPU
> from broadcasting its success finding an SNP-free physical region to
> other CPUs.
The hardware does this broadcast for the RMPUPDATE instruction,
a broadcast will be sent in the RMPUPDATE instruction to clear matching entries
in other RMPOPT tables. This broadcast will be sent to all CPUs.
For the RMPOPT instruction itself, there is no such broadcast, but RMPOPT
instruction needs to be executed on only one thread per core, the
per-CPU RMPOPT table of the other sibling thread will be programmed while
executing the same instruction.
That's the reason, why we had this optimization to executing RMPOPT instruction
on only the primary thread as part of the v1 patch series and i believe we should
include this optimization as part of future series.
>
> tl;dr: I agree with you. The cost of these scans is going to be
> annoying, and it's going to need OS help to optimize it.
Here is some performance data:
Raw CPU cycles for a single RMPOPT instruction, func=0 :
RMPOPT during snp_rmptable_init() while booting:
....
[ 12.098580] SEV-SNP: RMPOPT max. CPU cycles 501460
[ 12.103839] SEV-SNP: RMPOPT min. CPU cycles 60
[ 12.108799] SEV-SNP: RMPOPT average cycles 139790
RMPOPT during SNP_INIT_EX, at CCP module load at boot:
[ 40.206619] SEV-SNP: RMPOPT max. CPU cycles 248083620
[ 40.206629] SEV-SNP: RMPOPT min. CPU cycles 60
[ 40.206629] SEV-SNP: RMPOPT average cycles 249820
RMPOPT after SNP guest shutdown:
...
[ 298.746893] SEV-SNP: RMPOPT max. CPU cycles 248083620
[ 298.746898] SEV-SNP: RMPOPT min. CPU cycles 60
[ 298.746900] SEV-SNP: RMPOPT average cycles 127859
I believe the min. CPU cycles is the case where RMPOPT fails early.
Raw CPU cycles for one complete iteration of executing RMPOPT (func=0) on all CPUs for the whole RAM:
This is for this complete loop with cond_resched() removed.
while (!kthread_should_stop()) {
phys_addr_t pa;
pr_info("RMP optimizations enabled on physical address range @1GB alignment [0x%016llx - 0x%016llx]\n",
pa_start, pa_end);
start = rdtsc_ordered();
/*
* RMPOPT optimizations skip RMP checks at 1GB granularity if this range of
* memory does not contain any SNP guest memory.
*/
for (pa = pa_start; pa < pa_end; pa += PUD_SIZE) {
/* Bit zero passes the function to the RMPOPT instruction. */
on_each_cpu_mask(cpu_online_mask, rmpopt,
(void *)(pa | RMPOPT_FUNC_VERIFY_AND_REPORT_STATUS),
true);
}
end = rdtsc_ordered();
pr_info("RMPOPT cycles taken for physical address range 0x%016llx - 0x%016llx on all cpus %llu cycles\n",
pa_start, pa_end, end - start);
set_current_state(TASK_INTERRUPTIBLE);
schedule();
}
RMPOPT during snp_rmptable_init() while booting:
...
[ 12.114047] SEV-SNP: RMPOPT cycles taken for physical address range 0x0000000000000000 - 0x0000010380000000 on all cpus 1499496600 cycles
RMPOPT during SNP_INIT_EX:
...
[ 40.206630] SEV-SNP: RMPOPT cycles taken for physical address range 0x0000000000000000 - 0x0000010380000000 on all cpus 686519180 cycles
RMPOPT after SNP guest shutdown:
...
[ 298.746900] SEV-SNP: RMPOPT cycles taken for physical address range 0x0000000000000000 - 0x0000010380000000 on all cpus 369059160 cycles
Thanks,
Ashish
next prev parent reply other threads:[~2026-03-05 1:40 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 21:35 [PATCH v2 0/7] Add RMPOPT support Ashish Kalra
2026-03-02 21:35 ` [PATCH v2 1/7] x86/cpufeatures: Add X86_FEATURE_AMD_RMPOPT feature flag Ashish Kalra
2026-03-02 23:00 ` Dave Hansen
2026-03-05 12:36 ` Borislav Petkov
2026-03-02 21:35 ` [PATCH v2 2/7] x86/sev: add support for enabling RMPOPT Ashish Kalra
2026-03-02 22:32 ` Dave Hansen
2026-03-02 22:55 ` Kalra, Ashish
2026-03-02 23:00 ` Dave Hansen
2026-03-02 23:11 ` Kalra, Ashish
2026-03-02 22:33 ` Dave Hansen
2026-03-06 15:18 ` Borislav Petkov
2026-03-06 15:33 ` Tom Lendacky
2026-03-02 21:36 ` [PATCH v2 3/7] x86/sev: add support for RMPOPT instruction Ashish Kalra
2026-03-02 22:57 ` Dave Hansen
2026-03-02 23:09 ` Kalra, Ashish
2026-03-02 23:15 ` Dave Hansen
2026-03-04 15:56 ` Andrew Cooper
2026-03-04 16:03 ` Dave Hansen
2026-03-25 21:53 ` Kalra, Ashish
2026-03-26 0:40 ` Andrew Cooper
2026-03-26 2:02 ` Kalra, Ashish
2026-03-26 2:14 ` Kalra, Ashish
2026-03-04 15:01 ` Sean Christopherson
2026-03-04 15:25 ` Dave Hansen
2026-03-04 15:32 ` Dave Hansen
2026-03-05 1:40 ` Kalra, Ashish [this message]
2026-03-05 19:22 ` Kalra, Ashish
2026-03-05 19:40 ` Dave Hansen
2026-03-11 21:24 ` Kalra, Ashish
2026-03-11 22:20 ` Dave Hansen
2026-03-16 19:03 ` Kalra, Ashish
2026-03-18 14:00 ` Dave Hansen
2026-03-02 21:36 ` [PATCH v2 4/7] x86/sev: Add interface to re-enable RMP optimizations Ashish Kalra
2026-03-02 21:36 ` [PATCH v2 5/7] KVM: guest_memfd: Add cleanup interface for guest teardown Ashish Kalra
2026-03-09 9:01 ` Ackerley Tng
2026-03-10 22:18 ` Kalra, Ashish
2026-03-11 6:00 ` Ackerley Tng
2026-03-11 21:49 ` Kalra, Ashish
2026-03-27 17:16 ` Ackerley Tng
2026-03-02 21:37 ` [PATCH v2 6/7] KVM: SEV: Implement SEV-SNP specific guest cleanup Ashish Kalra
2026-03-02 21:37 ` [PATCH v2 7/7] x86/sev: Add debugfs support for RMPOPT Ashish Kalra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0fbb94ad-bfcf-4fbe-bf40-d79051d67ad8@amd.com \
--to=ashish.kalra@amd.com \
--cc=KPrateek.Nayak@amd.com \
--cc=Michael.Roth@amd.com \
--cc=Nathan.Fontenot@amd.com \
--cc=Tycho.Andersen@amd.com \
--cc=aik@amd.com \
--cc=ardb@kernel.org \
--cc=babu.moger@amd.com \
--cc=bp@alien8.de \
--cc=darwi@linutronix.de \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dyoung@redhat.com \
--cc=herbert@gondor.apana.org.au \
--cc=hpa@zytor.com \
--cc=jackyli@google.com \
--cc=jacobhxu@google.com \
--cc=john.allen@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nikunj@amd.com \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pgonda@google.com \
--cc=rientjes@google.com \
--cc=seanjc@google.com \
--cc=tglx@kernel.org \
--cc=thomas.lendacky@amd.com \
--cc=x86@kernel.org \
--cc=xin@zytor.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox