From: K Prateek Nayak <kprateek.nayak@amd.com>
To: "Kalra, Ashish" <ashish.kalra@amd.com>, <tglx@kernel.org>,
<mingo@redhat.com>, <bp@alien8.de>, <dave.hansen@linux.intel.com>,
<x86@kernel.org>, <hpa@zytor.com>, <seanjc@google.com>,
<peterz@infradead.org>, <thomas.lendacky@amd.com>,
<herbert@gondor.apana.org.au>, <davem@davemloft.net>,
<ardb@kernel.org>
Cc: <pbonzini@redhat.com>, <aik@amd.com>, <Michael.Roth@amd.com>,
<Tycho.Andersen@amd.com>, <Nathan.Fontenot@amd.com>,
<ackerleytng@google.com>, <jackyli@google.com>,
<pgonda@google.com>, <rientjes@google.com>, <jacobhxu@google.com>,
<xin@zytor.com>, <pawan.kumar.gupta@linux.intel.com>,
<babu.moger@amd.com>, <dyoung@redhat.com>, <nikunj@amd.com>,
<john.allen@amd.com>, <darwi@linutronix.de>,
<linux-kernel@vger.kernel.org>, <linux-crypto@vger.kernel.org>,
<kvm@vger.kernel.org>, <linux-coco@lists.linux.dev>
Subject: Re: [PATCH v8 4/7] x86/sev: Add support to perform RMP optimizations asynchronously
Date: Wed, 17 Jun 2026 09:50:06 +0530 [thread overview]
Message-ID: <75cf11f1-51fc-4f1a-a9a7-4b9403d2bb8b@amd.com> (raw)
In-Reply-To: <8c5f4082-e3a5-4f65-b058-33938a7ee324@amd.com>
Hello Ashish,
On 6/17/2026 1:26 AM, Kalra, Ashish wrote:
> Hello Prateek,
>
> On 6/16/2026 2:27 AM, K Prateek Nayak wrote:
>> Hello Ashish,
>>
>> On 6/16/2026 1:19 AM, Ashish Kalra wrote:
>>> + /*
>>> + * RMPOPT scans the RMP table, stores the result of the scan in the
>>> + * reserved processor memory. The RMP scan is the most expensive
>>> + * part. If a second RMPOPT occurs, it can skip the expensive scan
>>> + * if they can see a cached result in the reserved processor memory.
>>> + *
>>> + * Do RMPOPT on one CPU alone. Then, follow that up with RMPOPT
>>> + * on every other primary thread. Followers are "designed to"
>>> + * skip the scan if they see the "cached" scan results.
>>> + */
>>> + cpumask_copy(follower_mask, &rmpopt_cpumask);
>>
>> rmpopt_cpumask is constructed after hotplug is disabled but ...
>>
>>> +
>>> + /*
>>> + * Pin the worker to the current CPU for the leader loop so that
>>> + * this_cpu remains valid and the RMPOPT instruction executes on
>>> + * the correct CPU.
>>> + *
>>> + * Use migrate_disable() rather than get_cpu() to prevent
>>> + * migration while still allowing preemption.
>>> + */
>>> + migrate_disable();
>>> + this_cpu = smp_processor_id();
>>> +
>>> + if (cpumask_test_cpu(this_cpu, follower_mask)) {
>>> + /*
>>> + * Current CPU is a primary thread in rmpopt_cpumask.
>>> + * Run leader locally and remove from follower mask.
>>> + */
>>> + cpumask_clear_cpu(this_cpu, follower_mask);
>>> +
>>> + for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G) {
>>> + rmpopt(pa);
>>> + cond_resched();
>>> + }
>>> + } else if (cpumask_intersects(topology_sibling_cpumask(this_cpu),
>>> + follower_mask)) {
>>> + /*
>>> + * Current CPU is a sibling thread whose primary is in
>>> + * rmpopt_cpumask. RMPOPT_BASE MSR is per-core, so it
>>> + * is safe to run the leader locally. Remove the sibling's
>>> + * primary from the follower mask as this core is already
>>> + * covered by the leader.
>>> + */
>>> + cpumask_andnot(follower_mask, follower_mask,
>>> + topology_sibling_cpumask(this_cpu));
>>> +
>>> + for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G) {
>>> + rmpopt(pa);
>>> + cond_resched();
>>> + }
>>> + } else {
>>> + /*
>>> + * Current CPU does not have RMPOPT_BASE MSR programmed.
>>> + * Pick an explicit leader from the cpumask to avoid #UD.
>>> + * Use work_on_cpu() to run in process context on the leader,
>>> + * avoiding IPI latency.
>>> + */
>>
>> ... this_cpu is neither in the "rmpopt_cpumask", nor is any of its
>> siblings on "rmpopt_cpumask".
>>
>> How does that happen?
>
> Actually, this was the implementation before the CPU hotplug disable enforcement code was implemented and added in v8,
> and i should have fixed this rmpopt_work_handler() accordingly for v8.
>
> With the enforced cpu hotplug disable support, case #3 here (above) is now dead code, and removing it lets
> cases #1 and #2 collapse too.
>
> snp_prepare() requires cpu_online_mask == cpu_present_mask before SNP init — so when snp_setup_rmpopt() programs the MSRs, every
> core's primary is online -> every core is in rmpopt_cpumask.
>
> So now the work handler always runs on a CPU whose core is programmed. topology_sibling_cpumask(this_cpu) therefore always intersects
> rmpopt_cpumask -> case #1 or #2 always matches.
>
> So i should actually drop case #3 here - which is: "this_cpu is neither in the "rmpopt_cpumask", nor is any of its
> siblings on rmpopt_cpumask"
Ack.
Also the fact that cpu_mark_primary_thread() uses LSBs of APICID and if
you have some insanely weird configuration - like boot with maxcpus=1,
online all the secondary threads (CPUs 256-511 on a 256C/512T system),
launch an SNP guest - it can actually leave everything except CORE0 out
of the "rmpopt_cpumask".
>
>
>>
>>> + int leader_cpu = cpumask_first(follower_mask);
>>> +
>>> + if (WARN_ON_ONCE(leader_cpu >= nr_cpu_ids)) {
>>> + migrate_enable();
>>> + goto out;
>>> + }
>>> +
>>> + cpumask_clear_cpu(leader_cpu, follower_mask);
>>> +
>>> + /* Release migration pin before work_on_cpu(). */
>>> + migrate_enable();
>>> +
>>> + work_on_cpu(leader_cpu, rmpopt_leader_fn, NULL);
>>
>> This creates a delayed work and also waits for it to finish execution
>> which will add more latency than a simple IPI if the comment about IPI
>> latency above is accurate.
>>
>> I think there is some corner case in construction of the
>> "rmpopt_cpumask" that requires this not-so-pretty else block. Can you
>> elaborate why this is required?
>>
>> Perhaps the "rmpopt_cpumask" construction needs:
>>
>> for_each_online_cpu(cpu) {
>> /* Nominate the first CPU on the sibling mask for RMPOPT */
>> if (cpu != cpumask_first(topology_sibling_cpumask(cpu)))
>> continue;
>> cpumask_set_cpu(cpu, &rmpopt_cpumask);
>> }
>>
>>
>> and all you need here is:
>>
>> /* Do RMPOPt for local core */
>> for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G)
>> rmpopt(pa);
>>
>> /* Skip this core from concurrent RMPOPT */
>> cpumask_and_not(follower_mask, &rmpopt_cpumask, topology_sibling_cpumask(cpu));
>>
>> No?
>>
>
> Yes, a simpler implementation will be like this:
> ...
>
> if (!alloc_cpumask_var(&follower_mask, GFP_KERNEL))
> return;
>
If you move the migrate_disable() here, you can simply do an andnot
without needing to copy the rmpopt_cpumask beforehand and save on one
cpumask iteration.
> cpumask_copy(follower_mask, &rmpopt_cpumask);
>
> /*
> * The current CPU's core always has RMPOPT_BASE programmed
> * (snp_prepare() required all CPUs online at setup and CPU hotplug
> * is disabled while SNP is active), so it can always be the leader.
> * RMPOPT_BASE is per-core; exclude this core from the followers.
> */
> migrate_disable();
> cpumask_andnot(follower_mask, follower_mask,
> topology_sibling_cpumask(smp_processor_id()));
>
> for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G) {
> rmpopt(pa);
> cond_resched();
> }
> migrate_enable();
>
> cpus_read_lock();
I think you can even skip the cpus_read_lock() since we know for a
fact that hotplug is disabled when we are here.
Perhaps we can have a lockdep_assert_cpu_hotplug_disabled() which
ensures we'll get a splat if that assumption ever changes when
running with LOCKDEP?
I'll let others comment if that is a good idea or not.
> for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G) {
> on_each_cpu_mask(follower_mask, rmpopt_smp, (void *)pa, true);
> cond_resched();
> }
> cpus_read_unlock();
>
> free_cpumask_var(follower_mask);
>
>
> Here, the leader exclusion must use the sibling mask, not clear_cpu(this_cpu). That's why my collapsed version uses:
>
> cpumask_andnot(follower_mask, follower_mask,
> topology_sibling_cpumask(smp_processor_id()));
>
> - If this_cpu is a primary: its sibling mask contains itself (the primary) -> andnot removes this core's primary from the followers.
>
> - If this_cpu is a secondary: it isn't in follower_mask at all, but its sibling mask contains its primary, which is in
> follower_mask -> andnot still removes this core's primary.
>
> So either way the current core is dropped from the followers. (The old code needed two branches because case #1 used
> clear_cpu(this_cpu) — only correct when this_cpu is the primary — while case #2 used the sibling andnot. The single andnot works for
> both cases).
Ack! And I think this looks much cleaner (to my eyes at least ;-)
--
Thanks and Regards,
Prateek
next prev parent reply other threads:[~2026-06-17 4:20 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1781419998.git.ashish.kalra@amd.com>
2026-06-15 19:48 ` [PATCH v8 1/7] x86/cpufeatures: Add X86_FEATURE_RMPOPT feature flag Ashish Kalra
2026-06-15 19:48 ` [PATCH v8 2/7] x86/sev: Initialize RMPOPT configuration MSRs Ashish Kalra
2026-06-16 6:03 ` K Prateek Nayak
2026-06-15 19:49 ` [PATCH v8 3/7] crypto/ccp: Disable CPU hotplug while SNP is active Ashish Kalra
2026-06-17 4:33 ` K Prateek Nayak
2026-06-15 19:49 ` [PATCH v8 4/7] x86/sev: Add support to perform RMP optimizations asynchronously Ashish Kalra
2026-06-16 7:27 ` K Prateek Nayak
2026-06-16 19:56 ` Kalra, Ashish
2026-06-17 4:20 ` K Prateek Nayak [this message]
2026-06-15 19:49 ` [PATCH v8 5/7] x86/sev: Add interface to re-enable RMP optimizations Ashish Kalra
2026-06-15 19:50 ` [PATCH v8 6/7] KVM: SEV: Perform RMP optimizations on SNP guest shutdown Ashish Kalra
2026-06-15 19:50 ` [PATCH v8 7/7] x86/sev: Add debugfs support for RMPOPT Ashish Kalra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75cf11f1-51fc-4f1a-a9a7-4b9403d2bb8b@amd.com \
--to=kprateek.nayak@amd.com \
--cc=Michael.Roth@amd.com \
--cc=Nathan.Fontenot@amd.com \
--cc=Tycho.Andersen@amd.com \
--cc=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=ardb@kernel.org \
--cc=ashish.kalra@amd.com \
--cc=babu.moger@amd.com \
--cc=bp@alien8.de \
--cc=darwi@linutronix.de \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dyoung@redhat.com \
--cc=herbert@gondor.apana.org.au \
--cc=hpa@zytor.com \
--cc=jackyli@google.com \
--cc=jacobhxu@google.com \
--cc=john.allen@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nikunj@amd.com \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pgonda@google.com \
--cc=rientjes@google.com \
--cc=seanjc@google.com \
--cc=tglx@kernel.org \
--cc=thomas.lendacky@amd.com \
--cc=x86@kernel.org \
--cc=xin@zytor.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox