public inbox for linux-coco@lists.linux.dev
 help / color / mirror / Atom feed
From: "Kalra, Ashish" <ashish.kalra@amd.com>
To: Dave Hansen <dave.hansen@intel.com>,
	Sean Christopherson <seanjc@google.com>
Cc: tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, thomas.lendacky@amd.com,
	herbert@gondor.apana.org.au, davem@davemloft.net,
	ardb@kernel.org, pbonzini@redhat.com, aik@amd.com,
	Michael.Roth@amd.com, KPrateek.Nayak@amd.com,
	Tycho.Andersen@amd.com, Nathan.Fontenot@amd.com,
	jackyli@google.com, pgonda@google.com, rientjes@google.com,
	jacobhxu@google.com, xin@zytor.com,
	pawan.kumar.gupta@linux.intel.com, babu.moger@amd.com,
	dyoung@redhat.com, nikunj@amd.com, john.allen@amd.com,
	darwi@linutronix.de, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev
Subject: Re: [PATCH v2 3/7] x86/sev: add support for RMPOPT instruction
Date: Wed, 11 Mar 2026 16:24:13 -0500	[thread overview]
Message-ID: <cc9bf918-a14b-4619-a084-3f424fa16ea1@amd.com> (raw)
In-Reply-To: <d7ba3790-a959-4150-87e0-c87dea4d09c5@intel.com>

Hello Dave and Sean,

On 3/5/2026 1:40 PM, Dave Hansen wrote:
> On 3/5/26 11:22, Kalra, Ashish wrote:
>> But, these are the performance numbers you should be considering : 
>>
>> RMPOPT during boot: 
>>
>> [   49.913402] SEV-SNP: RMPOPT largest cycles 1143020
>> [   49.913407] SEV-SNP: RMPOPT smallest cycles 60
>> [   49.913408] SEV-SNP: RMPOPT average cycles 5226
>>
>> RMPOPT after SNP guest shutdown: 
>>
>> [  276.435091] SEV-SNP: RMPOPT largest cycles 83680
>> [  276.435096] SEV-SNP: RMPOPT smallest cycles 60
>> [  276.435097] SEV-SNP: RMPOPT average cycles 5658
> 
> First of all, I'd really appreciate wall clock measurements on these.
> It's just less math and guesswork. Cycles are easy to measure but hard
> to read. Please make these easier to read. Also, the per-RMPOPT numbers
> don't mean much. You have to scale it by the number of CPUs and memory
> (or 2TB) to get to a real, useful number.
> 
> The thing that matters is how long this loop takes:
> 
> 	for (pa = pa_start; pa < pa_end; pa += PUD_SIZE)
> 
> and *especially* how long it takes per-cpu and when the system has a
> full 2TB load of memory.
> 
> That will tell us how many resources this RMPOPT thing is going to take,
> which is the _real_ thing we need to know.
> 
> Also, to some degree, the thing we care about here the *most* is the
> worst case scenario. I think the worst possible case is that there's one
> 4k private page in each 1GB of memory, and that it's the last 4k page.
> I'd like to see numbers for something close to *that*, not when there
> are no private pages.
> 
> The two things you measured above are interesting, but they're only part
> of the story.
> 

Here is the concerned performance data:

All these measurements are done with 2TB RAM installed on the server:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           2.0Ti        13Gi       1.9Ti       8.8Mi       1.6Gi       1.9Ti
Swap:          2.0Gi          0B       2.0Gi


For the loop executing RMPOPT on up-to 2TB of RAM on all CPUs: 

                ..
                start = ktime_get();
               
                for (pa = pa_start; pa < pa_end; pa += PUD_SIZE) {
                        /* Bit zero passes the function to the RMPOPT instruction. */
                        on_each_cpu_mask(cpu_online_mask, rmpopt,
                                         (void *)(pa | RMPOPT_FUNC_VERIFY_AND_REPORT_STATUS),
                                         true);
                }
                end = ktime_get();

                elapsed_ns = ktime_to_ns(ktime_sub(end, start));
		...

There are 2 active SNP VMs here, with one SNP VM being terminated, the other SNP VM is still running, both VMs are configured with 100GB guest RAM: 

When this loop is executed when the SNP guest terminates:

[  232.789187] SEV-SNP: RMPOPT execution time 391609638 ns for physical address range 0x0000000000000000 - 0x0000020000000000 on all cpus -> ~391 ms

[  234.647462] SEV-SNP: RMPOPT execution time 457933019 ns for physical address range 0x0000000000000000 - 0x0000020000000000 on all cpus -> ~457 ms


Now, there are a couple of additional RMPOPT optimizations which can be applied to this loop : 

1). RMPOPT can skip the bulk of its work if another CPU has already optimized that region.
The optimal thing may be to optimize all memory on one CPU first, and then let all the others
run RMPOPT in parallel.

2). The other optimization being applied here is only executing RMPOPT on only thread per
core.

The code sequence being used here:

	...
        /* Only one thread per core needs to issue RMPOPT instruction */
        for_each_online_cpu(cpu) {
                if (!topology_is_primary_thread(cpu))
                        continue;

                cpumask_set_cpu(cpu, cpus);
        }

         while (!kthread_should_stop()) {
         	...
                start = ktime_get();
               
                /*
                 * RMPOPT is optimized to skip the bulk of its work if another CPU has already
                 * optimized that region. Optimize all memory on one CPU first, and then let all
                 * the others run RMPOPT in parallel.
                 */
                cpumask_clear_cpu(smp_processor_id(), cpus);

                /* current CPU */
                for (pa = pa_start; pa < pa_end; pa += PUD_SIZE)
                        rmpopt((void *)(pa | RMPOPT_FUNC_VERIFY_AND_REPORT_STATUS));

                for (pa = pa_start; pa < pa_end; pa += PUD_SIZE) {
                        /* Bit zero passes the function to the RMPOPT instruction. */
                        on_each_cpu_mask(cpus, rmpopt,
                                         (void *)(pa | RMPOPT_FUNC_VERIFY_AND_REPORT_STATUS),
                                         true);                       
                }
                end = ktime_get();

                elapsed_ns = ktime_to_ns(ktime_sub(end, start));
		...

With these optimizations applied:

When this loop is executed when an SNP guest terminates, again with 2 active SNP VMs with 100GB guest RAM:

[  363.926595] SEV-SNP: RMPOPT execution time 317016656 ns for physical address range 0x0000000000000000 - 0x0000020000000000 on all cpus -> ~317 ms

[  365.415243] SEV-SNP: RMPOPT execution time 369659769 ns for physical address range 0x0000000000000000 - 0x0000020000000000 on all cpus -> ~369 ms.

So, with these two optimizations applied, there is like a ~16-20% performance improvement (when SNP guest terminates) in the execution of this loop
which is executing RMPOPT on upto 2TB of RAM on all CPUs.

Any thoughts, feedback on the performance numbers ? 

Ideally we should be issuing RMPOPTs to only optimize the 1G regions that contained memory associated with that guest and that should be 
significantly less than the whole 2TB RAM range. 

But that is something we planned for 1GB hugetlb guest_memfd support getting merged and which i believe has dependency on:
1). in-place conversion for guest_memfd, 
2). 2M hugepage support for guest_memfd and finally 
3). 1GB hugeTLB support for guest_memfd.

The other alternative probably will be to use Dave's suggestions to loosely mirror the RMPOPT bitmap and
keep our own bitmap of 1GB regions that _need_ RMPOPT run on them and probably this bitmap lives in
guest_memfd and we track when they are being freed and then issue RMPOPT on those 1GB regions
(and this will be independent of the 1GB hugeTLB support for guest_memfd).

Thanks,
Ashish

  reply	other threads:[~2026-03-11 21:24 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02 21:35 [PATCH v2 0/7] Add RMPOPT support Ashish Kalra
2026-03-02 21:35 ` [PATCH v2 1/7] x86/cpufeatures: Add X86_FEATURE_AMD_RMPOPT feature flag Ashish Kalra
2026-03-02 23:00   ` Dave Hansen
2026-03-05 12:36   ` Borislav Petkov
2026-03-02 21:35 ` [PATCH v2 2/7] x86/sev: add support for enabling RMPOPT Ashish Kalra
2026-03-02 22:32   ` Dave Hansen
2026-03-02 22:55     ` Kalra, Ashish
2026-03-02 23:00       ` Dave Hansen
2026-03-02 23:11         ` Kalra, Ashish
2026-03-02 22:33   ` Dave Hansen
2026-03-06 15:18   ` Borislav Petkov
2026-03-06 15:33     ` Tom Lendacky
2026-03-02 21:36 ` [PATCH v2 3/7] x86/sev: add support for RMPOPT instruction Ashish Kalra
2026-03-02 22:57   ` Dave Hansen
2026-03-02 23:09     ` Kalra, Ashish
2026-03-02 23:15       ` Dave Hansen
2026-03-04 15:56     ` Andrew Cooper
2026-03-04 16:03       ` Dave Hansen
2026-03-25 21:53       ` Kalra, Ashish
2026-03-26  0:40         ` Andrew Cooper
2026-03-26  2:02           ` Kalra, Ashish
2026-03-26  2:14             ` Kalra, Ashish
2026-03-04 15:01   ` Sean Christopherson
2026-03-04 15:25     ` Dave Hansen
2026-03-04 15:32       ` Dave Hansen
2026-03-05  1:40       ` Kalra, Ashish
2026-03-05 19:22         ` Kalra, Ashish
2026-03-05 19:40           ` Dave Hansen
2026-03-11 21:24             ` Kalra, Ashish [this message]
2026-03-11 22:20               ` Dave Hansen
2026-03-16 19:03                 ` Kalra, Ashish
2026-03-18 14:00                   ` Dave Hansen
2026-03-02 21:36 ` [PATCH v2 4/7] x86/sev: Add interface to re-enable RMP optimizations Ashish Kalra
2026-03-02 21:36 ` [PATCH v2 5/7] KVM: guest_memfd: Add cleanup interface for guest teardown Ashish Kalra
2026-03-09  9:01   ` Ackerley Tng
2026-03-10 22:18     ` Kalra, Ashish
2026-03-11  6:00       ` Ackerley Tng
2026-03-11 21:49         ` Kalra, Ashish
2026-03-27 17:16           ` Ackerley Tng
2026-03-02 21:37 ` [PATCH v2 6/7] KVM: SEV: Implement SEV-SNP specific guest cleanup Ashish Kalra
2026-03-02 21:37 ` [PATCH v2 7/7] x86/sev: Add debugfs support for RMPOPT Ashish Kalra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc9bf918-a14b-4619-a084-3f424fa16ea1@amd.com \
    --to=ashish.kalra@amd.com \
    --cc=KPrateek.Nayak@amd.com \
    --cc=Michael.Roth@amd.com \
    --cc=Nathan.Fontenot@amd.com \
    --cc=Tycho.Andersen@amd.com \
    --cc=aik@amd.com \
    --cc=ardb@kernel.org \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=darwi@linutronix.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dyoung@redhat.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=hpa@zytor.com \
    --cc=jackyli@google.com \
    --cc=jacobhxu@google.com \
    --cc=john.allen@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nikunj@amd.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pgonda@google.com \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=tglx@kernel.org \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    --cc=xin@zytor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox