Re: [PATCH v2 5/7] KVM: guest_memfd: Add cleanup interface for guest teardown

public inbox for linux-coco@lists.linux.dev
 help / color / mirror / Atom feed

From: "Kalra, Ashish" <ashish.kalra@amd.com>
To: Ackerley Tng <ackerleytng@google.com>,
	tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	seanjc@google.com, peterz@infradead.org, thomas.lendacky@amd.com,
	herbert@gondor.apana.org.au, davem@davemloft.net,
	ardb@kernel.org
Cc: pbonzini@redhat.com, aik@amd.com, Michael.Roth@amd.com,
	KPrateek.Nayak@amd.com, Tycho.Andersen@amd.com,
	Nathan.Fontenot@amd.com, jackyli@google.com, pgonda@google.com,
	rientjes@google.com, jacobhxu@google.com, xin@zytor.com,
	pawan.kumar.gupta@linux.intel.com, babu.moger@amd.com,
	dyoung@redhat.com, nikunj@amd.com, john.allen@amd.com,
	darwi@linutronix.de, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev
Subject: Re: [PATCH v2 5/7] KVM: guest_memfd: Add cleanup interface for guest teardown
Date: Wed, 11 Mar 2026 16:49:03 -0500	[thread overview]
Message-ID: <75cd28a5-fb51-47ae-97c7-191fe9a6e045@amd.com> (raw)
In-Reply-To: <CAEvNRgGdaA1ynF8jxQDPh9U0U8Q0RkE0=KJx4FNrh_=+dVRaLQ@mail.gmail.com>

Hello Ackerley,

On 3/11/2026 1:00 AM, Ackerley Tng wrote:
> "Kalra, Ashish" <ashish.kalra@amd.com> writes:
> 
>> Hello Ackerley,
>>
>> On 3/9/2026 4:01 AM, Ackerley Tng wrote:
>>> Ashish Kalra <Ashish.Kalra@amd.com> writes:
>>>
>>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>>
>>>> Introduce kvm_arch_gmem_cleanup() to perform architecture-specific
>>>> cleanups when the last file descriptor for the guest_memfd inode is
>>>> closed. This typically occurs during guest shutdown and termination
>>>> and allows for final resource release.
>>>>
>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>> ---
>>>>
>>>> [...snip...]
>>>>
>>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>>>> index 017d84a7adf3..2724dd1099f2 100644
>>>> --- a/virt/kvm/guest_memfd.c
>>>> +++ b/virt/kvm/guest_memfd.c
>>>> @@ -955,6 +955,14 @@ static void kvm_gmem_destroy_inode(struct inode *inode)
>>>>
>>>>  static void kvm_gmem_free_inode(struct inode *inode)
>>>>  {
>>>> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_CLEANUP
>>>> +	/*
>>>> +	 * Finalize cleanup for the inode once the last guest_memfd
>>>> +	 * reference is released. This usually occurs after guest
>>>> +	 * termination.
>>>> +	 */
>>>> +	kvm_arch_gmem_cleanup();
>>>> +#endif
>>>
>>> Folks have already talked about the performance implications of doing
>>> the scan and rmpopt, I just want to call out that one VM could have more
>>> than one associated guest_memfd too.
>>
>> Yes, i have observed that kvm_gmem_free_inode() gets invoked multiple times
>> at SNP guest shutdown.
>>
>> And the same is true for kvm_gmem_destroy_inode() too.
>>
>>>
>>> I think the cleanup function should be thought of as cleanup for the
>>> inode (even if it doesn't take an inode pointer since it's not (yet)
>>> required).
>>>
>>> So, the gmem cleanup function should not handle deduplicating cleanup
>>> requests, but the arch function should, if the cleanup needs
>>> deduplicating.
>>
>> I agree, the arch function will have to handle deduplicating,  and for that
>> the arch function will probably need to be passed the inode pointer,
>> to have a parameter to assist with deduplicating.
>>
> 
> By the time .free_folio() is called, folio->mapping may no longer exist,
> so if we definitely want to deduplicate using something in the inode,
> .free_folio() won't be the right callback to use.

Ok.

> 
> I was thinking that deduplicating using something in the folio would be
> better. Can rmpopt take a PFN range? Then there's really no
> deduplication, the cleanup would be nicely narrowed to whatever was just
> freed. Perhaps the PFNs could be aligned up to the nearest PMD or PUD
> size for rmpopt to do the right thing.
> 

It will really be ideal if the cleanup can be narrowed down to whatever was just freed.

RMPOPT takes a SPA which is GB aligned, so if the PFNs are aligned to the nearest
PUD, then RMPOPT will be perfectly aligned to optimize the 1G regions that contained
memory associated with that guest being freed.

This will also be the most optimal way to use RMPOPT, as we only optimize the 1G regions
that contains memory associated with that guest, which should be much smaller than
optimizing the whole 2TB RAM. 

And that's what the actual plans for RMPOPT are.

We had planned for a phased RMPOPT implementation. 

In the first phase, we were planning to do RMP re-optimizations for entire 2TB
RAM. 

Once 1GB hugetlb guest_memfd support is merged, we planned to support re-enabling
RMPOPT optimizations during 1GB page cleanup as a follow-on series.

But i believe this support is dependent on:
1). in-place conversion for guest_memfd, 
2). 2M hugepage support for guest_memfd.

Another alternative we are considering is implementing a bitmap of 1GB regions in guest_memfd
that tracks when they are being freed and then issue RMPOPT on those 1GB regions.
(and this will be independent of the 1GB hugeTLB support for guest_memfd).

> Or perhaps some more tracking is required to check that the entire
> aligned range is freed before doing the rmpopt.
> 
> I need to implement some of this tracking for guest_memfd HugeTLB
> support, so if the tracking is useful for you, we should discuss!

Yes, this tracking is going to be useful for RMPOPT. 

Is this going to be implemented as part of the 1GB hugeTLB support for guest_memfd ?

> 
>>>
>>> Also, .free_inode() is called through RCU, so it could be called after
>>> some delay. Could it be possible that .free_inode() ends up being called
>>> way after the associated VM gets torn down, or after KVM the module gets
>>> unloaded?  Does rmpopt still work fine if KVM the module got unloaded?
>>
>> Yes, .free_inode() can probably get called after the associated VM has
>> been torn down and which should be fine for issuing RMPOPT to do
>> RMP re-optimizations.
>>
>> As far as about KVM module getting unloaded, then as part of the forthcoming patch-series,
>> during KVM module unload, X86_SNP_SHUTDOWN would be issued which means SNP would get
>> disabled and therefore, RMP checks are also disabled.
>>
>> And as CC_ATTR_HOST_SEV_SNP would then be cleared, therefore, snp_perform_rmp_optimization()
>> will simply return.
>>
> 
> I think relying on CC_ATTR_HOST_SEV_SNP to skip optimization should be
> best as long as there are no races (like the .free_inode() will
> definitely not try to optimize when SNP is half shut down or something
> like that.

Yeah, i will have to take a look at such races.

> 
>> Another option is to add a new guest_memfd superblock operation, and then do the
>> final guest_memfd cleanup using the .evict_inode() callback. This will then ensure
>> that the cleanup is not called through RCU and avoids any kind of delays, as following:
>>
>> +static void kvm_gmem_evict_inode(struct inode *inode)
>> +{
>> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_CLEANUP
>> +        kvm_arch_gmem_cleanup();
>> +#endif
>> +       truncate_inode_pages_final(&inode->i_data);
>> +       clear_inode(inode);
>> +}
>> +
>>
> 
> At the point of .evict_inode(), CoCo-shared guest_memfd pages could
> still be pinned (for DMA or whatever, accidentally or maliciously), can
> rmpopt work on shared pages that might still be used for DMA?
> 

Yes, RMPOPT should be safe to work here, as it checks the RMP table for assigned
or private pages in the 1GB range specified. For a 1GB range full of shared pages,
it will mark that range to be RMP optimized.

If all RMPUPDATE's for all private->shared pages conversion have been completed at
the point of .evict_inode(), then RMPOPT re-optimizations will work nicely.

> .invalidate_folio() and .free_folio() both actually happen on removal
> from guest_memfd ownership, though both are not exactly when the folio
> is completely not in use.
> 
> Is the best time to optimize when the pages are truly freed?
> 

Yes.

Thanks,
Ashish

>> @@ -971,6 +979,7 @@ static const struct super_operations kvm_gmem_super_operations = {
>>         .alloc_inode    = kvm_gmem_alloc_inode,
>>         .destroy_inode  = kvm_gmem_destroy_inode,
>>         .free_inode     = kvm_gmem_free_inode,
>> +       .evict_inode    = kvm_gmem_evict_inode,
>>  };
>>
>>
>> Thanks,
>> Ashish
>>
>>>
>>> IIUC the current kmem_cache_free(kvm_gmem_inode_cachep, GMEM_I(inode));
>>> is fine because in kvm_gmem_exit(), there is a rcu_barrier() before
>>> kmem_cache_destroy(kvm_gmem_inode_cachep);.
>>>
>>>>  	kmem_cache_free(kvm_gmem_inode_cachep, GMEM_I(inode));
>>>>  }
>>>>
>>>> --
>>>> 2.43.0

next prev parent reply	other threads:[~2026-03-11 21:49 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02 21:35 [PATCH v2 0/7] Add RMPOPT support Ashish Kalra
2026-03-02 21:35 ` [PATCH v2 1/7] x86/cpufeatures: Add X86_FEATURE_AMD_RMPOPT feature flag Ashish Kalra
2026-03-02 23:00   ` Dave Hansen
2026-03-05 12:36   ` Borislav Petkov
2026-03-02 21:35 ` [PATCH v2 2/7] x86/sev: add support for enabling RMPOPT Ashish Kalra
2026-03-02 22:32   ` Dave Hansen
2026-03-02 22:55     ` Kalra, Ashish
2026-03-02 23:00       ` Dave Hansen
2026-03-02 23:11         ` Kalra, Ashish
2026-03-02 22:33   ` Dave Hansen
2026-03-06 15:18   ` Borislav Petkov
2026-03-06 15:33     ` Tom Lendacky
2026-03-02 21:36 ` [PATCH v2 3/7] x86/sev: add support for RMPOPT instruction Ashish Kalra
2026-03-02 22:57   ` Dave Hansen
2026-03-02 23:09     ` Kalra, Ashish
2026-03-02 23:15       ` Dave Hansen
2026-03-04 15:56     ` Andrew Cooper
2026-03-04 16:03       ` Dave Hansen
2026-03-25 21:53       ` Kalra, Ashish
2026-03-26  0:40         ` Andrew Cooper
2026-03-26  2:02           ` Kalra, Ashish
2026-03-26  2:14             ` Kalra, Ashish
2026-03-04 15:01   ` Sean Christopherson
2026-03-04 15:25     ` Dave Hansen
2026-03-04 15:32       ` Dave Hansen
2026-03-05  1:40       ` Kalra, Ashish
2026-03-05 19:22         ` Kalra, Ashish
2026-03-05 19:40           ` Dave Hansen
2026-03-11 21:24             ` Kalra, Ashish
2026-03-11 22:20               ` Dave Hansen
2026-03-16 19:03                 ` Kalra, Ashish
2026-03-18 14:00                   ` Dave Hansen
2026-03-02 21:36 ` [PATCH v2 4/7] x86/sev: Add interface to re-enable RMP optimizations Ashish Kalra
2026-03-02 21:36 ` [PATCH v2 5/7] KVM: guest_memfd: Add cleanup interface for guest teardown Ashish Kalra
2026-03-09  9:01   ` Ackerley Tng
2026-03-10 22:18     ` Kalra, Ashish
2026-03-11  6:00       ` Ackerley Tng
2026-03-11 21:49         ` Kalra, Ashish [this message]
2026-03-27 17:16           ` Ackerley Tng
2026-03-02 21:37 ` [PATCH v2 6/7] KVM: SEV: Implement SEV-SNP specific guest cleanup Ashish Kalra
2026-03-02 21:37 ` [PATCH v2 7/7] x86/sev: Add debugfs support for RMPOPT Ashish Kalra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75cd28a5-fb51-47ae-97c7-191fe9a6e045@amd.com \
    --to=ashish.kalra@amd.com \
    --cc=KPrateek.Nayak@amd.com \
    --cc=Michael.Roth@amd.com \
    --cc=Nathan.Fontenot@amd.com \
    --cc=Tycho.Andersen@amd.com \
    --cc=ackerleytng@google.com \
    --cc=aik@amd.com \
    --cc=ardb@kernel.org \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=darwi@linutronix.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dyoung@redhat.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=hpa@zytor.com \
    --cc=jackyli@google.com \
    --cc=jacobhxu@google.com \
    --cc=john.allen@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nikunj@amd.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pgonda@google.com \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=tglx@kernel.org \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    --cc=xin@zytor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox