From: Adrian Hunter <adrian.hunter@intel.com>
To: Vishal Annapurve <vannapurve@google.com>
Cc: <pbonzini@redhat.com>, <seanjc@google.com>, <kvm@vger.kernel.org>,
<rick.p.edgecombe@intel.com>, <kirill.shutemov@linux.intel.com>,
<kai.huang@intel.com>, <reinette.chatre@intel.com>,
<xiaoyao.li@intel.com>, <tony.lindgren@linux.intel.com>,
<binbin.wu@linux.intel.com>, <isaku.yamahata@intel.com>,
<linux-kernel@vger.kernel.org>, <yan.y.zhao@intel.com>,
<chao.gao@intel.com>
Subject: Re: [PATCH RFC] KVM: TDX: Defer guest memory removal to decrease shutdown time
Date: Thu, 27 Mar 2025 12:10:05 +0200 [thread overview]
Message-ID: <8d0a4585-9e48-4e8d-8acb-7cb99142654c@intel.com> (raw)
In-Reply-To: <CAGtprH_o_Vbvk=jONSep64wRhAJ+Y51uZfX7-DDS28vh=ALQOA@mail.gmail.com>
On 27/03/25 10:14, Vishal Annapurve wrote:
> On Thu, Mar 13, 2025 at 11:17 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>> ...
>> == Problem ==
>>
>> Currently, Dynamic Page Removal is being used when the TD is being
>> shutdown for the sake of having simpler initial code.
>>
>> This happens when guest_memfds are closed, refer kvm_gmem_release().
>> guest_memfds hold a reference to struct kvm, so that VM destruction cannot
>> happen until after they are released, refer kvm_gmem_release().
>>
>> Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total
>> reclaim time. For example:
>>
>> VCPUs Size (GB) Before (secs) After (secs)
>> 4 18 72 24
>> 32 107 517 134
>
> If the time for reclaim grows linearly with memory size, then this is
> a significantly high value for TD cleanup (~21 minutes for a 1TB VM).
>
>>
>> Note, the V19 patch set:
>>
>> https://lore.kernel.org/all/cover.1708933498.git.isaku.yamahata@intel.com/
>>
>> did not have this issue because the HKID was released early, something that
>> Sean effectively NAK'ed:
>>
>> "No, the right answer is to not release the HKID until the VM is
>> destroyed."
>>
>> https://lore.kernel.org/all/ZN+1QHGa6ltpQxZn@google.com/
>
> IIUC, Sean is suggesting to treat S-EPT page removal and page reclaim
> separately. Through his proposal:
Thanks for looking at this!
It seems I am using the term "reclaim" wrongly. Sorry!
I am talking about taking private memory away from the guest,
not what happens to it subsequently. When the TDX VM is in "Runnable"
state, taking private memory away is slow (slow S-EPT removal).
When the TDX VM is in "Teardown" state, taking private memory away
is faster (a TDX SEAMCALL named TDH.PHYMEM.PAGE.RECLAIM which is where
I picked up the term "reclaim")
Once guest memory is removed from S-EPT, further action is not
needed to reclaim it. It belongs to KVM at that point.
guest_memfd memory can be added directly to S-EPT. No intermediate
state or step is used. Any guest_memfd memory not given to the
MMU (S-EPT), can be freed directly if userspace/KVM wants to.
Again there is no intermediate state or (reclaim) step.
> 1) If userspace drops last reference on gmem inode before/after
> dropping the VM reference
> -> slow S-EPT removal and slow page reclaim
Currently slow S-EPT removal happens when the file is released.
> 2) If memslots are removed before closing the gmem and dropping the VM reference
> -> slow S-EPT page removal and no page reclaim until the gmem is around.
>
> Reclaim should ideally happen when the host wants to use that memory
> i.e. for following scenarios:
> 1) Truncation of private guest_memfd ranges
> 2) Conversion of private guest_memfd ranges to shared when supporting
> in-place conversion (Could be deferred to the faulting in as shared as
> well).
>
> Would it be possible for you to provide the split of the time spent in
> slow S-EPT page removal vs page reclaim?
Based on what I wrote above, all the time is spent removing pages
from S-EPT. Greater that 99% of shutdown time is kvm_gmem_release().
>
> It might be worth exploring the possibility of parallelizing or giving
> userspace the flexibility to parallelize both these operations to
> bring the cleanup time down (to be comparable with non-confidential VM
> cleanup time for example).
next prev parent reply other threads:[~2025-03-27 10:10 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-13 18:16 [PATCH RFC] KVM: TDX: Defer guest memory removal to decrease shutdown time Adrian Hunter
2025-03-13 18:39 ` Paolo Bonzini
2025-03-13 19:07 ` Adrian Hunter
2025-03-17 8:13 ` Kirill A. Shutemov
2025-03-18 15:41 ` Adrian Hunter
2025-03-18 21:11 ` Vishal Annapurve
2025-03-20 0:42 ` Vishal Annapurve
2025-03-24 10:40 ` Adrian Hunter
2025-03-27 8:14 ` Vishal Annapurve
2025-03-27 10:10 ` Adrian Hunter [this message]
2025-03-27 15:54 ` Sean Christopherson
2025-03-28 10:26 ` Adrian Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8d0a4585-9e48-4e8d-8acb-7cb99142654c@intel.com \
--to=adrian.hunter@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=chao.gao@intel.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=tony.lindgren@linux.intel.com \
--cc=vannapurve@google.com \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox