From: Michael Roth <michael.roth@amd.com>
To: <kvm@vger.kernel.org>
Cc: <linux-coco@lists.linux.dev>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, <jroedel@suse.de>,
<thomas.lendacky@amd.com>, <pbonzini@redhat.com>,
<seanjc@google.com>, <vbabka@suse.cz>, <amit.shah@amd.com>,
<pratikrajesh.sampat@amd.com>, <ashish.kalra@amd.com>,
<liam.merwick@oracle.com>, <david@redhat.com>,
<vannapurve@google.com>, <ackerleytng@google.com>,
<quic_eberman@quicinc.com>
Subject: [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes
Date: Thu, 12 Dec 2024 00:36:30 -0600 [thread overview]
Message-ID: <20241212063635.712877-1-michael.roth@amd.com> (raw)
This patchset is also available at:
https://github.com/amdese/linux/commits/snp-prepare-thp-rfc1
and is based on top of Paolo's kvm-coco-queue-2024-11 tag which includes
a snapshot of his patches[1] to provide tracking of whether or not
sub-pages of a huge folio need to have kvm_arch_gmem_prepare() hooks issued
before guest access:
d55475f23cea KVM: gmem: track preparedness a page at a time
64b46ca6cd6d KVM: gmem: limit hole-punching to ranges within the file
17df70a5ea65 KVM: gmem: add a complete set of functions to query page preparedness
e3449f6841ef KVM: gmem: allocate private data for the gmem inode
[1] https://lore.kernel.org/lkml/20241108155056.332412-1-pbonzini@redhat.com/
This series addresses some of the pending review comments for those patches
(feel free to squash/rework as-needed), and implements a first real user in
the form of a reworked version of Sean's original 2MB THP support for gmem.
It is still a bit up in the air as to whether or not gmem should support
THP at all rather than moving straight to 2MB/1GB hugepages in the form of
something like HugeTLB folios[2] or the lower-level PFN range allocator
presented by Yu Zhao during the guest_memfd call last week. The main
arguments against THP, as I understand it, is that THPs will become
split over time due to hole-punching and rarely have an opportunity to get
rebuilt due to lack of memory migration support for current CoCo hypervisor
implementations like SNP (and adding the migration support to resolve that
not necessarily resulting in a net-gain performance-wise). The current
plan for SNP, as discussed during the first guest_memfd call, is to
implement something similar to 2MB HugeTLB, and disallow hole-punching
at sub-2MB granularity.
However, there have also been some discussions during recent PUCK calls
where the KVM maintainers have some still expressed some interest in pulling
in gmem THP support in a more official capacity. The thinking there is that
hole-punching is a userspace policy, and that it could in theory avoid
holepunching for sub-2MB GFN ranges to avoid degradation over time.
And if there's a desire to enforce this from the kernel-side by blocking
sub-2MB hole-punching from the host-side, this would provide similar
semantics/behavior to the 2MB HugeTLB-like approach above.
So maybe there is still some room for discussion about these approaches.
Outside that, there are a number of other development areas where it would
be useful to at least have some experimental 2MB support in place so that
those efforts can be pursued in parallel, such as the preparedness
tracking touched on here, and exploring how that will intersect with other
development areas like using gmem for both shared and private memory, mmap
support, guest_memfd library, etc., so my hopes are that this approach
could be useful for that purpose at least, even if only as an out-of-tree
stop-gap.
Thoughts/comments welcome!
[2] https://lore.kernel.org/all/cover.1728684491.git.ackerleytng@google.com/
Testing
-------
Currently, this series does not default to enabling 2M support, but it
can instead be switched on/off dynamically via a module parameter:
echo 1 >/sys/module/kvm/parameters/gmem_2m_enabled
echo 0 >/sys/module/kvm/parameters/gmem_2m_enabled
This can be useful for simulating things like host pressure where we start
getting a mix of 4K/2MB allocations. I've used this to help test that the
preparedness-tracking still handles things properly in these situations.
But if we do decide to pull in THP support upstream it would make more
sense to drop the parameter completely.
----------------------------------------------------------------
Michael Roth (4):
KVM: gmem: Don't rely on __kvm_gmem_get_pfn() for preparedness
KVM: gmem: Don't clear pages that have already been prepared
KVM: gmem: Hold filemap invalidate lock while allocating/preparing folios
KVM: SEV: Improve handling of large ranges in gmem prepare callback
Sean Christopherson (1):
KVM: Add hugepage support for dedicated guest memory
arch/x86/kvm/svm/sev.c | 163 ++++++++++++++++++++++++++------------------
include/linux/kvm_host.h | 2 +
virt/kvm/guest_memfd.c | 173 ++++++++++++++++++++++++++++++++++-------------
virt/kvm/kvm_main.c | 4 ++
4 files changed, 228 insertions(+), 114 deletions(-)
next reply other threads:[~2024-12-12 6:37 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-12 6:36 Michael Roth [this message]
2024-12-12 6:36 ` [PATCH 1/5] KVM: gmem: Don't rely on __kvm_gmem_get_pfn() for preparedness Michael Roth
2025-01-22 14:39 ` Tom Lendacky
2025-02-20 1:12 ` Michael Roth
2024-12-12 6:36 ` [PATCH 2/5] KVM: gmem: Don't clear pages that have already been prepared Michael Roth
2024-12-12 6:36 ` [PATCH 3/5] KVM: gmem: Hold filemap invalidate lock while allocating/preparing folios Michael Roth
2025-03-14 9:20 ` Yan Zhao
2025-04-07 8:25 ` Yan Zhao
2025-04-23 20:30 ` Ackerley Tng
2025-05-19 17:04 ` Ackerley Tng
2025-05-21 6:46 ` Yan Zhao
2025-06-03 1:05 ` Vishal Annapurve
2025-06-03 1:31 ` Yan Zhao
2025-06-04 6:28 ` Vishal Annapurve
2025-06-12 12:40 ` Yan Zhao
2025-06-12 14:43 ` Vishal Annapurve
2025-07-03 6:29 ` Yan Zhao
2025-06-13 15:19 ` Michael Roth
2025-06-13 18:04 ` Michael Roth
2025-07-03 6:33 ` Yan Zhao
2024-12-12 6:36 ` [PATCH 4/5] KVM: SEV: Improve handling of large ranges in gmem prepare callback Michael Roth
2024-12-12 6:36 ` [PATCH 5/5] KVM: Add hugepage support for dedicated guest memory Michael Roth
2025-03-14 9:50 ` Yan Zhao
2024-12-20 11:31 ` [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes David Hildenbrand
2025-01-07 12:11 ` Shah, Amit
2025-01-22 14:25 ` David Hildenbrand
2025-03-14 9:09 ` Yan Zhao
2025-03-14 9:33 ` David Hildenbrand
2025-03-14 11:19 ` Yan Zhao
2025-03-18 2:24 ` Yan Zhao
2025-03-18 19:13 ` David Hildenbrand
2025-03-19 7:39 ` Yan Zhao
2025-02-11 1:16 ` Vishal Annapurve
2025-02-20 1:09 ` Michael Roth
2025-03-14 9:16 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241212063635.712877-1-michael.roth@amd.com \
--to=michael.roth@amd.com \
--cc=ackerleytng@google.com \
--cc=amit.shah@amd.com \
--cc=ashish.kalra@amd.com \
--cc=david@redhat.com \
--cc=jroedel@suse.de \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=pratikrajesh.sampat@amd.com \
--cc=quic_eberman@quicinc.com \
--cc=seanjc@google.com \
--cc=thomas.lendacky@amd.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).