From: Sean Christopherson <seanjc@google.com>
To: Vishal Annapurve <vannapurve@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>,
David Hildenbrand <david@redhat.com>,
Patrick Roy <patrick.roy@linux.dev>,
Fuad Tabba <tabba@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Nikita Kalyazin <kalyazin@amazon.co.uk>,
shivankg@amd.com
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
Date: Wed, 1 Oct 2025 17:04:20 -0700 [thread overview]
Message-ID: <aN3BhKZkCC4-iphM@google.com> (raw)
In-Reply-To: <CAGtprH-5NWVVyEM63ou4XjG4JmF2VYNakoFkwFwNR1AnJmiDpA@mail.gmail.com>
On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> On Wed, Oct 1, 2025 at 10:16 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > > >
> > > > > > Oh! This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > > > KVM_CAP_GUEST_MEMFD_MMAP. Two things:
> > > > > >
> > > > > > 1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > > > > that we don't need to add a capability every time a new flag comes along,
> > > > > > and so that userspace can gather all flags in a single ioctl. If gmem ever
> > > > > > supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > > > > that's a non-issue relatively speaking.
> > > > > >
> > > > >
> > > > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > > > KVM_CAP_GUEST_MEMFD_CAPS.
> > > >
> > > > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > > > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > > > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> > >
> > > Ah, ok. Then do you envision the guest_memfd caps to still be separate
> > > KVM caps per guest_memfd feature?
> >
> > Yes? No? It depends on the feature and the actual implementation. E.g.
> > KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.
>
> I think I am confused. Is the proposal here as follows?
> * Use KVM_CAP_GUEST_MEMFD_FLAGS for features that map to guest_memfd
> creation flags.
No, the proposal is to use KVM_CAP_GUEST_MEMFD_FLAGS to enumerate the set of
supported KVM_CREATE_GUEST_MEMFD flags. Whether or not there is an associated
"feature" is irrelevant. I.e. it's a very literal "these are the supported
flags".
> * Use KVM caps for guest_memfd features that don't map to any flags.
>
> I think in general it would be better to have a KVM cap for each
> feature irrespective of the flags as the feature may also need
^^^
> additional UAPIs like IOCTLs.
If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
a CAP is gross overkill. Even if there are other assets that accompany the new
flag, there's no reason we couldn't say "this feature exist if XYZ flag is
supported".
E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
far beyond being able to create a VM of type KVM_X86_TDX_VM.
KVM_CAP_XEN_HVM is a big collection of flags that have very little in common other
than being for Xen emulation.
> I fail to see the benefits of KVM_CAP_GUEST_MEMFD_FLAGS over
> KVM_CAP_GUEST_MEMFD_MMAP:
Adding a new flag doesn't require all of the things that come along with a new
capability. E.g. there's zero chance of collisions between maintainer sub-trees
(at least as far as capabilities go; if multiple maintainers are adding multiple
gmem flags in the same kernel release, I really hope they'd be coordinating).
Enumerating in userspace is also more natural, e.g. userspace doesn't have to
manually build the mask of valid flags.
Writing documentation should be much easier (much less boilerplate), e.g. the
sum total of uAPI for adding GUEST_MEMFD_FLAG_INIT_SHARED is:
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7ba92f2ced38..754b662a453c 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6438,6 +6438,11 @@ specified via KVM_CREATE_GUEST_MEMFD. Currently defined flags:
============================ ================================================
GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
descriptor.
+ GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
+ KVM_CREATE_GUEST_MEMFD (memory files created
+ without INIT_SHARED will be marked private).
+ Shared memory can be faulted into host userspace
+ page tables. Private memory cannot.
============================ ================================================
When the KVM MMU performs a PFN lookup to service a guest fault and the backing
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b1d52d0c56ec..52f6000ab020 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
-#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
+#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
+#define GUEST_MEMFD_FLAG_INIT_SHARED (1ULL << 1)
struct kvm_create_guest_memfd {
__u64 size;
> 1) It limits the possible values to 32 even though we could pass 64 flags to
> the original ioctl.
So because we're currently limited to 32 flags, we should instead throw in the
towel and artificially limit ourselves to 1 flag (0 or 1)? Because for all intents
and purposes, that's what we'd be doing.
Again, that is unlikely to be problematic before I retire. It might not even be
a problem _ever_, because with luck we'll kill off 32-bit KVM in the next few
years and then we can actually leverage returning a "long" from ioctls. Literally
every capability that returns a mask of flags has this "problem"; it's not notable
or even an issue in practice.
> 2) Userspace has to anyways assume the semantics of each bit position.
Not always.
uint64_t valid_flags = vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS);
uint64_t flag;
int fd;
for (flag = BIT(0); flag; flag <<= 1) {
fd = __vm_create_guest_memfd(vm, page_size, flag);
if (flag & valid_flags) {
TEST_ASSERT(fd >= 0,
"guest_memfd() with flag '0x%lx' should succeed",
flag);
close(fd);
} else {
TEST_ASSERT(fd < 0 && errno == EINVAL,
"guest_memfd() with flag '0x%lx' should fail with EINVAL",
flag);
}
}
But pedantry aside, I don't see how this is at all an interesting point. Yes,
userspace has to know how to use a feature.
> 3) Userspace still has to check for caps for features that carry extra
> UAPI baggage.
That's simply not true. E.g. see the example with VM types.
> KVM_CAP_GUEST_MEMFD_MMAP allows userspace to assume that mmap is
> supported and userspace can just pass in the mmap flag that it anyways
> has to assume.
next prev parent reply other threads:[~2025-10-02 0:04 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
2025-09-29 8:38 ` David Hildenbrand
2025-09-29 8:57 ` Fuad Tabba
2025-09-29 9:01 ` David Hildenbrand
2025-09-29 9:04 ` Fuad Tabba
2025-09-29 9:43 ` Ackerley Tng
2025-09-29 10:15 ` Patrick Roy
2025-09-29 10:22 ` David Hildenbrand
2025-09-29 10:51 ` Ackerley Tng
2025-09-29 16:55 ` Sean Christopherson
2025-09-30 0:15 ` Sean Christopherson
2025-09-30 8:36 ` Ackerley Tng
2025-10-01 14:22 ` Vishal Annapurve
2025-10-01 16:15 ` Sean Christopherson
2025-10-01 16:31 ` Vishal Annapurve
2025-10-01 17:16 ` Sean Christopherson
2025-10-01 22:13 ` Vishal Annapurve
2025-10-02 0:04 ` Sean Christopherson [this message]
2025-10-02 15:41 ` Vishal Annapurve
2025-10-03 0:12 ` Sean Christopherson
2025-10-03 4:10 ` Vishal Annapurve
2025-10-03 16:13 ` Sean Christopherson
2025-10-03 20:30 ` Vishal Annapurve
2025-09-29 16:54 ` Sean Christopherson
2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
2025-09-29 9:12 ` Fuad Tabba
2025-09-29 9:17 ` David Hildenbrand
2025-09-29 10:56 ` Ackerley Tng
2025-09-29 16:58 ` Sean Christopherson
2025-09-30 6:52 ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
2025-09-29 9:18 ` David Hildenbrand
2025-09-29 9:24 ` Fuad Tabba
2025-09-29 11:02 ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
2025-09-29 9:21 ` David Hildenbrand
2025-09-29 9:24 ` Fuad Tabba
2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
2025-09-29 9:24 ` Fuad Tabba
2025-09-29 9:28 ` David Hildenbrand
2025-09-29 11:08 ` Ackerley Tng
2025-09-29 17:32 ` Sean Christopherson
2025-09-30 7:09 ` Ackerley Tng
2025-09-30 14:24 ` Sean Christopherson
2025-10-01 10:18 ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
2025-09-29 9:24 ` Fuad Tabba
2025-09-29 9:28 ` David Hildenbrand
2025-09-29 14:38 ` Ackerley Tng
2025-09-29 18:10 ` Sean Christopherson
2025-09-29 18:35 ` Sean Christopherson
2025-09-30 7:53 ` Ackerley Tng
2025-09-30 14:58 ` Sean Christopherson
2025-10-01 10:26 ` Ackerley Tng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aN3BhKZkCC4-iphM@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=borntraeger@linux.ibm.com \
--cc=david@redhat.com \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=kalyazin@amazon.co.uk \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=patrick.roy@linux.dev \
--cc=pbonzini@redhat.com \
--cc=shivankg@amd.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox