From: Sean Christopherson <seanjc@google.com>
To: Vishal Annapurve <vannapurve@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>,
David Hildenbrand <david@redhat.com>,
Patrick Roy <patrick.roy@linux.dev>,
Fuad Tabba <tabba@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Nikita Kalyazin <kalyazin@amazon.co.uk>,
shivankg@amd.com
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
Date: Thu, 2 Oct 2025 17:12:09 -0700 [thread overview]
Message-ID: <aN8U2c8KMXTy6h9Q@google.com> (raw)
In-Reply-To: <CAGtprH_evo=nyk1B6ZRdKJXX2s7g1W8dhwJhEPJkG=o2ORU48g@mail.gmail.com>
On Thu, Oct 02, 2025, Vishal Annapurve wrote:
> On Wed, Oct 1, 2025 at 5:04 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > On Wed, Oct 1, 2025 at 10:16 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > > > > >
> > > > > > > > Oh! This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > > > > > KVM_CAP_GUEST_MEMFD_MMAP. Two things:
> > > > > > > >
> > > > > > > > 1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > > > > > > that we don't need to add a capability every time a new flag comes along,
> > > > > > > > and so that userspace can gather all flags in a single ioctl. If gmem ever
> > > > > > > > supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > > > > > > that's a non-issue relatively speaking.
> > > > > > > >
> > > > > > >
> > > > > > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > > > > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > > > > > KVM_CAP_GUEST_MEMFD_CAPS.
> > > > > >
> > > > > > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > > > > > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > > > > > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> > > > >
> > > > > Ah, ok. Then do you envision the guest_memfd caps to still be separate
> > > > > KVM caps per guest_memfd feature?
> > > >
> > > > Yes? No? It depends on the feature and the actual implementation. E.g.
> > > > KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.
> > >
> > > I think I am confused. Is the proposal here as follows?
> > > * Use KVM_CAP_GUEST_MEMFD_FLAGS for features that map to guest_memfd
> > > creation flags.
> >
> > No, the proposal is to use KVM_CAP_GUEST_MEMFD_FLAGS to enumerate the set of
> > supported KVM_CREATE_GUEST_MEMFD flags. Whether or not there is an associated
> > "feature" is irrelevant. I.e. it's a very literal "these are the supported
> > flags".
> >
> > > * Use KVM caps for guest_memfd features that don't map to any flags.
> > >
> > > I think in general it would be better to have a KVM cap for each
> > > feature irrespective of the flags as the feature may also need
> > ^^^
> > > additional UAPIs like IOCTLs.
> >
> > If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
> > a CAP is gross overkill. Even if there are other assets that accompany the new
> > flag, there's no reason we couldn't say "this feature exist if XYZ flag is
> > supported".
> >
> > E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
> > KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
> > far beyond being able to create a VM of type KVM_X86_TDX_VM.
> >
>
> What's your opinion about having KVM_CAP_GUEST_MEMFD_MMAP part of
> KVM_CAP_GUEST_MEMFD_CAPS i.e. having a KVM cap covering all features
> of guest_memfd?
I'd much prefer to have both. Describing flags for an ioctl via a bitmask that
doesn't *exactly* match the flags is asking for problems. At best, it will be
confusing. E.g. we'll probably end up with code like this:
gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);
if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
if (gmem_caps & KVM_CAP_GUEST_MEMFD_INIT_SHARED)
gmem_flags |= KVM_CAP_GUEST_MEMFD_INIT_SHARED;
Those types of patterns often lead to typos causing problems (LOL, case in point,
there's a typo above; I'm leaving it to illustrate my point). That can be largely
solved by userspace via macro shenanigans, but userspace really shouldn't have to
jump through hoops for such a simple thing.
An ever worse outcome is if userspace does something like:
gmem_flags = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);
Which might actually work initially, e.g. if KVM_CAP_GUEST_MEMFD_MMAP and
GUEST_MEMFD_FLAG_MMAP have the same value. But eventually userspace will be sad.
Another issue is that, while unlikely, we could run out of KVM_CAP_GUEST_MEMFD_CAPS
bits before we run out of flags.
And if we use memory attributes, we're also guaranteed to have at least one gmem
capability that returns a bitmask separately from a dedicated one-size-fits-all
cap, e.g.
case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
if (vm_memory_attributes)
return 0;
return kvm_supported_mem_attributes(kvm);
Side topic, looking at this, I don't think we need KVM_CAP_GUEST_MEMFD_CAPS, I'm
pretty sure we can simply extend KVM_CAP_GUEST_MEMFD. E.g.
#define KVM_GUEST_MEMFD_FEAT_BASIC (1ULL << 0)
#define KVM_GUEST_MEMFD_FEAT_FANCY (1ULL << 1)
case KVM_CAP_GUEST_MEMFD:
return KVM_GUEST_MEMFD_FEAT_BASIC |
KVM_GUEST_MEMFD_FEAT_FANCY;
> That seems more consistent to me in order for userspace to deduce the
> supported features and assume flags/ioctls/... associated with the feature
> as a group.
If we add a feature that comes with a flag, we could always add both, i.e. a
feature flag for KVM_CAP_GUEST_MEMFD along with the natural enumeration for
KVM_CAP_GUEST_MEMFD_FLAGS. That certainly wouldn't be my first choice, but it's
a possibility, e.g. if it really is the most intuitive solution. But that's
getting quite hypothetical.
next prev parent reply other threads:[~2025-10-03 0:12 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
2025-09-29 8:38 ` David Hildenbrand
2025-09-29 8:57 ` Fuad Tabba
2025-09-29 9:01 ` David Hildenbrand
2025-09-29 9:04 ` Fuad Tabba
2025-09-29 9:43 ` Ackerley Tng
2025-09-29 10:15 ` Patrick Roy
2025-09-29 10:22 ` David Hildenbrand
2025-09-29 10:51 ` Ackerley Tng
2025-09-29 16:55 ` Sean Christopherson
2025-09-30 0:15 ` Sean Christopherson
2025-09-30 8:36 ` Ackerley Tng
2025-10-01 14:22 ` Vishal Annapurve
2025-10-01 16:15 ` Sean Christopherson
2025-10-01 16:31 ` Vishal Annapurve
2025-10-01 17:16 ` Sean Christopherson
2025-10-01 22:13 ` Vishal Annapurve
2025-10-02 0:04 ` Sean Christopherson
2025-10-02 15:41 ` Vishal Annapurve
2025-10-03 0:12 ` Sean Christopherson [this message]
2025-10-03 4:10 ` Vishal Annapurve
2025-10-03 16:13 ` Sean Christopherson
2025-10-03 20:30 ` Vishal Annapurve
2025-09-29 16:54 ` Sean Christopherson
2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
2025-09-29 9:12 ` Fuad Tabba
2025-09-29 9:17 ` David Hildenbrand
2025-09-29 10:56 ` Ackerley Tng
2025-09-29 16:58 ` Sean Christopherson
2025-09-30 6:52 ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
2025-09-29 9:18 ` David Hildenbrand
2025-09-29 9:24 ` Fuad Tabba
2025-09-29 11:02 ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
2025-09-29 9:21 ` David Hildenbrand
2025-09-29 9:24 ` Fuad Tabba
2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
2025-09-29 9:24 ` Fuad Tabba
2025-09-29 9:28 ` David Hildenbrand
2025-09-29 11:08 ` Ackerley Tng
2025-09-29 17:32 ` Sean Christopherson
2025-09-30 7:09 ` Ackerley Tng
2025-09-30 14:24 ` Sean Christopherson
2025-10-01 10:18 ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
2025-09-29 9:24 ` Fuad Tabba
2025-09-29 9:28 ` David Hildenbrand
2025-09-29 14:38 ` Ackerley Tng
2025-09-29 18:10 ` Sean Christopherson
2025-09-29 18:35 ` Sean Christopherson
2025-09-30 7:53 ` Ackerley Tng
2025-09-30 14:58 ` Sean Christopherson
2025-10-01 10:26 ` Ackerley Tng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aN8U2c8KMXTy6h9Q@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=borntraeger@linux.ibm.com \
--cc=david@redhat.com \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=kalyazin@amazon.co.uk \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=patrick.roy@linux.dev \
--cc=pbonzini@redhat.com \
--cc=shivankg@amd.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox