All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Vishal Annapurve <vannapurve@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>,
	David Hildenbrand <david@redhat.com>,
	 Patrick Roy <patrick.roy@linux.dev>,
	Fuad Tabba <tabba@google.com>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	 Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org,
	Nikita Kalyazin <kalyazin@amazon.co.uk>,
	shivankg@amd.com
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
Date: Thu, 2 Oct 2025 17:12:09 -0700	[thread overview]
Message-ID: <aN8U2c8KMXTy6h9Q@google.com> (raw)
In-Reply-To: <CAGtprH_evo=nyk1B6ZRdKJXX2s7g1W8dhwJhEPJkG=o2ORU48g@mail.gmail.com>

On Thu, Oct 02, 2025, Vishal Annapurve wrote:
> On Wed, Oct 1, 2025 at 5:04 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > On Wed, Oct 1, 2025 at 10:16 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > > > > >
> > > > > > > > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > > > > > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> > > > > > > >
> > > > > > > >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > > > > > >     that we don't need to add a capability every time a new flag comes along,
> > > > > > > >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> > > > > > > >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > > > > > >     that's a non-issue relatively speaking.
> > > > > > > >
> > > > > > >
> > > > > > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > > > > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > > > > > KVM_CAP_GUEST_MEMFD_CAPS.
> > > > > >
> > > > > > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > > > > > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > > > > > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> > > > >
> > > > > Ah, ok. Then do you envision the guest_memfd caps to still be separate
> > > > > KVM caps per guest_memfd feature?
> > > >
> > > > Yes?  No?  It depends on the feature and the actual implementation.  E.g.
> > > > KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.
> > >
> > > I think I am confused. Is the proposal here as follows?
> > > * Use KVM_CAP_GUEST_MEMFD_FLAGS for features that map to guest_memfd
> > > creation flags.
> >
> > No, the proposal is to use KVM_CAP_GUEST_MEMFD_FLAGS to enumerate the set of
> > supported KVM_CREATE_GUEST_MEMFD flags.  Whether or not there is an associated
> > "feature" is irrelevant.  I.e. it's a very literal "these are the supported
> > flags".
> >
> > > * Use KVM caps for guest_memfd features that don't map to any flags.
> > >
> > > I think in general it would be better to have a KVM cap for each
> > > feature irrespective of the flags as the feature may also need
> >                                                    ^^^
> > > additional UAPIs like IOCTLs.
> >
> > If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
> > a CAP is gross overkill.  Even if there are other assets that accompany the new
> > flag, there's no reason we couldn't say "this feature exist if XYZ flag is
> > supported".
> >
> > E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
> > KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
> > far beyond being able to create a VM of type KVM_X86_TDX_VM.
> >
> 
> What's your opinion about having KVM_CAP_GUEST_MEMFD_MMAP part of
> KVM_CAP_GUEST_MEMFD_CAPS i.e. having a KVM cap covering all features
> of guest_memfd?

I'd much prefer to have both.  Describing flags for an ioctl via a bitmask that
doesn't *exactly* match the flags is asking for problems.  At best, it will be
confusing.  E.g. we'll probably end up with code like this:

	gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);

	if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
		gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
	if (gmem_caps & KVM_CAP_GUEST_MEMFD_INIT_SHARED)
		gmem_flags |= KVM_CAP_GUEST_MEMFD_INIT_SHARED;

Those types of patterns often lead to typos causing problems (LOL, case in point,
there's a typo above; I'm leaving it to illustrate my point).  That can be largely
solved by userspace via macro shenanigans, but userspace really shouldn't have to
jump through hoops for such a simple thing.

An ever worse outcome is if userspace does something like:

	gmem_flags = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);

Which might actually work initially, e.g. if KVM_CAP_GUEST_MEMFD_MMAP and
GUEST_MEMFD_FLAG_MMAP have the same value.  But eventually userspace will be sad.

Another issue is that, while unlikely, we could run out of KVM_CAP_GUEST_MEMFD_CAPS
bits before we run out of flags.

And if we use memory attributes, we're also guaranteed to have at least one gmem
capability that returns a bitmask separately from a dedicated one-size-fits-all
cap, e.g.

	case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
		if (vm_memory_attributes)
			return 0;

		return kvm_supported_mem_attributes(kvm);

Side topic, looking at this, I don't think we need KVM_CAP_GUEST_MEMFD_CAPS, I'm
pretty sure we can simply extend KVM_CAP_GUEST_MEMFD.  E.g. 

#define KVM_GUEST_MEMFD_FEAT_BASIC		(1ULL << 0)
#define KVM_GUEST_MEMFD_FEAT_FANCY		(1ULL << 1)

	case KVM_CAP_GUEST_MEMFD:
		return KVM_GUEST_MEMFD_FEAT_BASIC |
		       KVM_GUEST_MEMFD_FEAT_FANCY;

> That seems more consistent to me in order for userspace to deduce the
> supported features and assume flags/ioctls/...  associated with the feature
> as a group.

If we add a feature that comes with a flag, we could always add both, i.e. a
feature flag for KVM_CAP_GUEST_MEMFD along with the natural enumeration for
KVM_CAP_GUEST_MEMFD_FLAGS.  That certainly wouldn't be my first choice, but it's
a possibility, e.g. if it really is the most intuitive solution.  But that's
getting quite hypothetical.

  reply	other threads:[~2025-10-03  0:12 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
2025-09-29  8:38   ` David Hildenbrand
2025-09-29  8:57     ` Fuad Tabba
2025-09-29  9:01       ` David Hildenbrand
2025-09-29  9:04   ` Fuad Tabba
2025-09-29  9:43     ` Ackerley Tng
2025-09-29 10:15       ` Patrick Roy
2025-09-29 10:22         ` David Hildenbrand
2025-09-29 10:51           ` Ackerley Tng
2025-09-29 16:55             ` Sean Christopherson
2025-09-30  0:15               ` Sean Christopherson
2025-09-30  8:36                 ` Ackerley Tng
2025-10-01 14:22                 ` Vishal Annapurve
2025-10-01 16:15                   ` Sean Christopherson
2025-10-01 16:31                     ` Vishal Annapurve
2025-10-01 17:16                       ` Sean Christopherson
2025-10-01 22:13                         ` Vishal Annapurve
2025-10-02  0:04                           ` Sean Christopherson
2025-10-02 15:41                             ` Vishal Annapurve
2025-10-03  0:12                               ` Sean Christopherson [this message]
2025-10-03  4:10                                 ` Vishal Annapurve
2025-10-03 16:13                                   ` Sean Christopherson
2025-10-03 20:30                                     ` Vishal Annapurve
2025-09-29 16:54       ` Sean Christopherson
2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
2025-09-29  9:12   ` Fuad Tabba
2025-09-29  9:17   ` David Hildenbrand
2025-09-29 10:56   ` Ackerley Tng
2025-09-29 16:58     ` Sean Christopherson
2025-09-30  6:52       ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
2025-09-29  9:18   ` David Hildenbrand
2025-09-29  9:24   ` Fuad Tabba
2025-09-29 11:02   ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
2025-09-29  9:21   ` David Hildenbrand
2025-09-29  9:24   ` Fuad Tabba
2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
2025-09-29  9:24   ` Fuad Tabba
2025-09-29  9:28   ` David Hildenbrand
2025-09-29 11:08   ` Ackerley Tng
2025-09-29 17:32     ` Sean Christopherson
2025-09-30  7:09       ` Ackerley Tng
2025-09-30 14:24         ` Sean Christopherson
2025-10-01 10:18           ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
2025-09-29  9:24   ` Fuad Tabba
2025-09-29  9:28   ` David Hildenbrand
2025-09-29 14:38   ` Ackerley Tng
2025-09-29 18:10     ` Sean Christopherson
2025-09-29 18:35       ` Sean Christopherson
2025-09-30  7:53       ` Ackerley Tng
2025-09-30 14:58         ` Sean Christopherson
2025-10-01 10:26           ` Ackerley Tng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aN8U2c8KMXTy6h9Q@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kalyazin@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patrick.roy@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=shivankg@amd.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.