public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Vishal Annapurve <vannapurve@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>,
	David Hildenbrand <david@redhat.com>,
	 Patrick Roy <patrick.roy@linux.dev>,
	Fuad Tabba <tabba@google.com>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	 Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org,
	Nikita Kalyazin <kalyazin@amazon.co.uk>,
	shivankg@amd.com
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
Date: Thu, 2 Oct 2025 17:12:09 -0700	[thread overview]
Message-ID: <aN8U2c8KMXTy6h9Q@google.com> (raw)
In-Reply-To: <CAGtprH_evo=nyk1B6ZRdKJXX2s7g1W8dhwJhEPJkG=o2ORU48g@mail.gmail.com>

On Thu, Oct 02, 2025, Vishal Annapurve wrote:
> On Wed, Oct 1, 2025 at 5:04 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > On Wed, Oct 1, 2025 at 10:16 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > > > > >
> > > > > > > > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > > > > > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> > > > > > > >
> > > > > > > >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > > > > > >     that we don't need to add a capability every time a new flag comes along,
> > > > > > > >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> > > > > > > >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > > > > > >     that's a non-issue relatively speaking.
> > > > > > > >
> > > > > > >
> > > > > > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > > > > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > > > > > KVM_CAP_GUEST_MEMFD_CAPS.
> > > > > >
> > > > > > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > > > > > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > > > > > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> > > > >
> > > > > Ah, ok. Then do you envision the guest_memfd caps to still be separate
> > > > > KVM caps per guest_memfd feature?
> > > >
> > > > Yes?  No?  It depends on the feature and the actual implementation.  E.g.
> > > > KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.
> > >
> > > I think I am confused. Is the proposal here as follows?
> > > * Use KVM_CAP_GUEST_MEMFD_FLAGS for features that map to guest_memfd
> > > creation flags.
> >
> > No, the proposal is to use KVM_CAP_GUEST_MEMFD_FLAGS to enumerate the set of
> > supported KVM_CREATE_GUEST_MEMFD flags.  Whether or not there is an associated
> > "feature" is irrelevant.  I.e. it's a very literal "these are the supported
> > flags".
> >
> > > * Use KVM caps for guest_memfd features that don't map to any flags.
> > >
> > > I think in general it would be better to have a KVM cap for each
> > > feature irrespective of the flags as the feature may also need
> >                                                    ^^^
> > > additional UAPIs like IOCTLs.
> >
> > If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
> > a CAP is gross overkill.  Even if there are other assets that accompany the new
> > flag, there's no reason we couldn't say "this feature exist if XYZ flag is
> > supported".
> >
> > E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
> > KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
> > far beyond being able to create a VM of type KVM_X86_TDX_VM.
> >
> 
> What's your opinion about having KVM_CAP_GUEST_MEMFD_MMAP part of
> KVM_CAP_GUEST_MEMFD_CAPS i.e. having a KVM cap covering all features
> of guest_memfd?

I'd much prefer to have both.  Describing flags for an ioctl via a bitmask that
doesn't *exactly* match the flags is asking for problems.  At best, it will be
confusing.  E.g. we'll probably end up with code like this:

	gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);

	if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
		gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
	if (gmem_caps & KVM_CAP_GUEST_MEMFD_INIT_SHARED)
		gmem_flags |= KVM_CAP_GUEST_MEMFD_INIT_SHARED;

Those types of patterns often lead to typos causing problems (LOL, case in point,
there's a typo above; I'm leaving it to illustrate my point).  That can be largely
solved by userspace via macro shenanigans, but userspace really shouldn't have to
jump through hoops for such a simple thing.

An ever worse outcome is if userspace does something like:

	gmem_flags = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);

Which might actually work initially, e.g. if KVM_CAP_GUEST_MEMFD_MMAP and
GUEST_MEMFD_FLAG_MMAP have the same value.  But eventually userspace will be sad.

Another issue is that, while unlikely, we could run out of KVM_CAP_GUEST_MEMFD_CAPS
bits before we run out of flags.

And if we use memory attributes, we're also guaranteed to have at least one gmem
capability that returns a bitmask separately from a dedicated one-size-fits-all
cap, e.g.

	case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
		if (vm_memory_attributes)
			return 0;

		return kvm_supported_mem_attributes(kvm);

Side topic, looking at this, I don't think we need KVM_CAP_GUEST_MEMFD_CAPS, I'm
pretty sure we can simply extend KVM_CAP_GUEST_MEMFD.  E.g. 

#define KVM_GUEST_MEMFD_FEAT_BASIC		(1ULL << 0)
#define KVM_GUEST_MEMFD_FEAT_FANCY		(1ULL << 1)

	case KVM_CAP_GUEST_MEMFD:
		return KVM_GUEST_MEMFD_FEAT_BASIC |
		       KVM_GUEST_MEMFD_FEAT_FANCY;

> That seems more consistent to me in order for userspace to deduce the
> supported features and assume flags/ioctls/...  associated with the feature
> as a group.

If we add a feature that comes with a flag, we could always add both, i.e. a
feature flag for KVM_CAP_GUEST_MEMFD along with the natural enumeration for
KVM_CAP_GUEST_MEMFD_FLAGS.  That certainly wouldn't be my first choice, but it's
a possibility, e.g. if it really is the most intuitive solution.  But that's
getting quite hypothetical.

  reply	other threads:[~2025-10-03  0:12 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
2025-09-29  8:38   ` David Hildenbrand
2025-09-29  8:57     ` Fuad Tabba
2025-09-29  9:01       ` David Hildenbrand
2025-09-29  9:04   ` Fuad Tabba
2025-09-29  9:43     ` Ackerley Tng
2025-09-29 10:15       ` Patrick Roy
2025-09-29 10:22         ` David Hildenbrand
2025-09-29 10:51           ` Ackerley Tng
2025-09-29 16:55             ` Sean Christopherson
2025-09-30  0:15               ` Sean Christopherson
2025-09-30  8:36                 ` Ackerley Tng
2025-10-01 14:22                 ` Vishal Annapurve
2025-10-01 16:15                   ` Sean Christopherson
2025-10-01 16:31                     ` Vishal Annapurve
2025-10-01 17:16                       ` Sean Christopherson
2025-10-01 22:13                         ` Vishal Annapurve
2025-10-02  0:04                           ` Sean Christopherson
2025-10-02 15:41                             ` Vishal Annapurve
2025-10-03  0:12                               ` Sean Christopherson [this message]
2025-10-03  4:10                                 ` Vishal Annapurve
2025-10-03 16:13                                   ` Sean Christopherson
2025-10-03 20:30                                     ` Vishal Annapurve
2025-09-29 16:54       ` Sean Christopherson
2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
2025-09-29  9:12   ` Fuad Tabba
2025-09-29  9:17   ` David Hildenbrand
2025-09-29 10:56   ` Ackerley Tng
2025-09-29 16:58     ` Sean Christopherson
2025-09-30  6:52       ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
2025-09-29  9:18   ` David Hildenbrand
2025-09-29  9:24   ` Fuad Tabba
2025-09-29 11:02   ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
2025-09-29  9:21   ` David Hildenbrand
2025-09-29  9:24   ` Fuad Tabba
2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
2025-09-29  9:24   ` Fuad Tabba
2025-09-29  9:28   ` David Hildenbrand
2025-09-29 11:08   ` Ackerley Tng
2025-09-29 17:32     ` Sean Christopherson
2025-09-30  7:09       ` Ackerley Tng
2025-09-30 14:24         ` Sean Christopherson
2025-10-01 10:18           ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
2025-09-29  9:24   ` Fuad Tabba
2025-09-29  9:28   ` David Hildenbrand
2025-09-29 14:38   ` Ackerley Tng
2025-09-29 18:10     ` Sean Christopherson
2025-09-29 18:35       ` Sean Christopherson
2025-09-30  7:53       ` Ackerley Tng
2025-09-30 14:58         ` Sean Christopherson
2025-10-01 10:26           ` Ackerley Tng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aN8U2c8KMXTy6h9Q@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kalyazin@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patrick.roy@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=shivankg@amd.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox