public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Roy <patrick.roy@linux.dev>
To: Ackerley Tng <ackerleytng@google.com>,
	Fuad Tabba <tabba@google.com>,
	Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>,
	"Kalyazin, Nikita" <kalyazin@amazon.co.uk>,
	shivankg@amd.com
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
Date: Mon, 29 Sep 2025 11:15:54 +0100	[thread overview]
Message-ID: <a4976f04-959d-48ae-9815-d192365bdcc6@linux.dev> (raw)
In-Reply-To: <diqz7bxh386h.fsf@google.com>

Hi Ackerley!

On Mon, 2025-09-29 at 10:43 +0100, Ackerley Tng wrote:
> Fuad Tabba <tabba@google.com> writes:
> 
>> Hi Sean,
>>
>> On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
>>>
>>> Add a guest_memfd flag to allow userspace to state that the underlying
>>> memory should be configured to be shared by default, and reject user page
>>> faults if the guest_memfd instance's memory isn't shared by default.
>>> Because KVM doesn't yet support in-place private<=>shared conversions, all
>>> guest_memfd memory effectively follows the default state.
>>>
>>> Alternatively, KVM could deduce the default state based on MMAP, which for
>>> all intents and purposes is what KVM currently does.  However, implicitly
>>> deriving the default state based on MMAP will result in a messy ABI when
>>> support for in-place conversions is added.
>>>
>>> For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
>>> by default (otherwise the memory would be unusable).  If MMAP implies
>>> memory is shared by default, then the default state for CoCo VMs will vary
>>> based on MMAP, and from userspace's perspective, will change when in-place
>>> conversion support is added.  I.e. to maintain guest<=>host ABI, userspace
>>> would need to immediately convert all memory from shared=>private, which
>>> is both ugly and inefficient.  The inefficiency could be avoided by adding
>>> a flag to state that memory is _private_ by default, irrespective of MMAP,
>>> but that would lead to an equally messy and hard to document ABI.
>>>
>>> Bite the bullet and immediately add a flag to control the default state so
>>> that the effective behavior is explicit and straightforward.
>>>
> 
> I like having this flag, but didn't propose this because I thought folks
> depending on the default being shared (Patrick/Nikita) might have their
> usage broken.

We'll just need to pass the new flag in Firecracker, that's not a problem
at all :) We aren't running this anywhere in production yet, so nothing
would break on our end.

>>> Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Fuad Tabba <tabba@google.com>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>> ---
>>>  Documentation/virt/kvm/api.rst                 | 10 ++++++++--
>>>  include/uapi/linux/kvm.h                       |  3 ++-
>>>  tools/testing/selftests/kvm/guest_memfd_test.c |  5 +++--
>>>  virt/kvm/guest_memfd.c                         |  6 +++++-
>>>  4 files changed, 18 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index c17a87a0a5ac..4dfe156bbe3c 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -6415,8 +6415,14 @@ guest_memfd range is not allowed (any number of memory regions can be bound to
>>>  a single guest_memfd file, but the bound ranges must not overlap).
>>>
>>>  When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
>>> -supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
>>> -enables mmap() and faulting of guest_memfd memory to host userspace.
>>> +supports GUEST_MEMFD_FLAG_MMAP and  GUEST_MEMFD_FLAG_DEFAULT_SHARED.  Setting
>>
>> There's an extra space between `and` and `GUEST_MEMFD_FLAG_DEFAULT_SHARED`.
>>
> 
> +1 on this. Also, would you consider putting the concept of "at creation
> time" or "at initialization time" into the name of the flag?
> 
> "Default" could be interpreted as "whenever a folio is allocated for
> this guest_memfd", the memory the folio represents is by default
> shared.
> 
> What we want to represent is that when the guest_memfd is created,
> memory at all indices are initialized as shared.
> 
> Looking a bit further, when conversion is supported, if this flag is not
> specified, then all the indices are initialized as private, right?
> 
>>> +the MMAP flag on guest_memfd creation enables mmap() and faulting of guest_memfd
>>> +memory to host userspace (so long as the memory is currently shared).  Setting
>>> +DEFAULT_SHARED makes all guest_memfd memory shared by default (versus private
>>> +by default).  Note!  Because KVM doesn't yet support in-place private<=>shared
>>> +conversions, DEFAULT_SHARED must be specified in order to fault memory into
>>> +userspace page tables.  This limitation will go away when in-place conversions
>>> +are supported.
>>
>> I think that a more accurate (and future proof) description of the
>> mmap flag could be something along the lines of:
>>
> 
> +1 on these suggestions, I agree that making the concepts of SHARED vs
> MMAP orthogonal from the start is more future proof.
> 
>> + Setting GUEST_MEMFD_FLAG_MMAP enables using mmap() on the file descriptor.
>>
>> + Setting GUEST_MEMFD_FLAG_DEFAULT_SHARED makes all memory in the file shared
>> + by default
> 
> See above, I'd prefer clarifying this as "at initialization time" or
> something similar.
> 
>> , as opposed to private. Shared memory can be faulted into host
>> + userspace page tables. Private memory cannot.
>>
>>>  When the KVM MMU performs a PFN lookup to service a guest fault and the backing
>>>  guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 6efa98a57ec1..38a2c083b6aa 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
>>>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>>>
>>>  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
>>> -#define GUEST_MEMFD_FLAG_MMAP  (1ULL << 0)
>>> +#define GUEST_MEMFD_FLAG_MMAP          (1ULL << 0)
>>> +#define GUEST_MEMFD_FLAG_DEFAULT_SHARED        (1ULL << 1)
>>>
>>>  struct kvm_create_guest_memfd {
>>>         __u64 size;
>>> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
>>> index b3ca6737f304..81b11a958c7a 100644
>>> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
>>> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
>>> @@ -274,7 +274,7 @@ static void test_guest_memfd(unsigned long vm_type)
>>>         vm = vm_create_barebones_type(vm_type);
>>>
>>>         if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
>>> -               flags |= GUEST_MEMFD_FLAG_MMAP;
>>> +               flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>>>
>>>         test_create_guest_memfd_multiple(vm);
>>>         test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
>>> @@ -337,7 +337,8 @@ static void test_guest_memfd_guest(void)
>>>                     "Default VM type should always support guest_memfd mmap()");
>>>
>>>         size = vm->page_size;
>>> -       fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP);
>>> +       fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP |
>>> +                                            GUEST_MEMFD_FLAG_DEFAULT_SHARED);
>>>         vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
>>>
>>>         mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>>> index 08a6bc7d25b6..19f05a45be04 100644
>>> --- a/virt/kvm/guest_memfd.c
>>> +++ b/virt/kvm/guest_memfd.c
>>> @@ -328,6 +328,9 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>>>         if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>>>                 return VM_FAULT_SIGBUS;
>>>
>>> +       if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_DEFAULT_SHARED))
>>> +               return VM_FAULT_SIGBUS;
>>> +
>>>         folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>>>         if (IS_ERR(folio)) {
>>>                 int err = PTR_ERR(folio);
>>> @@ -525,7 +528,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>>>         u64 valid_flags = 0;
>>>
>>>         if (kvm_arch_supports_gmem_mmap(kvm))
>>> -               valid_flags |= GUEST_MEMFD_FLAG_MMAP;
>>> +               valid_flags |= GUEST_MEMFD_FLAG_MMAP |
>>> +                              GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>>
>> At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
>> GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
>> checking for that, at least until we have in-place conversion? Having
>> only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
>> isn't a useful combination.
>>
> 
> I think it's okay to have the two flags be orthogonal from the start.

I think I dimly remember someone at one of the guest_memfd syncs
bringing up a usecase for having a VMA even if all memory is private,
not for faulting anything in, but to do madvise or something? Maybe it
was the NUMA stuff? (+Shivank)

So for that, having the flags be orthogonal would be useful even without
conversion support.

> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
> 
>> That said, these are all nits, I'll leave it to you. With that:
>>
>> Reviewed-by: Fuad Tabba <tabba@google.com>
>> Tested-by: Fuad Tabba <tabba@google.com>
>>
>> Cheers,
>> /fuad
>>
>>
>>
>>>
>>>         if (flags & ~valid_flags)
>>>                 return -EINVAL;
>>> --
>>> 2.51.0.536.g15c5d4f767-goog
>>>

Best,
Patrick

  reply	other threads:[~2025-09-29 10:15 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
2025-09-29  8:38   ` David Hildenbrand
2025-09-29  8:57     ` Fuad Tabba
2025-09-29  9:01       ` David Hildenbrand
2025-09-29  9:04   ` Fuad Tabba
2025-09-29  9:43     ` Ackerley Tng
2025-09-29 10:15       ` Patrick Roy [this message]
2025-09-29 10:22         ` David Hildenbrand
2025-09-29 10:51           ` Ackerley Tng
2025-09-29 16:55             ` Sean Christopherson
2025-09-30  0:15               ` Sean Christopherson
2025-09-30  8:36                 ` Ackerley Tng
2025-10-01 14:22                 ` Vishal Annapurve
2025-10-01 16:15                   ` Sean Christopherson
2025-10-01 16:31                     ` Vishal Annapurve
2025-10-01 17:16                       ` Sean Christopherson
2025-10-01 22:13                         ` Vishal Annapurve
2025-10-02  0:04                           ` Sean Christopherson
2025-10-02 15:41                             ` Vishal Annapurve
2025-10-03  0:12                               ` Sean Christopherson
2025-10-03  4:10                                 ` Vishal Annapurve
2025-10-03 16:13                                   ` Sean Christopherson
2025-10-03 20:30                                     ` Vishal Annapurve
2025-09-29 16:54       ` Sean Christopherson
2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
2025-09-29  9:12   ` Fuad Tabba
2025-09-29  9:17   ` David Hildenbrand
2025-09-29 10:56   ` Ackerley Tng
2025-09-29 16:58     ` Sean Christopherson
2025-09-30  6:52       ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
2025-09-29  9:18   ` David Hildenbrand
2025-09-29  9:24   ` Fuad Tabba
2025-09-29 11:02   ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
2025-09-29  9:21   ` David Hildenbrand
2025-09-29  9:24   ` Fuad Tabba
2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
2025-09-29  9:24   ` Fuad Tabba
2025-09-29  9:28   ` David Hildenbrand
2025-09-29 11:08   ` Ackerley Tng
2025-09-29 17:32     ` Sean Christopherson
2025-09-30  7:09       ` Ackerley Tng
2025-09-30 14:24         ` Sean Christopherson
2025-10-01 10:18           ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
2025-09-29  9:24   ` Fuad Tabba
2025-09-29  9:28   ` David Hildenbrand
2025-09-29 14:38   ` Ackerley Tng
2025-09-29 18:10     ` Sean Christopherson
2025-09-29 18:35       ` Sean Christopherson
2025-09-30  7:53       ` Ackerley Tng
2025-09-30 14:58         ` Sean Christopherson
2025-10-01 10:26           ` Ackerley Tng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a4976f04-959d-48ae-9815-d192365bdcc6@linux.dev \
    --to=patrick.roy@linux.dev \
    --cc=ackerleytng@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kalyazin@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shivankg@amd.com \
    --cc=tabba@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox