From: Ackerley Tng <ackerleytng@google.com>
To: Fuad Tabba <tabba@google.com>, Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
David Hildenbrand <david@redhat.com>,
roypat@amazon.co.uk, "Kalyazin, Nikita" <kalyazin@amazon.co.uk>
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set
Date: Mon, 29 Sep 2025 09:43:50 +0000 [thread overview]
Message-ID: <diqz7bxh386h.fsf@google.com> (raw)
In-Reply-To: <CA+EHjTzdX8+MbsYOHAJn6Gkayfei-jE6Q_5HfZhnfwnMijmucw@mail.gmail.com>
Fuad Tabba <tabba@google.com> writes:
> Hi Sean,
>
> On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@google.com> wrote:
>>
>> Add a guest_memfd flag to allow userspace to state that the underlying
>> memory should be configured to be shared by default, and reject user page
>> faults if the guest_memfd instance's memory isn't shared by default.
>> Because KVM doesn't yet support in-place private<=>shared conversions, all
>> guest_memfd memory effectively follows the default state.
>>
>> Alternatively, KVM could deduce the default state based on MMAP, which for
>> all intents and purposes is what KVM currently does. However, implicitly
>> deriving the default state based on MMAP will result in a messy ABI when
>> support for in-place conversions is added.
>>
>> For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
>> by default (otherwise the memory would be unusable). If MMAP implies
>> memory is shared by default, then the default state for CoCo VMs will vary
>> based on MMAP, and from userspace's perspective, will change when in-place
>> conversion support is added. I.e. to maintain guest<=>host ABI, userspace
>> would need to immediately convert all memory from shared=>private, which
>> is both ugly and inefficient. The inefficiency could be avoided by adding
>> a flag to state that memory is _private_ by default, irrespective of MMAP,
>> but that would lead to an equally messy and hard to document ABI.
>>
>> Bite the bullet and immediately add a flag to control the default state so
>> that the effective behavior is explicit and straightforward.
>>
I like having this flag, but didn't propose this because I thought folks
depending on the default being shared (Patrick/Nikita) might have their
usage broken.
>> Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Fuad Tabba <tabba@google.com>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> ---
>> Documentation/virt/kvm/api.rst | 10 ++++++++--
>> include/uapi/linux/kvm.h | 3 ++-
>> tools/testing/selftests/kvm/guest_memfd_test.c | 5 +++--
>> virt/kvm/guest_memfd.c | 6 +++++-
>> 4 files changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index c17a87a0a5ac..4dfe156bbe3c 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -6415,8 +6415,14 @@ guest_memfd range is not allowed (any number of memory regions can be bound to
>> a single guest_memfd file, but the bound ranges must not overlap).
>>
>> When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
>> -supports GUEST_MEMFD_FLAG_MMAP. Setting this flag on guest_memfd creation
>> -enables mmap() and faulting of guest_memfd memory to host userspace.
>> +supports GUEST_MEMFD_FLAG_MMAP and GUEST_MEMFD_FLAG_DEFAULT_SHARED. Setting
>
> There's an extra space between `and` and `GUEST_MEMFD_FLAG_DEFAULT_SHARED`.
>
+1 on this. Also, would you consider putting the concept of "at creation
time" or "at initialization time" into the name of the flag?
"Default" could be interpreted as "whenever a folio is allocated for
this guest_memfd", the memory the folio represents is by default
shared.
What we want to represent is that when the guest_memfd is created,
memory at all indices are initialized as shared.
Looking a bit further, when conversion is supported, if this flag is not
specified, then all the indices are initialized as private, right?
>> +the MMAP flag on guest_memfd creation enables mmap() and faulting of guest_memfd
>> +memory to host userspace (so long as the memory is currently shared). Setting
>> +DEFAULT_SHARED makes all guest_memfd memory shared by default (versus private
>> +by default). Note! Because KVM doesn't yet support in-place private<=>shared
>> +conversions, DEFAULT_SHARED must be specified in order to fault memory into
>> +userspace page tables. This limitation will go away when in-place conversions
>> +are supported.
>
> I think that a more accurate (and future proof) description of the
> mmap flag could be something along the lines of:
>
+1 on these suggestions, I agree that making the concepts of SHARED vs
MMAP orthogonal from the start is more future proof.
> + Setting GUEST_MEMFD_FLAG_MMAP enables using mmap() on the file descriptor.
>
> + Setting GUEST_MEMFD_FLAG_DEFAULT_SHARED makes all memory in the file shared
> + by default
See above, I'd prefer clarifying this as "at initialization time" or
something similar.
> , as opposed to private. Shared memory can be faulted into host
> + userspace page tables. Private memory cannot.
>
>> When the KVM MMU performs a PFN lookup to service a guest fault and the backing
>> guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 6efa98a57ec1..38a2c083b6aa 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
>> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
>>
>> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
>> -#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
>> +#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
>> +#define GUEST_MEMFD_FLAG_DEFAULT_SHARED (1ULL << 1)
>>
>> struct kvm_create_guest_memfd {
>> __u64 size;
>> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
>> index b3ca6737f304..81b11a958c7a 100644
>> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
>> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
>> @@ -274,7 +274,7 @@ static void test_guest_memfd(unsigned long vm_type)
>> vm = vm_create_barebones_type(vm_type);
>>
>> if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
>> - flags |= GUEST_MEMFD_FLAG_MMAP;
>> + flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>>
>> test_create_guest_memfd_multiple(vm);
>> test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
>> @@ -337,7 +337,8 @@ static void test_guest_memfd_guest(void)
>> "Default VM type should always support guest_memfd mmap()");
>>
>> size = vm->page_size;
>> - fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP);
>> + fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP |
>> + GUEST_MEMFD_FLAG_DEFAULT_SHARED);
>> vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
>>
>> mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 08a6bc7d25b6..19f05a45be04 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -328,6 +328,9 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>> if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>> return VM_FAULT_SIGBUS;
>>
>> + if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_DEFAULT_SHARED))
>> + return VM_FAULT_SIGBUS;
>> +
>> folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>> if (IS_ERR(folio)) {
>> int err = PTR_ERR(folio);
>> @@ -525,7 +528,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>> u64 valid_flags = 0;
>>
>> if (kvm_arch_supports_gmem_mmap(kvm))
>> - valid_flags |= GUEST_MEMFD_FLAG_MMAP;
>> + valid_flags |= GUEST_MEMFD_FLAG_MMAP |
>> + GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>
> At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
> GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
> checking for that, at least until we have in-place conversion? Having
> only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
> isn't a useful combination.
>
I think it's okay to have the two flags be orthogonal from the start.
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
> That said, these are all nits, I'll leave it to you. With that:
>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Tested-by: Fuad Tabba <tabba@google.com>
>
> Cheers,
> /fuad
>
>
>
>>
>> if (flags & ~valid_flags)
>> return -EINVAL;
>> --
>> 2.51.0.536.g15c5d4f767-goog
>>
next prev parent reply other threads:[~2025-09-29 9:43 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 16:31 [PATCH 0/6] KVM: Avoid a lurking guest_memfd ABI mess Sean Christopherson
2025-09-26 16:31 ` [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set Sean Christopherson
2025-09-29 8:38 ` David Hildenbrand
2025-09-29 8:57 ` Fuad Tabba
2025-09-29 9:01 ` David Hildenbrand
2025-09-29 9:04 ` Fuad Tabba
2025-09-29 9:43 ` Ackerley Tng [this message]
2025-09-29 10:15 ` Patrick Roy
2025-09-29 10:22 ` David Hildenbrand
2025-09-29 10:51 ` Ackerley Tng
2025-09-29 16:55 ` Sean Christopherson
2025-09-30 0:15 ` Sean Christopherson
2025-09-30 8:36 ` Ackerley Tng
2025-10-01 14:22 ` Vishal Annapurve
2025-10-01 16:15 ` Sean Christopherson
2025-10-01 16:31 ` Vishal Annapurve
2025-10-01 17:16 ` Sean Christopherson
2025-10-01 22:13 ` Vishal Annapurve
2025-10-02 0:04 ` Sean Christopherson
2025-10-02 15:41 ` Vishal Annapurve
2025-10-03 0:12 ` Sean Christopherson
2025-10-03 4:10 ` Vishal Annapurve
2025-10-03 16:13 ` Sean Christopherson
2025-10-03 20:30 ` Vishal Annapurve
2025-09-29 16:54 ` Sean Christopherson
2025-09-26 16:31 ` [PATCH 2/6] KVM: selftests: Stash the host page size in a global in the guest_memfd test Sean Christopherson
2025-09-29 9:12 ` Fuad Tabba
2025-09-29 9:17 ` David Hildenbrand
2025-09-29 10:56 ` Ackerley Tng
2025-09-29 16:58 ` Sean Christopherson
2025-09-30 6:52 ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 3/6] KVM: selftests: Create a new guest_memfd for each testcase Sean Christopherson
2025-09-29 9:18 ` David Hildenbrand
2025-09-29 9:24 ` Fuad Tabba
2025-09-29 11:02 ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 4/6] KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP Sean Christopherson
2025-09-29 9:21 ` David Hildenbrand
2025-09-29 9:24 ` Fuad Tabba
2025-09-26 16:31 ` [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success Sean Christopherson
2025-09-29 9:24 ` Fuad Tabba
2025-09-29 9:28 ` David Hildenbrand
2025-09-29 11:08 ` Ackerley Tng
2025-09-29 17:32 ` Sean Christopherson
2025-09-30 7:09 ` Ackerley Tng
2025-09-30 14:24 ` Sean Christopherson
2025-10-01 10:18 ` Ackerley Tng
2025-09-26 16:31 ` [PATCH 6/6] KVM: selftests: Verify that faulting in private guest_memfd memory fails Sean Christopherson
2025-09-29 9:24 ` Fuad Tabba
2025-09-29 9:28 ` David Hildenbrand
2025-09-29 14:38 ` Ackerley Tng
2025-09-29 18:10 ` Sean Christopherson
2025-09-29 18:35 ` Sean Christopherson
2025-09-30 7:53 ` Ackerley Tng
2025-09-30 14:58 ` Sean Christopherson
2025-10-01 10:26 ` Ackerley Tng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=diqz7bxh386h.fsf@google.com \
--to=ackerleytng@google.com \
--cc=borntraeger@linux.ibm.com \
--cc=david@redhat.com \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=kalyazin@amazon.co.uk \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=tabba@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.