From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sean Christopherson Date: Tue, 31 Oct 2023 15:13:23 -0700 Subject: [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory In-Reply-To: References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-17-seanjc@google.com> Message-ID: List-Id: To: kvm-riscv@lists.infradead.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Tue, Oct 31, 2023, Fuad Tabba wrote: > Hi, > > On Fri, Oct 27, 2023 at 7:23?PM Sean Christopherson wrote: > > ... > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index e2252c748fd6..e82c69d5e755 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6079,6 +6079,15 @@ applied. > > :Parameters: struct kvm_userspace_memory_region2 (in) > > :Returns: 0 on success, -1 on error > > > > +KVM_SET_USER_MEMORY_REGION2 is an extension to KVM_SET_USER_MEMORY_REGION that > > +allows mapping guest_memfd memory into a guest. All fields shared with > > +KVM_SET_USER_MEMORY_REGION identically. Userspace can set KVM_MEM_PRIVATE in > > +flags to have KVM bind the memory region to a given guest_memfd range of > > +[guest_memfd_offset, guest_memfd_offset + memory_size]. The target guest_memfd > > +must point at a file created via KVM_CREATE_GUEST_MEMFD on the current VM, and > > +the target range must not be bound to any other memory region. All standard > > +bounds checks apply (use common sense). > > + > > Bikeshedding here: Not sure if KVM_MEM_PRIVATE is the best name for > this. It gets confusing with KVM_MEMORY_ATTRIBUTE_PRIVATE, i.e., that > a region marked as KVM_MEM_PRIVATE is only potentially private. It did > confuse the rest of the team when I walked them through a previous > version of this code once. Would something like KVM_MEM_GUESTMEM make > more sense? Heh, deja vu. We discussed this back in v7[*], and I came to the conclusion that choosing a name that wasn't explicitly tied to private memory wasn't justified. But that was before a KVM-owned guest_memfd was even an idea, and thus before we had anything close to a real use case. Since we now know that at least pKVM will use guest_memfd for shared memory, and odds are quite good that "regular" VMs will also do the same, i.e. will want guest_memfd with the concept of private memory, I agree that we should avoid PRIVATE. Though I vote for KVM_MEM_GUEST_MEMFD (or KVM_MEM_GUEST_MEMFD_VALID or KVM_MEM_USE_GUEST_MEMFD). I.e. do our best to avoid ambiguity between referring to "guest memory" at-large and guest_memfd. Copying a few relevant points from v7 to save a click or three. : I don't have a concrete use case (this is a recent idea on my end), but since we're : already adding fd-based memory, I can't think of a good reason not make it more generic : for not much extra cost. And there are definitely classes of VMs for which fd-based : memory would Just Work, e.g. large VMs that are never oversubscribed on memory don't : need to support reclaim, so the fact that fd-based memslots won't support page aging : (among other things) right away is a non-issue. ... : Hrm, but basing private memory on top of a generic FD_VALID would effectively require : shared memory to use hva-based memslots for confidential VMs. That'd yield a very : weird API, e.g. non-confidential VMs could be backed entirely by fd-based memslots, : but confidential VMs would be forced to use hva-based memslots. : : Ignore this idea for now. If there's an actual use case for generic fd-based memory : then we'll want a separate flag, fd, and offset, i.e. that support could be added : independent of KVM_MEM_PRIVATE. ... : One alternative would be to call it KVM_MEM_PROTECTED. That shouldn't cause : problems for the known use of "private" (TDX and SNP), and it gives us a little : wiggle room, e.g. if we ever get a use case where VMs can share memory that is : otherwise protected. : : That's a pretty big "if" though, and odds are good we'd need more memslot flags and : fd+offset pairs to allow differentiating "private" vs. "protected-shared" without : forcing userspace to punch holes in memslots, so I don't know that hedging now will : buy us anything. : : So I'd say that if people think KVM_MEM_PRIVATE brings additional and meaningful : clarity over KVM_MEM_PROTECTECD, then lets go with PRIVATE. But if PROTECTED is : just as good, go with PROTECTED as it gives us a wee bit of wiggle room for the : future. [*] https://lore.kernel.org/all/Yuh0ikhoh+tCK6VW at google.com > > -See KVM_SET_USER_MEMORY_REGION. > > +A KVM_MEM_PRIVATE region _must_ have a valid guest_memfd (private memory) and > > +userspace_addr (shared memory). However, "valid" for userspace_addr simply > > +means that the address itself must be a legal userspace address. The backing > > +mapping for userspace_addr is not required to be valid/populated at the time of > > +KVM_SET_USER_MEMORY_REGION2, e.g. shared memory can be lazily mapped/allocated > > +on-demand. > > Regarding requiring that a private region have both a valid > guest_memfd and a userspace_addr, should this be > implementation-specific? In pKVM at least, all regions for protected > VMs are private, and KVM doesn't care about the host userspace address > for those regions even when part of the memory is shared. Hmm, as of this patch, no, because the pKVM usage doesn't exist. E.g. . Because this literally documents the current ABI. When > > +When mapping a gfn into the guest, KVM selects shared vs. private, i.e consumes > > +userspace_addr vs. guest_memfd, based on the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE > > +state. At VM creation time, all memory is shared, i.e. the PRIVATE attribute > > +is '0' for all gfns. Userspace can control whether memory is shared/private by > > +toggling KVM_MEMORY_ATTRIBUTE_PRIVATE via KVM_SET_MEMORY_ATTRIBUTES as needed. > > In pKVM, guest memory is private by default, and most of it will > remain so for the lifetime of the VM. Userspace could explicitly mark > all the guest's memory as private at initialization, but it would save > a slight amount of work. That said, I understand that it might be > better to be consistent across implementations. Yeah, we discussed this in v12[*]. The default really doesn't matter for memory overheads or performances once supports range-based xarray entries, and if that isn't sufficient, KVM can internally invert the polarity of PRIVATE. But for the ABI, I think we put a stake in the ground and say that all memory is shared by default. That way CoCo VMs and regular VMs (i.e VMs without the concept of private memory) all have the same ABI. Practically speaking, the cost to pKVM (and likely every other CoCo VM type) is a single ioctl() during VM creation to "convert" all memory to private. [*] https://lore.kernel.org/all/ZRw6X2BptZnRPNK7 at google.com > > --- /dev/null > > +++ b/virt/kvm/guest_memfd.c > > @@ -0,0 +1,548 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +#include > > +#include > > +#include > > +#include > > +#include > > nit: should this include be first (to maintain alphabetical ordering > of the includes)? Heh, yeah. I would argue this isn't a nit though ;-) > > +static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len) > > +{ > > + struct list_head *gmem_list = &inode->i_mapping->private_list; > > + pgoff_t start = offset >> PAGE_SHIFT; > > + pgoff_t end = (offset + len) >> PAGE_SHIFT; > > + struct kvm_gmem *gmem; > > + > > + /* > > + * Bindings must stable across invalidation to ensure the start+end > > nit: Bindings must _be/stay?_ stable "be" is what's intended. > ... > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 78a0b09ef2a5..5d1a2f1b4e94 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -798,7 +798,7 @@ void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end) > > } > > } > > > > -static bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > > +bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > > { > > kvm_mmu_invalidate_range_add(kvm, range->start, range->end); > > return kvm_unmap_gfn_range(kvm, range); > > @@ -1034,6 +1034,9 @@ static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) > > /* This does not remove the slot from struct kvm_memslots data structures */ > > static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) > > { > > + if (slot->flags & KVM_MEM_PRIVATE) > > + kvm_gmem_unbind(slot); > > + > > Should this be called after kvm_arch_free_memslot()? Arch-specific ode > might need some of the data before the unbinding, something I thought > might be necessary at one point for the pKVM port when deleting a > memslot, but realized later that kvm_invalidate_memslot() -> > kvm_arch_guest_memory_reclaimed() was the more logical place for it. > Also, since that seems to be the pattern for arch-specific handlers in > KVM. Maybe? But only if we can about symmetry between the allocation and free paths I really don't think kvm_arch_free_memslot() should be doing anything beyond a "pure" free. E.g. kvm_arch_free_memslot() is also called after moving a memslot, which hopefully we never actually have to allow for guest_memfd, but any code in kvm_arch_free_memslot() would bring about "what if" questions regarding memslot movement. I.e. the API is intended to be a "free arch metadata associated with the memslot". Out of curiosity, what does pKVM need to do at kvm_arch_guest_memory_reclaimed()? From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67062A28 for ; Tue, 31 Oct 2023 22:13:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gv2M2zxr" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-5af9b0850fdso59660807b3.1 for ; Tue, 31 Oct 2023 15:13:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698790405; x=1699395205; darn=lists.linux.dev; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=zs8biX5N196Q/2fqRFnHuCbiMBjZx3eWIMZhT9G5lkM=; b=gv2M2zxr3aRRUfr2tEJG8H8H5JMSH6ijj1lqzmHU4GquS9z2PrHm6znxQGKuyb7HTg AQf44WZDtN26mvfVBZrMo3CrUW59W5KLyYONmcgsxFNlvQ6it1UbROQJsfLt4YLxzQFu gbQTiefddFXBSdcoRS9CQJIul3lMXbRSyX1oxgkOa5AUsH3yBL0HpYE1d2NZJhle2T1v pdB3McYtVW4Ns+0dnPZy+1L3gKPRbYIbNISMgSwS+DjK7i6pwDiTFBrfOC+TeWPkmXhT /ZDaTFRahe8blxl5piuyGeMiNDcRqqZ0i9o50pP4SaEwrXT8ebn9P4gNifDy1g+uRY0N d+pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698790405; x=1699395205; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=zs8biX5N196Q/2fqRFnHuCbiMBjZx3eWIMZhT9G5lkM=; b=JPkGCv0sp4iV/Qpk3pmooTC5AbVGTHZiKDbjhLEls6U9e7/pZ/+BpHiZYLrYZ1nstZ xFkm9teV6rsrmfRE41uuLrnyV1PGlvZ/9kGLPWgnQTplzHeGEOlm+uCE0UhcOvgEq06M 7gnxfKm9xF1FYwAX70pgmR4q/uB/koz8bEJss5JnYp6qn2Hx1dy3YeEBdL9fYpkVFkik lGvkWWo9No3wv2+vdwGskYn3h+Ej3m6pclEyYic66u2rTHDT9xDTdruQQamvZp/aE20F PwRLwnrVouJDJ5qDT+WfybroZSPYvJw3y691KUBF7lmJFDVsnu3BeQ3MozPZst5CXcO0 LoSA== X-Gm-Message-State: AOJu0YzJVvmf9udwVTBz4x2AZlj8xB+y6t9qArtAwo8f7EXurdJQPLD9 KL6MfkSMcSZKXN34ZRirn0NGwJ9Hn6I= X-Google-Smtp-Source: AGHT+IFQSb3cO99m4kmVkxpYDSR1DbV1n07kMYxpUGk3r7RnUdGfKSLZ2dTeiYGKN4sKIS0uvhtR25tzD1o= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1746:b0:d9a:59cb:8bed with SMTP id bz6-20020a056902174600b00d9a59cb8bedmr238072ybb.5.1698790405257; Tue, 31 Oct 2023 15:13:25 -0700 (PDT) Date: Tue, 31 Oct 2023 15:13:23 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-17-seanjc@google.com> Message-ID: Subject: Re: [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Fuad Tabba Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiaoyao Li , Xu Yilun , Chao Peng , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?=" , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Tue, Oct 31, 2023, Fuad Tabba wrote: > Hi, >=20 > On Fri, Oct 27, 2023 at 7:23=E2=80=AFPM Sean Christopherson wrote: >=20 > ... >=20 > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/ap= i.rst > > index e2252c748fd6..e82c69d5e755 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6079,6 +6079,15 @@ applied. > > :Parameters: struct kvm_userspace_memory_region2 (in) > > :Returns: 0 on success, -1 on error > > > > +KVM_SET_USER_MEMORY_REGION2 is an extension to KVM_SET_USER_MEMORY_REG= ION that > > +allows mapping guest_memfd memory into a guest. All fields shared wit= h > > +KVM_SET_USER_MEMORY_REGION identically. Userspace can set KVM_MEM_PRI= VATE in > > +flags to have KVM bind the memory region to a given guest_memfd range = of > > +[guest_memfd_offset, guest_memfd_offset + memory_size]. The target gu= est_memfd > > +must point at a file created via KVM_CREATE_GUEST_MEMFD on the current= VM, and > > +the target range must not be bound to any other memory region. All st= andard > > +bounds checks apply (use common sense). > > + >=20 > Bikeshedding here: Not sure if KVM_MEM_PRIVATE is the best name for > this. It gets confusing with KVM_MEMORY_ATTRIBUTE_PRIVATE, i.e., that > a region marked as KVM_MEM_PRIVATE is only potentially private. It did > confuse the rest of the team when I walked them through a previous > version of this code once. Would something like KVM_MEM_GUESTMEM make > more sense? Heh, deja vu. We discussed this back in v7[*], and I came to the conclusio= n that choosing a name that wasn't explicitly tied to private memory wasn't justif= ied. But that was before a KVM-owned guest_memfd was even an idea, and thus befo= re we had anything close to a real use case. Since we now know that at least pKVM will use guest_memfd for shared memory= , and odds are quite good that "regular" VMs will also do the same, i.e. will wan= t guest_memfd with the concept of private memory, I agree that we should avoi= d PRIVATE. Though I vote for KVM_MEM_GUEST_MEMFD (or KVM_MEM_GUEST_MEMFD_VALID or KVM_MEM_USE_GUEST_MEMFD). I.e. do our best to avoid ambiguity between refe= rring to "guest memory" at-large and guest_memfd. Copying a few relevant points from v7 to save a click or three. : I don't have a concrete use case (this is a recent idea on my end), but = since we're : already adding fd-based memory, I can't think of a good reason not make = it more generic : for not much extra cost. And there are definitely classes of VMs for wh= ich fd-based : memory would Just Work, e.g. large VMs that are never oversubscribed on = memory don't : need to support reclaim, so the fact that fd-based memslots won't suppor= t page aging : (among other things) right away is a non-issue. ... : Hrm, but basing private memory on top of a generic FD_VALID would effect= ively require : shared memory to use hva-based memslots for confidential VMs. That'd yi= eld a very : weird API, e.g. non-confidential VMs could be backed entirely by fd-base= d memslots, : but confidential VMs would be forced to use hva-based memslots. :=20 : Ignore this idea for now. If there's an actual use case for generic fd-= based memory : then we'll want a separate flag, fd, and offset, i.e. that support could= be added : independent of KVM_MEM_PRIVATE. ... : One alternative would be to call it KVM_MEM_PROTECTED. That shouldn't c= ause : problems for the known use of "private" (TDX and SNP), and it gives us a= little : wiggle room, e.g. if we ever get a use case where VMs can share memory t= hat is : otherwise protected. :=20 : That's a pretty big "if" though, and odds are good we'd need more memslo= t flags and : fd+offset pairs to allow differentiating "private" vs. "protected-shared= " without : forcing userspace to punch holes in memslots, so I don't know that hedgi= ng now will : buy us anything. :=20 : So I'd say that if people think KVM_MEM_PRIVATE brings additional and me= aningful : clarity over KVM_MEM_PROTECTECD, then lets go with PRIVATE. But if PROT= ECTED is : just as good, go with PROTECTED as it gives us a wee bit of wiggle room = for the : future. [*] https://lore.kernel.org/all/Yuh0ikhoh+tCK6VW@google.com =20 > > -See KVM_SET_USER_MEMORY_REGION. > > +A KVM_MEM_PRIVATE region _must_ have a valid guest_memfd (private memo= ry) and > > +userspace_addr (shared memory). However, "valid" for userspace_addr s= imply > > +means that the address itself must be a legal userspace address. The = backing > > +mapping for userspace_addr is not required to be valid/populated at th= e time of > > +KVM_SET_USER_MEMORY_REGION2, e.g. shared memory can be lazily mapped/a= llocated > > +on-demand. >=20 > Regarding requiring that a private region have both a valid > guest_memfd and a userspace_addr, should this be > implementation-specific? In pKVM at least, all regions for protected > VMs are private, and KVM doesn't care about the host userspace address > for those regions even when part of the memory is shared. Hmm, as of this patch, no, because the pKVM usage doesn't exist. E.g.=20 . Because this literally documents the current ABI. When > > +When mapping a gfn into the guest, KVM selects shared vs. private, i.e= consumes > > +userspace_addr vs. guest_memfd, based on the gfn's KVM_MEMORY_ATTRIBUT= E_PRIVATE > > +state. At VM creation time, all memory is shared, i.e. the PRIVATE at= tribute > > +is '0' for all gfns. Userspace can control whether memory is shared/p= rivate by > > +toggling KVM_MEMORY_ATTRIBUTE_PRIVATE via KVM_SET_MEMORY_ATTRIBUTES as= needed. >=20 > In pKVM, guest memory is private by default, and most of it will > remain so for the lifetime of the VM. Userspace could explicitly mark > all the guest's memory as private at initialization, but it would save > a slight amount of work. That said, I understand that it might be > better to be consistent across implementations. Yeah, we discussed this in v12[*]. The default really doesn't matter for m= emory overheads or performances once supports range-based xarray entries, and if = that isn't sufficient, KVM can internally invert the polarity of PRIVATE. But for the ABI, I think we put a stake in the ground and say that all memo= ry is shared by default. That way CoCo VMs and regular VMs (i.e VMs without the = concept of private memory) all have the same ABI. Practically speaking, the cost t= o pKVM (and likely every other CoCo VM type) is a single ioctl() during VM creatio= n to "convert" all memory to private. [*] https://lore.kernel.org/all/ZRw6X2BptZnRPNK7@google.com > > --- /dev/null > > +++ b/virt/kvm/guest_memfd.c > > @@ -0,0 +1,548 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +#include > > +#include > > +#include > > +#include > > +#include >=20 > nit: should this include be first (to maintain alphabetical ordering > of the includes)? Heh, yeah. I would argue this isn't a nit though ;-) > > +static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, lo= ff_t len) > > +{ > > + struct list_head *gmem_list =3D &inode->i_mapping->private_list= ; > > + pgoff_t start =3D offset >> PAGE_SHIFT; > > + pgoff_t end =3D (offset + len) >> PAGE_SHIFT; > > + struct kvm_gmem *gmem; > > + > > + /* > > + * Bindings must stable across invalidation to ensure the start= +end >=20 > nit: Bindings must _be/stay?_ stable "be" is what's intended. > ... >=20 > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 78a0b09ef2a5..5d1a2f1b4e94 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -798,7 +798,7 @@ void kvm_mmu_invalidate_range_add(struct kvm *kvm, = gfn_t start, gfn_t end) > > } > > } > > > > -static bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_ra= nge *range) > > +bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *ra= nge) > > { > > kvm_mmu_invalidate_range_add(kvm, range->start, range->end); > > return kvm_unmap_gfn_range(kvm, range); > > @@ -1034,6 +1034,9 @@ static void kvm_destroy_dirty_bitmap(struct kvm_m= emory_slot *memslot) > > /* This does not remove the slot from struct kvm_memslots data structu= res */ > > static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *= slot) > > { > > + if (slot->flags & KVM_MEM_PRIVATE) > > + kvm_gmem_unbind(slot); > > + >=20 > Should this be called after kvm_arch_free_memslot()? Arch-specific ode > might need some of the data before the unbinding, something I thought > might be necessary at one point for the pKVM port when deleting a > memslot, but realized later that kvm_invalidate_memslot() -> > kvm_arch_guest_memory_reclaimed() was the more logical place for it. > Also, since that seems to be the pattern for arch-specific handlers in > KVM. Maybe? But only if we can about symmetry between the allocation and free p= aths I really don't think kvm_arch_free_memslot() should be doing anything beyon= d a "pure" free. E.g. kvm_arch_free_memslot() is also called after moving a me= mslot, which hopefully we never actually have to allow for guest_memfd, but any co= de in kvm_arch_free_memslot() would bring about "what if" questions regarding mem= slot movement. I.e. the API is intended to be a "free arch metadata associated = with the memslot". Out of curiosity, what does pKVM need to do at kvm_arch_guest_memory_reclai= med()? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8E9F8C4332F for ; Tue, 31 Oct 2023 22:13:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=kAoQ76V5ViZ4XSjRW46id2/K2uoII2cw0pcPa+1RaBY=; b=MbXuRe+UkOWLvYxbSfKHiVF8KJ BMXGZtddOXVNt8zT932YirHudN0TSTy9zX7g1d92Fv6swphQhQ4adVc1QqkyvTbAbgoM4YS++GVAK SdooieecIMoeKlnKRPPBOBR8C33hajUFYnCbFjyA/pMYNqlJyPXnVWDUPdOW319zgdvAfeWhjyesT gWAWxGeGr10n36y2WigRJ8NHYBY56Bvtn03A0jutokwMn9yKmYnICmxue49s8e2YFZK7R6UAKGOIz G7jQ/KKNe0rpQBjs5ZbwDfCTaZ73J+sGtbzD3yKW0abdY+NLxvAacXnqktE3E76+s2/5Ot1cwwIF5 mZB5afKw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qxwzL-006D5E-11; Tue, 31 Oct 2023 22:13:31 +0000 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qxwzH-006D3s-1t for linux-riscv@lists.infradead.org; Tue, 31 Oct 2023 22:13:29 +0000 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d99ec34829aso5939357276.1 for ; Tue, 31 Oct 2023 15:13:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698790405; x=1699395205; darn=lists.infradead.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=zs8biX5N196Q/2fqRFnHuCbiMBjZx3eWIMZhT9G5lkM=; b=u6N15J7ExppM0L24ku08thKF0urKpg55RLqbIw3D7pg5ivLfxql17sUdwsEN+7L26y sKi4iWz1tWD80ouKwHoGWMRvd79BgYXQbHoKGiuDZRilTh1Pvi5WfLhvBs/jqMIXgG0V vFUosB2OgYUCvqi5tbwT/SQ/R34hX9tEpueF3pm7neP/CBgd9OQdz2e6MjzX+f6RsUsV gAo54Tltf4h1+9Xg9NYEITsZr9vVdCoS1wQdqDx3jTvUccRMWQOBsFrXn8WQFNqllYeN G07MOOm8A+t4eAlN8yFx9kg9I+qSvapD42U5dgNN2sdDfmu/ngpQk5oEaFLftiMUIwqo +yhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698790405; x=1699395205; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=zs8biX5N196Q/2fqRFnHuCbiMBjZx3eWIMZhT9G5lkM=; b=Pn5lSPjhS2M+hXIjGtRw0RsrjeEXOqiSOb2oWkaoNOHJ5nAMYTQFJRbjaCTU4w9QBB 0kd6z5jRCZ014f9BOkL9MtpELcOPIBp1tAmBNQZCbAaSlA9l9DWjWsSj3m+iDcrDTwY3 O1ThWxxZUkGhTHWMehkPspE31BlDuPsWcuDTa0soPf42DA8r2gdLTmabJHQwiE9XZCe5 fwIMOh8LNOWwGMknU7Sy2TJ03l5gIDCTS9jBJ/nF8PCziemX8fRk7d5DMSSMTaRW1D/1 DV0qxFajnNKk1ezhMpolEW0RT6AoBpzy9h4pHaV76E1B0UQId9X1tn9aSwr+mIbRK4Z1 aBJw== X-Gm-Message-State: AOJu0Yw4mMdFq+QU6izjTumvh4pE9iM58ynpwZi4wkXp7JvsKJk32s3B l0h95F7tN/P8e82ESD0ZjVzRzCZnopU= X-Google-Smtp-Source: AGHT+IFQSb3cO99m4kmVkxpYDSR1DbV1n07kMYxpUGk3r7RnUdGfKSLZ2dTeiYGKN4sKIS0uvhtR25tzD1o= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1746:b0:d9a:59cb:8bed with SMTP id bz6-20020a056902174600b00d9a59cb8bedmr238072ybb.5.1698790405257; Tue, 31 Oct 2023 15:13:25 -0700 (PDT) Date: Tue, 31 Oct 2023 15:13:23 -0700 In-Reply-To: Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-17-seanjc@google.com> Message-ID: Subject: Re: [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Fuad Tabba Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiaoyao Li , Xu Yilun , Chao Peng , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?=" , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231031_151327_645899_8E202AF3 X-CRM114-Status: GOOD ( 45.92 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org T24gVHVlLCBPY3QgMzEsIDIwMjMsIEZ1YWQgVGFiYmEgd3JvdGU6Cj4gSGksCj4gCj4gT24gRnJp LCBPY3QgMjcsIDIwMjMgYXQgNzoyM+KAr1BNIFNlYW4gQ2hyaXN0b3BoZXJzb24gPHNlYW5qY0Bn b29nbGUuY29tPiB3cm90ZToKPiAKPiAuLi4KPiAKPiA+IGRpZmYgLS1naXQgYS9Eb2N1bWVudGF0 aW9uL3ZpcnQva3ZtL2FwaS5yc3QgYi9Eb2N1bWVudGF0aW9uL3ZpcnQva3ZtL2FwaS5yc3QKPiA+ IGluZGV4IGUyMjUyYzc0OGZkNi4uZTgyYzY5ZDVlNzU1IDEwMDY0NAo+ID4gLS0tIGEvRG9jdW1l bnRhdGlvbi92aXJ0L2t2bS9hcGkucnN0Cj4gPiArKysgYi9Eb2N1bWVudGF0aW9uL3ZpcnQva3Zt L2FwaS5yc3QKPiA+IEBAIC02MDc5LDYgKzYwNzksMTUgQEAgYXBwbGllZC4KPiA+ICA6UGFyYW1l dGVyczogc3RydWN0IGt2bV91c2Vyc3BhY2VfbWVtb3J5X3JlZ2lvbjIgKGluKQo+ID4gIDpSZXR1 cm5zOiAwIG9uIHN1Y2Nlc3MsIC0xIG9uIGVycm9yCj4gPgo+ID4gK0tWTV9TRVRfVVNFUl9NRU1P UllfUkVHSU9OMiBpcyBhbiBleHRlbnNpb24gdG8gS1ZNX1NFVF9VU0VSX01FTU9SWV9SRUdJT04g dGhhdAo+ID4gK2FsbG93cyBtYXBwaW5nIGd1ZXN0X21lbWZkIG1lbW9yeSBpbnRvIGEgZ3Vlc3Qu ICBBbGwgZmllbGRzIHNoYXJlZCB3aXRoCj4gPiArS1ZNX1NFVF9VU0VSX01FTU9SWV9SRUdJT04g aWRlbnRpY2FsbHkuICBVc2Vyc3BhY2UgY2FuIHNldCBLVk1fTUVNX1BSSVZBVEUgaW4KPiA+ICtm bGFncyB0byBoYXZlIEtWTSBiaW5kIHRoZSBtZW1vcnkgcmVnaW9uIHRvIGEgZ2l2ZW4gZ3Vlc3Rf bWVtZmQgcmFuZ2Ugb2YKPiA+ICtbZ3Vlc3RfbWVtZmRfb2Zmc2V0LCBndWVzdF9tZW1mZF9vZmZz ZXQgKyBtZW1vcnlfc2l6ZV0uICBUaGUgdGFyZ2V0IGd1ZXN0X21lbWZkCj4gPiArbXVzdCBwb2lu dCBhdCBhIGZpbGUgY3JlYXRlZCB2aWEgS1ZNX0NSRUFURV9HVUVTVF9NRU1GRCBvbiB0aGUgY3Vy cmVudCBWTSwgYW5kCj4gPiArdGhlIHRhcmdldCByYW5nZSBtdXN0IG5vdCBiZSBib3VuZCB0byBh bnkgb3RoZXIgbWVtb3J5IHJlZ2lvbi4gIEFsbCBzdGFuZGFyZAo+ID4gK2JvdW5kcyBjaGVja3Mg YXBwbHkgKHVzZSBjb21tb24gc2Vuc2UpLgo+ID4gKwo+IAo+IEJpa2VzaGVkZGluZyBoZXJlOiBO b3Qgc3VyZSBpZiBLVk1fTUVNX1BSSVZBVEUgaXMgdGhlIGJlc3QgbmFtZSBmb3IKPiB0aGlzLiBJ dCBnZXRzIGNvbmZ1c2luZyB3aXRoIEtWTV9NRU1PUllfQVRUUklCVVRFX1BSSVZBVEUsIGkuZS4s IHRoYXQKPiBhIHJlZ2lvbiBtYXJrZWQgYXMgS1ZNX01FTV9QUklWQVRFIGlzIG9ubHkgcG90ZW50 aWFsbHkgcHJpdmF0ZS4gSXQgZGlkCj4gY29uZnVzZSB0aGUgcmVzdCBvZiB0aGUgdGVhbSB3aGVu IEkgd2Fsa2VkIHRoZW0gdGhyb3VnaCBhIHByZXZpb3VzCj4gdmVyc2lvbiBvZiB0aGlzIGNvZGUg b25jZS4gV291bGQgc29tZXRoaW5nIGxpa2UgS1ZNX01FTV9HVUVTVE1FTSBtYWtlCj4gbW9yZSBz ZW5zZT8KCkhlaCwgZGVqYSB2dS4gIFdlIGRpc2N1c3NlZCB0aGlzIGJhY2sgaW4gdjdbKl0sIGFu ZCBJIGNhbWUgdG8gdGhlIGNvbmNsdXNpb24gdGhhdApjaG9vc2luZyBhIG5hbWUgdGhhdCB3YXNu J3QgZXhwbGljaXRseSB0aWVkIHRvIHByaXZhdGUgbWVtb3J5IHdhc24ndCBqdXN0aWZpZWQuCkJ1 dCB0aGF0IHdhcyBiZWZvcmUgYSBLVk0tb3duZWQgZ3Vlc3RfbWVtZmQgd2FzIGV2ZW4gYW4gaWRl YSwgYW5kIHRodXMgYmVmb3JlIHdlCmhhZCBhbnl0aGluZyBjbG9zZSB0byBhIHJlYWwgdXNlIGNh c2UuCgpTaW5jZSB3ZSBub3cga25vdyB0aGF0IGF0IGxlYXN0IHBLVk0gd2lsbCB1c2UgZ3Vlc3Rf bWVtZmQgZm9yIHNoYXJlZCBtZW1vcnksIGFuZApvZGRzIGFyZSBxdWl0ZSBnb29kIHRoYXQgInJl Z3VsYXIiIFZNcyB3aWxsIGFsc28gZG8gdGhlIHNhbWUsIGkuZS4gd2lsbCB3YW50Cmd1ZXN0X21l bWZkIHdpdGggdGhlIGNvbmNlcHQgb2YgcHJpdmF0ZSBtZW1vcnksIEkgYWdyZWUgdGhhdCB3ZSBz aG91bGQgYXZvaWQKUFJJVkFURS4KClRob3VnaCBJIHZvdGUgZm9yIEtWTV9NRU1fR1VFU1RfTUVN RkQgKG9yIEtWTV9NRU1fR1VFU1RfTUVNRkRfVkFMSUQgb3IKS1ZNX01FTV9VU0VfR1VFU1RfTUVN RkQpLiAgSS5lLiBkbyBvdXIgYmVzdCB0byBhdm9pZCBhbWJpZ3VpdHkgYmV0d2VlbiByZWZlcnJp bmcKdG8gImd1ZXN0IG1lbW9yeSIgYXQtbGFyZ2UgYW5kIGd1ZXN0X21lbWZkLgoKQ29weWluZyBh IGZldyByZWxldmFudCBwb2ludHMgZnJvbSB2NyB0byBzYXZlIGEgY2xpY2sgb3IgdGhyZWUuCgog OiBJIGRvbid0IGhhdmUgYSBjb25jcmV0ZSB1c2UgY2FzZSAodGhpcyBpcyBhIHJlY2VudCBpZGVh IG9uIG15IGVuZCksIGJ1dCBzaW5jZSB3ZSdyZQogOiBhbHJlYWR5IGFkZGluZyBmZC1iYXNlZCBt ZW1vcnksIEkgY2FuJ3QgdGhpbmsgb2YgYSBnb29kIHJlYXNvbiBub3QgbWFrZSBpdCBtb3JlIGdl bmVyaWMKIDogZm9yIG5vdCBtdWNoIGV4dHJhIGNvc3QuICBBbmQgdGhlcmUgYXJlIGRlZmluaXRl bHkgY2xhc3NlcyBvZiBWTXMgZm9yIHdoaWNoIGZkLWJhc2VkCiA6IG1lbW9yeSB3b3VsZCBKdXN0 IFdvcmssIGUuZy4gbGFyZ2UgVk1zIHRoYXQgYXJlIG5ldmVyIG92ZXJzdWJzY3JpYmVkIG9uIG1l bW9yeSBkb24ndAogOiBuZWVkIHRvIHN1cHBvcnQgcmVjbGFpbSwgc28gdGhlIGZhY3QgdGhhdCBm ZC1iYXNlZCBtZW1zbG90cyB3b24ndCBzdXBwb3J0IHBhZ2UgYWdpbmcKIDogKGFtb25nIG90aGVy IHRoaW5ncykgcmlnaHQgYXdheSBpcyBhIG5vbi1pc3N1ZS4KCi4uLgoKIDogSHJtLCBidXQgYmFz aW5nIHByaXZhdGUgbWVtb3J5IG9uIHRvcCBvZiBhIGdlbmVyaWMgRkRfVkFMSUQgd291bGQgZWZm ZWN0aXZlbHkgcmVxdWlyZQogOiBzaGFyZWQgbWVtb3J5IHRvIHVzZSBodmEtYmFzZWQgbWVtc2xv dHMgZm9yIGNvbmZpZGVudGlhbCBWTXMuICBUaGF0J2QgeWllbGQgYSB2ZXJ5CiA6IHdlaXJkIEFQ SSwgZS5nLiBub24tY29uZmlkZW50aWFsIFZNcyBjb3VsZCBiZSBiYWNrZWQgZW50aXJlbHkgYnkg ZmQtYmFzZWQgbWVtc2xvdHMsCiA6IGJ1dCBjb25maWRlbnRpYWwgVk1zIHdvdWxkIGJlIGZvcmNl ZCB0byB1c2UgaHZhLWJhc2VkIG1lbXNsb3RzLgogOiAKIDogSWdub3JlIHRoaXMgaWRlYSBmb3Ig bm93LiAgSWYgdGhlcmUncyBhbiBhY3R1YWwgdXNlIGNhc2UgZm9yIGdlbmVyaWMgZmQtYmFzZWQg bWVtb3J5CiA6IHRoZW4gd2UnbGwgd2FudCBhIHNlcGFyYXRlIGZsYWcsIGZkLCBhbmQgb2Zmc2V0 LCBpLmUuIHRoYXQgc3VwcG9ydCBjb3VsZCBiZSBhZGRlZAogOiBpbmRlcGVuZGVudCBvZiBLVk1f TUVNX1BSSVZBVEUuCgouLi4KCiA6IE9uZSBhbHRlcm5hdGl2ZSB3b3VsZCBiZSB0byBjYWxsIGl0 IEtWTV9NRU1fUFJPVEVDVEVELiAgVGhhdCBzaG91bGRuJ3QgY2F1c2UKIDogcHJvYmxlbXMgZm9y IHRoZSBrbm93biB1c2Ugb2YgInByaXZhdGUiIChURFggYW5kIFNOUCksIGFuZCBpdCBnaXZlcyB1 cyBhIGxpdHRsZQogOiB3aWdnbGUgcm9vbSwgZS5nLiBpZiB3ZSBldmVyIGdldCBhIHVzZSBjYXNl IHdoZXJlIFZNcyBjYW4gc2hhcmUgbWVtb3J5IHRoYXQgaXMKIDogb3RoZXJ3aXNlIHByb3RlY3Rl ZC4KIDogCiA6IFRoYXQncyBhIHByZXR0eSBiaWcgImlmIiB0aG91Z2gsIGFuZCBvZGRzIGFyZSBn b29kIHdlJ2QgbmVlZCBtb3JlIG1lbXNsb3QgZmxhZ3MgYW5kCiA6IGZkK29mZnNldCBwYWlycyB0 byBhbGxvdyBkaWZmZXJlbnRpYXRpbmcgInByaXZhdGUiIHZzLiAicHJvdGVjdGVkLXNoYXJlZCIg d2l0aG91dAogOiBmb3JjaW5nIHVzZXJzcGFjZSB0byBwdW5jaCBob2xlcyBpbiBtZW1zbG90cywg c28gSSBkb24ndCBrbm93IHRoYXQgaGVkZ2luZyBub3cgd2lsbAogOiBidXkgdXMgYW55dGhpbmcu CiA6IAogOiBTbyBJJ2Qgc2F5IHRoYXQgaWYgcGVvcGxlIHRoaW5rIEtWTV9NRU1fUFJJVkFURSBi cmluZ3MgYWRkaXRpb25hbCBhbmQgbWVhbmluZ2Z1bAogOiBjbGFyaXR5IG92ZXIgS1ZNX01FTV9Q Uk9URUNURUNELCB0aGVuIGxldHMgZ28gd2l0aCBQUklWQVRFLiAgQnV0IGlmIFBST1RFQ1RFRCBp cwogOiBqdXN0IGFzIGdvb2QsIGdvIHdpdGggUFJPVEVDVEVEIGFzIGl0IGdpdmVzIHVzIGEgd2Vl IGJpdCBvZiB3aWdnbGUgcm9vbSBmb3IgdGhlCiA6IGZ1dHVyZS4KClsqXSBodHRwczovL2xvcmUu a2VybmVsLm9yZy9hbGwvWXVoMGlraG9oK3RDSzZWV0Bnb29nbGUuY29tCiAKPiA+IC1TZWUgS1ZN X1NFVF9VU0VSX01FTU9SWV9SRUdJT04uCj4gPiArQSBLVk1fTUVNX1BSSVZBVEUgcmVnaW9uIF9t dXN0XyBoYXZlIGEgdmFsaWQgZ3Vlc3RfbWVtZmQgKHByaXZhdGUgbWVtb3J5KSBhbmQKPiA+ICt1 c2Vyc3BhY2VfYWRkciAoc2hhcmVkIG1lbW9yeSkuICBIb3dldmVyLCAidmFsaWQiIGZvciB1c2Vy c3BhY2VfYWRkciBzaW1wbHkKPiA+ICttZWFucyB0aGF0IHRoZSBhZGRyZXNzIGl0c2VsZiBtdXN0 IGJlIGEgbGVnYWwgdXNlcnNwYWNlIGFkZHJlc3MuICBUaGUgYmFja2luZwo+ID4gK21hcHBpbmcg Zm9yIHVzZXJzcGFjZV9hZGRyIGlzIG5vdCByZXF1aXJlZCB0byBiZSB2YWxpZC9wb3B1bGF0ZWQg YXQgdGhlIHRpbWUgb2YKPiA+ICtLVk1fU0VUX1VTRVJfTUVNT1JZX1JFR0lPTjIsIGUuZy4gc2hh cmVkIG1lbW9yeSBjYW4gYmUgbGF6aWx5IG1hcHBlZC9hbGxvY2F0ZWQKPiA+ICtvbi1kZW1hbmQu Cj4gCj4gUmVnYXJkaW5nIHJlcXVpcmluZyB0aGF0IGEgcHJpdmF0ZSByZWdpb24gaGF2ZSBib3Ro IGEgdmFsaWQKPiBndWVzdF9tZW1mZCBhbmQgYSB1c2Vyc3BhY2VfYWRkciwgc2hvdWxkIHRoaXMg YmUKPiBpbXBsZW1lbnRhdGlvbi1zcGVjaWZpYz8gSW4gcEtWTSBhdCBsZWFzdCwgYWxsIHJlZ2lv bnMgZm9yIHByb3RlY3RlZAo+IFZNcyBhcmUgcHJpdmF0ZSwgYW5kIEtWTSBkb2Vzbid0IGNhcmUg YWJvdXQgdGhlIGhvc3QgdXNlcnNwYWNlIGFkZHJlc3MKPiBmb3IgdGhvc2UgcmVnaW9ucyBldmVu IHdoZW4gcGFydCBvZiB0aGUgbWVtb3J5IGlzIHNoYXJlZC4KCkhtbSwgYXMgb2YgdGhpcyBwYXRj aCwgbm8sIGJlY2F1c2UgdGhlIHBLVk0gdXNhZ2UgZG9lc24ndCBleGlzdC4gIEUuZy4gCgouICBC ZWNhdXNlIHRoaXMgbGl0ZXJhbGx5IGRvY3VtZW50cyB0aGUgY3VycmVudCBBQkkuICBXaGVuCgo+ ID4gK1doZW4gbWFwcGluZyBhIGdmbiBpbnRvIHRoZSBndWVzdCwgS1ZNIHNlbGVjdHMgc2hhcmVk IHZzLiBwcml2YXRlLCBpLmUgY29uc3VtZXMKPiA+ICt1c2Vyc3BhY2VfYWRkciB2cy4gZ3Vlc3Rf bWVtZmQsIGJhc2VkIG9uIHRoZSBnZm4ncyBLVk1fTUVNT1JZX0FUVFJJQlVURV9QUklWQVRFCj4g PiArc3RhdGUuICBBdCBWTSBjcmVhdGlvbiB0aW1lLCBhbGwgbWVtb3J5IGlzIHNoYXJlZCwgaS5l LiB0aGUgUFJJVkFURSBhdHRyaWJ1dGUKPiA+ICtpcyAnMCcgZm9yIGFsbCBnZm5zLiAgVXNlcnNw YWNlIGNhbiBjb250cm9sIHdoZXRoZXIgbWVtb3J5IGlzIHNoYXJlZC9wcml2YXRlIGJ5Cj4gPiAr dG9nZ2xpbmcgS1ZNX01FTU9SWV9BVFRSSUJVVEVfUFJJVkFURSB2aWEgS1ZNX1NFVF9NRU1PUllf QVRUUklCVVRFUyBhcyBuZWVkZWQuCj4gCj4gSW4gcEtWTSwgZ3Vlc3QgbWVtb3J5IGlzIHByaXZh dGUgYnkgZGVmYXVsdCwgYW5kIG1vc3Qgb2YgaXQgd2lsbAo+IHJlbWFpbiBzbyBmb3IgdGhlIGxp ZmV0aW1lIG9mIHRoZSBWTS4gVXNlcnNwYWNlIGNvdWxkIGV4cGxpY2l0bHkgbWFyawo+IGFsbCB0 aGUgZ3Vlc3QncyBtZW1vcnkgYXMgcHJpdmF0ZSBhdCBpbml0aWFsaXphdGlvbiwgYnV0IGl0IHdv dWxkIHNhdmUKPiBhIHNsaWdodCBhbW91bnQgb2Ygd29yay4gVGhhdCBzYWlkLCBJIHVuZGVyc3Rh bmQgdGhhdCBpdCBtaWdodCBiZQo+IGJldHRlciB0byBiZSBjb25zaXN0ZW50IGFjcm9zcyBpbXBs ZW1lbnRhdGlvbnMuCgpZZWFoLCB3ZSBkaXNjdXNzZWQgdGhpcyBpbiB2MTJbKl0uICBUaGUgZGVm YXVsdCByZWFsbHkgZG9lc24ndCBtYXR0ZXIgZm9yIG1lbW9yeQpvdmVyaGVhZHMgb3IgcGVyZm9y bWFuY2VzIG9uY2Ugc3VwcG9ydHMgcmFuZ2UtYmFzZWQgeGFycmF5IGVudHJpZXMsIGFuZCBpZiB0 aGF0Cmlzbid0IHN1ZmZpY2llbnQsIEtWTSBjYW4gaW50ZXJuYWxseSBpbnZlcnQgdGhlIHBvbGFy aXR5IG9mIFBSSVZBVEUuCgpCdXQgZm9yIHRoZSBBQkksIEkgdGhpbmsgd2UgcHV0IGEgc3Rha2Ug aW4gdGhlIGdyb3VuZCBhbmQgc2F5IHRoYXQgYWxsIG1lbW9yeSBpcwpzaGFyZWQgYnkgZGVmYXVs dC4gIFRoYXQgd2F5IENvQ28gVk1zIGFuZCByZWd1bGFyIFZNcyAoaS5lIFZNcyB3aXRob3V0IHRo ZSBjb25jZXB0Cm9mIHByaXZhdGUgbWVtb3J5KSBhbGwgaGF2ZSB0aGUgc2FtZSBBQkkuICBQcmFj dGljYWxseSBzcGVha2luZywgdGhlIGNvc3QgdG8gcEtWTQooYW5kIGxpa2VseSBldmVyeSBvdGhl ciBDb0NvIFZNIHR5cGUpIGlzIGEgc2luZ2xlIGlvY3RsKCkgZHVyaW5nIFZNIGNyZWF0aW9uIHRv CiJjb252ZXJ0IiBhbGwgbWVtb3J5IHRvIHByaXZhdGUuCgpbKl0gaHR0cHM6Ly9sb3JlLmtlcm5l bC5vcmcvYWxsL1pSdzZYMkJwdFpuUlBOSzdAZ29vZ2xlLmNvbQoKPiA+IC0tLSAvZGV2L251bGwK PiA+ICsrKyBiL3ZpcnQva3ZtL2d1ZXN0X21lbWZkLmMKPiA+IEBAIC0wLDAgKzEsNTQ4IEBACj4g PiArLy8gU1BEWC1MaWNlbnNlLUlkZW50aWZpZXI6IEdQTC0yLjAKPiA+ICsjaW5jbHVkZSA8bGlu dXgvYmFja2luZy1kZXYuaD4KPiA+ICsjaW5jbHVkZSA8bGludXgvZmFsbG9jLmg+Cj4gPiArI2lu Y2x1ZGUgPGxpbnV4L2t2bV9ob3N0Lmg+Cj4gPiArI2luY2x1ZGUgPGxpbnV4L3BhZ2VtYXAuaD4K PiA+ICsjaW5jbHVkZSA8bGludXgvYW5vbl9pbm9kZXMuaD4KPiAKPiBuaXQ6IHNob3VsZCB0aGlz IGluY2x1ZGUgYmUgZmlyc3QgKHRvIG1haW50YWluIGFscGhhYmV0aWNhbCBvcmRlcmluZwo+IG9m IHRoZSBpbmNsdWRlcyk/CgpIZWgsIHllYWguICBJIHdvdWxkIGFyZ3VlIHRoaXMgaXNuJ3QgYSBu aXQgdGhvdWdoIDstKQoKPiA+ICtzdGF0aWMgbG9uZyBrdm1fZ21lbV9wdW5jaF9ob2xlKHN0cnVj dCBpbm9kZSAqaW5vZGUsIGxvZmZfdCBvZmZzZXQsIGxvZmZfdCBsZW4pCj4gPiArewo+ID4gKyAg ICAgICBzdHJ1Y3QgbGlzdF9oZWFkICpnbWVtX2xpc3QgPSAmaW5vZGUtPmlfbWFwcGluZy0+cHJp dmF0ZV9saXN0Owo+ID4gKyAgICAgICBwZ29mZl90IHN0YXJ0ID0gb2Zmc2V0ID4+IFBBR0VfU0hJ RlQ7Cj4gPiArICAgICAgIHBnb2ZmX3QgZW5kID0gKG9mZnNldCArIGxlbikgPj4gUEFHRV9TSElG VDsKPiA+ICsgICAgICAgc3RydWN0IGt2bV9nbWVtICpnbWVtOwo+ID4gKwo+ID4gKyAgICAgICAv Kgo+ID4gKyAgICAgICAgKiBCaW5kaW5ncyBtdXN0IHN0YWJsZSBhY3Jvc3MgaW52YWxpZGF0aW9u IHRvIGVuc3VyZSB0aGUgc3RhcnQrZW5kCj4gCj4gbml0OiBCaW5kaW5ncyBtdXN0IF9iZS9zdGF5 P18gc3RhYmxlCgoiYmUiIGlzIHdoYXQncyBpbnRlbmRlZC4KCj4gLi4uCj4gCj4gPiBkaWZmIC0t Z2l0IGEvdmlydC9rdm0va3ZtX21haW4uYyBiL3ZpcnQva3ZtL2t2bV9tYWluLmMKPiA+IGluZGV4 IDc4YTBiMDllZjJhNS4uNWQxYTJmMWI0ZTk0IDEwMDY0NAo+ID4gLS0tIGEvdmlydC9rdm0va3Zt X21haW4uYwo+ID4gKysrIGIvdmlydC9rdm0va3ZtX21haW4uYwo+ID4gQEAgLTc5OCw3ICs3OTgs NyBAQCB2b2lkIGt2bV9tbXVfaW52YWxpZGF0ZV9yYW5nZV9hZGQoc3RydWN0IGt2bSAqa3ZtLCBn Zm5fdCBzdGFydCwgZ2ZuX3QgZW5kKQo+ID4gICAgICAgICB9Cj4gPiAgfQo+ID4KPiA+IC1zdGF0 aWMgYm9vbCBrdm1fbW11X3VubWFwX2dmbl9yYW5nZShzdHJ1Y3Qga3ZtICprdm0sIHN0cnVjdCBr dm1fZ2ZuX3JhbmdlICpyYW5nZSkKPiA+ICtib29sIGt2bV9tbXVfdW5tYXBfZ2ZuX3JhbmdlKHN0 cnVjdCBrdm0gKmt2bSwgc3RydWN0IGt2bV9nZm5fcmFuZ2UgKnJhbmdlKQo+ID4gIHsKPiA+ICAg ICAgICAga3ZtX21tdV9pbnZhbGlkYXRlX3JhbmdlX2FkZChrdm0sIHJhbmdlLT5zdGFydCwgcmFu Z2UtPmVuZCk7Cj4gPiAgICAgICAgIHJldHVybiBrdm1fdW5tYXBfZ2ZuX3JhbmdlKGt2bSwgcmFu Z2UpOwo+ID4gQEAgLTEwMzQsNiArMTAzNCw5IEBAIHN0YXRpYyB2b2lkIGt2bV9kZXN0cm95X2Rp cnR5X2JpdG1hcChzdHJ1Y3Qga3ZtX21lbW9yeV9zbG90ICptZW1zbG90KQo+ID4gIC8qIFRoaXMg ZG9lcyBub3QgcmVtb3ZlIHRoZSBzbG90IGZyb20gc3RydWN0IGt2bV9tZW1zbG90cyBkYXRhIHN0 cnVjdHVyZXMgKi8KPiA+ICBzdGF0aWMgdm9pZCBrdm1fZnJlZV9tZW1zbG90KHN0cnVjdCBrdm0g Kmt2bSwgc3RydWN0IGt2bV9tZW1vcnlfc2xvdCAqc2xvdCkKPiA+ICB7Cj4gPiArICAgICAgIGlm IChzbG90LT5mbGFncyAmIEtWTV9NRU1fUFJJVkFURSkKPiA+ICsgICAgICAgICAgICAgICBrdm1f Z21lbV91bmJpbmQoc2xvdCk7Cj4gPiArCj4gCj4gU2hvdWxkIHRoaXMgYmUgY2FsbGVkIGFmdGVy IGt2bV9hcmNoX2ZyZWVfbWVtc2xvdCgpPyBBcmNoLXNwZWNpZmljIG9kZQo+IG1pZ2h0IG5lZWQg c29tZSBvZiB0aGUgZGF0YSBiZWZvcmUgdGhlIHVuYmluZGluZywgc29tZXRoaW5nIEkgdGhvdWdo dAo+IG1pZ2h0IGJlIG5lY2Vzc2FyeSBhdCBvbmUgcG9pbnQgZm9yIHRoZSBwS1ZNIHBvcnQgd2hl biBkZWxldGluZyBhCj4gbWVtc2xvdCwgYnV0IHJlYWxpemVkIGxhdGVyIHRoYXQga3ZtX2ludmFs aWRhdGVfbWVtc2xvdCgpIC0+Cj4ga3ZtX2FyY2hfZ3Vlc3RfbWVtb3J5X3JlY2xhaW1lZCgpIHdh cyB0aGUgbW9yZSBsb2dpY2FsIHBsYWNlIGZvciBpdC4KPiBBbHNvLCBzaW5jZSB0aGF0IHNlZW1z IHRvIGJlIHRoZSBwYXR0ZXJuIGZvciBhcmNoLXNwZWNpZmljIGhhbmRsZXJzIGluCj4gS1ZNLgoK TWF5YmU/ICBCdXQgb25seSBpZiB3ZSBjYW4gYWJvdXQgc3ltbWV0cnkgYmV0d2VlbiB0aGUgYWxs b2NhdGlvbiBhbmQgZnJlZSBwYXRocwpJIHJlYWxseSBkb24ndCB0aGluayBrdm1fYXJjaF9mcmVl X21lbXNsb3QoKSBzaG91bGQgYmUgZG9pbmcgYW55dGhpbmcgYmV5b25kIGEKInB1cmUiIGZyZWUu ICBFLmcuIGt2bV9hcmNoX2ZyZWVfbWVtc2xvdCgpIGlzIGFsc28gY2FsbGVkIGFmdGVyIG1vdmlu ZyBhIG1lbXNsb3QsCndoaWNoIGhvcGVmdWxseSB3ZSBuZXZlciBhY3R1YWxseSBoYXZlIHRvIGFs bG93IGZvciBndWVzdF9tZW1mZCwgYnV0IGFueSBjb2RlIGluCmt2bV9hcmNoX2ZyZWVfbWVtc2xv dCgpIHdvdWxkIGJyaW5nIGFib3V0ICJ3aGF0IGlmIiBxdWVzdGlvbnMgcmVnYXJkaW5nIG1lbXNs b3QKbW92ZW1lbnQuICBJLmUuIHRoZSBBUEkgaXMgaW50ZW5kZWQgdG8gYmUgYSAiZnJlZSBhcmNo IG1ldGFkYXRhIGFzc29jaWF0ZWQgd2l0aAp0aGUgbWVtc2xvdCIuCgpPdXQgb2YgY3VyaW9zaXR5 LCB3aGF0IGRvZXMgcEtWTSBuZWVkIHRvIGRvIGF0IGt2bV9hcmNoX2d1ZXN0X21lbW9yeV9yZWNs YWltZWQoKT8KCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f CmxpbnV4LXJpc2N2IG1haWxpbmcgbGlzdApsaW51eC1yaXNjdkBsaXN0cy5pbmZyYWRlYWQub3Jn Cmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3JnL21haWxtYW4vbGlzdGluZm8vbGludXgtcmlzY3YK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5702FC4332F for ; Tue, 31 Oct 2023 22:14:20 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=dGdpMxr6; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4SKkvH07BQz3cVD for ; Wed, 1 Nov 2023 09:14:19 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=dGdpMxr6; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=flex--seanjc.bounces.google.com (client-ip=2607:f8b0:4864:20::b4a; helo=mail-yb1-xb4a.google.com; envelope-from=3bxxbzqykddqiuqdzsweewbu.secbydknffs-tulbyiji.epbqri.ehw@flex--seanjc.bounces.google.com; receiver=lists.ozlabs.org) Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4SKktL4Lwpz3bgs for ; Wed, 1 Nov 2023 09:13:28 +1100 (AEDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d99ec34829aso5939359276.1 for ; Tue, 31 Oct 2023 15:13:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698790405; x=1699395205; darn=lists.ozlabs.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=zs8biX5N196Q/2fqRFnHuCbiMBjZx3eWIMZhT9G5lkM=; b=dGdpMxr6hcFQi9/W2c6avefwcWzt3zechUGTcbzfPEbg9aQfbWXJ+P3If45mhiVIpq XjdAh+s96L9hZStMosp9HzJPzvlEbGL0hgxiDTIBNIOPRBx1XcomYQ3IS6At92VOsWcq VWz+Z3Vf4IOB+u8UYYG0MnPar1LLtIFuyu5TkzxUDlGQi3HFKY2rcgIH/ISJ8eizEfCW 4Qy/TAWiZBK7pV30FOnJRiWENU1+HOe253zwOwCf/nc03RDi6wxWmJNz/iwh0vEIecWl LI0m9BFtlXG+cqi4Rmw6OmqpzivUvzkWoaVAhsnpn85rjgQG7OG2u+MdYyw11m4BaVnI 12vQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698790405; x=1699395205; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=zs8biX5N196Q/2fqRFnHuCbiMBjZx3eWIMZhT9G5lkM=; b=j4BOa6ZEHZQaGwn3UbKqmRXmuFG7v4HEzV0lyDiMGRSJg24HdA80AgjnzDqoW7XT8L S0UIpONrd3WECeHHFAZqB8M3eN6/IfM7nH+Y0S85iNBDMPIHb2qg73GUbpGnteQGQa8d eOounFhZvHgR9/FWhV06YFbMrZlJmzOPeQl3yqWmcp7SieT+xEchkCEX0mIHkOL+0MYQ L905DbzFnneSgKfjbrsqWPTrZzmPpBuUl/5/KinmPGwA0/EIpDZPSvq6zDcKiwC+uuSI /OpvRCRD4ZPSe7Kv53+fT4EN658bACilRjOEVl8s0Ux28cdb3Red4KHwSfWGsJbtLEkh 4v0Q== X-Gm-Message-State: AOJu0YyNW1zitwoJmS/+AuRzVzwQvybvbaHqRPtCDk/wooMiDGudzwtG uXW1IWIue2bcXehgBM0QV/LQb2icNqA= X-Google-Smtp-Source: AGHT+IFQSb3cO99m4kmVkxpYDSR1DbV1n07kMYxpUGk3r7RnUdGfKSLZ2dTeiYGKN4sKIS0uvhtR25tzD1o= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1746:b0:d9a:59cb:8bed with SMTP id bz6-20020a056902174600b00d9a59cb8bedmr238072ybb.5.1698790405257; Tue, 31 Oct 2023 15:13:25 -0700 (PDT) Date: Tue, 31 Oct 2023 15:13:23 -0700 In-Reply-To: Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-17-seanjc@google.com> Message-ID: Subject: Re: [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Fuad Tabba Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kvm@vger.kernel.org, David Hildenbrand , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chao Peng , linux-riscv@lists.infradead.org, Isaku Yamahata , Marc Zyngier , Huacai Chen , Xiaoyao Li , "Matthew Wilcox \(Oracle\)" , Wang , Vlastimil Babka , Yu Zhang , Maciej Szmigiero , Albert Ou , Michael Roth , Ackerley Tng , Alexander Viro , Paul Walmsley , kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, =?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?= , Isaku Yamahata , Christian Brauner , Quentin Perret , Liam Merwick , linux-mips@vger.kernel.org, Oliver Upton , David Matlack , Jarkko Sakkinen , Palmer Dabbelt , "Kirill A . Shutemov" , kvm-riscv@lists.infradead.org, Anup Patel , linux-fsdevel@vger.kernel.org, Paolo Bonzini , Andrew Morton , Vishal Annapurve , linuxppc-dev@lists.ozlabs.org, Xu Yilun , Anish Moorthy Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Tue, Oct 31, 2023, Fuad Tabba wrote: > Hi, >=20 > On Fri, Oct 27, 2023 at 7:23=E2=80=AFPM Sean Christopherson wrote: >=20 > ... >=20 > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/ap= i.rst > > index e2252c748fd6..e82c69d5e755 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6079,6 +6079,15 @@ applied. > > :Parameters: struct kvm_userspace_memory_region2 (in) > > :Returns: 0 on success, -1 on error > > > > +KVM_SET_USER_MEMORY_REGION2 is an extension to KVM_SET_USER_MEMORY_REG= ION that > > +allows mapping guest_memfd memory into a guest. All fields shared wit= h > > +KVM_SET_USER_MEMORY_REGION identically. Userspace can set KVM_MEM_PRI= VATE in > > +flags to have KVM bind the memory region to a given guest_memfd range = of > > +[guest_memfd_offset, guest_memfd_offset + memory_size]. The target gu= est_memfd > > +must point at a file created via KVM_CREATE_GUEST_MEMFD on the current= VM, and > > +the target range must not be bound to any other memory region. All st= andard > > +bounds checks apply (use common sense). > > + >=20 > Bikeshedding here: Not sure if KVM_MEM_PRIVATE is the best name for > this. It gets confusing with KVM_MEMORY_ATTRIBUTE_PRIVATE, i.e., that > a region marked as KVM_MEM_PRIVATE is only potentially private. It did > confuse the rest of the team when I walked them through a previous > version of this code once. Would something like KVM_MEM_GUESTMEM make > more sense? Heh, deja vu. We discussed this back in v7[*], and I came to the conclusio= n that choosing a name that wasn't explicitly tied to private memory wasn't justif= ied. But that was before a KVM-owned guest_memfd was even an idea, and thus befo= re we had anything close to a real use case. Since we now know that at least pKVM will use guest_memfd for shared memory= , and odds are quite good that "regular" VMs will also do the same, i.e. will wan= t guest_memfd with the concept of private memory, I agree that we should avoi= d PRIVATE. Though I vote for KVM_MEM_GUEST_MEMFD (or KVM_MEM_GUEST_MEMFD_VALID or KVM_MEM_USE_GUEST_MEMFD). I.e. do our best to avoid ambiguity between refe= rring to "guest memory" at-large and guest_memfd. Copying a few relevant points from v7 to save a click or three. : I don't have a concrete use case (this is a recent idea on my end), but = since we're : already adding fd-based memory, I can't think of a good reason not make = it more generic : for not much extra cost. And there are definitely classes of VMs for wh= ich fd-based : memory would Just Work, e.g. large VMs that are never oversubscribed on = memory don't : need to support reclaim, so the fact that fd-based memslots won't suppor= t page aging : (among other things) right away is a non-issue. ... : Hrm, but basing private memory on top of a generic FD_VALID would effect= ively require : shared memory to use hva-based memslots for confidential VMs. That'd yi= eld a very : weird API, e.g. non-confidential VMs could be backed entirely by fd-base= d memslots, : but confidential VMs would be forced to use hva-based memslots. :=20 : Ignore this idea for now. If there's an actual use case for generic fd-= based memory : then we'll want a separate flag, fd, and offset, i.e. that support could= be added : independent of KVM_MEM_PRIVATE. ... : One alternative would be to call it KVM_MEM_PROTECTED. That shouldn't c= ause : problems for the known use of "private" (TDX and SNP), and it gives us a= little : wiggle room, e.g. if we ever get a use case where VMs can share memory t= hat is : otherwise protected. :=20 : That's a pretty big "if" though, and odds are good we'd need more memslo= t flags and : fd+offset pairs to allow differentiating "private" vs. "protected-shared= " without : forcing userspace to punch holes in memslots, so I don't know that hedgi= ng now will : buy us anything. :=20 : So I'd say that if people think KVM_MEM_PRIVATE brings additional and me= aningful : clarity over KVM_MEM_PROTECTECD, then lets go with PRIVATE. But if PROT= ECTED is : just as good, go with PROTECTED as it gives us a wee bit of wiggle room = for the : future. [*] https://lore.kernel.org/all/Yuh0ikhoh+tCK6VW@google.com =20 > > -See KVM_SET_USER_MEMORY_REGION. > > +A KVM_MEM_PRIVATE region _must_ have a valid guest_memfd (private memo= ry) and > > +userspace_addr (shared memory). However, "valid" for userspace_addr s= imply > > +means that the address itself must be a legal userspace address. The = backing > > +mapping for userspace_addr is not required to be valid/populated at th= e time of > > +KVM_SET_USER_MEMORY_REGION2, e.g. shared memory can be lazily mapped/a= llocated > > +on-demand. >=20 > Regarding requiring that a private region have both a valid > guest_memfd and a userspace_addr, should this be > implementation-specific? In pKVM at least, all regions for protected > VMs are private, and KVM doesn't care about the host userspace address > for those regions even when part of the memory is shared. Hmm, as of this patch, no, because the pKVM usage doesn't exist. E.g.=20 . Because this literally documents the current ABI. When > > +When mapping a gfn into the guest, KVM selects shared vs. private, i.e= consumes > > +userspace_addr vs. guest_memfd, based on the gfn's KVM_MEMORY_ATTRIBUT= E_PRIVATE > > +state. At VM creation time, all memory is shared, i.e. the PRIVATE at= tribute > > +is '0' for all gfns. Userspace can control whether memory is shared/p= rivate by > > +toggling KVM_MEMORY_ATTRIBUTE_PRIVATE via KVM_SET_MEMORY_ATTRIBUTES as= needed. >=20 > In pKVM, guest memory is private by default, and most of it will > remain so for the lifetime of the VM. Userspace could explicitly mark > all the guest's memory as private at initialization, but it would save > a slight amount of work. That said, I understand that it might be > better to be consistent across implementations. Yeah, we discussed this in v12[*]. The default really doesn't matter for m= emory overheads or performances once supports range-based xarray entries, and if = that isn't sufficient, KVM can internally invert the polarity of PRIVATE. But for the ABI, I think we put a stake in the ground and say that all memo= ry is shared by default. That way CoCo VMs and regular VMs (i.e VMs without the = concept of private memory) all have the same ABI. Practically speaking, the cost t= o pKVM (and likely every other CoCo VM type) is a single ioctl() during VM creatio= n to "convert" all memory to private. [*] https://lore.kernel.org/all/ZRw6X2BptZnRPNK7@google.com > > --- /dev/null > > +++ b/virt/kvm/guest_memfd.c > > @@ -0,0 +1,548 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +#include > > +#include > > +#include > > +#include > > +#include >=20 > nit: should this include be first (to maintain alphabetical ordering > of the includes)? Heh, yeah. I would argue this isn't a nit though ;-) > > +static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, lo= ff_t len) > > +{ > > + struct list_head *gmem_list =3D &inode->i_mapping->private_list= ; > > + pgoff_t start =3D offset >> PAGE_SHIFT; > > + pgoff_t end =3D (offset + len) >> PAGE_SHIFT; > > + struct kvm_gmem *gmem; > > + > > + /* > > + * Bindings must stable across invalidation to ensure the start= +end >=20 > nit: Bindings must _be/stay?_ stable "be" is what's intended. > ... >=20 > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 78a0b09ef2a5..5d1a2f1b4e94 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -798,7 +798,7 @@ void kvm_mmu_invalidate_range_add(struct kvm *kvm, = gfn_t start, gfn_t end) > > } > > } > > > > -static bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_ra= nge *range) > > +bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *ra= nge) > > { > > kvm_mmu_invalidate_range_add(kvm, range->start, range->end); > > return kvm_unmap_gfn_range(kvm, range); > > @@ -1034,6 +1034,9 @@ static void kvm_destroy_dirty_bitmap(struct kvm_m= emory_slot *memslot) > > /* This does not remove the slot from struct kvm_memslots data structu= res */ > > static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *= slot) > > { > > + if (slot->flags & KVM_MEM_PRIVATE) > > + kvm_gmem_unbind(slot); > > + >=20 > Should this be called after kvm_arch_free_memslot()? Arch-specific ode > might need some of the data before the unbinding, something I thought > might be necessary at one point for the pKVM port when deleting a > memslot, but realized later that kvm_invalidate_memslot() -> > kvm_arch_guest_memory_reclaimed() was the more logical place for it. > Also, since that seems to be the pattern for arch-specific handlers in > KVM. Maybe? But only if we can about symmetry between the allocation and free p= aths I really don't think kvm_arch_free_memslot() should be doing anything beyon= d a "pure" free. E.g. kvm_arch_free_memslot() is also called after moving a me= mslot, which hopefully we never actually have to allow for guest_memfd, but any co= de in kvm_arch_free_memslot() would bring about "what if" questions regarding mem= slot movement. I.e. the API is intended to be a "free arch metadata associated = with the memslot". Out of curiosity, what does pKVM need to do at kvm_arch_guest_memory_reclai= med()? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 711C0C4332F for ; Tue, 31 Oct 2023 22:13:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=Yts+ocgrZCz37H19sdK/e0Ef+jqp7A5J9bY51+OxwZk=; b=c5GrUsw4VnwM6warv5SO0nvd6s h4jzdOk1pOqWmxnhfGgWlHE4lwGmuMN/ygwSdwAzKmaQgRjbfNCvQtDXaMzg+i+gLLReuKWbB8xVk 8WquFw10JdJR7UmVlFOw2Y8iCzpYm/TQ563u26EMsoiKSopO1pb1MMrSurHRiPeMIIylDNdpvpnQZ FbCoCr09Na19N6n4nEwWCDW3VAb34kk9JsMomrPhX5wTK0H0CmDHud3tgsmwdtgVd4AhySOMBD1JQ YbsEa3hMbrkTXtnaI4Hp4lWvEZr+FB5aHsCXWAMyIs6TbeIqS5Mf5zWfwXtqcFjFGNFeEob7EBW6/ wMno9CLw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qxwzN-006D5s-13; Tue, 31 Oct 2023 22:13:33 +0000 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qxwzK-006D3q-0E for linux-arm-kernel@lists.infradead.org; Tue, 31 Oct 2023 22:13:32 +0000 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d9a5a3f2d4fso5940335276.3 for ; Tue, 31 Oct 2023 15:13:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698790405; x=1699395205; darn=lists.infradead.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=zs8biX5N196Q/2fqRFnHuCbiMBjZx3eWIMZhT9G5lkM=; b=u6N15J7ExppM0L24ku08thKF0urKpg55RLqbIw3D7pg5ivLfxql17sUdwsEN+7L26y sKi4iWz1tWD80ouKwHoGWMRvd79BgYXQbHoKGiuDZRilTh1Pvi5WfLhvBs/jqMIXgG0V vFUosB2OgYUCvqi5tbwT/SQ/R34hX9tEpueF3pm7neP/CBgd9OQdz2e6MjzX+f6RsUsV gAo54Tltf4h1+9Xg9NYEITsZr9vVdCoS1wQdqDx3jTvUccRMWQOBsFrXn8WQFNqllYeN G07MOOm8A+t4eAlN8yFx9kg9I+qSvapD42U5dgNN2sdDfmu/ngpQk5oEaFLftiMUIwqo +yhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698790405; x=1699395205; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=zs8biX5N196Q/2fqRFnHuCbiMBjZx3eWIMZhT9G5lkM=; b=emLL2tibxvhDfb+AT0/8Po6P9Ou9flpaQYUJU0QX7NMbUDs/36Hz/1786+PAFqqzX8 N+95TmkTxz1TX9EMSqd41vxBMR6TRIUeohB681iZcYPQhQMIOsJZgN9Dak2DQ92mxCzu Lt/ao/sHsDiJWdlP5n1rRYxsJlhfjRmf8McQCXNCVY91eqyIY/cg7VAwKgowNss2/2sO Tw1NcBR+dSEoyV6z2L4pclJ+QPZiALTeIQl+DjwtF2p1cetywvjMc2E7vdR5MbeIarSh poh6fqFmd5+BnHyAb7qiJbwtq3uLl/4XGHu/JqEwuwWCH1i6E+sPn8VlUTkFKa0iVlJX JH8Q== X-Gm-Message-State: AOJu0YyoWhfs20J+9UBw6kig+f70GRm6vsFq0O4ntvUub8+uIBKSwlo/ fpsNntIM66Kg/K/F2vGugOFW+uao7os= X-Google-Smtp-Source: AGHT+IFQSb3cO99m4kmVkxpYDSR1DbV1n07kMYxpUGk3r7RnUdGfKSLZ2dTeiYGKN4sKIS0uvhtR25tzD1o= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1746:b0:d9a:59cb:8bed with SMTP id bz6-20020a056902174600b00d9a59cb8bedmr238072ybb.5.1698790405257; Tue, 31 Oct 2023 15:13:25 -0700 (PDT) Date: Tue, 31 Oct 2023 15:13:23 -0700 In-Reply-To: Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-17-seanjc@google.com> Message-ID: Subject: Re: [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Fuad Tabba Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiaoyao Li , Xu Yilun , Chao Peng , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?=" , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231031_151330_113683_CABF1A45 X-CRM114-Status: GOOD ( 47.34 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org T24gVHVlLCBPY3QgMzEsIDIwMjMsIEZ1YWQgVGFiYmEgd3JvdGU6Cj4gSGksCj4gCj4gT24gRnJp LCBPY3QgMjcsIDIwMjMgYXQgNzoyM+KAr1BNIFNlYW4gQ2hyaXN0b3BoZXJzb24gPHNlYW5qY0Bn b29nbGUuY29tPiB3cm90ZToKPiAKPiAuLi4KPiAKPiA+IGRpZmYgLS1naXQgYS9Eb2N1bWVudGF0 aW9uL3ZpcnQva3ZtL2FwaS5yc3QgYi9Eb2N1bWVudGF0aW9uL3ZpcnQva3ZtL2FwaS5yc3QKPiA+ IGluZGV4IGUyMjUyYzc0OGZkNi4uZTgyYzY5ZDVlNzU1IDEwMDY0NAo+ID4gLS0tIGEvRG9jdW1l bnRhdGlvbi92aXJ0L2t2bS9hcGkucnN0Cj4gPiArKysgYi9Eb2N1bWVudGF0aW9uL3ZpcnQva3Zt L2FwaS5yc3QKPiA+IEBAIC02MDc5LDYgKzYwNzksMTUgQEAgYXBwbGllZC4KPiA+ICA6UGFyYW1l dGVyczogc3RydWN0IGt2bV91c2Vyc3BhY2VfbWVtb3J5X3JlZ2lvbjIgKGluKQo+ID4gIDpSZXR1 cm5zOiAwIG9uIHN1Y2Nlc3MsIC0xIG9uIGVycm9yCj4gPgo+ID4gK0tWTV9TRVRfVVNFUl9NRU1P UllfUkVHSU9OMiBpcyBhbiBleHRlbnNpb24gdG8gS1ZNX1NFVF9VU0VSX01FTU9SWV9SRUdJT04g dGhhdAo+ID4gK2FsbG93cyBtYXBwaW5nIGd1ZXN0X21lbWZkIG1lbW9yeSBpbnRvIGEgZ3Vlc3Qu ICBBbGwgZmllbGRzIHNoYXJlZCB3aXRoCj4gPiArS1ZNX1NFVF9VU0VSX01FTU9SWV9SRUdJT04g aWRlbnRpY2FsbHkuICBVc2Vyc3BhY2UgY2FuIHNldCBLVk1fTUVNX1BSSVZBVEUgaW4KPiA+ICtm bGFncyB0byBoYXZlIEtWTSBiaW5kIHRoZSBtZW1vcnkgcmVnaW9uIHRvIGEgZ2l2ZW4gZ3Vlc3Rf bWVtZmQgcmFuZ2Ugb2YKPiA+ICtbZ3Vlc3RfbWVtZmRfb2Zmc2V0LCBndWVzdF9tZW1mZF9vZmZz ZXQgKyBtZW1vcnlfc2l6ZV0uICBUaGUgdGFyZ2V0IGd1ZXN0X21lbWZkCj4gPiArbXVzdCBwb2lu dCBhdCBhIGZpbGUgY3JlYXRlZCB2aWEgS1ZNX0NSRUFURV9HVUVTVF9NRU1GRCBvbiB0aGUgY3Vy cmVudCBWTSwgYW5kCj4gPiArdGhlIHRhcmdldCByYW5nZSBtdXN0IG5vdCBiZSBib3VuZCB0byBh bnkgb3RoZXIgbWVtb3J5IHJlZ2lvbi4gIEFsbCBzdGFuZGFyZAo+ID4gK2JvdW5kcyBjaGVja3Mg YXBwbHkgKHVzZSBjb21tb24gc2Vuc2UpLgo+ID4gKwo+IAo+IEJpa2VzaGVkZGluZyBoZXJlOiBO b3Qgc3VyZSBpZiBLVk1fTUVNX1BSSVZBVEUgaXMgdGhlIGJlc3QgbmFtZSBmb3IKPiB0aGlzLiBJ dCBnZXRzIGNvbmZ1c2luZyB3aXRoIEtWTV9NRU1PUllfQVRUUklCVVRFX1BSSVZBVEUsIGkuZS4s IHRoYXQKPiBhIHJlZ2lvbiBtYXJrZWQgYXMgS1ZNX01FTV9QUklWQVRFIGlzIG9ubHkgcG90ZW50 aWFsbHkgcHJpdmF0ZS4gSXQgZGlkCj4gY29uZnVzZSB0aGUgcmVzdCBvZiB0aGUgdGVhbSB3aGVu IEkgd2Fsa2VkIHRoZW0gdGhyb3VnaCBhIHByZXZpb3VzCj4gdmVyc2lvbiBvZiB0aGlzIGNvZGUg b25jZS4gV291bGQgc29tZXRoaW5nIGxpa2UgS1ZNX01FTV9HVUVTVE1FTSBtYWtlCj4gbW9yZSBz ZW5zZT8KCkhlaCwgZGVqYSB2dS4gIFdlIGRpc2N1c3NlZCB0aGlzIGJhY2sgaW4gdjdbKl0sIGFu ZCBJIGNhbWUgdG8gdGhlIGNvbmNsdXNpb24gdGhhdApjaG9vc2luZyBhIG5hbWUgdGhhdCB3YXNu J3QgZXhwbGljaXRseSB0aWVkIHRvIHByaXZhdGUgbWVtb3J5IHdhc24ndCBqdXN0aWZpZWQuCkJ1 dCB0aGF0IHdhcyBiZWZvcmUgYSBLVk0tb3duZWQgZ3Vlc3RfbWVtZmQgd2FzIGV2ZW4gYW4gaWRl YSwgYW5kIHRodXMgYmVmb3JlIHdlCmhhZCBhbnl0aGluZyBjbG9zZSB0byBhIHJlYWwgdXNlIGNh c2UuCgpTaW5jZSB3ZSBub3cga25vdyB0aGF0IGF0IGxlYXN0IHBLVk0gd2lsbCB1c2UgZ3Vlc3Rf bWVtZmQgZm9yIHNoYXJlZCBtZW1vcnksIGFuZApvZGRzIGFyZSBxdWl0ZSBnb29kIHRoYXQgInJl Z3VsYXIiIFZNcyB3aWxsIGFsc28gZG8gdGhlIHNhbWUsIGkuZS4gd2lsbCB3YW50Cmd1ZXN0X21l bWZkIHdpdGggdGhlIGNvbmNlcHQgb2YgcHJpdmF0ZSBtZW1vcnksIEkgYWdyZWUgdGhhdCB3ZSBz aG91bGQgYXZvaWQKUFJJVkFURS4KClRob3VnaCBJIHZvdGUgZm9yIEtWTV9NRU1fR1VFU1RfTUVN RkQgKG9yIEtWTV9NRU1fR1VFU1RfTUVNRkRfVkFMSUQgb3IKS1ZNX01FTV9VU0VfR1VFU1RfTUVN RkQpLiAgSS5lLiBkbyBvdXIgYmVzdCB0byBhdm9pZCBhbWJpZ3VpdHkgYmV0d2VlbiByZWZlcnJp bmcKdG8gImd1ZXN0IG1lbW9yeSIgYXQtbGFyZ2UgYW5kIGd1ZXN0X21lbWZkLgoKQ29weWluZyBh IGZldyByZWxldmFudCBwb2ludHMgZnJvbSB2NyB0byBzYXZlIGEgY2xpY2sgb3IgdGhyZWUuCgog OiBJIGRvbid0IGhhdmUgYSBjb25jcmV0ZSB1c2UgY2FzZSAodGhpcyBpcyBhIHJlY2VudCBpZGVh IG9uIG15IGVuZCksIGJ1dCBzaW5jZSB3ZSdyZQogOiBhbHJlYWR5IGFkZGluZyBmZC1iYXNlZCBt ZW1vcnksIEkgY2FuJ3QgdGhpbmsgb2YgYSBnb29kIHJlYXNvbiBub3QgbWFrZSBpdCBtb3JlIGdl bmVyaWMKIDogZm9yIG5vdCBtdWNoIGV4dHJhIGNvc3QuICBBbmQgdGhlcmUgYXJlIGRlZmluaXRl bHkgY2xhc3NlcyBvZiBWTXMgZm9yIHdoaWNoIGZkLWJhc2VkCiA6IG1lbW9yeSB3b3VsZCBKdXN0 IFdvcmssIGUuZy4gbGFyZ2UgVk1zIHRoYXQgYXJlIG5ldmVyIG92ZXJzdWJzY3JpYmVkIG9uIG1l bW9yeSBkb24ndAogOiBuZWVkIHRvIHN1cHBvcnQgcmVjbGFpbSwgc28gdGhlIGZhY3QgdGhhdCBm ZC1iYXNlZCBtZW1zbG90cyB3b24ndCBzdXBwb3J0IHBhZ2UgYWdpbmcKIDogKGFtb25nIG90aGVy IHRoaW5ncykgcmlnaHQgYXdheSBpcyBhIG5vbi1pc3N1ZS4KCi4uLgoKIDogSHJtLCBidXQgYmFz aW5nIHByaXZhdGUgbWVtb3J5IG9uIHRvcCBvZiBhIGdlbmVyaWMgRkRfVkFMSUQgd291bGQgZWZm ZWN0aXZlbHkgcmVxdWlyZQogOiBzaGFyZWQgbWVtb3J5IHRvIHVzZSBodmEtYmFzZWQgbWVtc2xv dHMgZm9yIGNvbmZpZGVudGlhbCBWTXMuICBUaGF0J2QgeWllbGQgYSB2ZXJ5CiA6IHdlaXJkIEFQ SSwgZS5nLiBub24tY29uZmlkZW50aWFsIFZNcyBjb3VsZCBiZSBiYWNrZWQgZW50aXJlbHkgYnkg ZmQtYmFzZWQgbWVtc2xvdHMsCiA6IGJ1dCBjb25maWRlbnRpYWwgVk1zIHdvdWxkIGJlIGZvcmNl ZCB0byB1c2UgaHZhLWJhc2VkIG1lbXNsb3RzLgogOiAKIDogSWdub3JlIHRoaXMgaWRlYSBmb3Ig bm93LiAgSWYgdGhlcmUncyBhbiBhY3R1YWwgdXNlIGNhc2UgZm9yIGdlbmVyaWMgZmQtYmFzZWQg bWVtb3J5CiA6IHRoZW4gd2UnbGwgd2FudCBhIHNlcGFyYXRlIGZsYWcsIGZkLCBhbmQgb2Zmc2V0 LCBpLmUuIHRoYXQgc3VwcG9ydCBjb3VsZCBiZSBhZGRlZAogOiBpbmRlcGVuZGVudCBvZiBLVk1f TUVNX1BSSVZBVEUuCgouLi4KCiA6IE9uZSBhbHRlcm5hdGl2ZSB3b3VsZCBiZSB0byBjYWxsIGl0 IEtWTV9NRU1fUFJPVEVDVEVELiAgVGhhdCBzaG91bGRuJ3QgY2F1c2UKIDogcHJvYmxlbXMgZm9y IHRoZSBrbm93biB1c2Ugb2YgInByaXZhdGUiIChURFggYW5kIFNOUCksIGFuZCBpdCBnaXZlcyB1 cyBhIGxpdHRsZQogOiB3aWdnbGUgcm9vbSwgZS5nLiBpZiB3ZSBldmVyIGdldCBhIHVzZSBjYXNl IHdoZXJlIFZNcyBjYW4gc2hhcmUgbWVtb3J5IHRoYXQgaXMKIDogb3RoZXJ3aXNlIHByb3RlY3Rl ZC4KIDogCiA6IFRoYXQncyBhIHByZXR0eSBiaWcgImlmIiB0aG91Z2gsIGFuZCBvZGRzIGFyZSBn b29kIHdlJ2QgbmVlZCBtb3JlIG1lbXNsb3QgZmxhZ3MgYW5kCiA6IGZkK29mZnNldCBwYWlycyB0 byBhbGxvdyBkaWZmZXJlbnRpYXRpbmcgInByaXZhdGUiIHZzLiAicHJvdGVjdGVkLXNoYXJlZCIg d2l0aG91dAogOiBmb3JjaW5nIHVzZXJzcGFjZSB0byBwdW5jaCBob2xlcyBpbiBtZW1zbG90cywg c28gSSBkb24ndCBrbm93IHRoYXQgaGVkZ2luZyBub3cgd2lsbAogOiBidXkgdXMgYW55dGhpbmcu CiA6IAogOiBTbyBJJ2Qgc2F5IHRoYXQgaWYgcGVvcGxlIHRoaW5rIEtWTV9NRU1fUFJJVkFURSBi cmluZ3MgYWRkaXRpb25hbCBhbmQgbWVhbmluZ2Z1bAogOiBjbGFyaXR5IG92ZXIgS1ZNX01FTV9Q Uk9URUNURUNELCB0aGVuIGxldHMgZ28gd2l0aCBQUklWQVRFLiAgQnV0IGlmIFBST1RFQ1RFRCBp cwogOiBqdXN0IGFzIGdvb2QsIGdvIHdpdGggUFJPVEVDVEVEIGFzIGl0IGdpdmVzIHVzIGEgd2Vl IGJpdCBvZiB3aWdnbGUgcm9vbSBmb3IgdGhlCiA6IGZ1dHVyZS4KClsqXSBodHRwczovL2xvcmUu a2VybmVsLm9yZy9hbGwvWXVoMGlraG9oK3RDSzZWV0Bnb29nbGUuY29tCiAKPiA+IC1TZWUgS1ZN X1NFVF9VU0VSX01FTU9SWV9SRUdJT04uCj4gPiArQSBLVk1fTUVNX1BSSVZBVEUgcmVnaW9uIF9t dXN0XyBoYXZlIGEgdmFsaWQgZ3Vlc3RfbWVtZmQgKHByaXZhdGUgbWVtb3J5KSBhbmQKPiA+ICt1 c2Vyc3BhY2VfYWRkciAoc2hhcmVkIG1lbW9yeSkuICBIb3dldmVyLCAidmFsaWQiIGZvciB1c2Vy c3BhY2VfYWRkciBzaW1wbHkKPiA+ICttZWFucyB0aGF0IHRoZSBhZGRyZXNzIGl0c2VsZiBtdXN0 IGJlIGEgbGVnYWwgdXNlcnNwYWNlIGFkZHJlc3MuICBUaGUgYmFja2luZwo+ID4gK21hcHBpbmcg Zm9yIHVzZXJzcGFjZV9hZGRyIGlzIG5vdCByZXF1aXJlZCB0byBiZSB2YWxpZC9wb3B1bGF0ZWQg YXQgdGhlIHRpbWUgb2YKPiA+ICtLVk1fU0VUX1VTRVJfTUVNT1JZX1JFR0lPTjIsIGUuZy4gc2hh cmVkIG1lbW9yeSBjYW4gYmUgbGF6aWx5IG1hcHBlZC9hbGxvY2F0ZWQKPiA+ICtvbi1kZW1hbmQu Cj4gCj4gUmVnYXJkaW5nIHJlcXVpcmluZyB0aGF0IGEgcHJpdmF0ZSByZWdpb24gaGF2ZSBib3Ro IGEgdmFsaWQKPiBndWVzdF9tZW1mZCBhbmQgYSB1c2Vyc3BhY2VfYWRkciwgc2hvdWxkIHRoaXMg YmUKPiBpbXBsZW1lbnRhdGlvbi1zcGVjaWZpYz8gSW4gcEtWTSBhdCBsZWFzdCwgYWxsIHJlZ2lv bnMgZm9yIHByb3RlY3RlZAo+IFZNcyBhcmUgcHJpdmF0ZSwgYW5kIEtWTSBkb2Vzbid0IGNhcmUg YWJvdXQgdGhlIGhvc3QgdXNlcnNwYWNlIGFkZHJlc3MKPiBmb3IgdGhvc2UgcmVnaW9ucyBldmVu IHdoZW4gcGFydCBvZiB0aGUgbWVtb3J5IGlzIHNoYXJlZC4KCkhtbSwgYXMgb2YgdGhpcyBwYXRj aCwgbm8sIGJlY2F1c2UgdGhlIHBLVk0gdXNhZ2UgZG9lc24ndCBleGlzdC4gIEUuZy4gCgouICBC ZWNhdXNlIHRoaXMgbGl0ZXJhbGx5IGRvY3VtZW50cyB0aGUgY3VycmVudCBBQkkuICBXaGVuCgo+ ID4gK1doZW4gbWFwcGluZyBhIGdmbiBpbnRvIHRoZSBndWVzdCwgS1ZNIHNlbGVjdHMgc2hhcmVk IHZzLiBwcml2YXRlLCBpLmUgY29uc3VtZXMKPiA+ICt1c2Vyc3BhY2VfYWRkciB2cy4gZ3Vlc3Rf bWVtZmQsIGJhc2VkIG9uIHRoZSBnZm4ncyBLVk1fTUVNT1JZX0FUVFJJQlVURV9QUklWQVRFCj4g PiArc3RhdGUuICBBdCBWTSBjcmVhdGlvbiB0aW1lLCBhbGwgbWVtb3J5IGlzIHNoYXJlZCwgaS5l LiB0aGUgUFJJVkFURSBhdHRyaWJ1dGUKPiA+ICtpcyAnMCcgZm9yIGFsbCBnZm5zLiAgVXNlcnNw YWNlIGNhbiBjb250cm9sIHdoZXRoZXIgbWVtb3J5IGlzIHNoYXJlZC9wcml2YXRlIGJ5Cj4gPiAr dG9nZ2xpbmcgS1ZNX01FTU9SWV9BVFRSSUJVVEVfUFJJVkFURSB2aWEgS1ZNX1NFVF9NRU1PUllf QVRUUklCVVRFUyBhcyBuZWVkZWQuCj4gCj4gSW4gcEtWTSwgZ3Vlc3QgbWVtb3J5IGlzIHByaXZh dGUgYnkgZGVmYXVsdCwgYW5kIG1vc3Qgb2YgaXQgd2lsbAo+IHJlbWFpbiBzbyBmb3IgdGhlIGxp ZmV0aW1lIG9mIHRoZSBWTS4gVXNlcnNwYWNlIGNvdWxkIGV4cGxpY2l0bHkgbWFyawo+IGFsbCB0 aGUgZ3Vlc3QncyBtZW1vcnkgYXMgcHJpdmF0ZSBhdCBpbml0aWFsaXphdGlvbiwgYnV0IGl0IHdv dWxkIHNhdmUKPiBhIHNsaWdodCBhbW91bnQgb2Ygd29yay4gVGhhdCBzYWlkLCBJIHVuZGVyc3Rh bmQgdGhhdCBpdCBtaWdodCBiZQo+IGJldHRlciB0byBiZSBjb25zaXN0ZW50IGFjcm9zcyBpbXBs ZW1lbnRhdGlvbnMuCgpZZWFoLCB3ZSBkaXNjdXNzZWQgdGhpcyBpbiB2MTJbKl0uICBUaGUgZGVm YXVsdCByZWFsbHkgZG9lc24ndCBtYXR0ZXIgZm9yIG1lbW9yeQpvdmVyaGVhZHMgb3IgcGVyZm9y bWFuY2VzIG9uY2Ugc3VwcG9ydHMgcmFuZ2UtYmFzZWQgeGFycmF5IGVudHJpZXMsIGFuZCBpZiB0 aGF0Cmlzbid0IHN1ZmZpY2llbnQsIEtWTSBjYW4gaW50ZXJuYWxseSBpbnZlcnQgdGhlIHBvbGFy aXR5IG9mIFBSSVZBVEUuCgpCdXQgZm9yIHRoZSBBQkksIEkgdGhpbmsgd2UgcHV0IGEgc3Rha2Ug aW4gdGhlIGdyb3VuZCBhbmQgc2F5IHRoYXQgYWxsIG1lbW9yeSBpcwpzaGFyZWQgYnkgZGVmYXVs dC4gIFRoYXQgd2F5IENvQ28gVk1zIGFuZCByZWd1bGFyIFZNcyAoaS5lIFZNcyB3aXRob3V0IHRo ZSBjb25jZXB0Cm9mIHByaXZhdGUgbWVtb3J5KSBhbGwgaGF2ZSB0aGUgc2FtZSBBQkkuICBQcmFj dGljYWxseSBzcGVha2luZywgdGhlIGNvc3QgdG8gcEtWTQooYW5kIGxpa2VseSBldmVyeSBvdGhl ciBDb0NvIFZNIHR5cGUpIGlzIGEgc2luZ2xlIGlvY3RsKCkgZHVyaW5nIFZNIGNyZWF0aW9uIHRv CiJjb252ZXJ0IiBhbGwgbWVtb3J5IHRvIHByaXZhdGUuCgpbKl0gaHR0cHM6Ly9sb3JlLmtlcm5l bC5vcmcvYWxsL1pSdzZYMkJwdFpuUlBOSzdAZ29vZ2xlLmNvbQoKPiA+IC0tLSAvZGV2L251bGwK PiA+ICsrKyBiL3ZpcnQva3ZtL2d1ZXN0X21lbWZkLmMKPiA+IEBAIC0wLDAgKzEsNTQ4IEBACj4g PiArLy8gU1BEWC1MaWNlbnNlLUlkZW50aWZpZXI6IEdQTC0yLjAKPiA+ICsjaW5jbHVkZSA8bGlu dXgvYmFja2luZy1kZXYuaD4KPiA+ICsjaW5jbHVkZSA8bGludXgvZmFsbG9jLmg+Cj4gPiArI2lu Y2x1ZGUgPGxpbnV4L2t2bV9ob3N0Lmg+Cj4gPiArI2luY2x1ZGUgPGxpbnV4L3BhZ2VtYXAuaD4K PiA+ICsjaW5jbHVkZSA8bGludXgvYW5vbl9pbm9kZXMuaD4KPiAKPiBuaXQ6IHNob3VsZCB0aGlz IGluY2x1ZGUgYmUgZmlyc3QgKHRvIG1haW50YWluIGFscGhhYmV0aWNhbCBvcmRlcmluZwo+IG9m IHRoZSBpbmNsdWRlcyk/CgpIZWgsIHllYWguICBJIHdvdWxkIGFyZ3VlIHRoaXMgaXNuJ3QgYSBu aXQgdGhvdWdoIDstKQoKPiA+ICtzdGF0aWMgbG9uZyBrdm1fZ21lbV9wdW5jaF9ob2xlKHN0cnVj dCBpbm9kZSAqaW5vZGUsIGxvZmZfdCBvZmZzZXQsIGxvZmZfdCBsZW4pCj4gPiArewo+ID4gKyAg ICAgICBzdHJ1Y3QgbGlzdF9oZWFkICpnbWVtX2xpc3QgPSAmaW5vZGUtPmlfbWFwcGluZy0+cHJp dmF0ZV9saXN0Owo+ID4gKyAgICAgICBwZ29mZl90IHN0YXJ0ID0gb2Zmc2V0ID4+IFBBR0VfU0hJ RlQ7Cj4gPiArICAgICAgIHBnb2ZmX3QgZW5kID0gKG9mZnNldCArIGxlbikgPj4gUEFHRV9TSElG VDsKPiA+ICsgICAgICAgc3RydWN0IGt2bV9nbWVtICpnbWVtOwo+ID4gKwo+ID4gKyAgICAgICAv Kgo+ID4gKyAgICAgICAgKiBCaW5kaW5ncyBtdXN0IHN0YWJsZSBhY3Jvc3MgaW52YWxpZGF0aW9u IHRvIGVuc3VyZSB0aGUgc3RhcnQrZW5kCj4gCj4gbml0OiBCaW5kaW5ncyBtdXN0IF9iZS9zdGF5 P18gc3RhYmxlCgoiYmUiIGlzIHdoYXQncyBpbnRlbmRlZC4KCj4gLi4uCj4gCj4gPiBkaWZmIC0t Z2l0IGEvdmlydC9rdm0va3ZtX21haW4uYyBiL3ZpcnQva3ZtL2t2bV9tYWluLmMKPiA+IGluZGV4 IDc4YTBiMDllZjJhNS4uNWQxYTJmMWI0ZTk0IDEwMDY0NAo+ID4gLS0tIGEvdmlydC9rdm0va3Zt X21haW4uYwo+ID4gKysrIGIvdmlydC9rdm0va3ZtX21haW4uYwo+ID4gQEAgLTc5OCw3ICs3OTgs NyBAQCB2b2lkIGt2bV9tbXVfaW52YWxpZGF0ZV9yYW5nZV9hZGQoc3RydWN0IGt2bSAqa3ZtLCBn Zm5fdCBzdGFydCwgZ2ZuX3QgZW5kKQo+ID4gICAgICAgICB9Cj4gPiAgfQo+ID4KPiA+IC1zdGF0 aWMgYm9vbCBrdm1fbW11X3VubWFwX2dmbl9yYW5nZShzdHJ1Y3Qga3ZtICprdm0sIHN0cnVjdCBr dm1fZ2ZuX3JhbmdlICpyYW5nZSkKPiA+ICtib29sIGt2bV9tbXVfdW5tYXBfZ2ZuX3JhbmdlKHN0 cnVjdCBrdm0gKmt2bSwgc3RydWN0IGt2bV9nZm5fcmFuZ2UgKnJhbmdlKQo+ID4gIHsKPiA+ICAg ICAgICAga3ZtX21tdV9pbnZhbGlkYXRlX3JhbmdlX2FkZChrdm0sIHJhbmdlLT5zdGFydCwgcmFu Z2UtPmVuZCk7Cj4gPiAgICAgICAgIHJldHVybiBrdm1fdW5tYXBfZ2ZuX3JhbmdlKGt2bSwgcmFu Z2UpOwo+ID4gQEAgLTEwMzQsNiArMTAzNCw5IEBAIHN0YXRpYyB2b2lkIGt2bV9kZXN0cm95X2Rp cnR5X2JpdG1hcChzdHJ1Y3Qga3ZtX21lbW9yeV9zbG90ICptZW1zbG90KQo+ID4gIC8qIFRoaXMg ZG9lcyBub3QgcmVtb3ZlIHRoZSBzbG90IGZyb20gc3RydWN0IGt2bV9tZW1zbG90cyBkYXRhIHN0 cnVjdHVyZXMgKi8KPiA+ICBzdGF0aWMgdm9pZCBrdm1fZnJlZV9tZW1zbG90KHN0cnVjdCBrdm0g Kmt2bSwgc3RydWN0IGt2bV9tZW1vcnlfc2xvdCAqc2xvdCkKPiA+ICB7Cj4gPiArICAgICAgIGlm IChzbG90LT5mbGFncyAmIEtWTV9NRU1fUFJJVkFURSkKPiA+ICsgICAgICAgICAgICAgICBrdm1f Z21lbV91bmJpbmQoc2xvdCk7Cj4gPiArCj4gCj4gU2hvdWxkIHRoaXMgYmUgY2FsbGVkIGFmdGVy IGt2bV9hcmNoX2ZyZWVfbWVtc2xvdCgpPyBBcmNoLXNwZWNpZmljIG9kZQo+IG1pZ2h0IG5lZWQg c29tZSBvZiB0aGUgZGF0YSBiZWZvcmUgdGhlIHVuYmluZGluZywgc29tZXRoaW5nIEkgdGhvdWdo dAo+IG1pZ2h0IGJlIG5lY2Vzc2FyeSBhdCBvbmUgcG9pbnQgZm9yIHRoZSBwS1ZNIHBvcnQgd2hl biBkZWxldGluZyBhCj4gbWVtc2xvdCwgYnV0IHJlYWxpemVkIGxhdGVyIHRoYXQga3ZtX2ludmFs aWRhdGVfbWVtc2xvdCgpIC0+Cj4ga3ZtX2FyY2hfZ3Vlc3RfbWVtb3J5X3JlY2xhaW1lZCgpIHdh cyB0aGUgbW9yZSBsb2dpY2FsIHBsYWNlIGZvciBpdC4KPiBBbHNvLCBzaW5jZSB0aGF0IHNlZW1z IHRvIGJlIHRoZSBwYXR0ZXJuIGZvciBhcmNoLXNwZWNpZmljIGhhbmRsZXJzIGluCj4gS1ZNLgoK TWF5YmU/ICBCdXQgb25seSBpZiB3ZSBjYW4gYWJvdXQgc3ltbWV0cnkgYmV0d2VlbiB0aGUgYWxs b2NhdGlvbiBhbmQgZnJlZSBwYXRocwpJIHJlYWxseSBkb24ndCB0aGluayBrdm1fYXJjaF9mcmVl X21lbXNsb3QoKSBzaG91bGQgYmUgZG9pbmcgYW55dGhpbmcgYmV5b25kIGEKInB1cmUiIGZyZWUu ICBFLmcuIGt2bV9hcmNoX2ZyZWVfbWVtc2xvdCgpIGlzIGFsc28gY2FsbGVkIGFmdGVyIG1vdmlu ZyBhIG1lbXNsb3QsCndoaWNoIGhvcGVmdWxseSB3ZSBuZXZlciBhY3R1YWxseSBoYXZlIHRvIGFs bG93IGZvciBndWVzdF9tZW1mZCwgYnV0IGFueSBjb2RlIGluCmt2bV9hcmNoX2ZyZWVfbWVtc2xv dCgpIHdvdWxkIGJyaW5nIGFib3V0ICJ3aGF0IGlmIiBxdWVzdGlvbnMgcmVnYXJkaW5nIG1lbXNs b3QKbW92ZW1lbnQuICBJLmUuIHRoZSBBUEkgaXMgaW50ZW5kZWQgdG8gYmUgYSAiZnJlZSBhcmNo IG1ldGFkYXRhIGFzc29jaWF0ZWQgd2l0aAp0aGUgbWVtc2xvdCIuCgpPdXQgb2YgY3VyaW9zaXR5 LCB3aGF0IGRvZXMgcEtWTSBuZWVkIHRvIGRvIGF0IGt2bV9hcmNoX2d1ZXN0X21lbW9yeV9yZWNs YWltZWQoKT8KCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f CmxpbnV4LWFybS1rZXJuZWwgbWFpbGluZyBsaXN0CmxpbnV4LWFybS1rZXJuZWxAbGlzdHMuaW5m cmFkZWFkLm9yZwpodHRwOi8vbGlzdHMuaW5mcmFkZWFkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xp bnV4LWFybS1rZXJuZWwK